1. 16 Nov, 2019 6 commits
    • Guillaume Nault's avatar
      ipmr: Fix skb headroom in ipmr_get_route(). · 7901cd97
      Guillaume Nault authored
      In route.c, inet_rtm_getroute_build_skb() creates an skb with no
      headroom. This skb is then used by inet_rtm_getroute() which may pass
      it to rt_fill_info() and, from there, to ipmr_get_route(). The later
      might try to reuse this skb by cloning it and prepending an IPv4
      header. But since the original skb has no headroom, skb_push() triggers
      skb_under_panic():
      
      skbuff: skb_under_panic: text:00000000ca46ad8a len:80 put:20 head:00000000cd28494e data:000000009366fd6b tail:0x3c end:0xec0 dev:veth0
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:108!
      invalid opcode: 0000 [#1] SMP KASAN PTI
      CPU: 6 PID: 587 Comm: ip Not tainted 5.4.0-rc6+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      RIP: 0010:skb_panic+0xbf/0xd0
      Code: 41 a2 ff 8b 4b 70 4c 8b 4d d0 48 c7 c7 20 76 f5 8b 44 8b 45 bc 48 8b 55 c0 48 8b 75 c8 41 54 41 57 41 56 41 55 e8 75 dc 7a ff <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
      RSP: 0018:ffff888059ddf0b0 EFLAGS: 00010286
      RAX: 0000000000000086 RBX: ffff888060a315c0 RCX: ffffffff8abe4822
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88806c9a79cc
      RBP: ffff888059ddf118 R08: ffffed100d9361b1 R09: ffffed100d9361b0
      R10: ffff88805c68aee3 R11: ffffed100d9361b1 R12: ffff88805d218000
      R13: ffff88805c689fec R14: 000000000000003c R15: 0000000000000ec0
      FS:  00007f6af184b700(0000) GS:ffff88806c980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffc8204a000 CR3: 0000000057b40006 CR4: 0000000000360ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       skb_push+0x7e/0x80
       ipmr_get_route+0x459/0x6fa
       rt_fill_info+0x692/0x9f0
       inet_rtm_getroute+0xd26/0xf20
       rtnetlink_rcv_msg+0x45d/0x630
       netlink_rcv_skb+0x1a5/0x220
       rtnetlink_rcv+0x15/0x20
       netlink_unicast+0x305/0x3a0
       netlink_sendmsg+0x575/0x730
       sock_sendmsg+0xb5/0xc0
       ___sys_sendmsg+0x497/0x4f0
       __sys_sendmsg+0xcb/0x150
       __x64_sys_sendmsg+0x48/0x50
       do_syscall_64+0xd2/0xac0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Actually the original skb used to have enough headroom, but the
      reserve_skb() call was lost with the introduction of
      inet_rtm_getroute_build_skb() by commit 404eb77e ("ipv4: support
      sport, dport and ip_proto in RTM_GETROUTE").
      
      We could reserve some headroom again in inet_rtm_getroute_build_skb(),
      but this function shouldn't be responsible for handling the special
      case of ipmr_get_route(). Let's handle that directly in
      ipmr_get_route() by calling skb_realloc_headroom() instead of
      skb_clone().
      
      Fixes: 404eb77e ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7901cd97
    • Ursula Braun's avatar
      net/smc: fix fastopen for non-blocking connect() · 8204df72
      Ursula Braun authored
      FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
      TCP native during connection start, the FASTOPEN setsockopts trigger
      this fallback, if the SMC-socket is still in state SMC_INIT.
      But if a FASTOPEN setsockopt is called after a non-blocking connect(),
      this is broken, and fallback does not make sense.
      This change complements
      commit cd206360 ("net/smc: avoid fallback in case of non-blocking connect")
      and fixes the syzbot reported problem "WARNING in smc_unhash_sk".
      
      Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
      Fixes: e1bbdd57 ("net/smc: reduce sock_put() for fallback sockets")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8204df72
    • Dag Moxnes's avatar
      rds: ib: update WR sizes when bringing up connection · a36e629e
      Dag Moxnes authored
      Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and
      rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result,
      a connection being down while rds_ib_sysctl_max_send_wr or
      rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when
      it comes back up.
      
      Move resizing of WRs to rds_ib_setup_qp so that connections will be setup
      with the most current WR sizes.
      Signed-off-by: default avatarDag Moxnes <dag.moxnes@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a36e629e
    • Vladimir Oltean's avatar
      net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid · c80ed84e
      Vladimir Oltean authored
      This sequence of operations:
      ip link set dev br0 type bridge vlan_filtering 1
      bridge vlan del dev swp2 vid 1
      ip link set dev br0 type bridge vlan_filtering 1
      ip link set dev br0 type bridge vlan_filtering 0
      
      apparently fails with the message:
      
      [   31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
      [   31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0)
      [   31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2
      [   31.335599] ------------[ cut here ]------------
      [   31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
      [   31.349981] br0: Commit of attribute (id=6) failed.
      [   31.354890] Modules linked in:
      [   31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062
      [   31.366167] Hardware name: Freescale LS1021A
      [   31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14)
      [   31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c)
      [   31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c)
      [   31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8)
      [   31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4)
      [   31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118)
      [   31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518)
      [   31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c)
      [   31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60)
      [   31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c)
      [   31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110)
      [   31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8)
      [   31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4)
      [   31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250)
      [   31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c)
      [   31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
      [   31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0)
      [   31.505122] 7fa0:                   00000002 b6f2e060 00000003 beabd6a4 00000000 00000000
      [   31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4
      [   31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68
      
      The reason is the implementation of br_get_pvid:
      
      static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg)
      {
      	if (!vg)
      		return 0;
      
      	smp_rmb();
      	return vg->pvid;
      }
      
      Since VID 0 is an invalid pvid from the bridge's point of view, let's
      add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that
      doesn't really exist.
      
      Fixes: 5f33183b ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering")
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c80ed84e
    • Andrea Mayer's avatar
      seg6: fix skb transport_header after decap_and_validate() · c71644d0
      Andrea Mayer authored
      in the receive path (more precisely in ip6_rcv_core()) the
      skb->transport_header is set to skb->network_header + sizeof(*hdr). As a
      consequence, after routing operations, destination input expects to find
      skb->transport_header correctly set to the next protocol (or extension
      header) that follows the network protocol. However, decap behaviors (DX*,
      DT*) remove the outer IPv6 and SRH extension and do not set again the
      skb->transport_header pointer correctly. For this reason, the patch sets
      the skb->transport_header to the skb->network_header + sizeof(hdr) in each
      DX* and DT* behavior.
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c71644d0
    • Andrea Mayer's avatar
      seg6: fix srh pointer in get_srh() · 7f91ed8c
      Andrea Mayer authored
      pskb_may_pull may change pointers in header. For this reason, it is
      mandatory to reload any pointer that points into skb header.
      Signed-off-by: default avatarAndrea Mayer <andrea.mayer@uniroma2.it>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f91ed8c
  2. 15 Nov, 2019 1 commit
  3. 13 Nov, 2019 9 commits
  4. 12 Nov, 2019 3 commits
    • Ursula Braun's avatar
      net/smc: fix refcount non-blocking connect() -part 2 · 6d6dd528
      Ursula Braun authored
      If an SMC socket is immediately terminated after a non-blocking connect()
      has been called, a memory leak is possible.
      Due to the sock_hold move in
      commit 301428ea ("net/smc: fix refcounting for non-blocking connect()")
      an extra sock_put() is needed in smc_connect_work(), if the internal
      TCP socket is aborted and cancels the sk_stream_wait_connect() of the
      connect worker.
      
      Reported-by: syzbot+4b73ad6fc767e576e275@syzkaller.appspotmail.com
      Fixes: 301428ea ("net/smc: fix refcounting for non-blocking connect()")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d6dd528
    • Xiaodong Xu's avatar
      xfrm: release device reference for invalid state · 4944a4b1
      Xiaodong Xu authored
      An ESP packet could be decrypted in async mode if the input handler for
      this packet returns -EINPROGRESS in xfrm_input(). At this moment the device
      reference in skb is held. Later xfrm_input() will be invoked again to
      resume the processing.
      If the transform state is still valid it would continue to release the
      device reference and there won't be a problem; however if the transform
      state is not valid when async resumption happens, the packet will be
      dropped while the device reference is still being held.
      When the device is deleted for some reason and the reference to this
      device is not properly released, the kernel will keep logging like:
      
      unregister_netdevice: waiting for ppp2 to become free. Usage count = 1
      
      The issue is observed when running IPsec traffic over a PPPoE device based
      on a bridge interface. By terminating the PPPoE connection on the server
      end for multiple times, the PPPoE device on the client side will eventually
      get stuck on the above warning message.
      
      This patch will check the async mode first and continue to release device
      reference in async resumption, before it is dropped due to invalid state.
      
      v2: Do not assign address family from outer_mode in the transform if the
      state is invalid
      
      v3: Release device reference in the error path instead of jumping to resume
      
      Fixes: 4ce3dbe3 ("xfrm: Fix xfrm_input() to verify state is valid when (encap_type < 0)")
      Signed-off-by: default avatarXiaodong Xu <stid.smth@gmail.com>
      Reported-by: default avatarBo Chen <chenborfc@163.com>
      Tested-by: default avatarBo Chen <chenborfc@163.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      4944a4b1
    • Aya Levin's avatar
      devlink: Add method for time-stamp on reporter's dump · d279505b
      Aya Levin authored
      When setting the dump's time-stamp, use ktime_get_real in addition to
      jiffies. This simplifies the user space implementation and bypasses
      some inconsistent behavior with translating jiffies to current time.
      The time taken is transformed into nsec, to comply with y2038 issue.
      
      Fixes: c8e1da0b ("devlink: Add health report functionality")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d279505b
  5. 10 Nov, 2019 1 commit
  6. 08 Nov, 2019 5 commits
    • Stefano Garzarella's avatar
      vsock/virtio: fix sock refcnt holding during the shutdown · ad8a7220
      Stefano Garzarella authored
      The "42f5cda5" commit rightly set SOCK_DONE on peer shutdown,
      but there is an issue if we receive the SHUTDOWN(RDWR) while the
      virtio_transport_close_timeout() is scheduled.
      In this case, when the timeout fires, the SOCK_DONE is already
      set and the virtio_transport_close_timeout() will not call
      virtio_transport_reset() and virtio_transport_do_close().
      This causes that both sockets remain open and will never be released,
      preventing the unloading of [virtio|vhost]_transport modules.
      
      This patch fixes this issue, calling virtio_transport_reset() and
      virtio_transport_do_close() when we receive the SHUTDOWN(RDWR)
      and there is nothing left to read.
      
      Fixes: 42f5cda5 ("vsock/virtio: set SOCK_DONE on peer shutdown")
      Cc: Stephen Barber <smbarber@chromium.org>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad8a7220
    • Ahmed Zaki's avatar
      mac80211: fix station inactive_time shortly after boot · 285531f9
      Ahmed Zaki authored
      In the first 5 minutes after boot (time of INITIAL_JIFFIES),
      ieee80211_sta_last_active() returns zero if last_ack is zero. This
      leads to "inactive time" showing jiffies_to_msecs(jiffies).
      
       # iw wlan0 station get fc:ec:da:64:a6:dd
       Station fc:ec:da:64:a6:dd (on wlan0)
      	inactive time:	4294894049 ms
      	.
      	.
      	connected time:	70 seconds
      
      Fix by returning last_rx if last_ack == 0.
      Signed-off-by: default avatarAhmed Zaki <anzaki@gmail.com>
      Link: https://lore.kernel.org/r/20191031121243.27694-1-anzaki@gmail.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      285531f9
    • Johannes Berg's avatar
      mac80211: fix ieee80211_txq_setup_flows() failure path · 6dd47d97
      Johannes Berg authored
      If ieee80211_txq_setup_flows() fails, we don't clean up LED
      state properly, leading to crashes later on, fix that.
      
      Fixes: dc8b274f ("mac80211: Move up init of TXQs")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Link: https://lore.kernel.org/r/20191105154110.1ccf7112ba5d.I0ba865792446d051867b33153be65ce6b063d98c@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      6dd47d97
    • David Ahern's avatar
      ipv4: Fix table id reference in fib_sync_down_addr · e0a31262
      David Ahern authored
      Hendrik reported routes in the main table using source address are not
      removed when the address is removed. The problem is that fib_sync_down_addr
      does not account for devices in the default VRF which are associated
      with the main table. Fix by updating the table id reference.
      
      Fixes: 5a56a0b3 ("net: Don't delete routes in different VRFs")
      Reported-by: default avatarHendrik Donner <hd@os-cillation.de>
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0a31262
    • Eric Dumazet's avatar
      ipv6: fixes rt6_probe() and fib6_nh->last_probe init · 1bef4c22
      Eric Dumazet authored
      While looking at a syzbot KCSAN report [1], I found multiple
      issues in this code :
      
      1) fib6_nh->last_probe has an initial value of 0.
      
         While probably okay on 64bit kernels, this causes an issue
         on 32bit kernels since the time_after(jiffies, 0 + interval)
         might be false ~24 days after boot (for HZ=1000)
      
      2) The data-race found by KCSAN
         I could use READ_ONCE() and WRITE_ONCE(), but we also can
         take the opportunity of not piling-up too many rt6_probe_deferred()
         works by using instead cmpxchg() so that only one cpu wins the race.
      
      [1]
      BUG: KCSAN: data-race in find_match / find_match
      
      write to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 1:
       rt6_probe net/ipv6/route.c:663 [inline]
       find_match net/ipv6/route.c:757 [inline]
       find_match+0x5bd/0x790 net/ipv6/route.c:733
       __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
       find_rr_leaf net/ipv6/route.c:852 [inline]
       rt6_select net/ipv6/route.c:896 [inline]
       fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
       ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
       ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
       fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
       ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
       ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
       ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
       ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
       inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
       inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
       tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
       tcp_xmit_probe_skb+0x19b/0x1d0 net/ipv4/tcp_output.c:3735
      
      read to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 0:
       rt6_probe net/ipv6/route.c:657 [inline]
       find_match net/ipv6/route.c:757 [inline]
       find_match+0x521/0x790 net/ipv6/route.c:733
       __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
       find_rr_leaf net/ipv6/route.c:852 [inline]
       rt6_select net/ipv6/route.c:896 [inline]
       fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
       ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
       ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
       fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
       ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
       ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
       ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
       ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
       inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
       inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 18894 Comm: udevd Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: cc3a86c8 ("ipv6: Change rt6_probe to take a fib6_nh")
      Fixes: f547fac6 ("ipv6: rate-limit probes for neighbourless routes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bef4c22
  7. 07 Nov, 2019 5 commits
  8. 06 Nov, 2019 2 commits
    • Jakub Kicinski's avatar
      net/tls: fix sk_msg trim on fallback to copy mode · 683916f6
      Jakub Kicinski authored
      sk_msg_trim() tries to only update curr pointer if it falls into
      the trimmed region. The logic, however, does not take into the
      account pointer wrapping that sk_msg_iter_var_prev() does nor
      (as John points out) the fact that msg->sg is a ring buffer.
      
      This means that when the message was trimmed completely, the new
      curr pointer would have the value of MAX_MSG_FRAGS - 1, which is
      neither smaller than any other value, nor would it actually be
      correct.
      
      Special case the trimming to 0 length a little bit and rework
      the comparison between curr and end to take into account wrapping.
      
      This bug caused the TLS code to not copy all of the message, if
      zero copy filled in fewer sg entries than memcopy would need.
      
      Big thanks to Alexander Potapenko for the non-KMSAN reproducer.
      
      v2:
       - take into account that msg->sg is a ring buffer (John).
      
      Link: https://lore.kernel.org/netdev/20191030160542.30295-1-jakub.kicinski@netronome.com/ (v1)
      
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Reported-by: syzbot+f8495bff23a879a6d0bd@syzkaller.appspotmail.com
      Reported-by: syzbot+6f50c99e8f6194bf363f@syzkaller.appspotmail.com
      Co-developed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      683916f6
    • John Hurley's avatar
      net: sched: prevent duplicate flower rules from tcf_proto destroy race · 59eb87cb
      John Hurley authored
      When a new filter is added to cls_api, the function
      tcf_chain_tp_insert_unique() looks up the protocol/priority/chain to
      determine if the tcf_proto is duplicated in the chain's hashtable. It then
      creates a new entry or continues with an existing one. In cls_flower, this
      allows the function fl_ht_insert_unque to determine if a filter is a
      duplicate and reject appropriately, meaning that the duplicate will not be
      passed to drivers via the offload hooks. However, when a tcf_proto is
      destroyed it is removed from its chain before a hardware remove hook is
      hit. This can lead to a race whereby the driver has not received the
      remove message but duplicate flows can be accepted. This, in turn, can
      lead to the offload driver receiving incorrect duplicate flows and out of
      order add/delete messages.
      
      Prevent duplicates by utilising an approach suggested by Vlad Buslov. A
      hash table per block stores each unique chain/protocol/prio being
      destroyed. This entry is only removed when the full destroy (and hardware
      offload) has completed. If a new flow is being added with the same
      identiers as a tc_proto being detroyed, then the add request is replayed
      until the destroy is complete.
      
      Fixes: 8b64678e ("net: sched: refactor tp insert/delete for concurrent execution")
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Reported-by: default avatarLouis Peens <louis.peens@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59eb87cb
  9. 05 Nov, 2019 1 commit
    • Ivan Khoronzhuk's avatar
      taprio: fix panic while hw offload sched list swap · 0763b3e8
      Ivan Khoronzhuk authored
      Don't swap oper and admin schedules too early, it's not correct and
      causes crash.
      
      Steps to reproduce:
      
      1)
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          num_tc 3 \
          map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
          queues 1@0 1@1 1@2 \
          base-time $SOME_BASE_TIME \
          sched-entry S 01 80000 \
          sched-entry S 02 15000 \
          sched-entry S 04 40000 \
          flags 2
      
      2)
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          base-time $SOME_BASE_TIME \
          sched-entry S 01 90000 \
          sched-entry S 02 20000 \
          sched-entry S 04 40000 \
          flags 2
      
      3)
      tc qdisc replace dev eth0 parent root handle 100 taprio \
          base-time $SOME_BASE_TIME \
          sched-entry S 01 150000 \
          sched-entry S 02 200000 \
          sched-entry S 04 40000 \
          flags 2
      
      Do 2 3 2 .. steps  more times if not happens and observe:
      
      [  305.832319] Unable to handle kernel write to read-only memory at
      virtual address ffff0000087ce7f0
      [  305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
      [  305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)
      
      [...]
      
      [  306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
      [  306.022422] Call trace:
      [  306.024866]  taprio_free_sched_cb+0x4c/0x98
      [  306.029040]  rcu_process_callbacks+0x25c/0x410
      [  306.033476]  __do_softirq+0x10c/0x208
      [  306.037132]  irq_exit+0xb8/0xc8
      [  306.040267]  __handle_domain_irq+0x64/0xb8
      [  306.044352]  gic_handle_irq+0x7c/0x178
      [  306.048092]  el1_irq+0xb0/0x128
      [  306.051227]  arch_cpu_idle+0x10/0x18
      [  306.054795]  do_idle+0x120/0x138
      [  306.058015]  cpu_startup_entry+0x20/0x28
      [  306.061931]  rest_init+0xcc/0xd8
      [  306.065154]  start_kernel+0x3bc/0x3e4
      [  306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
      [  306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
      [  306.079507] Kernel panic - not syncing: Fatal exception in interrupt
      [  306.085847] SMP: stopping secondary CPUs
      [  306.089765] Kernel Offset: disabled
      
      Try to explain one of the possible crash cases:
      
      The "real" admin list is assigned when admin_sched is set to
      new_admin, it happens after "swap", that assigns to oper_sched NULL.
      Thus if call qdisc show it can crash.
      
      Farther, next second time, when sched list is updated, the admin_sched
      is not NULL and becomes the oper_sched, previous oper_sched was NULL so
      just skipped. But then admin_sched is assigned new_admin, but schedules
      to free previous assigned admin_sched (that already became oper_sched).
      
      Farther, next third time, when sched list is updated,
      while one more swap, oper_sched is not null, but it was happy to be
      freed already (while prev. admin update), so while try to free
      oper_sched the kernel panic happens at taprio_free_sched_cb().
      
      So, move the "swap emulation" where it should be according to function
      comment from code.
      
      Fixes: 9c66d156 ("taprio: Add support for hardware offloading")
      Signed-off-by: default avatarIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Tested-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0763b3e8
  10. 04 Nov, 2019 7 commits