1. 28 Apr, 2017 14 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0e911788
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Just a couple more stragglers, I really hope this is it.
      
        1) Don't let frags slip down into the GRO segmentation handlers, from
           Steffen Klassert.
      
        2) Truesize under-estimation triggers warnings in TCP over loopback
           with socket filters, 2 part fix from Eric Dumazet.
      
        3) Fix undesirable reset of bonding MTU to ETH_HLEN on slave removal,
           from Paolo Abeni.
      
        4) If we flush the XFRM policy after garbage collection, it doesn't
           work because stray entries can be created afterwards. Fix from Xin
           Long.
      
        5) Hung socket connection fixes in TIPC from Parthasarathy Bhuvaragan.
      
        6) Fix GRO regression with IPSEC when netfilter is disabled, from
           Sabrina Dubroca.
      
        7) Fix cpsw driver Kconfig dependency regression, from Arnd Bergmann"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: hso: register netdev later to avoid a race condition
        net: adjust skb->truesize in ___pskb_trim()
        tcp: do not underestimate skb->truesize in tcp_trim_head()
        bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
        ipv4: Don't pass IP fragments to upper layer GRO handlers.
        cpsw/netcp: refine cpts dependency
        tipc: close the connection if protocol messages contain errors
        tipc: improve error validations for sockets in CONNECTING state
        tipc: Fix missing connection request handling
        xfrm: fix GRO for !CONFIG_NETFILTER
        xfrm: do the garbage collection after flushing policy
      0e911788
    • Andreas Kemnade's avatar
      net: hso: register netdev later to avoid a race condition · 4c761daf
      Andreas Kemnade authored
      
      
      If the netdev is accessed before the urbs are initialized,
      there will be NULL pointer dereferences. That is avoided by
      registering it when it is fully initialized.
      
      This case occurs e.g. if dhcpcd is running in the background
      and the device is probed, either after insmod hso or
      when the device appears on the usb bus.
      
      A backtrace is the following:
      
      [ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
      [ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
      [ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
      [ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
      [ 1357.574096] usb 1-2: Manufacturer: Option N.V.
      [ 1357.685882] hso 1-2:1.5: Not our interface
      [ 1460.886352] hso: unloaded
      [ 1460.889984] usbcore: deregistering interface driver hso
      [ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
      [ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 00000030
      [ 1513.887664] hso 1-2:1.5: Not our interface
      [ 1513.906890] usbcore: registered new interface driver hso
      [ 1513.937988] pgd = ecdec000
      [ 1513.949890] [00000030] *pgd=acd15831, *pte=00000000, *ppte=00000000
      [ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
      [ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007
      [ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso]
      [ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: G        W       4.11.0-rc8-letux+ #1
      [ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
      [ 1514.077758] task: ee748240 task.stack: ecdd6000
      [ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
      [ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
      [ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
      sp : ecdd7e20  ip : 00000000  fp : ffffffff
      [ 1514.105316] r10: 00000000  r9 : ed0e080c  r8 : ecd8fe2c
      [ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 : 00000000  r4 : ed0dbd80
      [ 1514.117706] r3 : 00000000  r2 : c0020c80  r1 : 00000000  r0 : ecdb7800
      [ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 00000051
      [ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
      [ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
      [ 1514.149078] 7e20: ed0dbd80 ecd8fe98 00000001 00000000 ecd8f800 ecd8fe00 ecd8fe60 00000000
      [ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 00000001 bf7a0258 ecd8f830 c068d958
      [ 1514.166320] 7e60: c068d8b8 ecd8f800 00000001 00001091 00001090 c068dba4 ecd8f800 00001090
      [ 1514.174926] 7e80: ecd8f940 ecd8f800 00000000 c068dc60 00000000 00000001 ed0e0800 ecd8f800
      [ 1514.183563] 7ea0: 00000000 c06feaa8 c0ca39c2 beea57dc 00000020 00000000 306f7368 00000000
      [ 1514.192169] 7ec0: 00000000 00000000 00001091 00000000 00000000 00000000 00000000 00008914
      [ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 00000006 00000000 00046858 c066a948
      [ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 00000006 c0282c90 0000c000 c0283654
      [ 1514.218017] 7f20: c09b0c00 c098bc31 00000001 c0c5e513 c0c5e513 00000000 c0151354 c01a20c0
      [ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 00000000 ee748240
      [ 1514.235260] 7f60: ee748734 00000000 ecc6b0c0 ecc6b0c0 beea57dc 00008914 00000006 00000000
      [ 1514.243896] 7f80: 00046858 c02837b0 00001091 0003a1f0 00046608 0003a248 00000036 c01071e4
      [ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 00000006 00008914 beea57dc 00001091
      [ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 00000036 0003ac0c 00046608 00046610 00046858
      [ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 00000006 00000000 00000000
      [ 1514.278411] [<bf79c304>] (hso_start_net_device [hso]) from [<bf79ced8>] (hso_net_open+0x68/0x84 [hso])
      [ 1514.288238] [<bf79ced8>] (hso_net_open [hso]) from [<c068d958>] (__dev_open+0xa0/0xf4)
      [ 1514.296600] [<c068d958>] (__dev_open) from [<c068dba4>] (__dev_change_flags+0x8c/0x130)
      [ 1514.305023] [<c068dba4>] (__dev_change_flags) from [<c068dc60>] (dev_change_flags+0x18/0x48)
      [ 1514.313934] [<c068dc60>] (dev_change_flags) from [<c06feaa8>] (devinet_ioctl+0x348/0x714)
      [ 1514.322540] [<c06feaa8>] (devinet_ioctl) from [<c066a948>] (sock_ioctl+0x2b0/0x308)
      [ 1514.330627] [<c066a948>] (sock_ioctl) from [<c0282c90>] (vfs_ioctl+0x20/0x34)
      [ 1514.338165] [<c0282c90>] (vfs_ioctl) from [<c0283654>] (do_vfs_ioctl+0x82c/0x93c)
      [ 1514.346038] [<c0283654>] (do_vfs_ioctl) from [<c02837b0>] (SyS_ioctl+0x4c/0x74)
      [ 1514.353759] [<c02837b0>] (SyS_ioctl) from [<c0107040>] (ret_fast_syscall+0x0/0x1c)
      [ 1514.361755] Code: e3822103 e3822080 e1822781 e5981014 (e5832030)
      [ 1514.510833] ---[ end trace dfb3e53c657f34a0 ]---
      Reported-by: default avatarH. Nikolaus Schaller <hns@goldelico.com>
      Signed-off-by: default avatarAndreas Kemnade <andreas@kemnade.info>
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c761daf
    • Eric Dumazet's avatar
      net: adjust skb->truesize in ___pskb_trim() · c21b48cc
      Eric Dumazet authored
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket.
      
      As we did recently in commit 158f323b
      
       ("net: adjust skb->truesize in
      pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
      via a call to skb_condense().
      
      If all frags were freed, then skb->truesize can be recomputed.
      
      This call can be done if skb is not yet owned, or destructor is
      sock_edemux().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c21b48cc
    • Eric Dumazet's avatar
      tcp: do not underestimate skb->truesize in tcp_trim_head() · 7162fb24
      Eric Dumazet authored
      
      
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket over loopback interface.
      
      I believe one issue with looped skbs is that tcp_trim_head() can end up
      producing skb with under estimated truesize.
      
      It hardly matters for normal conditions, since packets sent over
      loopback are never truncated.
      
      Bytes trimmed from skb->head should not change skb truesize, since
      skb->head is not reallocated.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7162fb24
    • Paolo Abeni's avatar
      bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal · 19cdead3
      Paolo Abeni authored
      On slave list updates, the bonding driver computes its hard_header_len
      as the maximum of all enslaved devices's hard_header_len.
      If the slave list is empty, e.g. on last enslaved device removal,
      ETH_HLEN is used.
      
      Since the bonding header_ops are set only when the first enslaved
      device is attached, the above can lead to header_ops->create()
      being called with the wrong skb headroom in place.
      
      If bond0 is configured on top of ipoib devices, with the
      following commands:
      
      ifup bond0
      for slave in $BOND_SLAVES_LIST; do
      	ip link set dev $slave nomaster
      done
      ping -c 1 <ip on bond0 subnet>
      
      we will obtain a skb_under_panic() with a similar call trace:
      	skb_push+0x3d/0x40
      	push_pseudo_header+0x17/0x30 [ib_ipoib]
      	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
      	arp_create+0x12f/0x220
      	arp_send_dst.part.19+0x28/0x50
      	arp_solicit+0x115/0x290
      	neigh_probe+0x4d/0x70
      	__neigh_event_send+0xa7/0x230
      	neigh_resolve_output+0x12e/0x1c0
      	ip_finish_output2+0x14b/0x390
      	ip_finish_output+0x136/0x1e0
      	ip_output+0x76/0xe0
      	ip_local_out+0x35/0x40
      	ip_send_skb+0x19/0x40
      	ip_push_pending_frames+0x33/0x40
      	raw_sendmsg+0x7d3/0xb50
      	inet_sendmsg+0x31/0xb0
      	sock_sendmsg+0x38/0x50
      	SYSC_sendto+0x102/0x190
      	SyS_sendto+0xe/0x10
      	do_syscall_64+0x67/0x180
      	entry_SYSCALL64_slow_path+0x25/0x25
      
      This change addresses the issue avoiding updating the bonding device
      hard_header_len when the slaves list become empty, forbidding to
      shrink it below the value used by header_ops->create().
      
      The bug is there since commit 54ef3137 ("[PATCH] bonding: Handle large
      hard_header_len") but the panic can be triggered only since
      commit fc791b63
      
       ("IB/ipoib: move back IB LL address into the hard
      header").
      Reported-by: default avatarNorbert P <noe@physik.uzh.ch>
      Fixes: 54ef3137 ("[PATCH] bonding: Handle large hard_header_len")
      Fixes: fc791b63
      
       ("IB/ipoib: move back IB LL address into the hard header")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19cdead3
    • Steffen Klassert's avatar
      ipv4: Don't pass IP fragments to upper layer GRO handlers. · 9b83e031
      Steffen Klassert authored
      Upper layer GRO handlers can not handle IP fragments, so
      exit GRO processing in this case.
      
      This fixes ESP GRO because the packet must be reassembled
      before we can decapsulate, otherwise we get authentication
      failures.
      
      It also aligns IPv4 to IPv6 where packets with fragmentation
      headers are not passed to upper layer GRO handlers.
      
      Fixes: 7785bba2
      
       ("esp: Add a software GRO codepath")
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b83e031
    • Arnd Bergmann's avatar
      cpsw/netcp: refine cpts dependency · 504926df
      Arnd Bergmann authored
      Tony Lindgren reports a kernel oops that resulted from my compile-time
      fix on the default config. This shows two problems:
      
      a) configurations that did not already enable PTP_1588_CLOCK will
         now miss the cpts driver
      
      b) when cpts support is disabled, the driver crashes. This is a
         preexisting problem that we did not notice before my patch.
      
      While the second problem is still being investigated, this modifies
      the dependencies again, getting us back to the original state, with
      another 'select NET_PTP_CLASSIFY' added in to avoid the original
      link error we got, and the 'depends on POSIX_TIMERS' to hide
      the CPTS support when turning it on would be useless.
      
      Cc: stable@vger.kernel.org # 4.11 needs this
      Fixes: 07fef362
      
       ("cpsw/netcp: cpts depends on posix_timers")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Tested-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      504926df
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 5577e679
      David S. Miller authored
      
      
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2017-04-28
      
      1) Do garbage collecting after a policy flush to remove old
         bundles immediately. From Xin Long.
      
      2) Fix GRO if netfilter is not defined.
         From Sabrina Dubroca.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5577e679
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · affb852d
      Linus Torvalds authored
      Pull input fix from Dmitry Torokhov:
       "Yet another quirk to i8042 to get touchpad recognized on some laptops"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: i8042 - add Clevo P650RS to the i8042 reset list
      affb852d
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 28b20135
      Linus Torvalds authored
      Pull btrfs fix from Chris Mason:
       "We have one more fix for btrfs.
      
        This gets rid of a new WARN_ON from rc1 that ended up making more
        noise than we really want. The larger fix for the underflow got
        delayed a bit and it's better for now to put it under
        CONFIG_BTRFS_DEBUG"
      
      * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: qgroup: move noisy underflow warning to debugging build
      28b20135
    • David S. Miller's avatar
      Merge branch 'tipc-socket-connection-hangs' · c5184717
      David S. Miller authored
      
      
      Parthasarathy Bhuvaragan says:
      
      ====================
      tipc: fix hanging socket connections
      
      This patch series contains fixes for the socket layer to
      prevent hanging / stale connections.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5184717
    • Parthasarathy Bhuvaragan's avatar
      tipc: close the connection if protocol messages contain errors · c1be7756
      Parthasarathy Bhuvaragan authored
      
      
      When a socket is shutting down, we notify the peer node about the
      connection termination by reusing an incoming message if possible.
      If the last received message was a connection acknowledgment
      message, we reverse this message and set the error code to
      TIPC_ERR_NO_PORT and send it to peer.
      
      In tipc_sk_proto_rcv(), we never check for message errors while
      processing the connection acknowledgment or probe messages. Thus
      this message performs the usual flow control accounting and leaves
      the session hanging.
      
      In this commit, we terminate the connection when we receive such
      error messages.
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1be7756
    • Parthasarathy Bhuvaragan's avatar
      tipc: improve error validations for sockets in CONNECTING state · 4e0df495
      Parthasarathy Bhuvaragan authored
      
      
      Until now, the checks for sockets in CONNECTING state was based on
      the assumption that the incoming message was always from the
      peer's accepted data socket.
      
      However an application using a non-blocking socket sends an implicit
      connect, this socket which is in CONNECTING state can receive error
      messages from the peer's listening socket. As we discard these
      messages, the application socket hangs as there due to inactivity.
      In addition to this, there are other places where we process errors
      but do not notify the user.
      
      In this commit, we process such incoming error messages and notify
      our users about them using sk_state_change().
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e0df495
    • Parthasarathy Bhuvaragan's avatar
      tipc: Fix missing connection request handling · 42b531de
      Parthasarathy Bhuvaragan authored
      
      
      In filter_connect, we use waitqueue_active() to check for any
      connections to wakeup. But waitqueue_active() is missing memory
      barriers while accessing the critical sections, leading to
      inconsistent results.
      
      In this commit, we replace this with an SMP safe wq_has_sleeper()
      using the generic socket callback sk_data_ready().
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42b531de
  2. 27 Apr, 2017 6 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux · 8b5d11e4
      Linus Torvalds authored
      Pull nfsd fixes from Bruce Fields:
       "Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs
        in our NFSv2/v3 xdr code that could crash the server or leak memory"
      
      * tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux:
        nfsd: stricter decoding of write-like NFSv2/v3 ops
        nfsd4: minor NFSv2/v3 write decoding cleanup
        nfsd: check for oversized NFSv2/v3 arguments
      8b5d11e4
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client · 19ac4474
      Linus Torvalds authored
      Pull ceph fix from Ilya Dryomov:
       "A fix for a kernel stack overflow bug in ceph setattr code, marked for
        stable"
      
      * tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client:
        ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
      19ac4474
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f56fc7bd
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
      
       - fix orangefs handling of faults on write() - I'd missed that one back
         when orangefs was going through review.
      
       - readdir counterpart of "9p: cope with bogus responses from server in
         p9_client_{read,write}" - server might be lying or broken, and we'd
         better not overrun the kmalloc'ed buffer we are copying the results
         into.
      
       - NFS O_DIRECT read/write can leave iov_iter advanced by too much;
         that's what had been causing iov_iter_pipe() warnings davej had been
         seeing.
      
       - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should
         go in before 4.11.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
        fix nfs O_DIRECT advancing iov_iter too much
        p9_client_readdir() fix
        orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
      f56fc7bd
    • Michael Kerrisk (man-pages)'s avatar
      statx: correct error handling of NULL pathname · 59372bbf
      Michael Kerrisk (man-pages) authored
      The change in commit 1e2f82d1 ("statx: Kill fd-with-NULL-path
      support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
      statx() is inconsistent.
      
      It results in the error EINVAL for a NULL pathname.  Other system calls
      with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.
      
      The solution is simply to remove the EINVAL check.  As I already pointed
      out in [1], user_path_at*() and filename_lookup() will handle the NULL
      pathname as per the other APIs, to correctly produce the error EFAULT.
      
      [1] https://lkml.org/lkml/2017/4/26/561
      
      Signed-off-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Sandeen <sandeen@sandeen.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59372bbf
    • Sabrina Dubroca's avatar
      xfrm: fix GRO for !CONFIG_NETFILTER · cfcf99f9
      Sabrina Dubroca authored
      In xfrm_input() when called from GRO, async == 0, and we end up
      skipping the processing in xfrm4_transport_finish(). GRO path will
      always skip the NF_HOOK, so we don't need the special-case for
      !NETFILTER during GRO processing.
      
      Fixes: 7785bba2
      
       ("esp: Add a software GRO codepath")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      cfcf99f9
    • Dmitry V. Levin's avatar
      uapi: change the type of struct statx_timestamp.tv_nsec to unsigned · 1741937d
      Dmitry V. Levin authored
      The comment asserting that the value of struct statx_timestamp.tv_nsec
      must be negative when statx_timestamp.tv_sec is negative, is wrong, as
      could be seen from the following example:
      
      	#define _FILE_OFFSET_BITS 64
      	#include <assert.h>
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      	#include <linux/stat.h>
      
      	int main(void)
      	{
      		static const struct timespec ts[2] = {
      			{ .tv_nsec = UTIME_OMIT },
      			{ .tv_sec = -2, .tv_nsec = 42 }
      		};
      		assert(utimensat(AT_FDCWD, ".", ts, 0) == 0);
      
      		struct stat st;
      		assert(stat(".", &st) == 0);
      		printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n",
      		       (long long) st.st_mtim.tv_sec,
      		       (unsigned long) st.st_mtim.tv_nsec);
      
      		struct statx stx;
      		assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0);
      		printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n",
      		       (long long) stx.stx_mtime.tv_sec,
      		       (unsigned long) stx.stx_mtime.tv_nsec);
      
      		return 0;
      	}
      
      It expectedly prints:
      st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42
      stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42
      
      The more generic comment asserting that the value of struct
      statx_timestamp.tv_nsec might be negative is confusing to say the least.
      
      It contradicts both the struct stat.st_[acm]time_nsec tradition and
      struct timespec.tv_nsec requirements in utimensat syscall.
      If statx syscall ever returns a stx_[acm]time containing a negative
      tv_nsec that cannot be passed unmodified to utimensat syscall,
      it will cause an immense confusion.
      
      Fix this source of confusion by changing the type of struct
      statx_timestamp.tv_nsec from __s32 to __u32.
      
      Fixes: a528d35e
      
       ("statx: Add a system call to make enhanced file info available")
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-api@vger.kernel.org
      cc: mtk.manpages@gmail.com
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1741937d
  3. 26 Apr, 2017 12 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · f8324608
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "I didn't want the release to go out without the statx system call
        properly hooked up"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: Update syscall tables.
        sparc64: Fill in rest of HAVE_REGS_AND_STACK_ACCESS_API
      f8324608
    • David Howells's avatar
      statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH · 1e2f82d1
      David Howells authored
      With the new statx() syscall, the following both allow the attributes of
      the file attached to a file descriptor to be retrieved:
      
      	statx(dfd, NULL, 0, ...);
      
      and:
      
      	statx(dfd, "", AT_EMPTY_PATH, ...);
      
      Change the code to reject the first option, though this means copying
      the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
      non-directory provided path is "".
      
      [ The timing of this isn't wonderful, but applying this now before we
        have statx() in any released kernel, before anybody starts using the
        NULL special case.    - Linus ]
      
      Fixes: a528d35e
      
       ("statx: Add a system call to make enhanced file info available")
      Reported-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Eric Sandeen <sandeen@sandeen.net>
      cc: fstests@vger.kernel.org
      cc: linux-api@vger.kernel.org
      cc: linux-man@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e2f82d1
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · fc08b197
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) MLX5 bug fixes from Saeed Mahameed et al:
           - released wrong resources when firmware timeout happens
           - fix wrong check for encapsulation size limits
           - UAR memory leak
           - ETHTOOL_GRXCLSRLALL failed to fill in info->data
      
       2) Don't cache l3mdev on mis-matches local route, causes net devices to
          leak refs. From Robert Shearman.
      
       3) Handle fragmented SKBs properly in macsec driver, the problem is
          that we were mis-sizing the sgvec table. From Jason A. Donenfeld.
      
       4) We cannot have checksum offload enabled for inner UDP tunneled
          packet during IPSEC, from Ansis Atteka.
      
       5) Fix double SKB free in ravb driver, from Dan Carpenter.
      
       6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli.
      
       7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver,
          from Maksim Salau.
      
       8) Fix device leak in macvlan driver, from Herbert Xu. We have to purge
          the broadcast queue properly on port destroy.
      
       9) Fix tx ring entry limit on EF10 devices in sfc driver. From Bert
          Kenward.
      
      10) Fix memory leaks in team driver, from Pan Bian.
      
      11) Don't setup ipv6_stub before it can be actually used, from Paolo
          Abeni.
      
      12) Fix tipc socket flow control accounting, from Parthasarathy
          Bhuvaragan.
      
      13) Fix crash on module unload in hso driver, from Andreas Kemnade.
      
      14) Fix purging of bridge multicast entries, the problem is that if we
          don't defer it to ndo_uninit it's possible for new entries to get
          added after we purge. Fix from Xin Long.
      
      15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander
          Potapenko.
      
      16) Fix autoneg stall properly in PHY layer, and revert micrel driver
          change that was papering over it. From Alexander Kochetkov.
      
      17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel
          code, from Cong Wang.
      
      18) Clear out the congestion control private of the TCP socket in all of
          the right places, from Wei Wang.
      
      19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie
          Bainbridge.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
        ipv6: check raw payload size correctly in ioctl
        tcp: memset ca_priv data to 0 properly
        ipv6: check skb->protocol before lookup for nexthop
        net: core: Prevent from dereferencing null pointer when releasing SKB
        macsec: dynamically allocate space for sglist
        Revert "phy: micrel: Disable auto negotiation on startup"
        net: phy: fix auto-negotiation stall due to unavailable interrupt
        net/packet: check length in getsockopt() called with PACKET_HDRLEN
        net: ipv6: regenerate host route if moved to gc list
        bridge: move bridge multicast cleanup to ndo_uninit
        ipv6: fix source routing
        qed: Fix error in the dcbx app meta data initialization.
        netvsc: fix calculation of available send sections
        net: hso: fix module unloading
        tipc: fix socket flow control accounting error at tipc_recv_stream
        tipc: fix socket flow control accounting error at tipc_send_stream
        ipv6: move stub initialization after ipv6 setup completion
        team: fix memory leaks
        sfc: tx ring can only have 2048 entries for all EF10 NICs
        macvlan: Fix device ref leak when purging bc_queue
        ...
      fc08b197
    • Jamie Bainbridge's avatar
      ipv6: check raw payload size correctly in ioctl · 105f5528
      Jamie Bainbridge authored
      
      
      In situations where an skb is paged, the transport header pointer and
      tail pointer can be the same because the skb contents are in frags.
      
      This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
      length of 0 when the length to receive is actually greater than zero.
      
      skb->len is already correctly set in ip6_input_finish() with
      pskb_pull(), so use skb->len as it always returns the correct result
      for both linear and paged data.
      Signed-off-by: default avatarJamie Bainbridge <jbainbri@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      105f5528
    • Wei Wang's avatar
      tcp: memset ca_priv data to 0 properly · c1201444
      Wei Wang authored
      Always zero out ca_priv data in tcp_assign_congestion_control() so that
      ca_priv data is cleared out during socket creation.
      Also always zero out ca_priv data in tcp_reinit_congestion_control() so
      that when cc algorithm is changed, ca_priv data is cleared out as well.
      We should still zero out ca_priv data even in TCP_CLOSE state because
      user could call connect() on AF_UNSPEC to disconnect the socket and
      leave it in TCP_CLOSE state and later call setsockopt() to switch cc
      algorithm on this socket.
      
      Fixes: 2b0a8c9e
      
       ("tcp: add CDG congestion control")
      Reported-by: default avatarAndrey Konovalov  <andreyknvl@google.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1201444
    • WANG Cong's avatar
      ipv6: check skb->protocol before lookup for nexthop · 199ab00f
      WANG Cong authored
      Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
      is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
      neigh key as an IPv6 address:
      
              neigh = dst_neigh_lookup(skb_dst(skb),
                                       &ipv6_hdr(skb)->daddr);
              if (!neigh)
                      goto tx_err_link_failure;
      
              addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
              addr_type = ipv6_addr_type(addr6);
      
              if (addr_type == IPV6_ADDR_ANY)
                      addr6 = &ipv6_hdr(skb)->daddr;
      
              memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
      
      Also the network header of the skb at this point should be still IPv4
      for 4in6 tunnels, we shold not just use it as IPv6 header.
      
      This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
      is, we are safe to do the nexthop lookup using skb_dst() and
      ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
      dest address we can pick here, we have to rely on callers to fill it
      from tunnel config, so just fall to ip6_route_output() to make the
      decision.
      
      Fixes: ea3dc960
      
       ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      199ab00f
    • Myungho Jung's avatar
      net: core: Prevent from dereferencing null pointer when releasing SKB · 9899886d
      Myungho Jung authored
      Added NULL check to make __dev_kfree_skb_irq consistent with kfree
      family of functions.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
      
      Signed-off-by: default avatarMyungho Jung <mhjungk@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9899886d
    • Jason A. Donenfeld's avatar
      macsec: dynamically allocate space for sglist · 5294b830
      Jason A. Donenfeld authored
      We call skb_cow_data, which is good anyway to ensure we can actually
      modify the skb as such (another error from prior). Now that we have the
      number of fragments required, we can safely allocate exactly that amount
      of memory.
      
      Fixes: c09440f7
      
       ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5294b830
    • David S. Miller's avatar
      Revert "phy: micrel: Disable auto negotiation on startup" · b43bd728
      David S. Miller authored
      This reverts commit 99f81afc.
      
      It was papering over the real problem, which is fixed by commit
      f555f34f
      
       ("net: phy: fix auto-negotiation stall due to unavailable
      interrupt")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b43bd728
    • Alexander Kochetkov's avatar
      net: phy: fix auto-negotiation stall due to unavailable interrupt · f555f34f
      Alexander Kochetkov authored
      The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
      cable was plugged before the Ethernet interface was brought up.
      
      The patch trigger PHY state machine to update link state if PHY was requested to
      do auto-negotiation and auto-negotiation complete flag already set.
      
      During power-up cycle the PHY do auto-negotiation, generate interrupt and set
      auto-negotiation complete flag. Interrupt is handled by PHY state machine but
      doesn't update link state because PHY is in PHY_READY state. After some time
      MAC bring up, start and request PHY to do auto-negotiation. If there are no new
      settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
      PHY continue to stay in auto-negotiation complete state and doesn't fire
      interrupt. At the same time PHY state machine expect that PHY started
      auto-negotiation and is waiting for interrupt from PHY and it won't get it.
      
      Fixes: 321beec5
      
       ("net: phy: Use interrupts when available in NOLINK state")
      Signed-off-by: default avatarAlexander Kochetkov <al.kochet@gmail.com>
      Cc: stable <stable@vger.kernel.org> # v4.9+
      Tested-by: default avatarRoger Quadros <rogerq@ti.com>
      Tested-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f555f34f
    • Linus Torvalds's avatar
      Merge tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · ea3a8596
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Since we got a bonus week, let me try to screw a few pending fixes.
      
        A slightly large fix is the locking fix in ASoC STI driver, but it's
        pretty board-specific, and the risk is fairly low.
      
        All the rest are small / trivial fixes, mostly marked as stable, for
        ALSA sequencer core, ASoC topology, ASoC Intel bytcr and Firewire
        drivers"
      
      * tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ASoC: intel: Fix PM and non-atomic crash in bytcr drivers
        ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type
        ALSA: seq: Don't break snd_use_lock_sync() loop by timeout
        ASoC: topology: Fix to store enum text values
        ASoC: STI: Fix null ptr deference in IRQ handler
        ALSA: oxfw: fix regression to handle Stanton SCS.1m/1d
      ea3a8596
    • Xin Long's avatar
      xfrm: do the garbage collection after flushing policy · 35db0691
      Xin Long authored
      
      
      Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
      These is no reason not to do it after flushing policies, especially
      considering that 'garbage collection deferred' is only triggered
      when it reaches gc_thresh.
      
      It's no good that the policy is gone but the xdst still hold there.
      The worse thing is that xdst->route/orig_dst is also hold and can
      not be released even if the orig_dst is already expired.
      
      This patch is to do the garbage collection if there is any policy
      removed in xfrm_policy_flush.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      35db0691
  4. 25 Apr, 2017 8 commits
    • Linus Torvalds's avatar
      Merge tag 'arc-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · ea839b41
      Linus Torvalds authored
      Pull ARC fix from Vineet Gupta:
       "Last minute fixes for ARC:
      
         - build error in Mellanox nps platform
      
         - addressing lack of saving FPU regs in releavnt configs"
      
      * tag 'arc-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARCv2: entry: save Accumulator register pair (r58:59) if present
        ARC: [plat-eznps] Fix build error
      ea839b41
    • J. Bruce Fields's avatar
      nfsd: stricter decoding of write-like NFSv2/v3 ops · 13bf9fbf
      J. Bruce Fields authored
      
      
      The NFSv2/v3 code does not systematically check whether we decode past
      the end of the buffer.  This generally appears to be harmless, but there
      are a few places where we do arithmetic on the pointers involved and
      don't account for the possibility that a length could be negative.  Add
      checks to catch these.
      Reported-by: default avatarTuomas Haanpää <thaan@synopsys.com>
      Reported-by: default avatarAri Kauppi <ari@synopsys.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      13bf9fbf
    • J. Bruce Fields's avatar
      nfsd4: minor NFSv2/v3 write decoding cleanup · db44bac4
      J. Bruce Fields authored
      
      
      Use a couple shortcuts that will simplify a following bugfix.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      db44bac4
    • J. Bruce Fields's avatar
      nfsd: check for oversized NFSv2/v3 arguments · e6838a29
      J. Bruce Fields authored
      
      
      A client can append random data to the end of an NFSv2 or NFSv3 RPC call
      without our complaining; we'll just stop parsing at the end of the
      expected data and ignore the rest.
      
      Encoded arguments and replies are stored together in an array of pages,
      and if a call is too large it could leave inadequate space for the
      reply.  This is normally OK because NFS RPC's typically have either
      short arguments and long replies (like READ) or long arguments and short
      replies (like WRITE).  But a client that sends an incorrectly long reply
      can violate those assumptions.  This was observed to cause crashes.
      
      Also, several operations increment rq_next_page in the decode routine
      before checking the argument size, which can leave rq_next_page pointing
      well past the end of the page array, causing trouble later in
      svc_free_pages.
      
      So, following a suggestion from Neil Brown, add a central check to
      enforce our expectation that no NFSv2/v3 call has both a large call and
      a large reply.
      
      As followup we may also want to rewrite the encoding routines to check
      more carefully that they aren't running off the end of the page array.
      
      We may also consider rejecting calls that have any extra garbage
      appended.  That would be safer, and within our rights by spec, but given
      the age of our server and the NFS protocol, and the fact that we've
      never enforced this before, we may need to balance that against the
      possibility of breaking some oddball client.
      Reported-by: default avatarTuomas Haanpää <thaan@synopsys.com>
      Reported-by: default avatarAri Kauppi <ari@synopsys.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      e6838a29
    • Yan, Zheng's avatar
      ceph: fix recursion between ceph_set_acl() and __ceph_setattr() · 8179a101
      Yan, Zheng authored
      ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
      to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
      then calls posix_acl_chmod().
      
      The problem is that __ceph_setattr() calls posix_acl_chmod() before
      sending the setattr request. The get_acl() call in posix_acl_chmod()
      can trigger a getxattr request. The reply of the getxattr request
      can restore inode's i_mode to its old value. The set_acl() call in
      posix_acl_chmod() sees old value of inode's i_mode, so it calls
      __ceph_setattr() again.
      
      Cc: stable@vger.kernel.org # needs backporting for < 4.9
      Link: http://tracker.ceph.com/issues/19688
      
      Reported-by: default avatarJerry Lee <leisurelysw24@gmail.com>
      Signed-off-by: default avatar"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Tested-by: default avatarLuis Henriques <lhenriques@suse.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      8179a101
    • Alexander Potapenko's avatar
      net/packet: check length in getsockopt() called with PACKET_HDRLEN · fd2c83b3
      Alexander Potapenko authored
      
      
      In the case getsockopt() is called with PACKET_HDRLEN and optlen < 4
      |val| remains uninitialized and the syscall may behave differently
      depending on its value, and even copy garbage to userspace on certain
      architectures. To fix this we now return -EINVAL if optlen is too small.
      
      This bug has been detected with KMSAN.
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd2c83b3
    • David Ahern's avatar
      net: ipv6: regenerate host route if moved to gc list · 8048ced9
      David Ahern authored
      Taking down the loopback device wreaks havoc on IPv6 routing. By
      extension, taking down a VRF device wreaks havoc on its table.
      
      Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
      FIB code while running syzkaller fuzzer. The root cause is a dead dst
      that is on the garbage list gets reinserted into the IPv6 FIB. While on
      the gc (or perhaps when it gets added to the gc list) the dst->next is
      set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
      out-of-bounds access.
      
      Andrey's reproducer was the key to getting to the bottom of this.
      
      With IPv6, host routes for an address have the dst->dev set to the
      loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
      a walk of the fib evicting routes with the 'lo' device which means all
      host routes are removed. That process moves the dst which is attached to
      an inet6_ifaddr to the gc list and marks it as dead.
      
      The recent change to keep global IPv6 addresses added a new function,
      fixup_permanent_addr, that is called on admin up. That function restarts
      dad for an inet6_ifaddr and when it completes the host route attached
      to it is inserted into the fib. Since the route was marked dead and
      moved to the gc list, re-inserting the route causes the reported
      out-of-bounds accesses. If the device with the address is taken down
      or the address is removed, the WARN_ON in fib6_del is triggered.
      
      All of those faults are fixed by regenerating the host route if the
      existing one has been moved to the gc list, something that can be
      determined by checking if the rt6i_ref counter is 0.
      
      Fixes: f1705ec1
      
       ("net: ipv6: Make address flushing on ifdown optional")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8048ced9
    • Xin Long's avatar
      bridge: move bridge multicast cleanup to ndo_uninit · b1b9d366
      Xin Long authored
      During removing a bridge device, if the bridge is still up, a new mdb entry
      still can be added in br_multicast_add_group() after all mdb entries are
      removed in br_multicast_dev_del(). Like the path:
      
        mld_ifc_timer_expire ->
          mld_sendpack -> ...
            br_multicast_rcv ->
              br_multicast_add_group
      
      The new mp's timer will be set up. If the timer expires after the bridge
      is freed, it may cause use-after-free panic in br_multicast_group_expired.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      IP: [<ffffffffa07ed2c8>] br_multicast_group_expired+0x28/0xb0 [bridge]
      Call Trace:
       <IRQ>
       [<ffffffff81094536>] call_timer_fn+0x36/0x110
       [<ffffffffa07ed2a0>] ? br_mdb_free+0x30/0x30 [bridge]
       [<ffffffff81096967>] run_timer_softirq+0x237/0x340
       [<ffffffff8108dcbf>] __do_softirq+0xef/0x280
       [<ffffffff8169889c>] call_softirq+0x1c/0x30
       [<ffffffff8102c275>] do_softirq+0x65/0xa0
       [<ffffffff8108e055>] irq_exit+0x115/0x120
       [<ffffffff81699515>] smp_apic_timer_interrupt+0x45/0x60
       [<ffffffff81697a5d>] apic_timer_interrupt+0x6d/0x80
      
      Nikolay also found it would cause a memory leak - the mdb hash is
      reallocated and not freed due to the mdb rehash.
      
      unreferenced object 0xffff8800540ba800 (size 2048):
        backtrace:
          [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
          [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
          [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
          [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
          [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
          [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
          [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
          [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
          [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
          [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
          [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
          [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
          [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
          [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
          [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
          [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
      
      This could happen when ip link remove a bridge or destroy a netns with a
      bridge device inside.
      
      With Nikolay's suggestion, this patch is to clean up bridge multicast in
      ndo_uninit after bridge dev is shutdown, instead of br_dev_delete, so
      that netif_running check in br_multicast_add_group can avoid this issue.
      
      v1->v2:
        - fix this issue by moving br_multicast_dev_del to ndo_uninit, instead
          of calling dev_close in br_dev_delete.
      
      (NOTE: Depends upon b6fe0440 ("bridge: implement missing ndo_uninit()"))
      
      Fixes: e10177ab
      
       ("bridge: multicast: fix handling of temp and perm entries")
      Reported-by: default avatarJianwen Ji <jiji@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1b9d366