Skip to content
Snippets Groups Projects
  1. Feb 27, 2020
    • Stefano Garzarella's avatar
      vsock: fix potential deadlock in transport->release() · 3f74957f
      Stefano Garzarella authored
      
      Some transports (hyperv, virtio) acquire the sock lock during the
      .release() callback.
      
      In the vsock_stream_connect() we call vsock_assign_transport(); if
      the socket was previously assigned to another transport, the
      vsk->transport->release() is called, but the sock lock is already
      held in the vsock_stream_connect(), causing a deadlock reported by
      syzbot:
      
          INFO: task syz-executor280:9768 blocked for more than 143 seconds.
            Not tainted 5.6.0-rc1-syzkaller #0
          "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          syz-executor280 D27912  9768   9766 0x00000000
          Call Trace:
           context_switch kernel/sched/core.c:3386 [inline]
           __schedule+0x934/0x1f90 kernel/sched/core.c:4082
           schedule+0xdc/0x2b0 kernel/sched/core.c:4156
           __lock_sock+0x165/0x290 net/core/sock.c:2413
           lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
           virtio_transport_release+0xc4/0xd60 net/vmw_vsock/virtio_transport_common.c:832
           vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
           vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
           __sys_connect_file+0x161/0x1c0 net/socket.c:1857
           __sys_connect+0x174/0x1b0 net/socket.c:1874
           __do_sys_connect net/socket.c:1885 [inline]
           __se_sys_connect net/socket.c:1882 [inline]
           __x64_sys_connect+0x73/0xb0 net/socket.c:1882
           do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      To avoid this issue, this patch remove the lock acquiring in the
      .release() callback of hyperv and virtio transports, and it holds
      the lock when we call vsk->transport->release() in the vsock core.
      
      Reported-by: default avatar <syzbot+731710996d79d0d58fbc@syzkaller.appspotmail.com>
      Fixes: 408624af ("vsock: use local transport when it is loaded")
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f74957f
    • David S. Miller's avatar
      unix: It's CONFIG_PROC_FS not CONFIG_PROCFS · 5c05a164
      David S. Miller authored
      
      Fixes: 3a12500e ("unix: define and set show_fdinfo only if procfs is enabled")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c05a164
    • Tobias Klauser's avatar
      unix: define and set show_fdinfo only if procfs is enabled · 3a12500e
      Tobias Klauser authored
      
      Follow the pattern used with other *_show_fdinfo functions and only
      define unix_show_fdinfo and set it in proto_ops if CONFIG_PROCFS
      is set.
      
      Fixes: 3c32da19 ("unix: Show number of pending scm files of receive queue in fdinfo")
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Reviewed-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a12500e
    • Karsten Graul's avatar
      net/smc: check for valid ib_client_data · a2f2ef4a
      Karsten Graul authored
      
      In smc_ib_remove_dev() check if the provided ib device was actually
      initialized for SMC before.
      
      Reported-by: default avatar <syzbot+84484ccebdd4e5451d91@syzkaller.appspotmail.com>
      Fixes: a4cf0443 ("smc: introduce SMC as an IB-client")
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2f2ef4a
    • Paolo Abeni's avatar
      mptcp: add dummy icsk_sync_mss() · dc24f8b4
      Paolo Abeni authored
      
      syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
      callback, and was able to trigger a null pointer dereference:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 8e171067 P4D 8e171067 PUD 93fa2067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc900020b7b80 EFLAGS: 00010246
      RAX: 1ffff110124ba600 RBX: 0000000000000000 RCX: ffff88809fefa600
      RDX: ffff8880994cdb18 RSI: 0000000000000000 RDI: ffff8880925d3140
      RBP: ffffc900020b7bd8 R08: ffffffff870225be R09: fffffbfff140652a
      R10: fffffbfff140652a R11: 0000000000000000 R12: ffff8880925d35d0
      R13: ffff8880925d3140 R14: dffffc0000000000 R15: 1ffff110124ba6ba
      FS:  0000000001a0b880(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a6d6f000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
       netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
       smack_netlabel security/smack/smack_lsm.c:2425 [inline]
       smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
       security_inode_setsecurity+0xb2/0x140 security/security.c:1364
       __vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
       vfs_setxattr fs/xattr.c:224 [inline]
       setxattr+0x335/0x430 fs/xattr.c:451
       __do_sys_fsetxattr fs/xattr.c:506 [inline]
       __se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
       __x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
       do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440199
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffcadc19e48 EFLAGS: 00000246 ORIG_RAX: 00000000000000be
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440199
      RDX: 0000000020000200 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000003 R09: 00000000004002c8
      R10: 0000000000000009 R11: 0000000000000246 R12: 0000000000401a20
      R13: 0000000000401ab0 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      CR2: 0000000000000000
      
      Address the issue adding a dummy icsk_sync_mss callback.
      To properly sync the subflows mss and options list we need some
      additional infrastructure, which will land to net-next.
      
      Reported-by: default avatar <syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com>
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc24f8b4
    • Eric Dumazet's avatar
      ipv6: restrict IPV6_ADDRFORM operation · b6f61189
      Eric Dumazet authored
      
      IPV6_ADDRFORM is able to transform IPv6 socket to IPv4 one.
      While this operation sounds illogical, we have to support it.
      
      One of the things it does for TCP socket is to switch sk->sk_prot
      to tcp_prot.
      
      We now have other layers playing with sk->sk_prot, so we should make
      sure to not interfere with them.
      
      This patch makes sure sk_prot is the default pointer for TCP IPv6 socket.
      
      syzbot reported :
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD a0113067 P4D a0113067 PUD a8771067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 10686 Comm: syz-executor.0 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       inet_release+0x165/0x1c0 net/ipv4/af_inet.c:427
       __sock_release net/socket.c:605 [inline]
       sock_close+0xe1/0x260 net/socket.c:1283
       __fput+0x2e4/0x740 fs/file_table.c:280
       ____fput+0x15/0x20 fs/file_table.c:313
       task_work_run+0x176/0x1b0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:188 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
       prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
       syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
       do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x45c429
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2ae75dac78 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: 0000000000000000 RBX: 00007f2ae75db6d4 RCX: 000000000045c429
      RDX: 0000000000000001 RSI: 000000000000011a RDI: 0000000000000004
      RBP: 000000000076bf20 R08: 0000000000000038 R09: 0000000000000000
      R10: 0000000020000180 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000000a9d R14: 00000000004ccfb4 R15: 000000000076bf2c
      Modules linked in:
      CR2: 0000000000000000
      ---[ end trace 82567b5207e87bae ]---
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc9000281fce0 EFLAGS: 00010246
      RAX: 1ffffffff15f48ac RBX: ffffffff8afa4560 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a69a8f40
      RBP: ffffc9000281fd10 R08: ffffffff86ed9b0c R09: ffffed1014d351f5
      R10: ffffed1014d351f5 R11: 0000000000000000 R12: ffff8880920d3098
      R13: 1ffff1101241a613 R14: ffff8880a69a8f40 R15: 0000000000000000
      FS:  00007f2ae75db700(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a3b85000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatar <syzbot+1938db17e275e85dc328@syzkaller.appspotmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6f61189
    • Ursula Braun's avatar
      net/smc: fix cleanup for linkgroup setup failures · 51e3dfa8
      Ursula Braun authored
      
      If an SMC connection to a certain peer is setup the first time,
      a new linkgroup is created. In case of setup failures, such a
      linkgroup is unusable and should disappear. As a first step the
      linkgroup is removed from the linkgroup list in smc_lgr_forget().
      
      There are 2 problems:
      smc_listen_decline() might be called before linkgroup creation
      resulting in a crash due to calling smc_lgr_forget() with
      parameter NULL.
      If a setup failure occurs after linkgroup creation, the connection
      is never unregistered from the linkgroup, preventing linkgroup
      freeing.
      
      This patch introduces an enhanced smc_lgr_cleanup_early() function
      which
      * contains a linkgroup check for early smc_listen_decline()
        invocations
      * invokes smc_conn_free() to guarantee unregistering of the
        connection.
      * schedules fast linkgroup removal of the unusable linkgroup
      
      And the unused function smcd_conn_free() is removed from smc_core.h.
      
      Fixes: 3b2dec26 ("net/smc: restructure client and server code in af_smc")
      Fixes: 2a0674ff ("net/smc: improve abnormal termination of link groups")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51e3dfa8
    • Jiri Pirko's avatar
      sched: act: count in the size of action flags bitfield · 1521a67e
      Jiri Pirko authored
      
      The put of the flags was added by the commit referenced in fixes tag,
      however the size of the message was not extended accordingly.
      
      Fix this by adding size of the flags bitfield to the message size.
      
      Fixes: e3822678 ("net: sched: update action implementations to support flags")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1521a67e
    • Madhuparna Bhowmik's avatar
      net: core: devlink.c: Use built-in RCU list checking · 2eb51c75
      Madhuparna Bhowmik authored
      
      list_for_each_entry_rcu() has built-in RCU and lock checking.
      
      Pass cond argument to list_for_each_entry_rcu() to silence
      false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.
      
      The devlink->lock is held when devlink_dpipe_table_find()
      is called in non RCU read side section. Therefore, pass struct devlink
      to devlink_dpipe_table_find() for lockdep checking.
      
      Signed-off-by: default avatarMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2eb51c75
  2. Feb 26, 2020
    • Cong Wang's avatar
      netfilter: xt_hashlimit: unregister proc file before releasing mutex · 99b79c39
      Cong Wang authored
      
      Before releasing the global mutex, we only unlink the hashtable
      from the hash list, its proc file is still not unregistered at
      this point. So syzbot could trigger a race condition where a
      parallel htable_create() could register the same file immediately
      after the mutex is released.
      
      Move htable_remove_proc_entry() back to mutex protection to
      fix this. And, fold htable_destroy() into htable_put() to make
      the code slightly easier to understand.
      
      Reported-and-tested-by: default avatar <syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com>
      Fixes: c4a3922d ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      99b79c39
    • Michal Kubecek's avatar
      ethtool: limit bitset size · e34f1753
      Michal Kubecek authored
      
      Syzbot reported that ethnl_compact_sanity_checks() can be tricked into
      reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK
      attributes and even the message by passing a value between (u32)(-31)
      and (u32)(-1) as ETHTOOL_A_BITSET_SIZE.
      
      The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so
      that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but
      ethnl_bitmap32_not_zero() check would try to access up to 512 MB of
      attribute "payload".
      
      Prevent this overflow byt limiting the bitset size. Technically, compact
      bitset format would allow bitset sizes up to almost 2^18 (so that the
      nest size does not exceed U16_MAX) but bitsets used by ethtool are much
      shorter. S16_MAX, the largest value which can be directly used as an
      upper limit in policy, should be a reasonable compromise.
      
      Fixes: 10b518d4 ("ethtool: netlink bitset handling")
      Reported-by: default avatar <syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com>
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e34f1753
    • Amritha Nambiar's avatar
      net: Fix Tx hash bound checking · 6e11d157
      Amritha Nambiar authored
      
      Fixes the lower and upper bounds when there are multiple TCs and
      traffic is on the the same TC on the same device.
      
      The lower bound is represented by 'qoffset' and the upper limit for
      hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
      mapping when there are multiple TCs, as the queue indices for upper TCs
      will be offset by 'qoffset'.
      
      v2: Fixed commit description based on comments.
      
      Fixes: 1b837d48 ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
      Fixes: eadec877 ("net: Add support for subordinate traffic classes to netdev_pick_tx")
      Signed-off-by: default avatarAmritha Nambiar <amritha.nambiar@intel.com>
      Reviewed-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e11d157
    • Stefano Brivio's avatar
      nft_set_pipapo: Actually fetch key data in nft_pipapo_remove() · 212d58c1
      Stefano Brivio authored
      
      Phil reports that adding elements, flushing and re-adding them
      right away:
      
        nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
        nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
        nft flush set t s
        nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'
      
      triggers, almost reliably, a crash like this one:
      
        [   71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
        [   71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192
        [   71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
        [   71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
        [   71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.334596] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.335780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.339718] Call Trace:
        [   71.340093]  nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
        [   71.340973]  nft_set_destroy+0x20/0x50 [nf_tables]
        [   71.341879]  nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
        [   71.342916]  process_one_work+0x1d5/0x3c0
        [   71.343601]  worker_thread+0x4a/0x3c0
        [   71.344229]  kthread+0xfb/0x130
        [   71.344780]  ? process_one_work+0x3c0/0x3c0
        [   71.345477]  ? kthread_park+0x90/0x90
        [   71.346129]  ret_from_fork+0x35/0x40
        [   71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
        [   71.348153] ---[ end trace 2eaa8149ca759bcc ]---
        [   71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
        [   71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
        [   71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
        [   71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
        [   71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
        [   71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
        [   71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
        [   71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
        [   71.350025] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
        [   71.350026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [   71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
        [   71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [   71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [   71.350030] Kernel panic - not syncing: Fatal exception
        [   71.350412] Kernel Offset: disabled
        [   71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      which is caused by dangling elements that have been deactivated, but
      never removed.
      
      On a flush operation, nft_pipapo_walk() walks through all the elements
      in the mapping table, which are then deactivated by nft_flush_set(),
      one by one, and added to the commit list for removal. Element data is
      then freed.
      
      On transaction commit, nft_pipapo_remove() is called, and failed to
      remove these elements, leading to the stale references in the mapping.
      The first symptom of this, revealed by KASan, is a one-byte
      use-after-free in subsequent calls to nft_pipapo_walk(), which is
      usually not enough to trigger a panic. When stale elements are used
      more heavily, though, such as double-free via nft_pipapo_destroy()
      as in Phil's case, the problem becomes more noticeable.
      
      The issue comes from that fact that, on a flush operation,
      nft_pipapo_remove() won't get the actual key data via elem->key,
      elements to be deleted upon commit won't be found by the lookup via
      pipapo_get(), and removal will be skipped. Key data should be fetched
      via nft_set_ext_key(), instead.
      
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      212d58c1
  3. Feb 24, 2020
  4. Feb 23, 2020
  5. Feb 22, 2020
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Fix forceadd evaluation path · 8af1c6fb
      Jozsef Kadlecsik authored
      
      When the forceadd option is enabled, the hash:* types should find and replace
      the first entry in the bucket with the new one if there are no reuseable
      (deleted or timed out) entries. However, the position index was just not set
      to zero and remained the invalid -1 if there were no reuseable entries.
      
      Reported-by: default avatar <syzbot+6a86565c74ebe30aea18@syzkaller.appspotmail.com>
      Fixes: 23c42a40 ("netfilter: ipset: Introduction of new commands and protocol version 7")
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      8af1c6fb
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reports · f66ee041
      Jozsef Kadlecsik authored
      
      In the case of huge hash:* types of sets, due to the single spinlock of
      a set the processing of the whole set under spinlock protection could take
      too long.
      
      There were four places where the whole hash table of the set was processed
      from bucket to bucket under holding the spinlock:
      
      - During resizing a set, the original set was locked to exclude kernel side
        add/del element operations (userspace add/del is excluded by the
        nfnetlink mutex). The original set is actually just read during the
        resize, so the spinlocking is replaced with rcu locking of regions.
        However, thus there can be parallel kernel side add/del of entries.
        In order not to loose those operations a backlog is added and replayed
        after the successful resize.
      - Garbage collection of timed out entries was also protected by the spinlock.
        In order not to lock too long, region locking is introduced and a single
        region is processed in one gc go. Also, the simple timer based gc running
        is replaced with a workqueue based solution. The internal book-keeping
        (number of elements, size of extensions) is moved to region level due to
        the region locking.
      - Adding elements: when the max number of the elements is reached, the gc
        was called to evict the timed out entries. The new approach is that the gc
        is called just for the matching region, assuming that if the region
        (proportionally) seems to be full, then the whole set does. We could scan
        the other regions to check every entry under rcu locking, but for huge
        sets it'd mean a slowdown at adding elements.
      - Listing the set header data: when the set was defined with timeout
        support, the garbage collector was called to clean up timed out entries
        to get the correct element numbers and set size values. Now the set is
        scanned to check non-timed out entries, without actually calling the gc
        for the whole set.
      
      Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe ->
      SOFTIRQ-unsafe lock order issues during working on the patch.
      
      Reported-by: default avatar <syzbot+4b0e9d4ff3cf117837e5@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+c27b8d5010f45c666ed1@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+68a806795ac89df3aa1c@syzkaller.appspotmail.com>
      Fixes: 23c42a40 ("netfilter: ipset: Introduction of new commands and protocol version 7")
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      f66ee041
  6. Feb 21, 2020
  7. Feb 20, 2020
    • Roman Kiryanov's avatar
      net: disable BRIDGE_NETFILTER by default · 98bda63e
      Roman Kiryanov authored
      
      The description says 'If unsure, say N.' but
      the module is built as M by default (once
      the dependencies are satisfied).
      
      When the module is selected (Y or M), it enables
      NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS
      which alter kernel internal structures.
      
      We (Android Studio Emulator) currently do not
      use this module and think this it is more consistent
      to have it disabled by default as opposite to
      disabling it explicitly to prevent enabling
      NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS.
      
      Signed-off-by: default avatarRoman Kiryanov <rkir@google.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98bda63e
    • Kees Cook's avatar
      openvswitch: Distribute switch variables for initialization · 16a556ee
      Kees Cook authored
      Variables declared in a switch statement before any case statements
      cannot be automatically initialized with compiler instrumentation (as
      they are not part of any execution flow). With GCC's proposed automatic
      stack variable initialization feature, this triggers a warning (and they
      don't get initialized). Clang's automatic stack variable initialization
      (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
      doesn't initialize such variables[1]. Note that these warnings (or silent
      skipping) happen before the dead-store elimination optimization phase,
      so even when the automatic initializations are later elided in favor of
      direct initializations, the warnings remain.
      
      To avoid these problems, move such variables into the "case" where
      they're used or lift them up into the main function body.
      
      net/openvswitch/flow_netlink.c: In function ‘validate_set’:
      net/openvswitch/flow_netlink.c:2711:29: warning: statement will never be executed [-Wswitch-unreachable]
       2711 |  const struct ovs_key_ipv4 *ipv4_key;
            |                             ^~~~~~~~
      
      [1] https://bugs.llvm.org/show_bug.cgi?id=44916
      
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16a556ee
    • Kees Cook's avatar
      net: ip6_gre: Distribute switch variables for initialization · 46d30cb1
      Kees Cook authored
      Variables declared in a switch statement before any case statements
      cannot be automatically initialized with compiler instrumentation (as
      they are not part of any execution flow). With GCC's proposed automatic
      stack variable initialization feature, this triggers a warning (and they
      don't get initialized). Clang's automatic stack variable initialization
      (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
      doesn't initialize such variables[1]. Note that these warnings (or silent
      skipping) happen before the dead-store elimination optimization phase,
      so even when the automatic initializations are later elided in favor of
      direct initializations, the warnings remain.
      
      To avoid these problems, move such variables into the "case" where
      they're used or lift them up into the main function body.
      
      net/ipv6/ip6_gre.c: In function ‘ip6gre_err’:
      net/ipv6/ip6_gre.c:440:32: warning: statement will never be executed [-Wswitch-unreachable]
        440 |   struct ipv6_tlv_tnl_enc_lim *tel;
            |                                ^~~
      
      net/ipv6/ip6_tunnel.c: In function ‘ip6_tnl_err’:
      net/ipv6/ip6_tunnel.c:520:32: warning: statement will never be executed [-Wswitch-unreachable]
        520 |   struct ipv6_tlv_tnl_enc_lim *tel;
            |                                ^~~
      
      [1] https://bugs.llvm.org/show_bug.cgi?id=44916
      
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46d30cb1
    • Kees Cook's avatar
      net: core: Distribute switch variables for initialization · 161d1792
      Kees Cook authored
      Variables declared in a switch statement before any case statements
      cannot be automatically initialized with compiler instrumentation (as
      they are not part of any execution flow). With GCC's proposed automatic
      stack variable initialization feature, this triggers a warning (and they
      don't get initialized). Clang's automatic stack variable initialization
      (via CONFIG_INIT_STACK_ALL=y) doesn't throw a warning, but it also
      doesn't initialize such variables[1]. Note that these warnings (or silent
      skipping) happen before the dead-store elimination optimization phase,
      so even when the automatic initializations are later elided in favor of
      direct initializations, the warnings remain.
      
      To avoid these problems, move such variables into the "case" where
      they're used or lift them up into the main function body.
      
      net/core/skbuff.c: In function ‘skb_checksum_setup_ip’:
      net/core/skbuff.c:4809:7: warning: statement will never be executed [-Wswitch-unreachable]
       4809 |   int err;
            |       ^~~
      
      [1] https://bugs.llvm.org/show_bug.cgi?id=44916
      
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      161d1792
    • Willem de Bruijn's avatar
      udp: rehash on disconnect · 303d0403
      Willem de Bruijn authored
      
      As of the below commit, udp sockets bound to a specific address can
      coexist with one bound to the any addr for the same port.
      
      The commit also phased out the use of socket hashing based only on
      port (hslot), in favor of always hashing on {addr, port} (hslot2).
      
      The change broke the following behavior with disconnect (AF_UNSPEC):
      
          server binds to 0.0.0.0:1337
          server connects to 127.0.0.1:80
          server disconnects
          client connects to 127.0.0.1:1337
          client sends "hello"
          server reads "hello"	// times out, packet did not find sk
      
      On connect the server acquires a specific source addr suitable for
      routing to its destination. On disconnect it reverts to the any addr.
      
      The connect call triggers a rehash to a different hslot2. On
      disconnect, add the same to return to the original hslot2.
      
      Skip this step if the socket is going to be unhashed completely.
      
      Fixes: 4cdeeee9 ("net: udp: prefer listeners bound to an address")
      Reported-by: default avatarPavel Roskin <plroskin@gmail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      303d0403
    • Rohit Maheshwari's avatar
      net/tls: Fix to avoid gettig invalid tls record · 06f5201c
      Rohit Maheshwari authored
      
      Current code doesn't check if tcp sequence number is starting from (/after)
      1st record's start sequnce number. It only checks if seq number is before
      1st record's end sequnce number. This problem will always be a possibility
      in re-transmit case. If a record which belongs to a requested seq number is
      already deleted, tls_get_record will start looking into list and as per the
      check it will look if seq number is before the end seq of 1st record, which
      will always be true and will return 1st record always, it should in fact
      return NULL.
      As part of the fix, start looking each record only if the sequence number
      lies in the list else return NULL.
      There is one more check added, driver look for the start marker record to
      handle tcp packets which are before the tls offload start sequence number,
      hence return 1st record if the record is tls start marker and seq number is
      before the 1st record's starting sequence number.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: default avatarRohit Maheshwari <rohitm@chelsio.com>
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06f5201c
  8. Feb 19, 2020
  9. Feb 18, 2020
Loading