1. 17 Apr, 2020 3 commits
    • Chuck Lever's avatar
      svcrdma: Fix leak of svc_rdma_recv_ctxt objects · 23cf1ee1
      Chuck Lever authored
      Utilize the xpo_release_rqst transport method to ensure that each
      rqstp's svc_rdma_recv_ctxt object is released even when the server
      cannot return a Reply for that rqstp.
      
      Without this fix, each RPC whose Reply cannot be sent leaks one
      svc_rdma_recv_ctxt. This is a 2.5KB structure, a 4KB DMA-mapped
      Receive buffer, and any pages that might be part of the Reply
      message.
      
      The leak is infrequent unless the network fabric is unreliable or
      Kerberos is in use, as GSS sequence window overruns, which result
      in connection loss, are more common on fast transports.
      
      Fixes: 3a88092e ("svcrdma: Preserve Receive buffer until svc_rdma_sendto")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      23cf1ee1
    • Chuck Lever's avatar
      svcrdma: Fix trace point use-after-free race · e28b4fc6
      Chuck Lever authored
      I hit this while testing nfsd-5.7 with kernel memory debugging
      enabled on my server:
      
      Mar 30 13:21:45 klimt kernel: BUG: unable to handle page fault for address: ffff8887e6c279a8
      Mar 30 13:21:45 klimt kernel: #PF: supervisor read access in kernel mode
      Mar 30 13:21:45 klimt kernel: #PF: error_code(0x0000) - not-present page
      Mar 30 13:21:45 klimt kernel: PGD 3601067 P4D 3601067 PUD 87c519067 PMD 87c3e2067 PTE 800ffff8193d8060
      Mar 30 13:21:45 klimt kernel: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
      Mar 30 13:21:45 klimt kernel: CPU: 2 PID: 1933 Comm: nfsd Not tainted 5.6.0-rc6-00040-g881e87a3c6f9 #1591
      Mar 30 13:21:45 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
      Mar 30 13:21:45 klimt kernel: RIP: 0010:svc_rdma_post_chunk_ctxt+0xab/0x284 [rpcrdma]
      Mar 30 13:21:45 klimt kernel: Code: c1 83 34 02 00 00 29 d0 85 c0 7e 72 48 8b bb a0 02 00 00 48 8d 54 24 08 4c 89 e6 48 8b 07 48 8b 40 20 e8 5a 5c 2b e1 41 89 c6 <8b> 45 20 89 44 24 04 8b 05 02 e9 01 00 85 c0 7e 33 e9 5e 01 00 00
      Mar 30 13:21:45 klimt kernel: RSP: 0018:ffffc90000dfbdd8 EFLAGS: 00010286
      Mar 30 13:21:45 klimt kernel: RAX: 0000000000000000 RBX: ffff8887db8db400 RCX: 0000000000000030
      Mar 30 13:21:45 klimt kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000246
      Mar 30 13:21:45 klimt kernel: RBP: ffff8887e6c27988 R08: 0000000000000000 R09: 0000000000000004
      Mar 30 13:21:45 klimt kernel: R10: ffffc90000dfbdd8 R11: 00c068ef00000000 R12: ffff8887eb4e4a80
      Mar 30 13:21:45 klimt kernel: R13: ffff8887db8db634 R14: 0000000000000000 R15: ffff8887fc931000
      Mar 30 13:21:45 klimt kernel: FS:  0000000000000000(0000) GS:ffff88885bd00000(0000) knlGS:0000000000000000
      Mar 30 13:21:45 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8 CR3: 000000081b72e002 CR4: 00000000001606e0
      Mar 30 13:21:45 klimt kernel: Call Trace:
      Mar 30 13:21:45 klimt kernel: ? svc_rdma_vec_to_sg+0x7f/0x7f [rpcrdma]
      Mar 30 13:21:45 klimt kernel: svc_rdma_send_write_chunk+0x59/0xce [rpcrdma]
      Mar 30 13:21:45 klimt kernel: svc_rdma_sendto+0xf9/0x3ae [rpcrdma]
      Mar 30 13:21:45 klimt kernel: ? nfsd_destroy+0x51/0x51 [nfsd]
      Mar 30 13:21:45 klimt kernel: svc_send+0x105/0x1e3 [sunrpc]
      Mar 30 13:21:45 klimt kernel: nfsd+0xf2/0x149 [nfsd]
      Mar 30 13:21:45 klimt kernel: kthread+0xf6/0xfb
      Mar 30 13:21:45 klimt kernel: ? kthread_queue_delayed_work+0x74/0x74
      Mar 30 13:21:45 klimt kernel: ret_from_fork+0x3a/0x50
      Mar 30 13:21:45 klimt kernel: Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue ib_umad ib_ipoib mlx4_ib sb_edac x86_pkg_temp_thermal iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper crypto_simd cryptd pcspkr rpcrdma i2c_i801 rdma_ucm lpc_ich mfd_core ib_iser rdma_cm iw_cm ib_cm mei_me raid0 libiscsi mei sg scsi_transport_iscsi ioatdma wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables xfs libcrc32c mlx4_en sd_mod sr_mod cdrom mlx4_core crc32c_intel igb nvme i2c_algo_bit ahci i2c_core libahci nvme_core dca libata t10_pi qedr dm_mirror dm_region_hash dm_log dm_mod dax qede qed crc8 ib_uverbs ib_core
      Mar 30 13:21:45 klimt kernel: CR2: ffff8887e6c279a8
      Mar 30 13:21:45 klimt kernel: ---[ end trace 87971d2ad3429424 ]---
      
      It's absolutely not safe to use resources pointed to by the @send_wr
      argument of ib_post_send() _after_ that function returns. Those
      resources are typically freed by the Send completion handler, which
      can run before ib_post_send() returns.
      
      Thus the trace points currently around ib_post_send() in the
      server's RPC/RDMA transport are a hazard, even when they are
      disabled. Rearrange them so that they touch the Work Request only
      _before_ ib_post_send() is invoked.
      
      Fixes: bd2abef3 ("svcrdma: Trace key RDMA API events")
      Fixes: 4201c746 ("svcrdma: Introduce svc_rdma_send_ctxt")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e28b4fc6
    • Chuck Lever's avatar
      SUNRPC: Fix backchannel RPC soft lockups · 6221f1d9
      Chuck Lever authored
      Currently, after the forward channel connection goes away,
      backchannel operations are causing soft lockups on the server
      because call_transmit_status's SOFTCONN logic ignores ENOTCONN.
      Such backchannel Calls are aggressively retried until the client
      reconnects.
      
      Backchannel Calls should use RPC_TASK_NOCONNECT rather than
      RPC_TASK_SOFTCONN. If there is no forward connection, the server is
      not capable of establishing a connection back to the client, thus
      that backchannel request should fail before the server attempts to
      send it. Commit 58255a4e ("NFSD: NFSv4 callback client should
      use RPC_TASK_SOFTCONN") was merged several years before
      RPC_TASK_NOCONNECT was available.
      
      Because setup_callback_client() explicitly sets NOPING, the NFSv4.0
      callback connection depends on the first callback RPC to initiate
      a connection to the client. Thus NFSv4.0 needs to continue to use
      RPC_TASK_SOFTCONN.
      Suggested-by: default avatarTrond Myklebust <trondmy@hammerspace.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Cc: <stable@vger.kernel.org> # v4.20+
      6221f1d9
  2. 15 Apr, 2020 6 commits
  3. 14 Apr, 2020 4 commits
  4. 13 Apr, 2020 2 commits
    • Yihao Wu's avatar
      SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge · 43e33924
      Yihao Wu authored
      Deleting list entry within hlist_for_each_entry_safe is not safe unless
      next pointer (tmp) is protected too. It's not, because once hash_lock
      is released, cache_clean may delete the entry that tmp points to. Then
      cache_purge can walk to a deleted entry and tries to double free it.
      
      Fix this bug by holding only the deleted entry's reference.
      Suggested-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarYihao Wu <wuyihao@linux.alibaba.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      [ cel: removed unused variable ]
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      43e33924
    • Florian Westphal's avatar
      mptcp: fix double-unlock in mptcp_poll · e154659b
      Florian Westphal authored
      mptcp_connect/28740 is trying to release lock (sk_lock-AF_INET) at:
      [<ffffffff82c15869>] mptcp_poll+0xb9/0x550
      but there are no more locks to release!
      Call Trace:
       lock_release+0x50f/0x750
       release_sock+0x171/0x1b0
       mptcp_poll+0xb9/0x550
       sock_poll+0x157/0x470
       ? get_net_ns+0xb0/0xb0
       do_sys_poll+0x63c/0xdd0
      
      Problem is that __mptcp_tcp_fallback() releases the mptcp socket lock,
      but after recent change it doesn't do this in all of its return paths.
      
      To fix this, remove the unlock from __mptcp_tcp_fallback() and
      always do the unlock in the caller.
      
      Also add a small comment as to why we have this
      __mptcp_needs_tcp_fallback().
      
      Fixes: 0b4f33de ("mptcp: fix tcp fallback crash")
      Reported-by: syzbot+e56606435b7bfeea8cf5@syzkaller.appspotmail.com
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e154659b
  5. 09 Apr, 2020 5 commits
  6. 08 Apr, 2020 9 commits
    • Joe Stringer's avatar
      bpf: Fix use of sk->sk_reuseport from sk_assign · 8e368dc7
      Joe Stringer authored
      In testing, we found that for request sockets the sk->sk_reuseport field
      may yet be uninitialized, which caused bpf_sk_assign() to randomly
      succeed or return -ESOCKTNOSUPPORT when handling the forward ACK in a
      three-way handshake.
      
      Fix it by only applying the reuseport check for full sockets.
      
      Fixes: cf7fbe66 ("bpf: Add socket assign support")
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200408033540.10339-1-joe@wand.net.nz
      8e368dc7
    • Arnd Bergmann's avatar
      net/tls: fix const assignment warning · f691a25c
      Arnd Bergmann authored
      Building with some experimental patches, I came across a warning
      in the tls code:
      
      include/linux/compiler.h:215:30: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        215 |  *(volatile typeof(x) *)&(x) = (val);  \
            |                              ^
      net/tls/tls_main.c:650:4: note: in expansion of macro 'smp_store_release'
        650 |    smp_store_release(&saved_tcpv4_prot, prot);
      
      This appears to be a legitimate warning about assigning a const pointer
      into the non-const 'saved_tcpv4_prot' global. Annotate both the ipv4 and
      ipv6 pointers 'const' to make the code internally consistent.
      
      Fixes: 5bb4c45d ("net/tls: Read sk_prot once when building tls proto ops")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f691a25c
    • Michael Weiß's avatar
      l2tp: Allow management of tunnels and session in user namespace · 2abe0523
      Michael Weiß authored
      Creation and management of L2TPv3 tunnels and session through netlink
      requires CAP_NET_ADMIN. However, a process with CAP_NET_ADMIN in a
      non-initial user namespace gets an EPERM due to the use of the
      genetlink GENL_ADMIN_PERM flag. Thus, management of L2TP VPNs inside
      an unprivileged container won't work.
      
      We replaced the GENL_ADMIN_PERM by the GENL_UNS_ADMIN_PERM flag
      similar to other network modules which also had this problem, e.g.,
      openvswitch (commit 4a92602a "openvswitch: allow management from
      inside user namespaces") and nl80211 (commit 5617c6cd "nl80211:
      Allow privileged operations from user namespaces").
      
      I tested this in the container runtime trustm3 (trustm3.github.io)
      and was able to create l2tp tunnels and sessions in unpriviliged
      (user namespaced) containers using a private network namespace.
      For other runtimes such as docker or lxc this should work, too.
      Signed-off-by: default avatarMichael Weiß <michael.weiss@aisec.fraunhofer.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2abe0523
    • Jason A. Donenfeld's avatar
      x86: update AS_* macros to binutils >=2.23, supporting ADX and AVX2 · e6abef61
      Jason A. Donenfeld authored
      Now that the kernel specifies binutils 2.23 as the minimum version, we
      can remove ifdefs for AVX2 and ADX throughout.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      e6abef61
    • Jason A. Donenfeld's avatar
      x86: probe assembler capabilities via kconfig instead of makefile · 5e8ebd84
      Jason A. Donenfeld authored
      Doing this probing inside of the Makefiles means we have a maze of
      ifdefs inside the source code and child Makefiles that need to make
      proper decisions on this too. Instead, we do it at Kconfig time, like
      many other compiler and assembler options, which allows us to set up the
      dependencies normally for full compilation units. In the process, the
      ADX test changes to use %eax instead of %r10 so that it's valid in both
      32-bit and 64-bit mode.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      5e8ebd84
    • Taehee Yoo's avatar
      hsr: check protocol version in hsr_newlink() · 4faab8c4
      Taehee Yoo authored
      In the current hsr code, only 0 and 1 protocol versions are valid.
      But current hsr code doesn't check the version, which is received by
      userspace.
      
      Test commands:
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 version 4
      
      In the test commands, version 4 is invalid.
      So, the command should be failed.
      
      After this patch, following error will occur.
      "Error: hsr: Only versions 0..1 are supported."
      
      Fixes: ee1c2797 ("net/hsr: Added support for HSR v1")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4faab8c4
    • Paul Blakey's avatar
      net: sched: Fix setting last executed chain on skb extension · a080da6a
      Paul Blakey authored
      After driver sets the missed chain on the tc skb extension it is
      consumed (deleted) by tc_classify_ingress and tc jumps to that chain.
      If tc now misses on this chain (either no match, or no goto action),
      then last executed chain remains 0, and the skb extension is not re-added,
      and the next datapath (ovs) will start from 0.
      
      Fix that by setting last executed chain to the chain read from the skb
      extension, so if there is a miss, we set it back.
      
      Fixes: af699626 ("net: sched: Support specifying a starting chain via tc skb ext")
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarPaul Blakey <paulb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a080da6a
    • Konstantin Khlebnikov's avatar
      net: revert default NAPI poll timeout to 2 jiffies · a4837980
      Konstantin Khlebnikov authored
      For HZ < 1000 timeout 2000us rounds up to 1 jiffy but expires randomly
      because next timer interrupt could come shortly after starting softirq.
      
      For commonly used CONFIG_HZ=1000 nothing changes.
      
      Fixes: 7acf8a1e ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
      Reported-by: default avatarDmitry Yakunin <zeil@yandex-team.ru>
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4837980
    • Tim Stallard's avatar
      net: icmp6: do not select saddr from iif when route has prefsrc set · b93cfb9c
      Tim Stallard authored
      Since commit fac6fce9 ("net: icmp6: provide input address for
      traceroute6") ICMPv6 errors have source addresses from the ingress
      interface. However, this overrides when source address selection is
      influenced by setting preferred source addresses on routes.
      
      This can result in ICMP errors being lost to upstream BCP38 filters
      when the wrong source addresses are used, breaking path MTU discovery
      and traceroute.
      
      This patch sets the modified source address selection to only take place
      when the route used has no prefsrc set.
      
      It can be tested with:
      
      ip link add v1 type veth peer name v2
      ip netns add test
      ip netns exec test ip link set lo up
      ip link set v2 netns test
      ip link set v1 up
      ip netns exec test ip link set v2 up
      ip addr add 2001:db8::1/64 dev v1 nodad
      ip addr add 2001:db8::3 dev v1 nodad
      ip netns exec test ip addr add 2001:db8::2/64 dev v2 nodad
      ip netns exec test ip route add unreachable 2001:db8:1::1
      ip netns exec test ip addr add 2001:db8:100::1 dev lo
      ip netns exec test ip route add 2001:db8::1 dev v2 src 2001:db8:100::1
      ip route add 2001:db8:1000::1 via 2001:db8::2
      traceroute6 -s 2001:db8::1 2001:db8:1000::1
      traceroute6 -s 2001:db8::3 2001:db8:1000::1
      ip netns delete test
      
      Output before:
      $ traceroute6 -s 2001:db8::1 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8::2 (2001:db8::2)  0.843 ms !N  0.396 ms !N  0.257 ms !N
      $ traceroute6 -s 2001:db8::3 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8::2 (2001:db8::2)  0.772 ms !N  0.257 ms !N  0.357 ms !N
      
      After:
      $ traceroute6 -s 2001:db8::1 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8:100::1 (2001:db8:100::1)  8.885 ms !N  0.310 ms !N  0.174 ms !N
      $ traceroute6 -s 2001:db8::3 2001:db8:1000::1
      traceroute to 2001:db8:1000::1 (2001:db8:1000::1), 30 hops max, 80 byte packets
       1  2001:db8::2 (2001:db8::2)  1.403 ms !N  0.205 ms !N  0.313 ms !N
      
      Fixes: fac6fce9 ("net: icmp6: provide input address for traceroute6")
      Signed-off-by: default avatarTim Stallard <code@timstallard.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b93cfb9c
  7. 07 Apr, 2020 2 commits
  8. 06 Apr, 2020 3 commits
  9. 05 Apr, 2020 4 commits
    • Eric Dumazet's avatar
      netfilter: nf_tables: do not leave dangling pointer in nf_tables_set_alloc_name · 7fb6f78d
      Eric Dumazet authored
      If nf_tables_set_alloc_name() frees set->name, we better
      clear set->name to avoid a future use-after-free or invalid-free.
      
      BUG: KASAN: double-free or invalid-free in nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148
      
      CPU: 0 PID: 28233 Comm: syz-executor.0 Not tainted 5.6.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:374
       kasan_report_invalid_free+0x61/0xa0 mm/kasan/report.c:468
       __kasan_slab_free+0x129/0x140 mm/kasan/common.c:455
       __cache_free mm/slab.c:3426 [inline]
       kfree+0x109/0x2b0 mm/slab.c:3757
       nf_tables_newset+0x1ed6/0x2560 net/netfilter/nf_tables_api.c:4148
       nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
       nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345
       ___sys_sendmsg+0x100/0x170 net/socket.c:2399
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2432
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x45c849
      Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fe5ca21dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007fe5ca21e6d4 RCX: 000000000045c849
      RDX: 0000000000000000 RSI: 0000000020000c40 RDI: 0000000000000003
      RBP: 000000000076bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 000000000000095b R14: 00000000004cc0e9 R15: 000000000076bf0c
      
      Allocated by task 28233:
       save_stack+0x1b/0x80 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       __kasan_kmalloc mm/kasan/common.c:515 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:488
       __do_kmalloc mm/slab.c:3656 [inline]
       __kmalloc_track_caller+0x159/0x790 mm/slab.c:3671
       kvasprintf+0xb5/0x150 lib/kasprintf.c:25
       kasprintf+0xbb/0xf0 lib/kasprintf.c:59
       nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3536 [inline]
       nf_tables_newset+0x1543/0x2560 net/netfilter/nf_tables_api.c:4088
       nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
       nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345
       ___sys_sendmsg+0x100/0x170 net/socket.c:2399
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2432
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 28233:
       save_stack+0x1b/0x80 mm/kasan/common.c:72
       set_track mm/kasan/common.c:80 [inline]
       kasan_set_free_info mm/kasan/common.c:337 [inline]
       __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:476
       __cache_free mm/slab.c:3426 [inline]
       kfree+0x109/0x2b0 mm/slab.c:3757
       nf_tables_set_alloc_name net/netfilter/nf_tables_api.c:3544 [inline]
       nf_tables_newset+0x1f73/0x2560 net/netfilter/nf_tables_api.c:4088
       nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
       nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6b9/0x7d0 net/socket.c:2345
       ___sys_sendmsg+0x100/0x170 net/socket.c:2399
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2432
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8880a6032d00
       which belongs to the cache kmalloc-32 of size 32
      The buggy address is located 0 bytes inside of
       32-byte region [ffff8880a6032d00, ffff8880a6032d20)
      The buggy address belongs to the page:
      page:ffffea0002980c80 refcount:1 mapcount:0 mapping:ffff8880aa0001c0 index:0xffff8880a6032fc1
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0002a3be88 ffffea00029b1908 ffff8880aa0001c0
      raw: ffff8880a6032fc1 ffff8880a6032000 000000010000003e 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: 65038428 ("netfilter: nf_tables: allow to specify stateful expression in set definition")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7fb6f78d
    • Maciej Żenczykowski's avatar
      netfilter: xt_IDLETIMER: target v1 - match Android layout · bc9fe614
      Maciej Żenczykowski authored
      Android has long had an extension to IDLETIMER to send netlink
      messages to userspace, see:
        https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/include/uapi/linux/netfilter/xt_IDLETIMER.h#42
      Note: this is idletimer target rev 1, there is no rev 0 in
      the Android common kernel sources, see registration at:
        https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/net/netfilter/xt_IDLETIMER.c#483
      
      When we compare that to upstream's new idletimer target rev 1:
        https://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git/tree/include/uapi/linux/netfilter/xt_IDLETIMER.h#n46
      
      We immediately notice that these two rev 1 structs are the
      same size and layout, and that while timer_type and send_nl_msg
      are differently named and serve a different purpose, they're
      at the same offset.
      
      This makes them impossible to tell apart - and thus one cannot
      know in a mixed Android/vanilla environment whether one means
      timer_type or send_nl_msg.
      
      Since this is iptables/netfilter uapi it introduces a problem
      between iptables (vanilla vs Android) userspace and kernel
      (vanilla vs Android) if the two don't match each other.
      
      Additionally when at some point in the future Android picks up
      5.7+ it's not at all clear how to resolve the resulting merge
      conflict.
      
      Furthermore, since upgrading the kernel on old Android phones
      is pretty much impossible there does not seem to be an easy way
      out of this predicament.
      
      The only thing I've been able to come up with is some super
      disgusting kernel version >= 5.7 check in the iptables binary
      to flip between different struct layouts.
      
      By adding a dummy field to the vanilla Linux kernel header file
      we can force the two structs to be compatible with each other.
      
      Long term I think I would like to deprecate send_nl_msg out of
      Android entirely, but I haven't quite been able to figure out
      exactly how we depend on it.  It seems to be very similar to
      sysfs notifications but with some extra info.
      
      Currently it's actually always enabled whenever Android uses
      the IDLETIMER target, so we could also probably entirely
      remove it from the uapi in favour of just always enabling it,
      but again we can't upgrade old kernels already in the field.
      
      (Also note that this doesn't change the structure's size,
      as it is simply fitting into the pre-existing padding, and
      that since 5.7 hasn't been released yet, there's still time
      to make this uapi visible change)
      
      Cc: Manoj Basapathi <manojbm@codeaurora.org>
      Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bc9fe614
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: do not update stateful expressions if lookup is inverted · a26c1e49
      Pablo Neira Ayuso authored
      Initialize set lookup matching element to NULL. Otherwise, the
      NFT_LOOKUP_F_INV flag reverses the matching logic and it leads to
      deference an uninitialized pointer to the matching element. Make sure
      element data area and stateful expression are accessed if there is a
      matching set element.
      
      This patch undoes 24791b9a ("netfilter: nft_set_bitmap: initialize set
      element extension in lookups") which is not required anymore.
      
      Fixes: 339706bc ("netfilter: nft_lookup: update element stateful expression")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a26c1e49
    • Stefano Brivio's avatar
      netfilter: nft_set_rbtree: Drop spurious condition for overlap detection on insertion · 72239f27
      Stefano Brivio authored
      Case a1. for overlap detection in __nft_rbtree_insert() is not a valid
      one: start-after-start is not needed to detect any type of interval
      overlap and it actually results in a false positive if, while
      descending the tree, this is the only step we hit after starting from
      the root.
      
      This introduced a regression, as reported by Pablo, in Python tests
      cases ip/ip.t and ip/numgen.t:
      
        ip/ip.t: ERROR: line 124: add rule ip test-ip4 input ip hdrlength vmap { 0-4 : drop, 5 : accept, 6 : continue } counter: This rule should not have failed.
        ip/numgen.t: ERROR: line 7: add rule ip test-ip4 pre dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200}: This rule should not have failed.
      
      Drop case a1. and renumber others, so that they are a bit clearer. In
      order for these diagrams to be readily understandable, a bigger rework
      is probably needed, such as an ASCII art of the actual rbtree (instead
      of a flattened version).
      
      Shell script test sets/0044interval_overlap_0 should cover all
      possible cases for false negatives, so I consider that test case still
      sufficient after this change.
      
      v2: Fix comments for cases a3. and b3.
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Fixes: 7c84d414 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      72239f27
  10. 04 Apr, 2020 1 commit
  11. 03 Apr, 2020 1 commit
    • Geliang Tang's avatar
      mptcp: add some missing pr_fmt defines · c85adced
      Geliang Tang authored
      Some of the mptcp logs didn't print out the format string:
      
      [  185.651493] DSS
      [  185.651494] data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1
      [  185.651494] data_ack=13792750332298763796
      [  185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=0 skb=0000000063dc595d
      [  185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 status=0
      [  185.651495] MPTCP: msk ack_seq=9bbc894565aa2f9a subflow ack_seq=9bbc894565aa2f9a
      [  185.651496] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=1 skb=0000000012e809e1
      
      So this patch added these missing pr_fmt defines. Then we can get the same
      format string "MPTCP" in all mptcp logs like this:
      
      [  142.795829] MPTCP: DSS
      [  142.795829] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1
      [  142.795829] MPTCP: data_ack=8089704603109242421
      [  142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=0 skb=00000000d5f230df
      [  142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 status=0
      [  142.795831] MPTCP: msk ack_seq=66790290f1199d9b subflow ack_seq=66790290f1199d9b
      [  142.795831] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=1 skb=00000000de5aca2e
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c85adced