Skip to content
Snippets Groups Projects
  1. Mar 19, 2021
  2. Mar 17, 2021
    • Pablo Neira Ayuso's avatar
      netfilter: nftables: allow to update flowtable flags · 7b35582c
      Pablo Neira Ayuso authored
      
      Honor flowtable flags from the control update path. Disallow disabling
      to toggle hardware offload support though.
      
      Fixes: 8bb69f3b ("netfilter: nf_tables: add flowtable offload control plane")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7b35582c
    • Alexei Starovoitov's avatar
      bpf: Fix fexit trampoline. · e21aa341
      Alexei Starovoitov authored
      
      The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
      The synchronize_rcu_tasks() will not wait for such tasks to complete.
      In such case the trampoline image will be freed and when the task
      wakes up the return IP will point to freed memory causing the crash.
      Solve this by adding percpu_ref_get/put for the duration of trampoline
      and separate trampoline vs its image life times.
      The "half page" optimization has to be removed, since
      first_half->second_half->first_half transition cannot be guaranteed to
      complete in deterministic time. Every trampoline update becomes a new image.
      The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
      call_rcu_tasks. Together they will wait for the original function and
      trampoline asm to complete. The trampoline is patched from nop to jmp to skip
      fexit progs. They are freed independently from the trampoline. The image with
      fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
      will wait for both sleepable and non-sleepable progs to complete.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
      Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
      e21aa341
    • Wei Wang's avatar
      net: fix race between napi kthread mode and busy poll · cb038357
      Wei Wang authored
      
      Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
      determine if the kthread owns this napi and could call napi->poll() on
      it. However, if socket busy poll is enabled, it is possible that the
      busy poll thread grabs this SCHED bit (after the previous napi->poll()
      invokes napi_complete_done() and clears SCHED bit) and tries to poll
      on the same napi. napi_disable() could grab the SCHED bit as well.
      This patch tries to fix this race by adding a new bit
      NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
      ____napi_schedule() if the threaded mode is enabled, and gets cleared
      in napi_complete_done(), and we only poll the napi in kthread if this
      bit is set. This helps distinguish the ownership of the napi between
      kthread and other scenarios and fixes the race issue.
      
      Fixes: 29863d41 ("net: implement threaded-able napi poll loop support")
      Reported-by: default avatarMartin Zaharinov <micron10@gmail.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Cc: Alexander Duyck <alexanderduyck@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb038357
  3. Mar 16, 2021
    • wenxu's avatar
      net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct · d29334c1
      wenxu authored
      
      When openvswitch conntrack offload with act_ct action. The first rule
      do conntrack in the act_ct in tc subsystem. And miss the next rule in
      the tc and fallback to the ovs datapath but miss set post_ct flag
      which will lead the ct_state_key with -trk flag.
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: default avatarwenxu <wenxu@ucloud.cn>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d29334c1
    • Martin Willi's avatar
      can: dev: Move device back to init netns on owning netns delete · 3a5ca857
      Martin Willi authored
      When a non-initial netns is destroyed, the usual policy is to delete
      all virtual network interfaces contained, but move physical interfaces
      back to the initial netns. This keeps the physical interface visible
      on the system.
      
      CAN devices are somewhat special, as they define rtnl_link_ops even
      if they are physical devices. If a CAN interface is moved into a
      non-initial netns, destroying that netns lets the interface vanish
      instead of moving it back to the initial netns. default_device_exit()
      skips CAN interfaces due to having rtnl_link_ops set. Reproducer:
      
        ip netns add foo
        ip link set can0 netns foo
        ip netns delete foo
      
      WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
      CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
      Workqueue: netns cleanup_net
      [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
      [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
      [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
      [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
      [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
      [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
      [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
      [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
      [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
      [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)
      
      To properly restore physical CAN devices to the initial netns on owning
      netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
      For CAN devices setting this flag, default_device_exit() considers them
      non-virtual, applying the usual namespace move.
      
      The issue was introduced in the commit mentioned below, as at that time
      CAN devices did not have a dellink() operation.
      
      Fixes: e008b5fc ("net: Simplfy default_device_exit and improve batching.")
      Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.org
      
      
      Signed-off-by: default avatarMartin Willi <martin@strongswan.org>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      3a5ca857
  4. Mar 15, 2021
  5. Mar 12, 2021
  6. Mar 10, 2021
    • Eric Dumazet's avatar
      net: sched: validate stab values · e323d865
      Eric Dumazet authored
      
      iproute2 package is well behaved, but malicious user space can
      provide illegal shift values and trigger UBSAN reports.
      
      Add stab parameter to red_check_params() to validate user input.
      
      syzbot reported:
      
      UBSAN: shift-out-of-bounds in ./include/net/red.h:312:18
      shift exponent 111 is too large for 64-bit type 'long unsigned int'
      CPU: 1 PID: 14662 Comm: syz-executor.3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
       red_calc_qavg_from_idle_time include/net/red.h:312 [inline]
       red_calc_qavg include/net/red.h:353 [inline]
       choke_enqueue.cold+0x18/0x3dd net/sched/sch_choke.c:221
       __dev_xmit_skb net/core/dev.c:3837 [inline]
       __dev_queue_xmit+0x1943/0x2e00 net/core/dev.c:4150
       neigh_hh_output include/net/neighbour.h:499 [inline]
       neigh_output include/net/neighbour.h:508 [inline]
       ip6_finish_output2+0x911/0x1700 net/ipv6/ip6_output.c:117
       __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
       __ip6_finish_output+0x4c1/0xe10 net/ipv6/ip6_output.c:161
       ip6_finish_output+0x35/0x200 net/ipv6/ip6_output.c:192
       NF_HOOK_COND include/linux/netfilter.h:290 [inline]
       ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:215
       dst_output include/net/dst.h:448 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       NF_HOOK include/linux/netfilter.h:295 [inline]
       ip6_xmit+0x127e/0x1eb0 net/ipv6/ip6_output.c:320
       inet6_csk_xmit+0x358/0x630 net/ipv6/inet6_connection_sock.c:135
       dccp_transmit_skb+0x973/0x12c0 net/dccp/output.c:138
       dccp_send_reset+0x21b/0x2b0 net/dccp/output.c:535
       dccp_finish_passive_close net/dccp/proto.c:123 [inline]
       dccp_finish_passive_close+0xed/0x140 net/dccp/proto.c:118
       dccp_terminate_connection net/dccp/proto.c:958 [inline]
       dccp_close+0xb3c/0xe60 net/dccp/proto.c:1028
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478
       __sock_release+0xcd/0x280 net/socket.c:599
       sock_close+0x18/0x20 net/socket.c:1258
       __fput+0x288/0x920 fs/file_table.c:280
       task_work_run+0xdd/0x1a0 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
      
      Fixes: 8afa10cb ("net_sched: red: Avoid illegal values")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e323d865
    • Eric Dumazet's avatar
      macvlan: macvlan_count_rx() needs to be aware of preemption · dd4fa1da
      Eric Dumazet authored
      
      macvlan_count_rx() can be called from process context, it is thus
      necessary to disable preemption before calling u64_stats_update_begin()
      
      syzbot was able to spot this on 32bit arch:
      
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert include/linux/seqlock.h:271 [inline]
      WARNING: CPU: 1 PID: 4632 at include/linux/seqlock.h:271 __seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269
      Modules linked in:
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 1 PID: 4632 Comm: kworker/1:3 Not tainted 5.12.0-rc2-syzkaller #0
      Hardware name: ARM-Versatile Express
      Workqueue: events macvlan_process_broadcast
      Backtrace:
      [<82740468>] (dump_backtrace) from [<827406dc>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       r7:00000080 r6:60000093 r5:00000000 r4:8422a3c4
      [<827406c4>] (show_stack) from [<82751b58>] (__dump_stack lib/dump_stack.c:79 [inline])
      [<827406c4>] (show_stack) from [<82751b58>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
      [<82751aa0>] (dump_stack) from [<82741270>] (panic+0x130/0x378 kernel/panic.c:231)
       r7:830209b4 r6:84069ea4 r5:00000000 r4:844350d0
      [<82741140>] (panic) from [<80244924>] (__warn+0xb0/0x164 kernel/panic.c:605)
       r3:8404ec8c r2:00000000 r1:00000000 r0:830209b4
       r7:0000010f
      [<80244874>] (__warn) from [<82741520>] (warn_slowpath_fmt+0x68/0xd4 kernel/panic.c:628)
       r7:81363f70 r6:0000010f r5:83018e50 r4:00000000
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert include/linux/seqlock.h:271 [inline])
      [<827414bc>] (warn_slowpath_fmt) from [<81363f70>] (__seqprop_assert.constprop.0+0xf0/0x11c include/linux/seqlock.h:269)
       r8:5a109000 r7:0000000f r6:a568dac0 r5:89802300 r4:00000001
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (u64_stats_update_begin include/linux/u64_stats_sync.h:128 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_count_rx include/linux/if_macvlan.h:47 [inline])
      [<81363e80>] (__seqprop_assert.constprop.0) from [<81364af0>] (macvlan_broadcast+0x154/0x26c drivers/net/macvlan.c:291)
       r5:89802300 r4:8a927740
      [<8136499c>] (macvlan_broadcast) from [<81365020>] (macvlan_process_broadcast+0x258/0x2d0 drivers/net/macvlan.c:317)
       r10:81364f78 r9:8a86d000 r8:8a9c7e7c r7:8413aa5c r6:00000000 r5:00000000
       r4:89802840
      [<81364dc8>] (macvlan_process_broadcast) from [<802696a4>] (process_one_work+0x2d4/0x998 kernel/workqueue.c:2275)
       r10:00000008 r9:8404ec98 r8:84367a02 r7:ddfe6400 r6:ddfe2d40 r5:898dac80
       r4:8a86d43c
      [<802693d0>] (process_one_work) from [<80269dcc>] (worker_thread+0x64/0x54c kernel/workqueue.c:2421)
       r10:00000008 r9:8a9c6000 r8:84006d00 r7:ddfe2d78 r6:898dac94 r5:ddfe2d40
       r4:898dac80
      [<80269d68>] (worker_thread) from [<80271f40>] (kthread+0x184/0x1a4 kernel/kthread.c:292)
       r10:85247e64 r9:898dac80 r8:80269d68 r7:00000000 r6:8a9c6000 r5:89a2ee40
       r4:8a97bd00
      [<80271dbc>] (kthread) from [<80200114>] (ret_from_fork+0x14/0x20 arch/arm/kernel/entry-common.S:158)
      Exception stack(0x8a9c7fb0 to 0x8a9c7ff8)
      
      Fixes: 412ca155 ("macvlan: Move broadcasts into a work queue")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd4fa1da
    • Wei Wang's avatar
      ipv6: fix suspecious RCU usage warning · 28259bac
      Wei Wang authored
      
      Syzbot reported the suspecious RCU usage in nexthop_fib6_nh() when
      called from ipv6_route_seq_show(). The reason is ipv6_route_seq_start()
      calls rcu_read_lock_bh(), while nexthop_fib6_nh() calls
      rcu_dereference_rtnl().
      The fix proposed is to add a variant of nexthop_fib6_nh() to use
      rcu_dereference_bh_rtnl() for ipv6_route_seq_show().
      
      The reported trace is as follows:
      ./include/net/nexthop.h:416 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by syz-executor.0/17895:
           at: seq_read+0x71/0x12a0 fs/seq_file.c:169
           at: seq_file_net include/linux/seq_file_net.h:19 [inline]
           at: ipv6_route_seq_start+0xaf/0x300 net/ipv6/ip6_fib.c:2616
      
      stack backtrace:
      CPU: 1 PID: 17895 Comm: syz-executor.0 Not tainted 4.15.0-syzkaller #0
      Call Trace:
       [<ffffffff849edf9e>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff849edf9e>] dump_stack+0xd8/0x147 lib/dump_stack.c:53
       [<ffffffff8480b7fa>] lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5745
       [<ffffffff8459ada6>] nexthop_fib6_nh include/net/nexthop.h:416 [inline]
       [<ffffffff8459ada6>] ipv6_route_native_seq_show net/ipv6/ip6_fib.c:2488 [inline]
       [<ffffffff8459ada6>] ipv6_route_seq_show+0x436/0x7a0 net/ipv6/ip6_fib.c:2673
       [<ffffffff81c556df>] seq_read+0xccf/0x12a0 fs/seq_file.c:276
       [<ffffffff81dbc62c>] proc_reg_read+0x10c/0x1d0 fs/proc/inode.c:231
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:714 [inline]
       [<ffffffff81bc28ae>] do_loop_readv_writev fs/read_write.c:701 [inline]
       [<ffffffff81bc28ae>] do_iter_read+0x49e/0x660 fs/read_write.c:935
       [<ffffffff81bc81ab>] vfs_readv+0xfb/0x170 fs/read_write.c:997
       [<ffffffff81c88847>] kernel_readv fs/splice.c:361 [inline]
       [<ffffffff81c88847>] default_file_splice_read+0x487/0x9c0 fs/splice.c:416
       [<ffffffff81c86189>] do_splice_to+0x129/0x190 fs/splice.c:879
       [<ffffffff81c86f66>] splice_direct_to_actor+0x256/0x890 fs/splice.c:951
       [<ffffffff81c8777d>] do_splice_direct+0x1dd/0x2b0 fs/splice.c:1060
       [<ffffffff81bc4747>] do_sendfile+0x597/0xce0 fs/read_write.c:1459
       [<ffffffff81bca205>] SYSC_sendfile64 fs/read_write.c:1520 [inline]
       [<ffffffff81bca205>] SyS_sendfile64+0x155/0x170 fs/read_write.c:1506
       [<ffffffff81015fcf>] do_syscall_64+0x1ff/0x310 arch/x86/entry/common.c:305
       [<ffffffff84a00076>] entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Ido Schimmel <idosch@idosch.org>
      Cc: Petr Machata <petrm@nvidia.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28259bac
    • Daniel Borkmann's avatar
      net: Consolidate common blackhole dst ops · c4c877b2
      Daniel Borkmann authored
      
      Move generic blackhole dst ops to the core and use them from both
      ipv4_dst_blackhole_ops and ip6_dst_blackhole_ops where possible. No
      functional change otherwise. We need these also in other locations
      and having to define them over and over again is not great.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4c877b2
    • Maor Gottlieb's avatar
      net/mlx5: Set QP timestamp mode to default · 4806f1e2
      Maor Gottlieb authored
      
      QPs which don't care from timestamp mode, should set the ts_format
      to default, otherwise the QP creation could be failed if the timestamp
      mode is not supported.
      
      Fixes: 2fe8d4b8 ("RDMA/mlx5: Fail QP creation if the device can not support the CQE TS")
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      4806f1e2
    • Balazs Nemeth's avatar
      net: check if protocol extracted by virtio_net_hdr_set_proto is correct · 924a9bc3
      Balazs Nemeth authored
      
      For gso packets, virtio_net_hdr_set_proto sets the protocol (if it isn't
      set) based on the type in the virtio net hdr, but the skb could contain
      anything since it could come from packet_snd through a raw socket. If
      there is a mismatch between what virtio_net_hdr_set_proto sets and
      the actual protocol, then the skb could be handled incorrectly later
      on.
      
      An example where this poses an issue is with the subsequent call to
      skb_flow_dissect_flow_keys_basic which relies on skb->protocol being set
      correctly. A specially crafted packet could fool
      skb_flow_dissect_flow_keys_basic preventing EINVAL to be returned.
      
      Avoid blindly trusting the information provided by the virtio net header
      by checking that the protocol in the packet actually matches the
      protocol set by virtio_net_hdr_set_proto. Note that since the protocol
      is only checked if skb->dev implements header_ops->parse_protocol,
      packets from devices without the implementation are not checked at this
      stage.
      
      Fixes: 9274124f ("net: stricter validation of untrusted gso packets")
      Signed-off-by: default avatarBalazs Nemeth <bnemeth@redhat.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      924a9bc3
  7. Mar 09, 2021
    • Yonghong Song's avatar
      bpf: Don't do bpf_cgroup_storage_set() for kuprobe/tp programs · 05a68ce5
      Yonghong Song authored
      For kuprobe and tracepoint bpf programs, kernel calls
      trace_call_bpf() which calls BPF_PROG_RUN_ARRAY_CHECK()
      to run the program array. Currently, BPF_PROG_RUN_ARRAY_CHECK()
      also calls bpf_cgroup_storage_set() to set percpu
      cgroup local storage with NULL value. This is
      due to Commit 394e40a2 ("bpf: extend bpf_prog_array to store
      pointers to the cgroup storage") which modified
      __BPF_PROG_RUN_ARRAY() to call bpf_cgroup_storage_set()
      and this macro is also used by BPF_PROG_RUN_ARRAY_CHECK().
      
      kuprobe and tracepoint programs are not allowed to call
      bpf_get_local_storage() helper hence does not
      access percpu cgroup local storage. Let us
      change BPF_PROG_RUN_ARRAY_CHECK() not to
      modify percpu cgroup local storage.
      
      The issue is observed when I tried to debug [1] where
      percpu data is overwritten due to
        preempt_disable -> migration_disable
      change. This patch does not completely fix the above issue,
      which will be addressed separately, e.g., multiple cgroup
      prog runs may preempt each other. But it does fix
      any potential issue caused by tracing program
      overwriting percpu cgroup storage:
       - in a busy system, a tracing program is to run between
         bpf_cgroup_storage_set() and the cgroup prog run.
       - a kprobe program is triggered by a helper in cgroup prog
         before bpf_get_local_storage() is called.
      
       [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T
      
      
      
      Fixes: 394e40a2 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com
      05a68ce5
  8. Mar 08, 2021
  9. Mar 04, 2021
  10. Mar 03, 2021
  11. Mar 02, 2021
  12. Mar 01, 2021
    • Willem de Bruijn's avatar
      net: expand textsearch ts_state to fit skb_seq_state · b228c9b0
      Willem de Bruijn authored
      The referenced commit expands the skb_seq_state used by
      skb_find_text with a 4B frag_off field, growing it to 48B.
      
      This exceeds container ts_state->cb, causing a stack corruption:
      
      [   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
      is corrupted in: skb_find_text+0xc5/0xd0
      [   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
      [   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.14.0-2 04/01/2014
      [   73.260078] Call Trace:
      [   73.264677]  dump_stack+0x57/0x6a
      [   73.267866]  panic+0xf6/0x2b7
      [   73.270578]  ? skb_find_text+0xc5/0xd0
      [   73.273964]  __stack_chk_fail+0x10/0x10
      [   73.277491]  skb_find_text+0xc5/0xd0
      [   73.280727]  string_mt+0x1f/0x30
      [   73.283639]  ipt_do_table+0x214/0x410
      
      The struct is passed between skb_find_text and its callbacks
      skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
      the textsearch interface using TS_SKB_CB.
      
      I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
      skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.
      
      skb->cb was increased from 40B to 48B after ts_state was introduced,
      in commit 3e3850e9 ("[NETFILTER]: Fix xfrm lookup in
      ip_route_me_harder/ip6_route_me_harder").
      
      Increase ts_state.cb[] to 48 to fit the struct.
      
      Also add a BUILD_BUG_ON to avoid a repeat.
      
      The alternative is to directly add a dependency from textsearch onto
      linux/skbuff.h, but I think the intent is textsearch to have no such
      dependencies on its callers.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
      
      
      Fixes: 97550f6f ("net: compound page support in skb_seq_read")
      Reported-by: default avatarKris Karas <bugs-a17@moonlit-rail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b228c9b0
    • Jean Delvare's avatar
      block: Drop leftover references to RQF_SORTED · 5218e12e
      Jean Delvare authored
      
      Commit a1ce35fa ("block: remove dead
      elevator code") removed all users of RQF_SORTED. However it is still
      defined, and there is one reference left to it (which in effect is
      dead code). Clear it all up.
      
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5218e12e
    • Oleksij Rempel's avatar
      can: skb: can_skb_set_owner(): fix ref counting if socket was closed before setting skb ownership · e940e089
      Oleksij Rempel authored
      There are two ref count variables controlling the free()ing of a socket:
      - struct sock::sk_refcnt - which is changed by sock_hold()/sock_put()
      - struct sock::sk_wmem_alloc - which accounts the memory allocated by
        the skbs in the send path.
      
      In case there are still TX skbs on the fly and the socket() is closed,
      the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack
      clones an "echo" skb, calls sock_hold() on the original socket and
      references it. This produces the following back trace:
      
      | WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
      | refcount_t: addition on 0; use-after-free.
      | Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
      | CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
      | Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
      | Backtrace:
      | [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
      | [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
      | [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
      | [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
      | [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
      | [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
      | [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
      | [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
      | [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
      | [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
      | [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
      | [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)
      
      To fix this problem, only set skb ownership to sockets which have still
      a ref count > 0.
      
      Fixes: 0ae89beb ("can: add destructor for self generated skbs")
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Andre Naujoks <nautsch2@gmail.com>
      Link: https://lore.kernel.org/r/20210226092456.27126-1-o.rempel@pengutronix.de
      
      
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      e940e089
  13. Feb 27, 2021
  14. Feb 26, 2021
Loading