Skip to content
Snippets Groups Projects
  1. May 21, 2021
  2. May 20, 2021
  3. May 19, 2021
  4. May 18, 2021
    • Shay Drory's avatar
      RDMA/core: Don't access cm_id after its destruction · 889d916b
      Shay Drory authored
      restrack should only be attached to a cm_id while the ID has a valid
      device pointer. It is set up when the device is first loaded, but not
      cleared when the device is removed. There is also two copies of the device
      pointer, one private and one in the public API, and these were left out of
      sync.
      
      Make everything go to NULL together and manipulate restrack right around
      the device assignments.
      
      Found by syzcaller:
      BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline]
      BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline]
      BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline]
      BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
      BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
      BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
      Write of size 8 at addr dead000000000108 by task syz-executor716/334
      
      CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0xbe/0xf9 lib/dump_stack.c:120
       __kasan_report mm/kasan/report.c:400 [inline]
       kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
       __list_del include/linux/list.h:112 [inline]
       __list_del_entry include/linux/list.h:135 [inline]
       list_del include/linux/list.h:146 [inline]
       cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
       cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
       cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
       _destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862
       ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185
       ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576
       ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797
       __fput+0x169/0x540 fs/file_table.c:280
       task_work_run+0xb7/0x100 kernel/task_work.c:140
       exit_task_work include/linux/task_work.h:30 [inline]
       do_exit+0x7da/0x17f0 kernel/exit.c:825
       do_group_exit+0x9e/0x190 kernel/exit.c:922
       __do_sys_exit_group kernel/exit.c:933 [inline]
       __se_sys_exit_group kernel/exit.c:931 [inline]
       __x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931
       do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 255d0c14 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count")
      Link: https://lore.kernel.org/r/3352ee288fe34f2b44220457a29bfc0548686363.1620711734.git.leonro@nvidia.com
      
      
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      889d916b
  5. May 17, 2021
    • Leon Romanovsky's avatar
      RDMA/rxe: Return CQE error if invalid lkey was supplied · dc07628b
      Leon Romanovsky authored
      RXE is missing update of WQE status in LOCAL_WRITE failures.  This caused
      the following kernel panic if someone sent an atomic operation with an
      explicitly wrong lkey.
      
      [leonro@vm ~]$ mkt test
      test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ...
       WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
       Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
       CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
       Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
       RSP: 0018:ffff8880158af090 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
       RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
       RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
       R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
       R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
       FS:  00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        rxe_do_task+0x130/0x230 [rdma_rxe]
        rxe_rcv+0xb11/0x1df0 [rdma_rxe]
        rxe_loopback+0x157/0x1e0 [rdma_rxe]
        rxe_responder+0x5532/0x7620 [rdma_rxe]
        rxe_do_task+0x130/0x230 [rdma_rxe]
        rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
        rxe_loopback+0x157/0x1e0 [rdma_rxe]
        rxe_requester+0x1efd/0x58c0 [rdma_rxe]
        rxe_do_task+0x130/0x230 [rdma_rxe]
        rxe_post_send+0x998/0x1860 [rdma_rxe]
        ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
        ib_uverbs_write+0x847/0xc80 [ib_uverbs]
        vfs_write+0x1c5/0x840
        ksys_write+0x176/0x1d0
        do_syscall_64+0x3f/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 8700e3e7 ("Soft RoCE driver")
      Link: https://lore.kernel.org/r/11e7b553f3a6f5371c6bb3f57c494bb52b88af99.1620711734.git.leonro@nvidia.com
      
      
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Acked-by: default avatarZhu Yanjun <zyjzyj2000@gmail.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      dc07628b
    • Maor Gottlieb's avatar
      RDMA/mlx5: Recover from fatal event in dual port mode · 97f30d32
      Maor Gottlieb authored
      When there is fatal event on the slave port, the device is marked as not
      active. We need to mark it as active again when the slave is recovered to
      regain full functionality.
      
      Fixes: d69a24e0 ("IB/mlx5: Move IB event processing onto a workqueue")
      Link: https://lore.kernel.org/r/8906754455bb23019ef223c725d2c0d38acfb80b.1620711734.git.leonro@nvidia.com
      
      
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      97f30d32
Loading