Skip to content
Snippets Groups Projects
  1. Apr 08, 2021
    • Sami Tolvanen's avatar
      add support for Clang CFI · cf68fffb
      Sami Tolvanen authored
      This change adds support for Clang’s forward-edge Control Flow
      Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
      injects a runtime check before each indirect function call to ensure
      the target is a valid function with the correct static type. This
      restricts possible call targets and makes it more difficult for
      an attacker to exploit bugs that allow the modification of stored
      function pointers. For more details, see:
      
        https://clang.llvm.org/docs/ControlFlowIntegrity.html
      
      
      
      Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
      visibility to possible call targets. Kernel modules are supported
      with Clang’s cross-DSO CFI mode, which allows checking between
      independently compiled components.
      
      With CFI enabled, the compiler injects a __cfi_check() function into
      the kernel and each module for validating local call targets. For
      cross-module calls that cannot be validated locally, the compiler
      calls the global __cfi_slowpath_diag() function, which determines
      the target module and calls the correct __cfi_check() function. This
      patch includes a slowpath implementation that uses __module_address()
      to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
      shadow map that speeds up module look-ups by ~3x.
      
      Clang implements indirect call checking using jump tables and
      offers two methods of generating them. With canonical jump tables,
      the compiler renames each address-taken function to <function>.cfi
      and points the original symbol to a jump table entry, which passes
      __cfi_check() validation. This isn’t compatible with stand-alone
      assembly code, which the compiler doesn’t instrument, and would
      result in indirect calls to assembly code to fail. Therefore, we
      default to using non-canonical jump tables instead, where the compiler
      generates a local jump table entry <function>.cfi_jt for each
      address-taken function, and replaces all references to the function
      with the address of the jump table entry.
      
      Note that because non-canonical jump table addresses are local
      to each component, they break cross-module function address
      equality. Specifically, the address of a global function will be
      different in each module, as it's replaced with the address of a local
      jump table entry. If this address is passed to a different module,
      it won’t match the address of the same function taken there. This
      may break code that relies on comparing addresses passed from other
      components.
      
      CFI checking can be disabled in a function with the __nocfi attribute.
      Additionally, CFI can be disabled for an entire compilation unit by
      filtering out CC_FLAGS_CFI.
      
      By default, CFI failures result in a kernel panic to stop a potential
      exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
      kernel prints out a rate-limited warning instead, and allows execution
      to continue. This option is helpful for locating type mismatches, but
      should only be enabled during development.
      
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
      cf68fffb
  2. Apr 02, 2021
  3. Mar 30, 2021
  4. Mar 29, 2021
    • Rafael J. Wysocki's avatar
      ACPI: tables: x86: Reserve memory occupied by ACPI tables · 1a1c130a
      Rafael J. Wysocki authored
      The following problem has been reported by George Kennedy:
      
       Since commit 7fef431b ("mm/page_alloc: place pages to tail
       in __free_pages_core()") the following use after free occurs
       intermittently when ACPI tables are accessed.
      
       BUG: KASAN: use-after-free in ibft_init+0x134/0xc49
       Read of size 4 at addr ffff8880be453004 by task swapper/0/1
       CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1-7a7fd0de #1
       Call Trace:
        dump_stack+0xf6/0x158
        print_address_description.constprop.9+0x41/0x60
        kasan_report.cold.14+0x7b/0xd4
        __asan_report_load_n_noabort+0xf/0x20
        ibft_init+0x134/0xc49
        do_one_initcall+0xc4/0x3e0
        kernel_init_freeable+0x5af/0x66b
        kernel_init+0x16/0x1d0
        ret_from_fork+0x22/0x30
      
       ACPI tables mapped via kmap() do not have their mapped pages
       reserved and the pages can be "stolen" by the buddy allocator.
      
      Apparently, on the affected system, the ACPI table in question is
      not located in "reserved" memory, like ACPI NVS or ACPI Data, that
      will not be used by the buddy allocator, so the memory occupied by
      that table has to be explicitly reserved to prevent the buddy
      allocator from using it.
      
      In order to address this problem, rearrange the initialization of the
      ACPI tables on x86 to locate the initial tables earlier and reserve
      the memory occupied by them.
      
      The other architectures using ACPI should not be affected by this
      change.
      
      Link: https://lore.kernel.org/linux-acpi/1614802160-29362-1-git-send-email-george.kennedy@oracle.com/
      
      
      Reported-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Tested-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: 5.10+ <stable@vger.kernel.org> # 5.10+
      1a1c130a
  5. Mar 26, 2021
  6. Mar 25, 2021
    • Mike Rapoport's avatar
      mm: memblock: fix section mismatch warning again · a024b7c2
      Mike Rapoport authored
      Commit 34dc2efb ("memblock: fix section mismatch warning") marked
      memblock_bottom_up() and memblock_set_bottom_up() as __init, but they
      could be referenced from non-init functions like
      memblock_find_in_range_node() on architectures that enable
      CONFIG_ARCH_KEEP_MEMBLOCK.
      
      For such builds kernel test robot reports:
      
         WARNING: modpost: vmlinux.o(.text+0x74fea4): Section mismatch in reference from the function memblock_find_in_range_node() to the function .init.text:memblock_bottom_up()
         The function memblock_find_in_range_node() references the function __init memblock_bottom_up().
         This is often because memblock_find_in_range_node lacks a __init  annotation or the annotation of memblock_bottom_up is wrong.
      
      Replace __init annotations with __init_memblock annotations so that the
      appropriate section will be selected depending on
      CONFIG_ARCH_KEEP_MEMBLOCK.
      
      Link: https://lore.kernel.org/lkml/202103160133.UzhgY0wt-lkp@intel.com
      Link: https://lkml.kernel.org/r/20210316171347.14084-1-rppt@kernel.org
      
      
      Fixes: 34dc2efb ("memblock: fix section mismatch warning")
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a024b7c2
    • Sean Christopherson's avatar
      mm/mmu_notifiers: ensure range_end() is paired with range_start() · c2655835
      Sean Christopherson authored
      If one or more notifiers fails .invalidate_range_start(), invoke
      .invalidate_range_end() for "all" notifiers.  If there are multiple
      notifiers, those that did not fail are expecting _start() and _end() to
      be paired, e.g.  KVM's mmu_notifier_count would become imbalanced.
      Disallow notifiers that can fail _start() from implementing _end() so
      that it's unnecessary to either track which notifiers rejected _start(),
      or had already succeeded prior to a failed _start().
      
      Note, the existing behavior of calling _start() on all notifiers even
      after a previous notifier failed _start() was an unintented "feature".
      Make it canon now that the behavior is depended on for correctness.
      
      As of today, the bug is likely benign:
      
        1. The only caller of the non-blocking notifier is OOM kill.
        2. The only notifiers that can fail _start() are the i915 and Nouveau
           drivers.
        3. The only notifiers that utilize _end() are the SGI UV GRU driver
           and KVM.
        4. The GRU driver will never coincide with the i195/Nouveau drivers.
        5. An imbalanced kvm->mmu_notifier_count only causes soft lockup in the
           _guest_, and the guest is already doomed due to being an OOM victim.
      
      Fix the bug now to play nice with future usage, e.g.  KVM has a
      potential use case for blocking memslot updates in KVM while an
      invalidation is in-progress, and failure to unblock would result in said
      updates being blocked indefinitely and hanging.
      
      Found by inspection.  Verified by adding a second notifier in KVM that
      periodically returns -EAGAIN on non-blockable ranges, triggering OOM,
      and observing that KVM exits with an elevated notifier count.
      
      Link: https://lkml.kernel.org/r/20210311180057.1582638-1-seanjc@google.com
      
      
      Fixes: 93065ac7 ("mm, oom: distinguish blockable mode for mmu notifiers")
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Suggested-by: default avatarJason Gunthorpe <jgg@ziepe.ca>
      Reviewed-by: default avatarJason Gunthorpe <jgg@nvidia.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ben Gardon <bgardon@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c2655835
    • Andrey Konovalov's avatar
      kasan: fix per-page tags for non-page_alloc pages · cf10bd4c
      Andrey Konovalov authored
      To allow performing tag checks on page_alloc addresses obtained via
      page_address(), tag-based KASAN modes store tags for page_alloc
      allocations in page->flags.
      
      Currently, the default tag value stored in page->flags is 0x00.
      Therefore, page_address() returns a 0x00ffff...  address for pages that
      were not allocated via page_alloc.
      
      This might cause problems.  A particular case we encountered is a
      conflict with KFENCE.  If a KFENCE-allocated slab object is being freed
      via kfree(page_address(page) + offset), the address passed to kfree()
      will get tagged with 0x00 (as slab pages keep the default per-page
      tags).  This leads to is_kfence_address() check failing, and a KFENCE
      object ending up in normal slab freelist, which causes memory
      corruptions.
      
      This patch changes the way KASAN stores tag in page-flags: they are now
      stored xor'ed with 0xff.  This way, KASAN doesn't need to initialize
      per-page flags for every created page, which might be slow.
      
      With this change, page_address() returns natively-tagged (with 0xff)
      pointers for pages that didn't have tags set explicitly.
      
      This patch fixes the encountered conflict with KFENCE and prevents more
      similar issues that can occur in the future.
      
      Link: https://lkml.kernel.org/r/1a41abb11c51b264511d9e71c303bb16d5cb367b.1615475452.git.andreyknvl@google.com
      
      
      Fixes: 2813b9c0 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Peter Collingbourne <pcc@google.com>
      Cc: Evgenii Stepanov <eugenis@google.com>
      Cc: Branislav Rankov <Branislav.Rankov@arm.com>
      Cc: Kevin Brodsky <kevin.brodsky@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf10bd4c
    • Miaohe Lin's avatar
      hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings · d85aecf2
      Miaohe Lin authored
      The current implementation of hugetlb_cgroup for shared mappings could
      have different behavior.  Consider the following two scenarios:
      
       1.Assume initial css reference count of hugetlb_cgroup is 1:
        1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
            count is 2 associated with 1 file_region.
        1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
            count is 3 associated with 2 file_region.
        1.3 coalesce_file_region will coalesce these two file_regions into
            one. So css reference count is 3 associated with 1 file_region
            now.
      
       2.Assume initial css reference count of hugetlb_cgroup is 1 again:
        2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
            count is 2 associated with 1 file_region.
      
      Therefore, we might have one file_region while holding one or more css
      reference counts. This inconsistency could lead to imbalanced css_get()
      and css_put() pair. If we do css_put one by one (i.g. hole punch case),
      scenario 2 would put one more css reference. If we do css_put all
      together (i.g. truncate case), scenario 1 will leak one css reference.
      
      The imbalanced css_get() and css_put() pair would result in a non-zero
      reference when we try to destroy the hugetlb cgroup. The hugetlb cgroup
      directory is removed __but__ associated resource is not freed. This
      might result in OOM or can not create a new hugetlb cgroup in a busy
      workload ultimately.
      
      In order to fix this, we have to make sure that one file_region must
      hold exactly one css reference. So in coalesce_file_region case, we
      should release one css reference before coalescence. Also only put css
      reference when the entire file_region is removed.
      
      The last thing to note is that the caller of region_add() will only hold
      one reference to h_cg->css for the whole contiguous reservation region.
      But this area might be scattered when there are already some
      file_regions reside in it. As a result, many file_regions may share only
      one h_cg->css reference. In order to ensure that one file_region must
      hold exactly one css reference, we should do css_get() for each
      file_region and release the reference held by caller when they are done.
      
      [linmiaohe@huawei.com: fix imbalanced css_get and css_put pair for shared mappings]
        Link: https://lkml.kernel.org/r/20210316023002.53921-1-linmiaohe@huawei.com
      
      Link: https://lkml.kernel.org/r/20210301120540.37076-1-linmiaohe@huawei.com
      
      
      Fixes: 075a61d0 ("hugetlb_cgroup: add accounting for shared mappings")
      Reported-by: kernel test robot <lkp@intel.com> (auto build test ERROR)
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Wanpeng Li <liwp.linux@gmail.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d85aecf2
  7. Mar 24, 2021
  8. Mar 23, 2021
  9. Mar 22, 2021
    • Andy Shevchenko's avatar
      ACPI: scan: Use unique number for instance_no · eb50aaf9
      Andy Shevchenko authored
      
      The decrementation of acpi_device_bus_id->instance_no
      in acpi_device_del() is incorrect, because it may cause
      a duplicate instance number to be allocated next time
      a device with the same acpi_device_bus_id is added.
      
      Replace above mentioned approach by using IDA framework.
      
      While at it, define the instance range to be [0, 4096).
      
      Fixes: e49bd2dd ("ACPI: use PNPID:instance_no as bus_id of ACPI device")
      Fixes: ca9dc8d4 ("ACPI / scan: Fix acpi_bus_id_list bookkeeping")
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eb50aaf9
    • Shin'ichiro Kawasaki's avatar
      dm table: Fix zoned model check and zone sectors check · 2d669ceb
      Shin'ichiro Kawasaki authored
      
      Commit 24f6b603 ("dm table: fix zoned iterate_devices based device
      capability checks") triggered dm table load failure when dm-zoned device
      is set up for zoned block devices and a regular device for cache.
      
      The commit inverted logic of two callback functions for iterate_devices:
      device_is_zoned_model() and device_matches_zone_sectors(). The logic of
      device_is_zoned_model() was inverted then all destination devices of all
      targets in dm table are required to have the expected zoned model. This
      is fine for dm-linear, dm-flakey and dm-crypt on zoned block devices
      since each target has only one destination device. However, this results
      in failure for dm-zoned with regular cache device since that target has
      both regular block device and zoned block devices.
      
      As for device_matches_zone_sectors(), the commit inverted the logic to
      require all zoned block devices in each target have the specified
      zone_sectors. This check also fails for regular block device which does
      not have zones.
      
      To avoid the check failures, fix the zone model check and the zone
      sectors check. For zone model check, introduce the new feature flag
      DM_TARGET_MIXED_ZONED_MODEL, and set it to dm-zoned target. When the
      target has this flag, allow it to have destination devices with any
      zoned model. For zone sectors check, skip the check if the destination
      device is not a zoned block device. Also add comments and improve an
      error message to clarify expectations to the two checks.
      
      Fixes: 24f6b603 ("dm table: fix zoned iterate_devices based device capability checks")
      Signed-off-by: default avatarShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2d669ceb
  10. Mar 19, 2021
    • Zqiang's avatar
      bpf: Fix umd memory leak in copy_process() · f60a85ca
      Zqiang authored
      
      The syzbot reported a memleak as follows:
      
      BUG: memory leak
      unreferenced object 0xffff888101b41d00 (size 120):
        comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s)
        backtrace:
          [<ffffffff8125dc56>] alloc_pid+0x66/0x560
          [<ffffffff81226405>] copy_process+0x1465/0x25e0
          [<ffffffff81227943>] kernel_clone+0xf3/0x670
          [<ffffffff812281a1>] kernel_thread+0x61/0x80
          [<ffffffff81253464>] call_usermodehelper_exec_work
          [<ffffffff81253464>] call_usermodehelper_exec_work+0xc4/0x120
          [<ffffffff812591c9>] process_one_work+0x2c9/0x600
          [<ffffffff81259ab9>] worker_thread+0x59/0x5d0
          [<ffffffff812611c8>] kthread+0x178/0x1b0
          [<ffffffff8100227f>] ret_from_fork+0x1f/0x30
      
      unreferenced object 0xffff888110ef5c00 (size 232):
        comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s)
        backtrace:
          [<ffffffff8154a0cf>] kmem_cache_zalloc
          [<ffffffff8154a0cf>] __alloc_file+0x1f/0xf0
          [<ffffffff8154a809>] alloc_empty_file+0x69/0x120
          [<ffffffff8154a8f3>] alloc_file+0x33/0x1b0
          [<ffffffff8154ab22>] alloc_file_pseudo+0xb2/0x140
          [<ffffffff81559218>] create_pipe_files+0x138/0x2e0
          [<ffffffff8126c793>] umd_setup+0x33/0x220
          [<ffffffff81253574>] call_usermodehelper_exec_async+0xb4/0x1b0
          [<ffffffff8100227f>] ret_from_fork+0x1f/0x30
      
      After the UMD process exits, the pipe_to_umh/pipe_from_umh and
      tgid need to be released.
      
      Fixes: d71fa5c9 ("bpf: Add kernel module with user mode driver that populates bpffs.")
      Reported-by: default avatar <syzbot+44908bb56d2bfe56b28e@syzkaller.appspotmail.com>
      Signed-off-by: default avatarZqiang <qiang.zhang@windriver.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20210317030915.2865-1-qiang.zhang@windriver.com
      f60a85ca
    • Bhaskar Chowdhury's avatar
      sch_red: Fix a typo · 8a2dc6af
      Bhaskar Chowdhury authored
      
      s/recalcultion/recalculation/
      
      Signed-off-by: default avatarBhaskar Chowdhury <unixbhaskar@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a2dc6af
    • Ard Biesheuvel's avatar
      efi: use 32-bit alignment for efi_guid_t literals · fb98cc0b
      Ard Biesheuvel authored
      
      Commit 494c704f ("efi: Use 32-bit alignment for efi_guid_t") updated
      the type definition of efi_guid_t to ensure that it always appears
      sufficiently aligned (the UEFI spec is ambiguous about this, but given
      the fact that its EFI_GUID type is defined in terms of a struct carrying
      a uint32_t, the natural alignment is definitely >= 32 bits).
      
      However, we missed the EFI_GUID() macro which is used to instantiate
      efi_guid_t literals: that macro is still based on the guid_t type,
      which does not have a minimum alignment at all. This results in warnings
      such as
      
        In file included from drivers/firmware/efi/mokvar-table.c:35:
        include/linux/efi.h:1093:34: warning: passing 1-byte aligned argument to
            4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer
            access [-Walign-mismatch]
                status = get_var(L"SecureBoot", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size,
                                                ^
        include/linux/efi.h:1101:24: warning: passing 1-byte aligned argument to
            4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer
            access [-Walign-mismatch]
                get_var(L"SetupMode", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size, &setupmode);
      
      The distinction only matters on CPUs that do not support misaligned loads
      fully, but 32-bit ARM's load-multiple instructions fall into that category,
      and these are likely to be emitted by the compiler that built the firmware
      for loading word-aligned 128-bit GUIDs from memory
      
      So re-implement the initializer in terms of our own efi_guid_t type, so that
      the alignment becomes a property of the literal's type.
      
      Fixes: 494c704f ("efi: Use 32-bit alignment for efi_guid_t")
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1327
      
      
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      fb98cc0b
  11. Mar 18, 2021
  12. Mar 17, 2021
    • Pablo Neira Ayuso's avatar
      netfilter: nftables: allow to update flowtable flags · 7b35582c
      Pablo Neira Ayuso authored
      
      Honor flowtable flags from the control update path. Disallow disabling
      to toggle hardware offload support though.
      
      Fixes: 8bb69f3b ("netfilter: nf_tables: add flowtable offload control plane")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7b35582c
    • Alexei Starovoitov's avatar
      bpf: Fix fexit trampoline. · e21aa341
      Alexei Starovoitov authored
      
      The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
      The synchronize_rcu_tasks() will not wait for such tasks to complete.
      In such case the trampoline image will be freed and when the task
      wakes up the return IP will point to freed memory causing the crash.
      Solve this by adding percpu_ref_get/put for the duration of trampoline
      and separate trampoline vs its image life times.
      The "half page" optimization has to be removed, since
      first_half->second_half->first_half transition cannot be guaranteed to
      complete in deterministic time. Every trampoline update becomes a new image.
      The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
      call_rcu_tasks. Together they will wait for the original function and
      trampoline asm to complete. The trampoline is patched from nop to jmp to skip
      fexit progs. They are freed independently from the trampoline. The image with
      fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
      will wait for both sleepable and non-sleepable progs to complete.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
      Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
      e21aa341
    • Wei Wang's avatar
      net: fix race between napi kthread mode and busy poll · cb038357
      Wei Wang authored
      
      Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
      determine if the kthread owns this napi and could call napi->poll() on
      it. However, if socket busy poll is enabled, it is possible that the
      busy poll thread grabs this SCHED bit (after the previous napi->poll()
      invokes napi_complete_done() and clears SCHED bit) and tries to poll
      on the same napi. napi_disable() could grab the SCHED bit as well.
      This patch tries to fix this race by adding a new bit
      NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
      ____napi_schedule() if the threaded mode is enabled, and gets cleared
      in napi_complete_done(), and we only poll the napi in kthread if this
      bit is set. This helps distinguish the ownership of the napi between
      kthread and other scenarios and fixes the race issue.
      
      Fixes: 29863d41 ("net: implement threaded-able napi poll loop support")
      Reported-by: default avatarMartin Zaharinov <micron10@gmail.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Cc: Alexander Duyck <alexanderduyck@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb038357
    • Alan Stern's avatar
      usb-storage: Add quirk to defeat Kindle's automatic unload · 546aa0e4
      Alan Stern authored
      
      Matthias reports that the Amazon Kindle automatically removes its
      emulated media if it doesn't receive another SCSI command within about
      one second after a SYNCHRONIZE CACHE.  It does so even when the host
      has sent a PREVENT MEDIUM REMOVAL command.  The reason for this
      behavior isn't clear, although it's not hard to make some guesses.
      
      At any rate, the results can be unexpected for anyone who tries to
      access the Kindle in an unusual fashion, and in theory they can lead
      to data loss (for example, if one file is closed and synchronized
      while other files are still in the middle of being written).
      
      To avoid such problems, this patch creates a new usb-storage quirks
      flag telling the driver always to issue a REQUEST SENSE following a
      SYNCHRONIZE CACHE command, and adds an unusual_devs entry for the
      Kindle with the flag set.  This is sufficient to prevent the Kindle
      from doing its automatic unload, without interfering with proper
      operation.
      
      Another possible way to deal with this would be to increase the
      frequency of TEST UNIT READY polling that the kernel normally carries
      out for removable-media storage devices.  However that would increase
      the overall load on the system and it is not as reliable, because the
      user can override the polling interval.  Changing the driver's
      behavior is safer and has minimal overhead.
      
      CC: <stable@vger.kernel.org>
      Reported-and-tested-by: default avatarMatthias Schwarzott <zzam@gentoo.org>
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Link: https://lore.kernel.org/r/20210317190654.GA497856@rowland.harvard.edu
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      546aa0e4
    • Leon Romanovsky's avatar
      module: remove never implemented MODULE_SUPPORTED_DEVICE · 6417f031
      Leon Romanovsky authored
      
      MODULE_SUPPORTED_DEVICE was added in pre-git era and never was
      implemented. We can safely remove it, because the kernel has grown
      to have many more reliable mechanisms to determine if device is
      supported or not.
      
      Signed-off-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6417f031
    • Waiman Long's avatar
      locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini() · bee64578
      Waiman Long authored
      
      In ww_acquire_init(), mutex_acquire() is gated by CONFIG_DEBUG_LOCK_ALLOC.
      The dep_map in the ww_acquire_ctx structure is also gated by the
      same config. However mutex_release() in ww_acquire_fini() is gated by
      CONFIG_DEBUG_MUTEXES. It is possible to set CONFIG_DEBUG_MUTEXES without
      setting CONFIG_DEBUG_LOCK_ALLOC though it is an unlikely configuration.
      That may cause a compilation error as dep_map isn't defined in this
      case. Fix this potential problem by enclosing mutex_release() inside
      CONFIG_DEBUG_LOCK_ALLOC.
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20210316153119.13802-3-longman@redhat.com
      bee64578
  13. Mar 16, 2021
  14. Mar 15, 2021
Loading