Skip to content
Snippets Groups Projects
  1. Apr 08, 2021
    • Sami Tolvanen's avatar
      add support for Clang CFI · cf68fffb
      Sami Tolvanen authored
      This change adds support for Clang’s forward-edge Control Flow
      Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
      injects a runtime check before each indirect function call to ensure
      the target is a valid function with the correct static type. This
      restricts possible call targets and makes it more difficult for
      an attacker to exploit bugs that allow the modification of stored
      function pointers. For more details, see:
      
        https://clang.llvm.org/docs/ControlFlowIntegrity.html
      
      
      
      Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
      visibility to possible call targets. Kernel modules are supported
      with Clang’s cross-DSO CFI mode, which allows checking between
      independently compiled components.
      
      With CFI enabled, the compiler injects a __cfi_check() function into
      the kernel and each module for validating local call targets. For
      cross-module calls that cannot be validated locally, the compiler
      calls the global __cfi_slowpath_diag() function, which determines
      the target module and calls the correct __cfi_check() function. This
      patch includes a slowpath implementation that uses __module_address()
      to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
      shadow map that speeds up module look-ups by ~3x.
      
      Clang implements indirect call checking using jump tables and
      offers two methods of generating them. With canonical jump tables,
      the compiler renames each address-taken function to <function>.cfi
      and points the original symbol to a jump table entry, which passes
      __cfi_check() validation. This isn’t compatible with stand-alone
      assembly code, which the compiler doesn’t instrument, and would
      result in indirect calls to assembly code to fail. Therefore, we
      default to using non-canonical jump tables instead, where the compiler
      generates a local jump table entry <function>.cfi_jt for each
      address-taken function, and replaces all references to the function
      with the address of the jump table entry.
      
      Note that because non-canonical jump table addresses are local
      to each component, they break cross-module function address
      equality. Specifically, the address of a global function will be
      different in each module, as it's replaced with the address of a local
      jump table entry. If this address is passed to a different module,
      it won’t match the address of the same function taken there. This
      may break code that relies on comparing addresses passed from other
      components.
      
      CFI checking can be disabled in a function with the __nocfi attribute.
      Additionally, CFI can be disabled for an entire compilation unit by
      filtering out CC_FLAGS_CFI.
      
      By default, CFI failures result in a kernel panic to stop a potential
      exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
      kernel prints out a rate-limited warning instead, and allows execution
      to continue. This option is helpful for locating type mismatches, but
      should only be enabled during development.
      
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
      cf68fffb
  2. Mar 13, 2021
    • Masahiro Yamada's avatar
      init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM · ea29b20a
      Masahiro Yamada authored
      I read the commit log of the following two:
      
      - bc083a64 ("init/Kconfig: make COMPILE_TEST depend on !UML")
      - 334ef6ed ("init/Kconfig: make COMPILE_TEST depend on !S390")
      
      Both are talking about HAS_IOMEM dependency missing in many drivers.
      
      So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me.
      
      This does not change the behavior of UML. UML still cannot enable
      COMPILE_TEST because it does not provide HAS_IOMEM.
      
      The current dependency for S390 is too strong. Under the condition of
      CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST.
      
      I also removed the meaningless 'default n'.
      
      Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org
      
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Arnd Bergmann <arnd@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KP Singh <kpsingh@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Quentin Perret <qperret@google.com>
      Cc: Valentin Schneider <valentin.schneider@arm.com>
      Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea29b20a
  3. Mar 11, 2021
  4. Feb 28, 2021
  5. Feb 26, 2021
  6. Feb 24, 2021
  7. Feb 23, 2021
    • Linus Torvalds's avatar
      Kbuild: disable TRIM_UNUSED_KSYMS option · 5cf0fd59
      Linus Torvalds authored
      
      The removal of EXPORT_UNUSED_SYMBOL() in commit 36794822 looks like
      (and was sold as) a no-op, but it actually had a rather serious and
      subtle side effect: the UNUSED_SYMBOLS option not only enabled the
      removed (unused) functionality, it also _disabled_ the TRIM_UNUSED_KSYMS
      functionality.
      
      And it turns out that TRIM_UNUSED_KSYMS is a huge time waste, and takes
      up a third of the kernel build time for me.  For no actual upside, since
      no distro is likely to ever be able to enable it (because they all
      support external kernel modules).
      
      Rather than re-enable EXPORT_UNUSED_SYMBOL, this just disables the
      TRIM_UNUSED_KSYMS option by marking it broken.  I'm tempted to just
      remove the support entirely, but maybe somebody has a use-case and can
      fix the behavior of it.
      
      I could have just disabled it for COMPILE_TEST, but it really smells
      like the TRIM_UNUSED_KSYMS option is badly done and not really useful,
      so this takes the more direct approach - let's see if anybody ever
      actually notices or complains.
      
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Emil Velikov <emil.l.velikov@gmail.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jessica Yu <jeyu@kernel.org>
      Fixes: 36794822 ("module: remove EXPORT_UNUSED_SYMBOL*")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5cf0fd59
  8. Feb 21, 2021
    • Masahiro Yamada's avatar
      kbuild: check the minimum linker version in Kconfig · 02aff859
      Masahiro Yamada authored
      
      Unify the two scripts/ld-version.sh and scripts/lld-version.sh, and
      check the minimum linker version like scripts/cc-version.sh did.
      
      I tested this script for some corner cases reported in the past:
      
       - GNU ld version 2.25-15.fc23
         as reported by commit 8083013f ("ld-version: Fix it on Fedora")
      
       - GNU ld (GNU Binutils) 2.20.1.20100303
         as reported by commit 0d61ed17 ("ld-version: Drop the 4th and
         5th version components")
      
      This script show an error message if the linker is too old:
      
        $ make LD=ld.lld-9
          SYNC    include/config/auto.conf
        ***
        *** Linker is too old.
        ***   Your LLD version:    9.0.1
        ***   Minimum LLD version: 10.0.1
        ***
        scripts/Kconfig.include:50: Sorry, this linker is not supported.
        make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1
        make[1]: *** [Makefile:600: syncconfig] Error 2
        make: *** [Makefile:708: include/config/auto.conf] Error 2
      
      I also moved the check for gold to this script, so gold is still rejected:
      
        $ make LD=gold
          SYNC    include/config/auto.conf
        gold linker is not supported as it is not capable of linking the kernel proper.
        scripts/Kconfig.include:50: Sorry, this linker is not supported.
        make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1
        make[1]: *** [Makefile:600: syncconfig] Error 2
        make: *** [Makefile:708: include/config/auto.conf] Error 2
      
      Thanks to David Laight for suggesting shell script improvements.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      02aff859
  9. Feb 19, 2021
  10. Feb 16, 2021
  11. Feb 15, 2021
  12. Feb 08, 2021
  13. Feb 05, 2021
  14. Jan 29, 2021
  15. Jan 27, 2021
  16. Jan 14, 2021
  17. Jan 08, 2021
  18. Dec 22, 2020
  19. Dec 15, 2020
    • Vlastimil Babka's avatar
      mm, page_alloc: do not rely on the order of page_poison and init_on_alloc/free parameters · 04013513
      Vlastimil Babka authored
      Patch series "cleanup page poisoning", v3.
      
      I have identified a number of issues and opportunities for cleanup with
      CONFIG_PAGE_POISON and friends:
      
       - interaction with init_on_alloc and init_on_free parameters depends on
         the order of parameters (Patch 1)
      
       - the boot time enabling uses static key, but inefficienty (Patch 2)
      
       - sanity checking is incompatible with hibernation (Patch 3)
      
       - CONFIG_PAGE_POISONING_NO_SANITY can be removed now that we have
         init_on_free (Patch 4)
      
       - CONFIG_PAGE_POISONING_ZERO can be most likely removed now that we
         have init_on_free (Patch 5)
      
      This patch (of 5):
      
      Enabling page_poison=1 together with init_on_alloc=1 or init_on_free=1
      produces a warning in dmesg that page_poison takes precedence.  However,
      as these warnings are printed in early_param handlers for
      init_on_alloc/free, they are not printed if page_poison is enabled later
      on the command line (handlers are called in the order of their
      parameters), or when init_on_alloc/free is always enabled by the
      respective config option - before the page_poison early param handler is
      called, it is not considered to be enabled.  This is inconsistent.
      
      We can remove the dependency on order by making the init_on_* parameters
      only set a boolean variable, and postponing the evaluation after all early
      params have been processed.  Introduce a new
      init_mem_debugging_and_hardening() function for that, and move the related
      debug_pagealloc processing there as well.
      
      As a result init_mem_debugging_and_hardening() knows always accurately if
      init_on_* and/or page_poison options were enabled.  Thus we can also
      optimize want_init_on_alloc() and want_init_on_free().  We don't need to
      check page_poisoning_enabled() there, we can instead not enable the
      init_on_* static keys at all, if page poisoning is enabled.  This results
      in a simpler and more effective code.
      
      Link: https://lkml.kernel.org/r/20201113104033.22907-1-vbabka@suse.cz
      Link: https://lkml.kernel.org/r/20201113104033.22907-2-vbabka@suse.cz
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mateusz Nosek <mateusznosek0@gmail.com>
      Cc: Laura Abbott <labbott@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04013513
    • Lin Feng's avatar
      init/main: fix broken buffer_init when DEFERRED_STRUCT_PAGE_INIT set · ba8f3587
      Lin Feng authored
      In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set,
      we have following callchain:
      
      start_kernel
      ...
        mm_init
          mem_init
           memblock_free_all
             reset_all_zones_managed_pages
             free_low_memory_core_early
      ...
        buffer_init
          nr_free_buffer_pages
            zone->managed_pages
      ...
        rest_init
          kernel_init
            kernel_init_freeable
              page_alloc_init_late
                kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
                wait_for_completion(&pgdat_init_all_done_comp);
                ...
                files_maxfiles_init
      
      It's clear that buffer_init depends on zone->managed_pages, but it's reset
      in reset_all_zones_managed_pages after that pages are readded into
      zone->managed_pages, but when buffer_init runs this process is half done
      and most of them will finally be added till deferred_init_memmap done.  In
      large memory couting of nr_free_buffer_pages drifts too much, also
      drifting from kernels to kernels on same hardware.
      
      Fix is simple, it delays buffer_init run till deferred_init_memmap all
      done.
      
      But as corrected by this patch, max_buffer_heads becomes very large, the
      value is roughly as many as 4 times of totalram_pages, formula:
      max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct
      buffer_head));
      
      Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads
      turns out to be roughly 67,108,864.  In common cases, should a buffer_head
      be mapped to one page/block(4KB)?  So max_buffer_heads never exceeds
      totalram_pages.  IMO it's likely to make buffer_heads_over_limit bool
      value alwasy false, then make codes 'if (buffer_heads_over_limit)' test in
      vmscan unnecessary.
      
      So this patch will change the original behavior related to
      buffer_heads_over_limit in vmscan since we used a half done value of
      zone->managed_pages before, or should we use a smaller factor(<10%) in
      previous formula.
      
      akpm: I think this is OK - the max_buffer_heads code is only needed on
      highmem machines, to prevent ZONE_NORMAL from being consumed by large
      amounts of buffer_heads attached to highmem pagecache.  This problem will
      not occur on 64-bit machines, so this feature's non-functionality on such
      machines is a feature, not a bug.
      
      Link: https://lkml.kernel.org/r/20201123110500.103523-1-linf@wangsu.com
      
      
      Signed-off-by: default avatarLin Feng <linf@wangsu.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba8f3587
    • Zhenhua Huang's avatar
      mm: fix page_owner initializing issue for arm32 · 7fb7ab6d
      Zhenhua Huang authored
      Page owner of pages used by page owner itself used is missing on arm32
      targets.  The reason is dummy_handle and failure_handle is not initialized
      correctly.  Buddy allocator is used to initialize these two handles.
      However, buddy allocator is not ready when page owner calls it.  This
      change fixed that by initializing page owner after buddy initialization.
      
      The working flow before and after this change are:
      original logic:
       1. allocated memory for page_ext(using memblock).
       2. invoke the init callback of page_ext_ops like page_owner(using buddy
          allocator).
       3. initialize buddy.
      
      after this change:
       1. allocated memory for page_ext(using memblock).
       2. initialize buddy.
       3. invoke the init callback of page_ext_ops like page_owner(using buddy
          allocator).
      
      with the change, failure/dummy_handle can get its correct value and page
      owner output for example has the one for page owner itself:
      
        Page allocated via order 2, mask 0x6202c0(GFP_USER|__GFP_NOWARN), pid 1006, ts 67278156558 ns
        PFN 543776 type Unmovable Block 531 type Unmovable Flags 0x0()
          init_page_owner+0x28/0x2f8
          invoke_init_callbacks_flatmem+0x24/0x34
          start_kernel+0x33c/0x5d8
      
      Link: https://lkml.kernel.org/r/1603104925-5888-1-git-send-email-zhenhuah@codeaurora.org
      
      
      Signed-off-by: default avatarZhenhua Huang <zhenhuah@codeaurora.org>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7fb7ab6d
  20. Dec 14, 2020
  21. Dec 11, 2020
  22. Dec 10, 2020
    • Eric W. Biederman's avatar
      exec: Transform exec_update_mutex into a rw_semaphore · f7cfd871
      Eric W. Biederman authored
      Recently syzbot reported[0] that there is a deadlock amongst the users
      of exec_update_mutex.  The problematic lock ordering found by lockdep
      was:
      
         perf_event_open  (exec_update_mutex -> ovl_i_mutex)
         chown            (ovl_i_mutex       -> sb_writes)
         sendfile         (sb_writes         -> p->lock)
           by reading from a proc file and writing to overlayfs
         proc_pid_syscall (p->lock           -> exec_update_mutex)
      
      While looking at possible solutions it occured to me that all of the
      users and possible users involved only wanted to state of the given
      process to remain the same.  They are all readers.  The only writer is
      exec.
      
      There is no reason for readers to block on each other.  So fix
      this deadlock by transforming exec_update_mutex into a rw_semaphore
      named exec_update_lock that only exec takes for writing.
      
      Cc: Jann Horn <jannh@google.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Bernd Edlinger <bernd.edlinger@hotmail.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Christopher Yeoh <cyeoh@au1.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Christian Brauner <christian.brauner@ubuntu.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Fixes: eea96732 ("exec: Add exec_update_mutex to replace cred_guard_mutex")
      [0] https://lkml.kernel.org/r/00000000000063640c05ade8e3de@google.com
      
      
      Reported-by: default avatar <syzbot+db9cdf3dd1f64252c6ef@syzkaller.appspotmail.com>
      Link: https://lkml.kernel.org/r/87ft4mbqen.fsf@x220.int.ebiederm.org
      
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      f7cfd871
  23. Dec 01, 2020
Loading