Skip to content
Snippets Groups Projects
  1. Apr 25, 2021
  2. Apr 18, 2021
  3. Apr 11, 2021
  4. Apr 08, 2021
    • Sami Tolvanen's avatar
      add support for Clang CFI · cf68fffb
      Sami Tolvanen authored
      This change adds support for Clang’s forward-edge Control Flow
      Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler
      injects a runtime check before each indirect function call to ensure
      the target is a valid function with the correct static type. This
      restricts possible call targets and makes it more difficult for
      an attacker to exploit bugs that allow the modification of stored
      function pointers. For more details, see:
      
        https://clang.llvm.org/docs/ControlFlowIntegrity.html
      
      
      
      Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain
      visibility to possible call targets. Kernel modules are supported
      with Clang’s cross-DSO CFI mode, which allows checking between
      independently compiled components.
      
      With CFI enabled, the compiler injects a __cfi_check() function into
      the kernel and each module for validating local call targets. For
      cross-module calls that cannot be validated locally, the compiler
      calls the global __cfi_slowpath_diag() function, which determines
      the target module and calls the correct __cfi_check() function. This
      patch includes a slowpath implementation that uses __module_address()
      to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a
      shadow map that speeds up module look-ups by ~3x.
      
      Clang implements indirect call checking using jump tables and
      offers two methods of generating them. With canonical jump tables,
      the compiler renames each address-taken function to <function>.cfi
      and points the original symbol to a jump table entry, which passes
      __cfi_check() validation. This isn’t compatible with stand-alone
      assembly code, which the compiler doesn’t instrument, and would
      result in indirect calls to assembly code to fail. Therefore, we
      default to using non-canonical jump tables instead, where the compiler
      generates a local jump table entry <function>.cfi_jt for each
      address-taken function, and replaces all references to the function
      with the address of the jump table entry.
      
      Note that because non-canonical jump table addresses are local
      to each component, they break cross-module function address
      equality. Specifically, the address of a global function will be
      different in each module, as it's replaced with the address of a local
      jump table entry. If this address is passed to a different module,
      it won’t match the address of the same function taken there. This
      may break code that relies on comparing addresses passed from other
      components.
      
      CFI checking can be disabled in a function with the __nocfi attribute.
      Additionally, CFI can be disabled for an entire compilation unit by
      filtering out CC_FLAGS_CFI.
      
      By default, CFI failures result in a kernel panic to stop a potential
      exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the
      kernel prints out a rate-limited warning instead, and allows execution
      to continue. This option is helpful for locating type mismatches, but
      should only be enabled during development.
      
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
      cf68fffb
    • Kees Cook's avatar
      stack: Optionally randomize kernel stack offset each syscall · 39218ff4
      Kees Cook authored
      This provides the ability for architectures to enable kernel stack base
      address offset randomization. This feature is controlled by the boot
      param "randomize_kstack_offset=on/off", with its default value set by
      CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT.
      
      This feature is based on the original idea from the last public release
      of PaX's RANDKSTACK feature: https://pax.grsecurity.net/docs/randkstack.txt
      All the credit for the original idea goes to the PaX team. Note that
      the design and implementation of this upstream randomize_kstack_offset
      feature differs greatly from the RANDKSTACK feature (see below).
      
      Reasoning for the feature:
      
      This feature aims to make harder the various stack-based attacks that
      rely on deterministic stack structure. We have had many such attacks in
      past (just to name few):
      
      https://jon.oberheide.org/files/infiltrate12-thestackisback.pdf
      https://jon.oberheide.org/files/stackjacking-infiltrate11.pdf
      https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
      
      As Linux kernel stack protections have been constantly improving
      (vmap-based stack allocation with guard pages, removal of thread_info,
      STACKLEAK), attackers have had to find new ways for their exploits
      to work. They have done so, continuing to rely on the kernel's stack
      determinism, in situations where VMAP_STACK and THREAD_INFO_IN_TASK_STRUCT
      were not relevant. For example, the following recent attacks would have
      been hampered if the stack offset was non-deterministic between syscalls:
      
      https://repositorio-aberto.up.pt/bitstream/10216/125357/2/374717.pdf
      (page 70: targeting the pt_regs copy with linear stack overflow)
      
      https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
      (leaked stack address from one syscall as a target during next syscall)
      
      The main idea is that since the stack offset is randomized on each system
      call, it is harder for an attack to reliably land in any particular place
      on the thread stack, even with address exposures, as the stack base will
      change on the next syscall. Also, since randomization is performed after
      placing pt_regs, the ptrace-based approach[1] to discover the randomized
      offset during a long-running syscall should not be possible.
      
      Design description:
      
      During most of the kernel's execution, it runs on the "thread stack",
      which is pretty deterministic in its structure: it is fixed in size,
      and on every entry from userspace to kernel on a syscall the thread
      stack starts construction from an address fetched from the per-cpu
      cpu_current_top_of_stack variable. The first element to be pushed to the
      thread stack is the pt_regs struct that stores all required CPU registers
      and syscall parameters. Finally the specific syscall function is called,
      with the stack being used as the kernel executes the resulting request.
      
      The goal of randomize_kstack_offset feature is to add a random offset
      after the pt_regs has been pushed to the stack and before the rest of the
      thread stack is used during the syscall processing, and to change it every
      time a process issues a syscall. The source of randomness is currently
      architecture-defined (but x86 is using the low byte of rdtsc()). Future
      improvements for different entropy sources is possible, but out of scope
      for this patch. Further more, to add more unpredictability, new offsets
      are chosen at the end of syscalls (the timing of which should be less
      easy to measure from userspace than at syscall entry time), and stored
      in a per-CPU variable, so that the life of the value does not stay
      explicitly tied to a single task.
      
      As suggested by Andy Lutomirski, the offset is added using alloca()
      and an empty asm() statement with an output constraint, since it avoids
      changes to assembly syscall entry code, to the unwinder, and provides
      correct stack alignment as defined by the compiler.
      
      In order to make this available by default with zero performance impact
      for those that don't want it, it is boot-time selectable with static
      branches. This way, if the overhead is not wanted, it can just be
      left turned off with no performance impact.
      
      The generated assembly for x86_64 with GCC looks like this:
      
      ...
      ffffffff81003977: 65 8b 05 02 ea 00 7f  mov %gs:0x7f00ea02(%rip),%eax
      					    # 12380 <kstack_offset>
      ffffffff8100397e: 25 ff 03 00 00        and $0x3ff,%eax
      ffffffff81003983: 48 83 c0 0f           add $0xf,%rax
      ffffffff81003987: 25 f8 07 00 00        and $0x7f8,%eax
      ffffffff8100398c: 48 29 c4              sub %rax,%rsp
      ffffffff8100398f: 48 8d 44 24 0f        lea 0xf(%rsp),%rax
      ffffffff81003994: 48 83 e0 f0           and $0xfffffffffffffff0,%rax
      ...
      
      As a result of the above stack alignment, this patch introduces about
      5 bits of randomness after pt_regs is spilled to the thread stack on
      x86_64, and 6 bits on x86_32 (since its has 1 fewer bit required for
      stack alignment). The amount of entropy could be adjusted based on how
      much of the stack space we wish to trade for security.
      
      My measure of syscall performance overhead (on x86_64):
      
      lmbench: /usr/lib/lmbench/bin/x86_64-linux-gnu/lat_syscall -N 10000 null
          randomize_kstack_offset=y	Simple syscall: 0.7082 microseconds
          randomize_kstack_offset=n	Simple syscall: 0.7016 microseconds
      
      So, roughly 0.9% overhead growth for a no-op syscall, which is very
      manageable. And for people that don't want this, it's off by default.
      
      There are two gotchas with using the alloca() trick. First,
      compilers that have Stack Clash protection (-fstack-clash-protection)
      enabled by default (e.g. Ubuntu[3]) add pagesize stack probes to
      any dynamic stack allocations. While the randomization offset is
      always less than a page, the resulting assembly would still contain
      (unreachable!) probing routines, bloating the resulting assembly. To
      avoid this, -fno-stack-clash-protection is unconditionally added to
      the kernel Makefile since this is the only dynamic stack allocation in
      the kernel (now that VLAs have been removed) and it is provably safe
      from Stack Clash style attacks.
      
      The second gotcha with alloca() is a negative interaction with
      -fstack-protector*, in that it sees the alloca() as an array allocation,
      which triggers the unconditional addition of the stack canary function
      pre/post-amble which slows down syscalls regardless of the static
      branch. In order to avoid adding this unneeded check and its associated
      performance impact, architectures need to carefully remove uses of
      -fstack-protector-strong (or -fstack-protector) in the compilation units
      that use the add_random_kstack() macro and to audit the resulting stack
      mitigation coverage (to make sure no desired coverage disappears). No
      change is visible for this on x86 because the stack protector is already
      unconditionally disabled for the compilation unit, but the change is
      required on arm64. There is, unfortunately, no attribute that can be
      used to disable stack protector for specific functions.
      
      Comparison to PaX RANDKSTACK feature:
      
      The RANDKSTACK feature randomizes the location of the stack start
      (cpu_current_top_of_stack), i.e. including the location of pt_regs
      structure itself on the stack. Initially this patch followed the same
      approach, but during the recent discussions[2], it has been determined
      to be of a little value since, if ptrace functionality is available for
      an attacker, they can use PTRACE_PEEKUSR/PTRACE_POKEUSR to read/write
      different offsets in the pt_regs struct, observe the cache behavior of
      the pt_regs accesses, and figure out the random stack offset. Another
      difference is that the random offset is stored in a per-cpu variable,
      rather than having it be per-thread. As a result, these implementations
      differ a fair bit in their implementation details and results, though
      obviously the intent is similar.
      
      [1] https://lore.kernel.org/kernel-hardening/2236FBA76BA1254E88B949DDB74E612BA4BC57C1@IRSMSX102.ger.corp.intel.com/
      [2] https://lore.kernel.org/kernel-hardening/20190329081358.30497-1-elena.reshetova@intel.com/
      [3] https://lists.ubuntu.com/archives/ubuntu-devel/2019-June/040741.html
      
      
      
      Co-developed-by: default avatarElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: default avatarElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20210401232347.2791257-4-keescook@chromium.org
      39218ff4
  5. Apr 04, 2021
  6. Mar 28, 2021
  7. Mar 21, 2021
  8. Mar 14, 2021
  9. Mar 11, 2021
  10. Mar 09, 2021
  11. Mar 06, 2021
  12. Mar 01, 2021
  13. Feb 28, 2021
  14. Feb 25, 2021
  15. Feb 24, 2021
  16. Feb 23, 2021
    • Sami Tolvanen's avatar
      kbuild: lto: force rebuilds when switching CONFIG_LTO · 5e95325f
      Sami Tolvanen authored
      
      When doing non-clean builds and switching between CONFIG_LTO=n and
      CONFIG_LTO=y, the build system (correctly) didn't notice that assembly
      and LTO-excluded C object files were rewritten in place by objtool (to
      add the .orc_unwind* sections), since their build command lines were the
      same between CONFIG_LTO=y and CONFIG_LTO=n. The objtool step would fail:
      
      vmlinux.o: warning: objtool: file already has .orc_unwind section, skipping
      make: *** [Makefile:1194: vmlinux] Error 255
      
      Avoid this by making sure the build will see a difference between an LTO
      and non-LTO build (by including "-fno-lto" in KBUILD_*FLAGS). This will
      get ignored when CC_FLAGS_LTO is present, and will not be included at
      all when CONFIG_LTO=n.
      
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      5e95325f
    • Sami Tolvanen's avatar
      tracing: add support for objtool mcount · 22c8542d
      Sami Tolvanen authored
      
      This change adds build support for using objtool to generate
      __mcount_loc sections.
      
      Signed-off-by: default avatarSami Tolvanen <samitolvanen@google.com>
      22c8542d
  17. Feb 17, 2021
  18. Feb 16, 2021
    • Sasha Levin's avatar
      kbuild: simplify access to the kernel's version · 88a68672
      Sasha Levin authored
      
      Instead of storing the version in a single integer and having various
      kernel (and userspace) code how it's constructed, export individual
      (major, patchlevel, sublevel) components and simplify kernel code that
      uses it.
      
      This should also make it easier on userspace.
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      88a68672
    • Sasha Levin's avatar
      kbuild: clamp SUBLEVEL to 255 · 9b82f13e
      Sasha Levin authored
      
      Right now if SUBLEVEL becomes larger than 255 it will overflow into the
      territory of PATCHLEVEL, causing havoc in userspace that tests for
      specific kernel version.
      
      While userspace code tests for MAJOR and PATCHLEVEL, it doesn't test
      SUBLEVEL at any point as ABI changes don't happen in the context of
      stable tree.
      
      Thus, to avoid overflows, simply clamp SUBLEVEL to it's maximum value in
      the context of LINUX_VERSION_CODE. This does not affect "make
      kernelversion" and such.
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      9b82f13e
    • Nick Desaulniers's avatar
      Kconfig: allow explicit opt in to DWARF v5 · 98cd6f52
      Nick Desaulniers authored
      DWARF v5 is the latest standard of the DWARF debug info format. GCC 11
      will change the implicit default DWARF version, if left unspecified, to
      DWARF v5.
      
      Allow users of Clang and older versions of GCC that have not changed the
      implicit default DWARF version to DWARF v5 to opt in. This can help
      testing consumers of DWARF debug info in preparation of v5 becoming more
      widespread, as well as result in significant binary size savings of the
      pre-stripped vmlinux image.
      
      DWARF5 wins significantly in terms of size when mixed with compression
      (CONFIG_DEBUG_INFO_COMPRESSED).
      
      363M    vmlinux.clang12.dwarf5.compressed
      434M    vmlinux.clang12.dwarf4.compressed
      439M    vmlinux.clang12.dwarf2.compressed
      457M    vmlinux.clang12.dwarf5
      536M    vmlinux.clang12.dwarf4
      548M    vmlinux.clang12.dwarf2
      
      515M    vmlinux.gcc10.2.dwarf5.compressed
      599M    vmlinux.gcc10.2.dwarf4.compressed
      624M    vmlinux.gcc10.2.dwarf2.compressed
      630M    vmlinux.gcc10.2.dwarf5
      765M    vmlinux.gcc10.2.dwarf4
      809M    vmlinux.gcc10.2.dwarf2
      
      Though the quality of debug info is harder to quantify; size is not a
      proxy for quality.
      
      Jakub notes:
        One thing is GCC DWARF-5 support, that is whether the compiler will
        support -gdwarf-5 flag, and that support should be there from GCC 7
        onwards.
      
        All [GCC] 5.1 - 6.x did was start accepting -gdwarf-5 as experimental
        option that enabled some small DWARF subset (initially only a few
        DW_LANG_* codes newly added to DWARF5 drafts).  Only GCC 7 (released
        after DWARF 5 has been finalized) started emitting DWARF5 section
        headers and got most of the DWARF5 changes in...
      
        Another separate thing is whether the assembler does support
        the -gdwarf-5 option (i.e. if you can compile assembler files
        with -Wa,-gdwarf-5) ... That option is about whether the assembler
        will emit DWARF5 or DWARF2 .debug_line.  It is fine to compile C sources
        with -gdwarf-5 and use DWARF2 .debug_line for assembler files if as
        doesn't support it.
      
      Version check GCC so that we don't need to worry about the difference in
      command line args between GNU readelf and llvm-readelf/llvm-dwarfdump to
      validate the DWARF Version in the assembler feature detection script.
      
      Most issues with clang produced assembler were fixed in binutils 2.35.1,
      but 2.35.2 fixed issues related to requiring the flag -Wa,-gdwarf-5
      explicitly. The added shell script test checks for the latter, and is
      only required when using clang without its integrated assembler, though
      we use for clang regardless as we do not yet have a way to query the
      assembler from Kconfig.
      
      Disabled for now if CONFIG_DEBUG_INFO_BTF is set; pahole doesn't yet
      recognize the new additions to the DWARF debug info.
      
      This only modifies the DWARF version emitted by the compiler, not the
      assembler.
      
      The DWARF version of a binary can be validated with:
      $ llvm-dwarfdump <object file> | head -n 4 | grep version
      or
      $ readelf --debug-dump=info <object file> 2>/dev/null | grep Version
      
      Parts of the tree don't reuse DEBUG_CFLAGS as they should; such cleanup
      is left as a follow up.
      
      Link: http://www.dwarfstd.org/doc/DWARF5.pdf
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1922707
      
      
      Reported-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Suggested-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Suggested-by: default avatarCaroline Tice <cmtice@google.com>
      Suggested-by: default avatarFangrui Song <maskray@google.com>
      Suggested-by: default avatarJakub Jelinek <jakub@redhat.com>
      Suggested-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Suggested-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v12.0.0-rc1 x86-64
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      98cd6f52
    • Nick Desaulniers's avatar
      Kbuild: make DWARF version a choice · a66049e2
      Nick Desaulniers authored
      
      Adds a default CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT which allows
      the implicit default version of DWARF emitted by the toolchain to
      progress over time.
      
      Modifies CONFIG_DEBUG_INFO_DWARF4 to be a member of a choice, making it
      mutually exclusive with CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT. Users
      may want to select this if they are using a newer toolchain, but have
      consumers of the DWARF debug info that aren't yet ready for newer DWARF
      versions' debug info.
      
      Does so in a way that's forward compatible with existing
      configs, and makes adding future versions more straightforward. This
      patch does not change the current behavior or selection of DWARF
      version for users upgrading to kernels with this patch.
      
      GCC since ~4.8 has defaulted to DWARF v4 implicitly, and GCC 11 has
      bumped this to v5.
      
      Remove the Kconfig help text  about DWARF v4 being larger.  It's
      empirically false for the latest toolchains for x86_64 defconfig, has no
      point of reference (I suspect it was DWARF v2 but that's stil
      empirically false), and debug info size is not a qualatative measure.
      
      Suggested-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Suggested-by: default avatarFangrui Song <maskray@google.com>
      Suggested-by: default avatarJakub Jelinek <jakub@redhat.com>
      Suggested-by: default avatarMark Wielaard <mark@klomp.org>
      Suggested-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Suggested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      a66049e2
    • Masahiro Yamada's avatar
      kbuild: stop removing stale <linux/version.h> file · 0dd77e95
      Masahiro Yamada authored
      
      Revert commit 223c24a7 ("kbuild: Automatically remove stale
      <linux/version.h> file").
      
      It was more than 6 years ago. I do not expect anybody to start
      git-bisect for such a big window.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      0dd77e95
  19. Feb 14, 2021
  20. Feb 11, 2021
  21. Feb 09, 2021
  22. Feb 07, 2021
  23. Feb 05, 2021
  24. Feb 04, 2021
  25. Feb 01, 2021
  26. Jan 31, 2021
  27. Jan 29, 2021
  28. Jan 28, 2021
  29. Jan 25, 2021
Loading