Skip to content
Snippets Groups Projects
  1. May 05, 2017
    • Stafford Horne's avatar
      initramfs: Always do fput() and load modules after rootfs populate · 17a9be31
      Stafford Horne authored
      
      In OpenRISC we do not have a bootloader passed initrd, but the built in
      initramfs does contain the /init and other binaries, including modules.
      The previous commit 08865514 ("initramfs: finish fput() before
      accessing any binary from initramfs") made a change to only call fput()
      if the bootloader initrd was available, this caused intermittent crashes
      for OpenRISC.
      
      This patch changes the fput() to happen unconditionally if any rootfs is
      loaded. Also, I added some comments to make it a bit more clear why we
      call unpack_to_rootfs() multiple times.
      
      Fixes: 08865514 ("initramfs: finish fput() before accessing any binary from initramfs")
      Cc: stable@vger.kernel.org
      Cc: Lokesh Vutla <lokeshvutla@ti.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarStafford Horne <shorne@gmail.com>
      17a9be31
  2. Apr 01, 2017
    • Michal Hocko's avatar
      mm: move mm_percpu_wq initialization earlier · 597b7305
      Michal Hocko authored
      Yang Li has reported that drain_all_pages triggers a WARN_ON which means
      that this function is called earlier than the mm_percpu_wq is
      initialized on arm64 with CMA configured:
      
        WARNING: CPU: 2 PID: 1 at mm/page_alloc.c:2423 drain_all_pages+0x244/0x25c
        Modules linked in:
        CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc1-next-20170310-00027-g64dfbc5 #127
        Hardware name: Freescale Layerscape 2088A RDB Board (DT)
        task: ffffffc07c4a6d00 task.stack: ffffffc07c4a8000
        PC is at drain_all_pages+0x244/0x25c
        LR is at start_isolate_page_range+0x14c/0x1f0
        [...]
         drain_all_pages+0x244/0x25c
         start_isolate_page_range+0x14c/0x1f0
         alloc_contig_range+0xec/0x354
         cma_alloc+0x100/0x1fc
         dma_alloc_from_contiguous+0x3c/0x44
         atomic_pool_init+0x7c/0x208
         arm64_dma_init+0x44/0x4c
         do_one_initcall+0x38/0x128
         kernel_init_freeable+0x1a0/0x240
         kernel_init+0x10/0xfc
         ret_from_fork+0x10/0x20
      
      Fix this by moving the whole setup_vmstat which is an initcall right now
      to init_mm_internals which will be called right after the WQ subsystem
      is initialized.
      
      Link: http://lkml.kernel.org/r/20170315164021.28532-1-mhocko@kernel.org
      
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarYang Li <pku.leo@gmail.com>
      Tested-by: default avatarYang Li <pku.leo@gmail.com>
      Tested-by: default avatarXiaolong Ye <xiaolong.ye@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      597b7305
  3. Mar 02, 2017
  4. Feb 28, 2017
  5. Feb 23, 2017
    • Tejun Heo's avatar
      slub: make sysfs directories for memcg sub-caches optional · 1663f26d
      Tejun Heo authored
      SLUB creates a per-cache directory under /sys/kernel/slab which hosts a
      bunch of debug files.  Usually, there aren't that many caches on a
      system and this doesn't really matter; however, if memcg is in use, each
      cache can have per-cgroup sub-caches.  SLUB creates the same directories
      for these sub-caches under /sys/kernel/slab/$CACHE/cgroup.
      
      Unfortunately, because there can be a lot of cgroups, active or
      draining, the product of the numbers of caches, cgroups and files in
      each directory can reach a very high number - hundreds of thousands is
      commonplace.  Millions and beyond aren't difficult to reach either.
      
      What's under /sys/kernel/slab is primarily for debugging and the
      information and control on the a root cache already cover its
      sub-caches.  While having a separate directory for each sub-cache can be
      helpful for development, it doesn't make much sense to pay this amount
      of overhead by default.
      
      This patch introduces a boot parameter slub_memcg_sysfs which determines
      whether to create sysfs directories for per-memcg sub-caches.  It also
      adds CONFIG_SLUB_MEMCG_SYSFS_ON which determines the boot parameter's
      default value and defaults to 0.
      
      [akpm@linux-foundation.org: kset_unregister(NULL) is legal]
      Link: http://lkml.kernel.org/r/20170204145203.GB26958@mtj.duckdns.org
      
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1663f26d
  6. Feb 14, 2017
    • Matthew Wilcox's avatar
      Reimplement IDR and IDA using the radix tree · 0a835c4f
      Matthew Wilcox authored
      
      The IDR is very similar to the radix tree.  It has some functionality that
      the radix tree did not have (alloc next free, cyclic allocation, a
      callback-based for_each, destroy tree), which is readily implementable on
      top of the radix tree.  A few small changes were needed in order to use a
      tag to represent nodes with free space below them.  More extensive
      changes were needed to support storing NULL as a valid entry in an IDR.
      Plain radix trees still interpret NULL as a not-present entry.
      
      The IDA is reimplemented as a client of the newly enhanced radix tree.  As
      in the current implementation, it uses a bitmap at the last level of the
      tree.
      
      Signed-off-by: default avatarMatthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarMatthew Wilcox <mawilcox@microsoft.com>
      Tested-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0a835c4f
  7. Feb 09, 2017
    • Paul Gortmaker's avatar
      core: migrate exception table users off module.h and onto extable.h · 8a293be0
      Paul Gortmaker authored
      
      These files were including module.h for exception table related
      functions.  We've now separated that content out into its own file
      "extable.h" so now move over to that and where possible, avoid all
      the extra header content in module.h that we don't really need to
      compile these non-modular files.
      
      Note:
         init/main.c still needs module.h for __init_or_module
         kernel/extable.c still needs module.h for is_module_text_address
      
      ...and so we don't get the benefit of removing module.h from the cpp
      feed for these two files, unlike the almost universal 1:1 exchange
      of module.h for extable.h we were able to do in the arch dirs.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarJessica Yu <jeyu@redhat.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      8a293be0
  8. Feb 08, 2017
  9. Feb 07, 2017
  10. Feb 03, 2017
    • Ard Biesheuvel's avatar
      kbuild: modversions: add infrastructure for emitting relative CRCs · 56067812
      Ard Biesheuvel authored
      
      This add the kbuild infrastructure that will allow architectures to emit
      vmlinux symbol CRCs as 32-bit offsets to another location in the kernel
      where the actual value is stored. This works around problems with CRCs
      being mistaken for relocatable symbols on kernels that self relocate at
      runtime (i.e., powerpc with CONFIG_RELOCATABLE=y)
      
      For the kbuild side of things, this comes down to the following:
      
       - introducing a Kconfig symbol MODULE_REL_CRCS
      
       - adding a -R switch to genksyms to instruct it to emit the CRC symbols
         as references into the .rodata section
      
       - making modpost distinguish such references from absolute CRC symbols
         by the section index (SHN_ABS)
      
       - making kallsyms disregard non-absolute symbols with a __crc_ prefix
      
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56067812
  11. Feb 01, 2017
  12. Jan 27, 2017
    • Jason A. Donenfeld's avatar
      random: use chacha20 for get_random_int/long · f5b98461
      Jason A. Donenfeld authored
      
      Now that our crng uses chacha20, we can rely on its speedy
      characteristics for replacing MD5, while simultaneously achieving a
      higher security guarantee. Before the idea was to use these functions if
      you wanted random integers that aren't stupidly insecure but aren't
      necessarily secure either, a vague gray zone, that hopefully was "good
      enough" for its users. With chacha20, we can strengthen this claim,
      since either we're using an rdrand-like instruction, or we're using the
      same crng as /dev/urandom. And it's faster than what was before.
      
      We could have chosen to replace this with a SipHash-derived function,
      which might be slightly faster, but at the cost of having yet another
      RNG construction in the kernel. By moving to chacha20, we have a single
      RNG to analyze and verify, and we also already get good performance
      improvements on all platforms.
      
      Implementation-wise, rather than use a generic buffer for both
      get_random_int/long and memcpy based on the size needs, we use a
      specific buffer for 32-bit reads and for 64-bit reads. This way, we're
      guaranteed to always have aligned accesses on all platforms. While
      slightly more verbose in C, the assembly this generates is a lot
      simpler than otherwise.
      
      Finally, on 32-bit platforms where longs and ints are the same size,
      we simply alias get_random_int to get_random_long.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Suggested-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      f5b98461
  13. Jan 23, 2017
  14. Jan 19, 2017
  15. Jan 17, 2017
  16. Jan 14, 2017
    • Peter Zijlstra's avatar
      locking/atomic, kref: Add KREF_INIT() · 1e24edca
      Peter Zijlstra authored
      
      Since we need to change the implementation, stop exposing internals.
      
      Provide KREF_INIT() to allow static initialization of struct kref.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1e24edca
    • Peter Zijlstra's avatar
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra authored
      
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9881b024
  17. Jan 11, 2017
    • Arnd Bergmann's avatar
      cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig · 73b35147
      Arnd Bergmann authored
      
      We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
      not right when CONFIG_NET is disabled and there is no socket interface:
      
      warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)
      
      I don't know what the correct solution for this is, but simply removing
      the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
      'if NET' section avoids the warning and does not produce other build
      errors.
      
      Fixes: 483c4933 ("cgroup: Fix CGROUP_BPF config")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73b35147
  18. Jan 10, 2017
    • Parav Pandit's avatar
      rdmacg: Added rdma cgroup controller · 39d3e758
      Parav Pandit authored
      
      Added rdma cgroup controller that does accounting, limit enforcement
      on rdma/IB resources.
      
      Added rdma cgroup header file which defines its APIs to perform
      charging/uncharging functionality. It also defined APIs for RDMA/IB
      stack for device registration. Devices which are registered will
      participate in controller functions of accounting and limit
      enforcements. It define rdmacg_device structure to bind IB stack
      and RDMA cgroup controller.
      
      RDMA resources are tracked using resource pool. Resource pool is per
      device, per cgroup entity which allows setting up accounting limits
      on per device basis.
      
      Currently resources are defined by the RDMA cgroup.
      
      Resource pool is created/destroyed dynamically whenever
      charging/uncharging occurs respectively and whenever user
      configuration is done. Its a tradeoff of memory vs little more code
      space that creates resource pool object whenever necessary, instead of
      creating them during cgroup creation and device registration time.
      
      Signed-off-by: default avatarParav Pandit <pandit.parav@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      39d3e758
  19. Dec 25, 2016
    • Nicholas Piggin's avatar
      mm: add PageWaiters indicating tasks are waiting for a page bit · 62906027
      Nicholas Piggin authored
      
      Add a new page flag, PageWaiters, to indicate the page waitqueue has
      tasks waiting. This can be tested rather than testing waitqueue_active
      which requires another cacheline load.
      
      This bit is always set when the page has tasks on page_waitqueue(page),
      and is set and cleared under the waitqueue lock. It may be set when
      there are no tasks on the waitqueue, which will cause a harmless extra
      wakeup check that will clears the bit.
      
      The generic bit-waitqueue infrastructure is no longer used for pages.
      Instead, waitqueues are used directly with a custom key type. The
      generic code was not flexible enough to have PageWaiters manipulation
      under the waitqueue lock (which simplifies concurrency).
      
      This improves the performance of page lock intensive microbenchmarks by
      2-3%.
      
      Putting two bits in the same word opens the opportunity to remove the
      memory barrier between clearing the lock bit and testing the waiters
      bit, after some work on the arch primitives (e.g., ensuring memory
      operand widths match and cover both bits).
      
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Lutomirski <luto@kernel.org>
      Cc: Andreas Gruenbacher <agruenba@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      62906027
  20. Dec 24, 2016
  21. Dec 18, 2016
  22. Dec 13, 2016
  23. Dec 09, 2016
    • Thomas Gleixner's avatar
      x86/amd: Check for the C1E bug post ACPI subsystem init · e7ff3a47
      Thomas Gleixner authored
      
      AMD CPUs affected by the E400 erratum suffer from the issue that the
      local APIC timer stops when the CPU goes into C1E. Unfortunately there
      is no way to detect the affected CPUs on early boot. It's only possible
      to determine the range of possibly affected CPUs from the family/model
      range.
      
      The actual decision whether to enter C1E and thus cause the bug is done
      by the firmware and we need to detect that case late, after ACPI has
      been initialized.
      
      The current solution is to check in the idle routine whether the CPU is
      affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the
      K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is
      affected and the system is switched into forced broadcast mode.
      
      This is ineffective and on non-affected CPUs every entry to idle does
      the extra RDMSR.
      
      After doing some research it turns out that the bits are visible on the
      boot CPU right after the ACPI subsystem is initialized in the early
      boot process. So instead of polling for the bits in the idle loop, add
      a detection function after acpi_subsystem_init() and check for the MSR
      bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and
      the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it
      will stop in C1E state as well.
      
      The switch to broadcast mode cannot be done at this point because the
      boot CPU still uses HPET as a clockevent device and the local APIC timer
      is not yet calibrated and installed. The switch to broadcast mode on the
      affected CPUs needs to be done when the local APIC timer is actually set
      up.
      
      This allows to cleanup the amd_e400_idle() function in the next step.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.de
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      e7ff3a47
  24. Nov 30, 2016
    • Linus Torvalds's avatar
      Re-enable CONFIG_MODVERSIONS in a slightly weaker form · faaae2a5
      Linus Torvalds authored
      
      This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC
      information in order to work around the issue that newer binutils
      versions seem to occasionally drop the CRC on the floor.  binutils 2.26
      seems to work fine, while binutils 2.27 seems to break MODVERSIONS of
      symbols that have been defined in assembler files.
      
      [ We've had random missing CRC's before - it may be an old problem that
        just is now reliably triggered with the weak asm symbols and a new
        version of binutils ]
      
      Some day I really do want to remove MODVERSIONS entirely.  Sadly, today
      does not appear to be that day: Debian people apparently do want the
      option to enable MODVERSIONS to make it easier to have external modules
      across kernel versions, and this seems to be a fairly minimal fix for
      the annoying problem.
      
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Acked-by: default avatarMichal Marek <mmarek@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      faaae2a5
  25. Nov 28, 2016
  26. Nov 25, 2016
    • Linus Torvalds's avatar
      Fix subtle CONFIG_MODVERSIONS problems · cd3caefb
      Linus Torvalds authored
      
      CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series,
      and quite frankly, nobody has cared very deeply.  We absolutely know how
      to fix it, and it's not _complicated_, but it's not exactly pretty
      either.
      
      This oneliner fixes it without the ugliness, and allows for further
      future cleanups.
      
        "We've secretly replaced their regular MODVERSIONS with nothing at
         all, let's see if they notice"
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cd3caefb
    • Daniel Mack's avatar
      cgroup: add support for eBPF programs · 30070984
      Daniel Mack authored
      
      This patch adds two sets of eBPF program pointers to struct cgroup.
      One for such that are directly pinned to a cgroup, and one for such
      that are effective for it.
      
      To illustrate the logic behind that, assume the following example
      cgroup hierarchy.
      
        A - B - C
              \ D - E
      
      If only B has a program attached, it will be effective for B, C, D
      and E. If D then attaches a program itself, that will be effective for
      both D and E, and the program in B will only affect B and C. Only one
      program of a given type is effective for a cgroup.
      
      Attaching and detaching programs will be done through the bpf(2)
      syscall. For now, ingress and egress inet socket filtering are the
      only supported use-cases.
      
      Signed-off-by: default avatarDaniel Mack <daniel@zonque.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30070984
  27. Nov 24, 2016
  28. Nov 16, 2016
  29. Oct 24, 2016
  30. Oct 11, 2016
    • Peter Zijlstra's avatar
      relay: Use irq_work instead of plain timer for deferred wakeup · 26b5679e
      Peter Zijlstra authored
      Relay avoids calling wake_up_interruptible() for doing the wakeup of
      readers/consumers, waiting for the generation of new data, from the
      context of a process which produced the data.  This is apparently done to
      prevent the possibility of a deadlock in case Scheduler itself is is
      generating data for the relay, after acquiring rq->lock.
      
      The following patch used a timer (to be scheduled at next jiffy), for
      delegating the wakeup to another context.
      	commit 7c9cb383
      	Author: Tom Zanussi <zanussi@comcast.net>
      	Date:   Wed May 9 02:34:01 2007 -0700
      
      	relay: use plain timer instead of delayed work
      
      	relay doesn't need to use schedule_delayed_work() for waking readers
      	when a simple timer will do.
      
      Scheduling a plain timer, at next jiffies boundary, to do the wakeup
      causes a significant wakeup latency for the Userspace client, which makes
      relay less suitable for the high-frequency low-payload use cases where the
      data gets generated at a very high rate, like multiple sub buffers getting
      filled within a milli second.  Moreover the timer is re-scheduled on every
      newly produced sub buffer so the timer keeps getting pushed out if sub
      buffers are filled in a very quick succession (less than a jiffy gap
      between filling of 2 sub buffers).  As a result relay runs out of sub
      buffers to store the new data.
      
      By using irq_work it is ensured that wakeup of userspace client, blocked
      in the poll call, is done at earliest (through self IPI or next timer
      tick) enabling it to always consume the data in time.  Also this makes
      relay consistent with printk & ring buffers (trace), as they too use
      irq_work for deferred wake up of readers.
      
      [arnd@arndb.de: select CONFIG_IRQ_WORK]
       Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com
      
      
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAkash Goel <akash.goel@intel.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26b5679e
  31. Oct 10, 2016
    • Emese Revfy's avatar
      gcc-plugins: Add latent_entropy plugin · 38addce8
      Emese Revfy authored
      
      This adds a new gcc plugin named "latent_entropy". It is designed to
      extract as much possible uncertainty from a running system at boot time as
      possible, hoping to capitalize on any possible variation in CPU operation
      (due to runtime data differences, hardware differences, SMP ordering,
      thermal timing variation, cache behavior, etc).
      
      At the very least, this plugin is a much more comprehensive example for
      how to manipulate kernel code using the gcc plugin internals.
      
      The need for very-early boot entropy tends to be very architecture or
      system design specific, so this plugin is more suited for those sorts
      of special cases. The existing kernel RNG already attempts to extract
      entropy from reliable runtime variation, but this plugin takes the idea to
      a logical extreme by permuting a global variable based on any variation
      in code execution (e.g. a different value (and permutation function)
      is used to permute the global based on loop count, case statement,
      if/then/else branching, etc).
      
      To do this, the plugin starts by inserting a local variable in every
      marked function. The plugin then adds logic so that the value of this
      variable is modified by randomly chosen operations (add, xor and rol) and
      random values (gcc generates separate static values for each location at
      compile time and also injects the stack pointer at runtime). The resulting
      value depends on the control flow path (e.g., loops and branches taken).
      
      Before the function returns, the plugin mixes this local variable into
      the latent_entropy global variable. The value of this global variable
      is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
      though it does not credit any bytes of entropy to the pool; the contents
      of the global are just used to mix the pool.
      
      Additionally, the plugin can pre-initialize arrays with build-time
      random contents, so that two different kernel builds running on identical
      hardware will not have the same starting values.
      
      Signed-off-by: default avatarEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message and code comments]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      38addce8
Loading