1. 28 Aug, 2020 1 commit
  2. 12 Aug, 2020 5 commits
  3. 27 Jul, 2020 1 commit
  4. 08 Jun, 2020 2 commits
    • Guilherme G. Piccoli's avatar
      panic: add sysctl to dump all CPUs backtraces on oops event · 60c958d8
      Guilherme G. Piccoli authored
      
      
      Usually when the kernel reaches an oops condition, it's a point of no
      return; in case not enough debug information is available in the kernel
      splat, one of the last resorts would be to collect a kernel crash dump
      and analyze it.  The problem with this approach is that in order to
      collect the dump, a panic is required (to kexec-load the crash kernel).
      When in an environment of multiple virtual machines, users may prefer to
      try living with the oops, at least until being able to properly shutdown
      their VMs / finish their important tasks.
      
      This patch implements a way to collect a bit more debug details when an
      oops event is reached, by printing all the CPUs backtraces through the
      usage of NMIs (on architectures that support that).  The sysctl added
      (and documented) here was called "oops_all_cpu_backtrace", and when set
      will (as the name suggests) dump all CPUs backtraces.
      
      Far from ideal, this may be the last option though for users that for
      some reason cannot panic on oops.  Most of times oopses are clear enough
      to indicate the kernel portion that must be investigated, but in virtual
      environments it's possible to observe hypervisor/KVM issues that could
      lead to oopses shown in other guests CPUs (like virtual APIC crashes).
      This patch hence aims to help debug such complex issues without
      resorting to kdump.
      Signed-off-by: default avatarGuilherme G. Piccoli <gpiccoli@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200327224116.21030-1-gpiccoli@canonical.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      60c958d8
    • Rafael Aquini's avatar
      kernel: add panic_on_taint · db38d5c1
      Rafael Aquini authored
      
      
      Analogously to the introduction of panic_on_warn, this patch introduces
      a kernel option named panic_on_taint in order to provide a simple and
      generic way to stop execution and catch a coredump when the kernel gets
      tainted by any given flag.
      
      This is useful for debugging sessions as it avoids having to rebuild the
      kernel to explicitly add calls to panic() into the code sites that
      introduce the taint flags of interest.
      
      For instance, if one is interested in proceeding with a post-mortem
      analysis at the point a given code path is hitting a bad page (i.e.
      unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
      by rebooting the kernel with 'panic_on_taint=0x20' amended to the
      command line.
      
      Another, perhaps less frequent, use for this option would be as a means
      for assuring a security policy case where only a subset of taints, or no
      single taint (in paranoid mode), is allowed for the running system.  The
      optional switch 'nousertaint' is handy in this particular scenario, as
      it will avoid userspace induced crashes by writes to sysctl interface
      /proc/sys/kernel/tainted causing false positive hits for such policies.
      
      [akpm@linux-foundation.org: tweak kernel-parameters.txt wording]
      Suggested-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db38d5c1
  5. 20 Feb, 2020 1 commit
    • Thomas Gleixner's avatar
      sched: Provide cant_migrate() · 4e139c77
      Thomas Gleixner authored
      
      
      Some code pathes rely on preempt_disable() to prevent migration on a non RT
      enabled kernel. These preempt_disable/enable() pairs are substituted by
      migrate_disable/enable() pairs or other forms of RT specific protection. On
      RT these protections prevent migration but not preemption. Obviously a
      cant_sleep() check in such a section will trigger on RT because preemption
      is not disabled.
      
      Provide a cant_migrate() macro which maps to cant_sleep() on a non RT
      kernel and an empty placeholder for RT for now. The placeholder will be
      changed to a proper debug check along with the RT specific migration
      protection mechanism.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20200214161503.070487511@linutronix.de
      4e139c77
  6. 30 Dec, 2019 1 commit
  7. 05 Dec, 2019 1 commit
  8. 25 Nov, 2019 1 commit
  9. 07 Sep, 2019 1 commit
    • Daniel Vetter's avatar
      kernel.h: Add non_block_start/end() · 312364f3
      Daniel Vetter authored
      In some special cases we must not block, but there's not a spinlock,
      preempt-off, irqs-off or similar critical section already that arms the
      might_sleep() debug checks. Add a non_block_start/end() pair to annotate
      these.
      
      This will be used in the oom paths of mmu-notifiers, where blocking is not
      allowed to make sure there's forward progress. Quoting Michal:
      
      "The notifier is called from quite a restricted context - oom_reaper -
      which shouldn't depend on any locks or sleepable conditionals. The code
      should be swift as well but we mostly do care about it to make a forward
      progress. Checking for sleepable context is the best thing we could come
      up with that would describe these demands at least partially."
      
      Peter also asked whether we want to catch spinlocks on top, but Michal
      said those are less of a problem because spinlocks can't have an indirect
      dependency upon the page allocator and hence close the loop with the oom
      reaper.
      
      Suggested by Michal Hocko.
      
      Link: https://lore.kernel.org/r/20190826201425.17547-4-daniel.vetter@ffwll.ch
      
      
      Acked-by: Christian König <christian.koenig@amd.com> (v1)
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      312364f3
  10. 17 Jul, 2019 1 commit
  11. 29 Jun, 2019 1 commit
  12. 15 May, 2019 1 commit
  13. 06 Apr, 2019 1 commit
    • Christoph Hellwig's avatar
      block: remove CONFIG_LBDAF · 72deb455
      Christoph Hellwig authored
      
      
      Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
      architectures.  These types are required to support block device and/or
      file sizes larger than 2 TiB, and have generally defaulted to on for
      a long time.  Enabling the option only increases the i386 tinyconfig
      size by 145 bytes, and many data structures already always use
      64-bit values for their in-core and on-disk data structures anyway,
      so there should not be a large change in dynamic memory usage either.
      
      Dropping this option removes a somewhat weird non-default config that
      has cause various bugs or compiler warnings when actually used.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      72deb455
  14. 03 Apr, 2019 1 commit
    • Jann Horn's avatar
      linux/kernel.h: Use parentheses around argument in u64_to_user_ptr() · a0fe2c64
      Jann Horn authored
      
      
      Use parentheses around uses of the argument in u64_to_user_ptr() to
      ensure that the cast doesn't apply to part of the argument.
      
      There are existing uses of the macro of the form
      
        u64_to_user_ptr(A + B)
      
      which expands to
      
        (void __user *)(uintptr_t)A + B
      
      (the cast applies to the first operand of the addition, the addition
      is a pointer addition). This happens to still work as intended, the
      semantic difference doesn't cause a difference in behavior.
      
      But I want to use u64_to_user_ptr() with a ternary operator in the
      argument, like so:
      
        u64_to_user_ptr(A ? B : C)
      
      This currently doesn't work as intended.
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarMukesh Ojha <mojha@codeaurora.org>
      Cc: Andrei Vagin <avagin@openvz.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190329214652.258477-1-jannh@google.com
      a0fe2c64
  15. 08 Mar, 2019 4 commits
  16. 19 Feb, 2019 1 commit
  17. 04 Jan, 2019 1 commit
  18. 22 Aug, 2018 1 commit
  19. 21 Jun, 2018 1 commit
  20. 08 Jun, 2018 1 commit
  21. 27 May, 2018 1 commit
    • Thomas Gleixner's avatar
      PM / suspend: Prevent might sleep splats · c1a957d1
      Thomas Gleixner authored
      
      
      timekeeping suspend/resume calls read_persistent_clock() which takes
      rtc_lock. That results in might sleep warnings because at that point
      we run with interrupts disabled.
      
      We cannot convert rtc_lock to a raw spinlock as that would trigger
      other might sleep warnings.
      
      As a workaround we disable the might sleep warnings by setting
      system_state to SYSTEM_SUSPEND before calling sysdev_suspend() and
      restoring it to SYSTEM_RUNNING afer sysdev_resume(). There is no lock
      contention because hibernate / suspend to RAM is single-CPU at this
      point.
      
      In s2idle's case the system_state is set to SYSTEM_SUSPEND before
      timekeeping_suspend() which is invoked by the last CPU. In the resume
      case it set back to SYSTEM_RUNNING after timekeeping_resume() which is
      invoked by the first CPU in the resume case. The other CPUs will block
      on tick_freeze_lock.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [bigeasy: cover s2idle in tick_freeze() / tick_unfreeze()]
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c1a957d1
  22. 26 Apr, 2018 1 commit
  23. 23 Apr, 2018 1 commit
    • NeilBrown's avatar
      staging: lustre: add container_of_safe() · 05e6557b
      NeilBrown authored
      
      
      Luster has a container_of0() function which is similar to
      container_of() but passes an IS_ERR_OR_NULL() pointer through
      unchanged.
      This could be generally useful: bcache at last has a similar function.
      
      Naming is hard, but the precedent set by hlist_entry_safe() suggests
      a _safe suffix might be most consistent.
      
      So add container_of_safe() to kernel.h, and replace all occurrences of
      container_of0() with one of
        - list_first_entry, list_next_entry, when that is a better fit,
        - container_of(), when the pointer is used as a validpointer in
          surrounding code,
        - container_of_safe() when there is no obviously better alternative.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Reviewed-by: default avatarJames Simmons <jsimmons@infradead.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05e6557b
  24. 11 Apr, 2018 3 commits
  25. 09 Apr, 2018 1 commit
    • Linus Torvalds's avatar
      Fix subtle macro variable shadowing in min_not_zero() · e9092d0d
      Linus Torvalds authored
      Commit 3c8ba0d6 ("kernel.h: Retain constant expression output for
      max()/min()") rewrote our min/max macros to be very clever, but in the
      meantime resurrected a variable name shadow issue that we had had
      previously fixed in commit 589a9785 ("min/max: remove sparse
      warnings when they're nested").
      
      That commit talks about the sparse warnings that this shadowing causes,
      which we ignored as just a minor annoyance.  But it turns out that the
      sparse warning is the least of our problems.  We actually have a real
      bug due to the shadowing through the interaction with "min_not_zero()",
      which ends up doing
      
         min(__x, __y)
      
      internally, and then the new declaration of "__x" and "__y" as new
      variables in __cmp_once() results in a complete mess of an expression,
      and "min_not_zero()" doesn't work at all.
      
      For some odd reason, this only ever caused (reported) problems on s390,
      even though it is a generic issue and most of the (obviously successful)
      testing of the problematic commit had happened on other architectures.
      
      Quoting Sebastian Ott:
       "What happened is that the bio build by the partition detection code
        was attempted to be split by the block layer because the block queue
        had a max_sector setting of 0. blk_queue_max_hw_sectors uses
        min_not_zero."
      
      So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
      macros do not have these kinds of clashes.
      
      [ That said, __UNIQUE_ID() itself has several issues that make it less
        than wonderful.
      
        In particular, the "uniqueness" has a fallback on the line number,
        which means that it's not actually unique in more complex cases if you
        don't build with gcc or clang (which have working unique counters that
        aren't tied to line numbers).
      
        That historical broken fallback also means that we have that pointless
        "prefix" argument that doesn't actually make much sense _except_ for
        the known-broken case. Oh well. ]
      
      Fixes: 3c8ba0d6
      
       ("kernel.h: Retain constant expression output for max()/min()")
      Reported-and-tested-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9092d0d
  26. 05 Apr, 2018 1 commit
    • Kees Cook's avatar
      kernel.h: Retain constant expression output for max()/min() · 3c8ba0d6
      Kees Cook authored
      In the effort to remove all VLAs from the kernel[1], it is desirable to
      build with -Wvla.  However, this warning is overly pessimistic, in that
      it is only happy with stack array sizes that are declared as constant
      expressions, and not constant values.  One case of this is the
      evaluation of the max() macro which, due to its construction, ends up
      converting constant expression arguments into a constant value result.
      
      All attempts to rewrite this macro with __builtin_constant_p() failed
      with older compilers (e.g.  gcc 4.4)[2].  However, Martin Uecker,
      constructed[3] a mind-shattering solution that works everywhere.
      Cthulhu fhtagn!
      
      This patch updates the min()/max() macros to evaluate to a constant
      expression when called on constant expression arguments.  This removes
      several false-positive stack VLA warnings from an x86 allmodconfig build
      when -Wvla is added:
      
        $ diff -u before.txt after.txt | grep ^-
        -drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids variable length array ‘ids’ [-Wvla]
        -fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array ‘namebuf’ [-Wvla]
        -lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ [-Wvla]
        -net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
        -net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
        -net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ [-Wvla]
      
      This also updates two cases where different enums were being compared
      and explicitly casts them to int (which matches the old side-effect of
      the single-evaluation code): one in tpm/tpm_tis_core.h, and one in
      drm/drm_color_mgmt.c.
      
       [1] https://lkml.org/lkml/2018/3/7/621
       [2] https://lkml.org/lkml/2018/3/10/170
       [3] https://lkml.org/lkml/2018/3/20/845
      
      Co-Developed-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Co-Developed-by: default avatarMartin Uecker <Martin.Uecker@med.uni-goettingen.de>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3c8ba0d6
  27. 28 Mar, 2018 1 commit
  28. 21 Feb, 2018 1 commit
  29. 04 Feb, 2018 1 commit
    • Crt Mori's avatar
      lib: Add strongly typed 64bit int_sqrt · 47a36163
      Crt Mori authored
      
      
      There is no option to perform 64bit integer sqrt on 32bit platform.
      Added stronger typed int_sqrt64 enables the 64bit calculations to
      be performed on 32bit platforms. Using same algorithm as int_sqrt()
      with strong typing provides enough precision also on 32bit platforms,
      but it sacrifices some performance. In case values are smaller than
      ULONG_MAX the standard int_sqrt is used for calculation to maximize the
      performance due to more native calculations.
      Signed-off-by: default avatarCrt Mori <cmo@melexis.com>
      Acked-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      47a36163
  30. 18 Nov, 2017 1 commit
    • Borislav Petkov's avatar
      kernel/panic.c: add TAINT_AUX · 4efb442c
      Borislav Petkov authored
      This is the gist of a patch which we've been forward-porting in our
      kernels for a long time now and it probably would make a good sense to
      have such TAINT_AUX flag upstream which can be used by each distro etc,
      how they see fit.  This way, we won't need to forward-port a distro-only
      version indefinitely.
      
      Add an auxiliary taint flag to be used by distros and others.  This
      obviates the need to forward-port whatever internal solutions people
      have in favor of a single flag which they can map arbitrarily to a
      definition of their pleasing.
      
      The "X" mnemonic could also mean eXternal, which would be taint from a
      distro or something else but not the upstream kernel.  We will use it to
      mark modules for which we don't provide support.  I.e., a really
      eXternal module.
      
      Link: http://lkml.kernel.org/r/20170911134533.dp5mtyku5bongx4c@pd.tnic
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4efb442c