1. 20 May, 2011 1 commit
    • Linus Torvalds's avatar
      sanitize <linux/prefetch.h> usage · 268bb0ce
      Linus Torvalds authored
      Commit e66eed65
      
       ("list: remove prefetching from regular list
      iterators") removed the include of prefetch.h from list.h, which
      uncovered several cases that had apparently relied on that rather
      obscure header file dependency.
      
      So this fixes things up a bit, using
      
         grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
         grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')
      
      to guide us in finding files that either need <linux/prefetch.h>
      inclusion, or have it despite not needing it.
      
      There are more of them around (mostly network drivers), but this gets
      many core ones.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      268bb0ce
  2. 10 Mar, 2011 2 commits
    • Andrea Arcangeli's avatar
      x86/mm: Fix pgd_lock deadlock · a79e53d8
      Andrea Arcangeli authored
      
      
      It's forbidden to take the page_table_lock with the irq disabled
      or if there's contention the IPIs (for tlb flushes) sent with
      the page_table_lock held will never run leading to a deadlock.
      
      Nobody takes the pgd_lock from irq context so the _irqsave can be
      removed.
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Tested-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a79e53d8
    • Andrey Vagin's avatar
      x86/mm: Handle mm_fault_error() in kernel space · f8626854
      Andrey Vagin authored
      
      
      mm_fault_error() should not execute oom-killer, if page fault
      occurs in kernel space.  E.g. in copy_from_user()/copy_to_user().
      
      This would happen if we find ourselves in OOM on a
      copy_to_user(), or a copy_from_user() which faults.
      
      Without this patch, the kernels hangs up in copy_from_user(),
      because OOM killer sends SIG_KILL to current process, but it
      can't handle a signal while in syscall, then the kernel returns
      to copy_from_user(), reexcute current command and provokes
      page_fault again.
      
      With this patch the kernel return -EFAULT from copy_from_user().
      
      The code, which checks that page fault occurred in kernel space,
      has been copied from do_sigbus().
      
      This situation is handled by the same way on powerpc, xtensa,
      tile, ...
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <201103092322.p29NMNPH001682@imap1.linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f8626854
  3. 26 Oct, 2010 2 commits
  4. 20 Oct, 2010 1 commit
  5. 19 Oct, 2010 1 commit
  6. 14 Oct, 2010 1 commit
    • Frederic Weisbecker's avatar
      x86: Barf when vmalloc and kmemcheck faults happen in NMI · ebc8827f
      Frederic Weisbecker authored
      
      
      In x86, faults exit by executing the iret instruction, which then
      reenables NMIs if we faulted in NMI context. Then if a fault
      happens in NMI, another NMI can nest after the fault exits.
      
      But we don't yet support nested NMIs because we have only one NMI
      stack. To prevent from that, check that vmalloc and kmemcheck
      faults don't happen in this context. Most of the other kernel faults
      in NMIs can be more easily spotted by finding explicit
      copy_from,to_user() calls on review.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      ebc8827f
  7. 08 Oct, 2010 1 commit
    • Andi Kleen's avatar
      x86: HWPOISON: Report correct address granuality for huge hwpoison faults · f672b49b
      Andi Kleen authored
      
      
      An earlier patch fixed the hwpoison fault handling to encode the
      huge page size in the fault code of the page fault handler.
      
      This is needed to report this information in SIGBUS to user space.
      
      This is a straight forward patch to pass this information
      through to the signal handling in the x86 specific fault.c
      
      Cc: x86@kernel.org
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: fengguang.wu@intel.com
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      f672b49b
  8. 26 Aug, 2010 2 commits
  9. 13 Aug, 2010 1 commit
    • Linus Torvalds's avatar
      x86: don't send SIGBUS for kernel page faults · 96054569
      Linus Torvalds authored
      
      
      It's wrong for several reasons, but the most direct one is that the
      fault may be for the stack accesses to set up a previous SIGBUS.  When
      we have a kernel exception, the kernel exception handler does all the
      fixups, not some user-level signal handler.
      
      Even apart from the nested SIGBUS issue, it's also wrong to give out
      kernel fault addresses in the signal handler info block, or to send a
      SIGBUS when a system call already returns EFAULT.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96054569
  10. 23 Nov, 2009 1 commit
  11. 21 Sep, 2009 1 commit
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      
      
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
  12. 16 Sep, 2009 1 commit
  13. 30 Aug, 2009 1 commit
  14. 11 Jul, 2009 1 commit
    • Roland Dreier's avatar
      x86: Remove spurious printk level from segfault message · a1a08d1c
      Roland Dreier authored
      Since commit 5fd29d6c
      
       ("printk: clean up handling of log-levels
      and newlines"), the kernel logs segfaults like:
      
          <6>gnome-power-man[24509]: segfault at 20 ip 00007f9d4950465a sp 00007fffbb50fc70 error 4 in libgobject-2.0.so.0.2103.0[7f9d494f7000+45000]
      
      with the extra "<6>" being KERN_INFO.  This happens because the
      printk in show_signal_msg() started with KERN_CONT and then
      used "%s" to pass in the real level; and KERN_CONT is no longer
      an empty string, and printk only pays attention to the level at
      the very beginning of the format string.
      
      Therefore, remove the KERN_CONT from this printk, since it is
      now actively causing problems (and never really made any
      sense).
      Signed-off-by: default avatarRoland Dreier <roland@digitalvampire.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <874otjitkj.fsf@shaolin.home.digitalvampire.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a1a08d1c
  15. 08 Jul, 2009 1 commit
  16. 29 Jun, 2009 1 commit
  17. 21 Jun, 2009 1 commit
  18. 16 Jun, 2009 1 commit
    • Ingo Molnar's avatar
      x86: mm: Read cr2 before prefetching the mmap_lock · 5dfaf90f
      Ingo Molnar authored
      
      
      Prefetch instructions can generate spurious faults on certain
      models of older CPUs. The faults themselves cannot be stopped
      and they can occur pretty much anywhere - so the way we solve
      them is that we detect certain patterns and ignore the fault.
      
      There is one small path of code where we must not take faults
      though: the #PF handler execution leading up to the reading
      of the CR2 (the faulting address). If we take a fault there
      then we destroy the CR2 value (with that of the prefetching
      instruction's) and possibly mishandle user-space or
      kernel-space pagefaults.
      
      It turns out that in current upstream we do exactly that:
      
      	prefetchw(&mm->mmap_sem);
      
      	/* Get the faulting address: */
      	address = read_cr2();
      
      This is not good.
      
      So turn around the order: first read the cr2 then prefetch
      the lock address. Reading cr2 is plenty fast (2 cycles) so
      delaying the prefetch by this amount shouldnt be a big issue
      performance-wise.
      
      [ And this might explain a mystery fault.c warning that sometimes
        occurs on one an old AMD/Semptron based test-system i have -
        which does have such prefetch problems. ]
      
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      LKML-Reference: <20090616030522.GA22162@Krystal>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5dfaf90f
  19. 15 Jun, 2009 1 commit
    • Vegard Nossum's avatar
      x86: add hooks for kmemcheck · f8561296
      Vegard Nossum authored
      
      
      The hooks that we modify are:
      - Page fault handler (to handle kmemcheck faults)
      - Debug exception handler (to hide pages after single-stepping
        the instruction that caused the page fault)
      
      Also redefine memset() to use the optimized version if kmemcheck is
      enabled.
      
      (Thanks to Pekka Enberg for minimizing the impact on the page fault
      handler.)
      
      As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
      the optimized xor code, and rely instead on the generic C implementation
      in order to avoid false-positive warnings.
      Signed-off-by: default avatarVegard Nossum <vegardno@ifi.uio.no>
      
      [whitespace fixlet]
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      
      [rebased for mainline inclusion]
      Signed-off-by: default avatarVegard Nossum <vegardno@ifi.uio.no>
      f8561296
  20. 11 Jun, 2009 1 commit
  21. 03 May, 2009 1 commit
  22. 08 Apr, 2009 1 commit
    • Peter Zijlstra's avatar
      perf_counter: allow for data addresses to be recorded · 78f13e95
      Peter Zijlstra authored
      
      
      Paul suggested we allow for data addresses to be recorded along with
      the traditional IPs as power can provide these.
      
      For now, only the software pagefault events provide data addresses,
      but in the future power might as well for some events.
      
      x86 doesn't seem capable of providing this atm.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <20090408130409.394816925@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      78f13e95
  23. 06 Apr, 2009 2 commits
  24. 30 Mar, 2009 2 commits
  25. 22 Feb, 2009 1 commit
  26. 20 Feb, 2009 10 commits
    • Ingo Molnar's avatar
      x86, mm: fault.c, update copyrights · f8eeb2e6
      Ingo Molnar authored
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f8eeb2e6
    • Ingo Molnar's avatar
      x86, mm: fault.c, give another attempt at prefetch handing before SIGBUS · cd1b68f0
      Ingo Molnar authored
      
      
      Impact: extend prefetch handling on 64-bit
      
      Currently there's an extra is_prefetch() check done in do_sigbus(),
      which we only do on 32 bits.
      
      This is a last-ditch check before we terminate a task, so it's worth
      giving prefetch instructions another chance - should none of our
      existing quirks have caught a prefetch instruction related spurious
      fault.
      
      The only risk is if a prefetch causes a real sigbus, in that case
      we'll not OOM but try another fault. But this code has been on
      32-bit for a long time, so it should be fine in practice.
      
      So do this on 64-bit too - and thus remove one more #ifdef.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cd1b68f0
    • Ingo Molnar's avatar
      x86, mm: fault.c, remove #ifdef from fault_in_kernel_space() · 7c178a26
      Ingo Molnar authored
      
      
      Impact: cleanup
      
      Removal of an #ifdef in fault_in_kernel_space(), by making
      use of the new TASK_SIZE_MAX symbol which is now available
      on 32-bit too.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7c178a26
    • Ingo Molnar's avatar
      x86, mm: rename TASK_SIZE64 => TASK_SIZE_MAX · d9517346
      Ingo Molnar authored
      
      
      Impact: cleanup
      
      Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the
      define on 32-bit too. (mapped to TASK_SIZE)
      
      This allows 32-bit code to make use of the (former-) TASK_SIZE64
      symbol as well, in a clean way.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d9517346
    • Ingo Molnar's avatar
      x86, mm: fault.c, remove #ifdef from do_page_fault() · c3731c68
      Ingo Molnar authored
      
      
      Impact: cleanup
      
      do_page_fault() has this ugly #ifdef in its prototype:
      
        #ifdef CONFIG_X86_64
        asmlinkage
        #endif
        void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)
      
      Replace it with 'dotraplinkage' which maps to exactly the above
      construct: nothing on 32-bit and asmlinkage on 64-bit.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c3731c68
    • Ingo Molnar's avatar
      x86, mm: fault.c, unify oops handling · 1cc99544
      Ingo Molnar authored
      
      
      Impact: add oops-recursion check to 32-bit
      
      Unify the oops state-machine, to the 64-bit version. It is
      slightly more careful in that it does a recursion check
      in oops_begin(), and is thus more likely to show the relevant
      oops.
      
      It also means that 32-bit will print one more line at the
      end of pagefault triggered oopses:
      
       	printk(KERN_EMERG "CR2: %016lx\n", address);
      
      Which is generally good information to be seen in partial-dump
      digital-camera jpegs ;-)
      
      The downside is the somewhat more complex critical path. Both
      variants have been tested well meanwhile by kernel developers
      crashing their boxes so i dont think this is a practical worry.
      
      This removes 3 ugly #ifdefs from no_context() and makes the
      function a lot nicer read.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1cc99544
    • Ingo Molnar's avatar
      x86, mm: fault.c, unify oops printing · 8f766149
      Ingo Molnar authored
      
      
      Impact: refine/extend page fault related oops printing on 64-bit
      
       - honor the pause_on_oops logic on 64-bit too
       - print out NX fault warnings on 64-bit as well
       - factor out the NX fault message to make it git-greppable and readable
      
      Note that this means that we do the PF_INSTR check on 32-bit non-PAE
      as well where it should not occur ... normally. Cannot hurt.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8f766149
    • Ingo Molnar's avatar
      x86, mm: fault.c, reorder functions · f2f13a85
      Ingo Molnar authored
      
      
      Impact: cleanup
      
      Avoid a couple more #ifdefs by moving fundamentally non-unifiable
      functions into a single #ifdef 32-bit / #else / #endif block in
      fault.c: vmalloc*(), dump_pagetable(), check_vm8086_mode().
      
      No code changed:
      
         text	   data	    bss	    dec	    hex	filename
         4618	     32	     24	   4674	   1242	fault.o.before
         4618	     32	     24	   4674	   1242	fault.o.after
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f2f13a85
    • Ingo Molnar's avatar
      x86, mm, kprobes: fault.c, simplify notify_page_fault() · b1801812
      Ingo Molnar authored
      
      
      Impact: cleanup
      
      Remove an #ifdef from notify_page_fault(). The function still
      compiles to nothing in the !CONFIG_KPROBES case.
      
      Introduce kprobes_built_in() and kprobe_fault_handler() helpers
      to allow this - they returns 0 if !CONFIG_KPROBES.
      
      No code changed:
      
         text	   data	    bss	    dec	    hex	filename
         4618	     32	     24	   4674	   1242	fault.o.before
         4618	     32	     24	   4674	   1242	fault.o.after
      
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b1801812
    • Ingo Molnar's avatar
      x86, mm: fault.c, simplify kmmio_fault() · b814d41f
      Ingo Molnar authored
      
      
      Impact: cleanup
      
      Remove an #ifdef from kmmio_fault() - we can do this by
      providing default implementations for is_kmmio_active()
      and kmmio_handler(). The compiler optimizes it all away
      in the !CONFIG_MMIOTRACE case.
      
      Also, while at it, clean up mmiotrace.h a bit:
      
       - standard header guards
       - standard vertical spaces for structure definitions
      
      No code changed (both with mmiotrace on and off in the config):
      
         text	   data	    bss	    dec	    hex	filename
         2947	     12	     12	   2971	    b9b	fault.o.before
         2947	     12	     12	   2971	    b9b	fault.o.after
      
      Cc: Pekka Paalanen <pq@iki.fi>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b814d41f