1. 29 Sep, 2017 1 commit
  2. 17 Sep, 2017 1 commit
  3. 13 Sep, 2017 1 commit
  4. 10 Sep, 2017 1 commit
  5. 07 Sep, 2017 1 commit
    • Andy Lutomirski's avatar
      x86/mm: Reinitialize TLB state on hotplug and resume · 72c0098d
      Andy Lutomirski authored
      
      
      When Linux brings a CPU down and back up, it switches to init_mm and then
      loads swapper_pg_dir into CR3.  With PCID enabled, this has the side effect
      of masking off the ASID bits in CR3.
      
      This can result in some confusion in the TLB handling code.  If we
      bring a CPU down and back up with any ASID other than 0, we end up
      with the wrong ASID active on the CPU after resume.  This could
      cause our internal state to become corrupt, although major
      corruption is unlikely because init_mm doesn't have any user pages.
      More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
      in the next context switch.  The result of *that* is a failure to
      resume from suspend with probability 1 - 1/6^(cpus-1).
      
      Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: default avatarJiri Kosina <jikos@kernel.org>
      Fixes: 10af6235
      
       ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      72c0098d
  6. 25 Jul, 2017 1 commit
    • Andy Lutomirski's avatar
      x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID · 10af6235
      Andy Lutomirski authored
      PCID is a "process context ID" -- it's what other architectures call
      an address space ID.  Every non-global TLB entry is tagged with a
      PCID, only TLB entries that match the currently selected PCID are
      used, and we can switch PGDs without flushing the TLB.  x86's
      PCID is 12 bits.
      
      This is an unorthodox approach to using PCID.  x86's PCID is far too
      short to uniquely identify a process, and we can't even really
      uniquely identify a running process because there are monster
      systems with over 4096 CPUs.  To make matters worse, past attempts
      to use all 12 PCID bits have resulted in slowdowns instead of
      speedups.
      
      This patch uses PCID differently.  We use a PCID to identify a
      recently-used mm on a per-cpu basis.  An mm has no fixed PCID
      binding at all; instead, we give it a fresh PCID each time it's
      loaded except in cases where we want to preserve the TLB, in which
      case we reuse a recent value.
      
      Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
      (turbo off, intel_pstate requesting max performance) under KVM with
      the guest using idle=poll (to avoid artifacts when bouncing between
      CPUs).  I haven't done any real statistics here -- I just ran them
      in a loop and picked the fastest results that didn't look like
      outliers.  Unpatched means commit a4eb8b99
      
      , so all the
      bookkeeping overhead is gone.
      
      ping-pong between two mms on the same CPU using eventfd:
      
        patched:         1.22µs
        patched, nopcid: 1.33µs
        unpatched:       1.34µs
      
      Same ping-pong, but now touch 512 pages (all zero-page to minimize
      cache misses) each iteration.  dTLB misses are measured by
      dtlb_load_misses.miss_causes_a_walk:
      
        patched:         1.8µs  11M  dTLB misses
        patched, nopcid: 6.2µs, 207M dTLB misses
        unpatched:       6.1µs, 190M dTLB misses
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/9ee75f17a81770feed616358e6860d98a2a5b1e7.1500957502.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      10af6235
  7. 18 Jul, 2017 1 commit
    • Tom Lendacky's avatar
      x86/mm: Provide general kernel support for memory encryption · 21729f81
      Tom Lendacky authored
      
      
      Changes to the existing page table macros will allow the SME support to
      be enabled in a simple fashion with minimal changes to files that use these
      macros.  Since the memory encryption mask will now be part of the regular
      pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
      _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
      without the encryption mask before SME becomes active.  Two new pgprot()
      macros are defined to allow setting or clearing the page encryption mask.
      
      The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO.  SME does
      not support encryption for MMIO areas so this define removes the encryption
      mask from the page attribute.
      
      Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
      creating a physical address with the encryption mask.  These are used when
      working with the cr3 register so that the PGD can be encrypted. The current
      __va() macro is updated so that the virtual address is generated based off
      of the physical address without the encryption mask thus allowing the same
      virtual address to be generated regardless of whether encryption is enabled
      for that physical location or not.
      
      Also, an early initialization function is added for SME.  If SME is active,
      this function:
      
       - Updates the early_pmd_flags so that early page faults create mappings
         with the encryption mask.
      
       - Updates the __supported_pte_mask to include the encryption mask.
      
       - Updates the protection_map entries to include the encryption mask so
         that user-space allocations will automatically have the encryption mask
         applied.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      21729f81
  8. 05 Jul, 2017 4 commits
    • Andy Lutomirski's avatar
      x86/mm: Stop calling leave_mm() in idle code · 43858b4f
      Andy Lutomirski authored
      
      
      Now that lazy TLB suppresses all flush IPIs (as opposed to all but
      the first), there's no need to leave_mm() when going idle.
      
      This means we can get rid of the rcuidle hack in
      switch_mm_irqs_off() and we can unexport leave_mm().
      
      This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
      since it has no callers any more.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      43858b4f
    • Andy Lutomirski's avatar
      x86/mm: Rework lazy TLB mode and TLB freshness tracking · 94b1b03b
      Andy Lutomirski authored
      x86's lazy TLB mode used to be fairly weak -- it would switch to
      init_mm the first time it tried to flush a lazy TLB.  This meant an
      unnecessary CR3 write and, if the flush was remote, an unnecessary
      IPI.
      
      Rewrite it entirely.  When we enter lazy mode, we simply remove the
      CPU from mm_cpumask.  This means that we need a way to figure out
      whether we've missed a flush when we switch back out of lazy mode.
      I use the tlb_gen machinery to track whether a context is up to
      date.
      
      Note to reviewers: this patch, my itself, looks a bit odd.  I'm
      using an array of length 1 containing (ctx_id, tlb_gen) rather than
      just storing tlb_gen, and making it at array isn't necessary yet.
      I'm doing this because the next few patches add PCID support, and,
      with PCID, we need ctx_id, and the array will end up with a length
      greater than 1.  Making it an array now means that there will be
      less churn and therefore less stress on your eyeballs.
      
      NB: This is dubious but, AFAICT, still correct on Xen and UV.
      xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
      patch changes the way that mm_cpumask() works.  This should be okay,
      since Xen *also* iterates all online CPUs to find all the CPUs it
      needs to twiddle.
      
      The UV tlbflush code is rather dated and should be changed.
      
      Here are some benchmark results, done on a Skylake laptop at 2.3 GHz
      (turbo off, intel_pstate requesting max performance) under KVM with
      the guest using idle=poll (to avoid artifacts when bouncing between
      CPUs).  I haven't done any real statistics here -- I just ran them
      in a loop and picked the fastest results that didn't look like
      outliers.  Unpatched means commit a4eb8b99
      
      , so all the
      bookkeeping overhead is gone.
      
      MADV_DONTNEED; touch the page; switch CPUs using sched_setaffinity.  In
      an unpatched kernel, MADV_DONTNEED will send an IPI to the previous CPU.
      This is intended to be a nearly worst-case test.
      
        patched:         13.4µs
        unpatched:       21.6µs
      
      Vitaly's pthread_mmap microbenchmark with 8 threads (on four cores),
      nrounds = 100, 256M data
      
        patched:         1.1 seconds or so
        unpatched:       1.9 seconds or so
      
      The sleepup on Vitaly's test appearss to be because it spends a lot
      of time blocked on mmap_sem, and this patch avoids sending IPIs to
      blocked CPUs.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/ddf2c92962339f4ba39d8fc41b853936ec0b44f1.1498751203.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      94b1b03b
    • Andy Lutomirski's avatar
      x86/mm: Track the TLB's tlb_gen and update the flushing algorithm · b0579ade
      Andy Lutomirski authored
      
      
      There are two kernel features that would benefit from tracking
      how up-to-date each CPU's TLB is in the case where IPIs aren't keeping
      it up to date in real time:
      
       - Lazy mm switching currently works by switching to init_mm when
         it would otherwise flush.  This is wasteful: there isn't fundamentally
         any need to update CR3 at all when going lazy or when returning from
         lazy mode, nor is there any need to receive flush IPIs at all.  Instead,
         we should just stop trying to keep the TLB coherent when we go lazy and,
         when unlazying, check whether we missed any flushes.
      
       - PCID will let us keep recent user contexts alive in the TLB.  If we
         start doing this, we need a way to decide whether those contexts are
         up to date.
      
      On some paravirt systems, remote TLBs can be flushed without IPIs.
      This won't update the target CPUs' tlb_gens, which may cause
      unnecessary local flushes later on.  We can address this if it becomes
      a problem by carefully updating the target CPU's tlb_gen directly.
      
      By itself, this patch is a very minor optimization that avoids
      unnecessary flushes when multiple TLB flushes targetting the same CPU
      race.  The complexity in this patch would not be worth it on its own,
      but it will enable improved lazy TLB tracking and PCID.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1210fb244bc9cbe7677f7f0b72db4d359675f24b.1498751203.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b0579ade
    • Andy Lutomirski's avatar
      x86/mm: Give each mm TLB flush generation a unique ID · f39681ed
      Andy Lutomirski authored
      
      
      This adds two new variables to mmu_context_t: ctx_id and tlb_gen.
      ctx_id uniquely identifies the mm_struct and will never be reused.
      For a given mm_struct (and hence ctx_id), tlb_gen is a monotonic
      count of the number of times that a TLB flush has been requested.
      The pair (ctx_id, tlb_gen) can be used as an identifier for TLB
      flush actions and will be used in subsequent patches to reliably
      determine whether all needed TLB flushes have occurred on a given
      CPU.
      
      This patch is split out for ease of review.  By itself, it has no
      real effect other than creating and updating the new variables.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/413a91c24dab3ed0caa5f4e4d017d87b0857f920.1498751203.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f39681ed
  9. 30 Jun, 2017 2 commits
  10. 22 Jun, 2017 1 commit
    • Andy Lutomirski's avatar
      x86/ldt: Simplify the LDT switching logic · 73534258
      Andy Lutomirski authored
      
      
      Originally, Linux reloaded the LDT whenever the prev mm or the next
      mm had an LDT. It was changed in 2002 in:
      
        0bbed3beb4f2 ("[PATCH] Thread-Local Storage (TLS) support")
      
      (commit from the historical tree), like this:
      
      -		/* load_LDT, if either the previous or next thread
      -		 * has a non-default LDT.
      +		/*
      +		 * load the LDT, if the LDT is different:
      		 */
      -		if (next->context.size+prev->context.size)
      +		if (unlikely(prev->context.ldt != next->context.ldt))
      			load_LDT(&next->context);
      
      The current code is unlikely to avoid any LDT reloads, since different
      mms won't share an LDT.
      
      When we redo lazy mode to stop flush IPIs without switching to
      init_mm, though, the current logic would become incorrect: it will
      be possible to have real_prev == next but nonetheless have a stale
      LDT descriptor.
      
      Simplify the code to update LDTR if either the previous or the next
      mm has an LDT, i.e. effectively restore the historical logic..
      While we're at it, clean up the code by moving all the ifdeffery to
      a header where it belongs.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/2a859ac01245f9594c58f9d0a8b2ed8a7cd2507e.1498022414.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      73534258
  11. 05 Jun, 2017 7 commits
    • Andy Lutomirski's avatar
      x86/mm: Be more consistent wrt PAGE_SHIFT vs PAGE_SIZE in tlb flush code · be4ffc0d
      Andy Lutomirski authored
      
      
      Nadav pointed out that some code used PAGE_SIZE and other code used
      PAGE_SHIFT.  Use PAGE_SHIFT instead of multiplying or dividing by
      PAGE_SIZE.
      Requested-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      be4ffc0d
    • Andy Lutomirski's avatar
      x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
      Andy Lutomirski authored
      
      
      Lazy TLB state is currently managed in a rather baroque manner.
      AFAICT, there are three possible states:
      
       - Non-lazy.  This means that we're running a user thread or a
         kernel thread that has called use_mm().  current->mm ==
         current->active_mm == cpu_tlbstate.active_mm and
         cpu_tlbstate.state == TLBSTATE_OK.
      
       - Lazy with user mm.  We're running a kernel thread without an mm
         and we're borrowing an mm_struct.  We have current->mm == NULL,
         current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
         != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
         in mm_cpumask(current->active_mm).  CR3 points to
         current->active_mm->pgd.  The TLB is up to date.
      
       - Lazy with init_mm.  This happens when we call leave_mm().  We
         have current->mm == NULL, current->active_mm ==
         cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
         the scheduler is tracking it for refcounting.  cpu_tlbstate.state
         != TLBSTATE_OK.  The current cpu is clear in
         mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
         i.e. init_mm->pgd.
      
      This patch simplifies the situation.  Other than perf, x86 stops
      caring about current->active_mm at all.  We have
      cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
      TLB is always up to date for that mm.  leave_mm() just switches us
      to init_mm.  There are no longer any special cases for mm_cpumask,
      and switch_mm() switches mms without worrying about laziness.
      
      After this patch, cpu_tlbstate.state serves only to tell the TLB
      flush code whether it may switch to init_mm instead of doing a
      normal flush.
      
      This makes fairly extensive changes to xen_exit_mmap(), which used
      to look a bit like black magic.
      
      Perf is unchanged.  With or without this change, perf may behave a bit
      erratically if it tries to read user memory in kernel thread context.
      We should build on this patch to teach perf to never look at user
      memory when cpu_tlbstate.loaded_mm != current->mm.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3d28ebce
    • Andy Lutomirski's avatar
      x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code · ce4a4e56
      Andy Lutomirski authored
      
      
      The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
      Aside from that, it's fallen quite a bit behind the SMP code:
      
       - flush_tlb_mm_range() didn't flush individual pages if the range
         was small.
      
       - The lazy TLB code was much weaker.  This usually wouldn't matter,
         but, if a kernel thread flushed its lazy "active_mm" more than
         once (due to reclaim or similar), it wouldn't be unlazied and
         would instead pointlessly flush repeatedly.
      
       - Tracepoints were missing.
      
      Aside from that, simply having the UP code around was a maintanence
      burden, since it means that any change to the TLB flush code had to
      make sure not to break it.
      
      Simplify everything by deleting the UP code.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ce4a4e56
    • Andy Lutomirski's avatar
      x86/mm: Use new merged flush logic in arch_tlbbatch_flush() · 3f79e4c7
      Andy Lutomirski authored
      
      
      Now there's only one copy of the local tlb flush logic for
      non-kernel pages on SMP kernels.
      
      The only functional change is that arch_tlbbatch_flush() will now
      leave_mm() on the local CPU if that CPU is in the batch and is in
      TLBSTATE_LAZY mode.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3f79e4c7
    • Andy Lutomirski's avatar
      x86/mm: Refactor flush_tlb_mm_range() to merge local and remote cases · 454bbad9
      Andy Lutomirski authored
      
      
      The local flush path is very similar to the remote flush path.
      Merge them.
      
      This is intended to make no difference to behavior whatsoever.  It
      removes some code and will make future changes to the flushing
      mechanics simpler.
      
      This patch does remove one small optimization: flush_tlb_mm_range()
      now has an unconditional smp_mb() instead of using MOV to CR3 or
      INVLPG as a full barrier when applicable.  I think this is okay for
      a few reasons.  First, smp_mb() is quite cheap compared to the cost
      of a TLB flush.  Second, this rearrangement makes a bigger
      optimization available: with some work on the SMP function call
      code, we could do the local and remote flushes in parallel.  Third,
      I'm planning a rework of the TLB flush algorithm that will require
      an atomic operation at the beginning of each flush, and that
      operation will replace the smp_mb().
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      454bbad9
    • Andy Lutomirski's avatar
      x86/mm: Change the leave_mm() condition for local TLB flushes · 59f537c1
      Andy Lutomirski authored
      
      
      On a remote TLB flush, we leave_mm() if we're TLBSTATE_LAZY.  For a
      local flush_tlb_mm_range(), we leave_mm() if !current->mm.  These
      are approximately the same condition -- the scheduler sets lazy TLB
      mode when switching to a thread with no mm.
      
      I'm about to merge the local and remote flush code, but for ease of
      verifying and bisecting the patch, I want the local and remote flush
      behavior to match first.  This patch changes the local code to match
      the remote code.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      59f537c1
    • Andy Lutomirski's avatar
      x86/mm: Pass flush_tlb_info to flush_tlb_others() etc · a2055abe
      Andy Lutomirski authored
      
      
      Rather than passing all the contents of flush_tlb_info to
      flush_tlb_others(), pass a pointer to the structure directly. For
      consistency, this also removes the unnecessary cpu parameter from
      uv_flush_tlb_others() to make its signature match the other
      *flush_tlb_others() functions.
      
      This serves two purposes:
      
       - It will dramatically simplify future patches that change struct
         flush_tlb_info, which I'm planning to do.
      
       - struct flush_tlb_info is an adequate description of what to do
         for a local flush, too, so by reusing it we can remove duplicated
         code between local and remove flushes in a future patch.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      [ Fix build warning. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a2055abe
  12. 24 May, 2017 3 commits
    • Andy Lutomirski's avatar
      mm, x86/mm: Make the batched unmap TLB flush API more generic · e73ad5ff
      Andy Lutomirski authored
      
      
      try_to_unmap_flush() used to open-code a rather x86-centric flush
      sequence: local_flush_tlb() + flush_tlb_others().  Rearrange the
      code so that the arch (only x86 for now) provides
      arch_tlbbatch_add_mm() and arch_tlbbatch_flush() and the core code
      calls those functions instead.
      
      I'll want this for x86 because, to enable address space ids, I can't
      support the flush_tlb_others() mode used by exising
      try_to_unmap_flush() implementation with good performance.  I can
      support the new API fairly easily, though.
      
      I imagine that other architectures may be in a similar position.
      Architectures with strong remote flush primitives (arm64?) may have
      even worse performance problems with flush_tlb_others() the way that
      try_to_unmap_flush() uses it.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/19f25a8581f9fb77876b7ff3b001f89835e34ea3.1495492063.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e73ad5ff
    • Andy Lutomirski's avatar
      x86/mm: Reduce indentation in flush_tlb_func() · b3b90e5a
      Andy Lutomirski authored
      
      
      The leave_mm() case can just exit the function early so we don't
      need to indent the entire remainder of the function.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/97901ddcc9821d7bc7b296d2918d1179f08aaf22.1495492063.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b3b90e5a
    • Andy Lutomirski's avatar
      x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range() · ca6c99c0
      Andy Lutomirski authored
      
      
      flush_tlb_page() was very similar to flush_tlb_mm_range() except that
      it had a couple of issues:
      
       - It was missing an smp_mb() in the case where
         current->active_mm != mm.  (This is a longstanding bug reported by Nadav Amit)
      
       - It was missing tracepoints and vm counter updates.
      
      The only reason that I can see for keeping it at as a separate
      function is that it could avoid a few branches that
      flush_tlb_mm_range() needs to decide to flush just one page.  This
      hardly seems worthwhile.  If we decide we want to get rid of those
      branches again, a better way would be to introduce an
      __flush_tlb_mm_range() helper and make both flush_tlb_page() and
      flush_tlb_mm_range() use it.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.1495492063.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ca6c99c0
  13. 26 Apr, 2017 3 commits
    • Andy Lutomirski's avatar
      x86/mm: Fix flush_tlb_page() on Xen · dbd68d8e
      Andy Lutomirski authored
      
      
      flush_tlb_page() passes a bogus range to flush_tlb_others() and
      expects the latter to fix it up.  native_flush_tlb_others() has the
      fixup but Xen's version doesn't.  Move the fixup to
      flush_tlb_others().
      
      AFAICS the only real effect is that, without this fix, Xen would
      flush everything instead of just the one page on remote vCPUs in
      when flush_tlb_page() was called.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: e7b52ffd ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
      Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      dbd68d8e
    • Andy Lutomirski's avatar
      x86/mm: Make flush_tlb_mm_range() more predictable · ce27374f
      Andy Lutomirski authored
      
      
      I'm about to rewrite the function almost completely, but first I
      want to get a functional change out of the way.  Currently, if
      flush_tlb_mm_range() does not flush the local TLB at all, it will
      never do individual page flushes on remote CPUs.  This seems to be
      an accident, and preserving it will be awkward.  Let's change it
      first so that any regressions in the rewrite will be easier to
      bisect and so that the rewrite can attempt to change no visible
      behavior at all.
      
      The fix is simple: we can simply avoid short-circuiting the
      calculation of base_pages_to_flush.
      
      As a side effect, this also eliminates a potential corner case: if
      tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
      could have ended up flushing the entire address space one page at a
      time.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Acked-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ce27374f
    • Andy Lutomirski's avatar
      x86/mm: Remove flush_tlb() and flush_tlb_current_task() · 29961b59
      Andy Lutomirski authored
      
      
      I was trying to figure out what how flush_tlb_current_task() would
      possibly work correctly if current->mm != current->active_mm, but I
      realized I could spare myself the effort: it has no callers except
      the unused flush_tlb() macro.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      29961b59
  14. 24 Aug, 2016 1 commit
    • Andy Lutomirski's avatar
      x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y) · e37e43a4
      Andy Lutomirski authored
      
      
      This allows x86_64 kernels to enable vmapped stacks by setting
      HAVE_ARCH_VMAP_STACK=y - which enables the CONFIG_VMAP_STACK=y
      high level Kconfig option.
      
      There are a couple of interesting bits:
      
      First, x86 lazily faults in top-level paging entries for the vmalloc
      area.  This won't work if we get a page fault while trying to access
      the stack: the CPU will promote it to a double-fault and we'll die.
      To avoid this problem, probe the new stack when switching stacks and
      forcibly populate the pgd entry for the stack when switching mms.
      
      Second, once we have guard pages around the stack, we'll want to
      detect and handle stack overflow.
      
      I didn't enable it on x86_32.  We'd need to rework the double-fault
      code a bit and I'm concerned about running out of vmalloc virtual
      addresses under some workloads.
      
      This patch, by itself, will behave somewhat erratically when the
      stack overflows while RSP is still more than a few tens of bytes
      above the bottom of the stack.  Specifically, we'll get #PF and make
      it to no_context and them oops without reliably triggering a
      double-fault, and no_context doesn't know about stack overflows.
      The next patch will improve that case.
      
      Thank you to Nadav and Brian for helping me pay enough attention to
      the SDM to hopefully get this right.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c88f3e2920b18e6cc621d772a04a62c06869037e.1470907718.git.luto@kernel.org
      
      
      [ Minor edits. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e37e43a4
  15. 14 Jul, 2016 1 commit
    • Paul Gortmaker's avatar
      x86/mm: Audit and remove any unnecessary uses of module.h · 4b599fed
      Paul Gortmaker authored
      
      
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
      for the presence of either and replace accordingly where needed.
      
      Note that some bool/obj-y instances remain since module.h is
      the header for some exception table entry stuff, and for things
      like __init_or_module (code that is tossed when MODULES=n).
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160714001901.31603-3-paul.gortmaker@windriver.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4b599fed
  16. 28 Apr, 2016 3 commits
  17. 01 Apr, 2016 2 commits
  18. 11 Jan, 2016 1 commit
    • Andy Lutomirski's avatar
      x86/mm: Add barriers and document switch_mm()-vs-flush synchronization · 71b3c126
      Andy Lutomirski authored
      
      
      When switch_mm() activates a new PGD, it also sets a bit that
      tells other CPUs that the PGD is in use so that TLB flush IPIs
      will be sent.  In order for that to work correctly, the bit
      needs to be visible prior to loading the PGD and therefore
      starting to fill the local TLB.
      
      Document all the barriers that make this work correctly and add
      a couple that were missing.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      71b3c126
  19. 04 Sep, 2015 1 commit
    • Mel Gorman's avatar
      x86, mm: trace when an IPI is about to be sent · 5b74283a
      Mel Gorman authored
      
      
      When unmapping pages it is necessary to flush the TLB.  If that page was
      accessed by another CPU then an IPI is used to flush the remote CPU.  That
      is a lot of IPIs if kswapd is scanning and unmapping >100K pages per
      second.
      
      There already is a window between when a page is unmapped and when it is
      TLB flushed.  This series increases the window so multiple pages can be
      flushed using a single IPI.  This should be safe or the kernel is hosed
      already.
      
      Patch 1 simply made the rest of the series easier to write as ftrace
              could identify all the senders of TLB flush IPIS.
      
      Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI
              to flush the entire TLB.
      
      Patch 3 tracks when there potentially are writable TLB entries that
              need to be batched differently
      
      Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes
      
      The performance impact is documented in the changelogs but in the optimistic
      case on a 4-socket machine the full series reduces interrupts from 900K
      interrupts/second to 60K interrupts/second.
      
      This patch (of 4):
      
      It is easy to trace when an IPI is received to flush a TLB but harder to
      detect what event sent it.  This patch makes it easy to identify the
      source of IPIs being transmitted for TLB flushes on x86.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b74283a
  20. 21 Jul, 2015 1 commit
  21. 04 Feb, 2015 1 commit
  22. 10 Aug, 2014 1 commit
  23. 08 Aug, 2014 1 commit