1. 10 Aug, 2017 2 commits
    • Minchan Kim's avatar
      mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem · 99baac21
      Minchan Kim authored
      Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
      problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
      
      Quote from Mel Gorman:
       "The race in question is CPU 0 running madv_free and updating some PTEs
        while CPU 1 is also running madv_free and looking at the same PTEs.
        CPU 1 may have writable TLB entries for a page but fail the pte_dirty
        check (because CPU 0 has updated it already) and potentially fail to
        flush.
      
        Hence, when madv_free on CPU 1 returns, there are still potentially
        writable TLB entries and the underlying PTE is still present so that a
        subsequent write does not necessarily propagate the dirty bit to the
        underlying PTE any more. Reclaim at some unknown time at the future
        may then see that the PTE is still clean and discard the page even
        though a write has happened in the meantime. I think this is possible
        but I could have missed some protection in madv_free that prevents it
        happening."
      
      This patch aims for solving both problems all at once and is ready for
      other problem with KSM, MADV_FREE and soft-dirty story[3].
      
      TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
      and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
      catch there are parallel threads going on.  In that case, forcefully,
      flush TLB to prevent for user to access memory via stale TLB entry
      although it fail to gather page table entry.
      
      I confirmed this patch works with [4] test program Nadav gave so this
      patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
      v2" in current mmotm.
      
      NOTE:
      
      This patch modifies arch-specific TLB gathering interface(x86, ia64,
      s390, sh, um).  It seems most of architecture are straightforward but
      s390 need to be careful because tlb_flush_mmu works only if
      mm->context.flush_mm is set to non-zero which happens only a pte entry
      really is cleared by ptep_get_and_clear and friends.  However, this
      problem never changes the pte entries but need to flush to prevent
      memory access from stale tlb.
      
      [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
      [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
      [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
      [4] https://patchwork.kernel.org/patch/9861621/
      
      [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
        Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
      Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reported-by: default avatarNadav Amit <namit@vmware.com>
      Reported-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99baac21
    • Minchan Kim's avatar
      mm: refactor TLB gathering API · 56236a59
      Minchan Kim authored
      This patch is a preparatory patch for solving race problems caused by
      TLB batch.  For that, we will increase/decrease TLB flush pending count
      of mm_struct whenever tlb_[gather|finish]_mmu is called.
      
      Before making it simple, this patch separates architecture specific part
      and rename it to arch_tlb_[gather|finish]_mmu and generic part just
      calls it.
      
      It shouldn't change any behavior.
      
      Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56236a59
  2. 13 Dec, 2016 3 commits
  3. 26 Jul, 2016 2 commits
  4. 10 Oct, 2014 1 commit
  5. 25 Apr, 2014 1 commit
    • Linus Torvalds's avatar
      mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts · 1cf35d47
      Linus Torvalds authored
      
      
      The mmu-gather operation 'tlb_flush_mmu()' has done two things: the
      actual tlb flush operation, and the batched freeing of the pages that
      the TLB entries pointed at.
      
      This splits the operation into separate phases, so that the forced
      batched flushing done by zap_pte_range() can now do the actual TLB flush
      while still holding the page table lock, but delay the batched freeing
      of all the pages to after the lock has been dropped.
      
      This in turn allows us to avoid a race condition between
      set_page_dirty() (as called by zap_pte_range() when it finds a dirty
      shared memory pte) and page_mkclean(): because we now flush all the
      dirty page data from the TLB's while holding the pte lock,
      page_mkclean() will be held up walking the (recently cleaned) page
      tables until after the TLB entries have been flushed from all CPU's.
      Reported-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1cf35d47
  6. 16 Aug, 2013 1 commit
    • Linus Torvalds's avatar
      Fix TLB gather virtual address range invalidation corner cases · 2b047252
      Linus Torvalds authored
      Ben Tebulin reported:
      
       "Since v3.7.2 on two independent machines a very specific Git
        repository fails in 9/10 cases on git-fsck due to an SHA1/memory
        failures.  This only occurs on a very specific repository and can be
        reproduced stably on two independent laptops.  Git mailing list ran
        out of ideas and for me this looks like some very exotic kernel issue"
      
      and bisected the failure to the backport of commit 53a59fc6 ("mm:
      limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").
      
      That commit itself is not actually buggy, but what it does is to make it
      much more likely to hit the partial TLB invalidation case, since it
      introduces a new case in tlb_next_batch() that previously only ever
      happened when running out of memory.
      
      The real bug is that the TLB gather virtual memory range setup is subtly
      buggered.  It was introduced in commit 597e1c35 ("mm/mmu_gather:
      enable tlb flush range in generic mmu_gather"), and the range handling
      was already fixed at least once in commit e6c495a9
      
       ("mm: fix the TLB
      range flushed when __tlb_remove_page() runs out of slots"), but that fix
      was not complete.
      
      The problem with the TLB gather virtual address range is that it isn't
      set up by the initial tlb_gather_mmu() initialization (which didn't get
      the TLB range information), but it is set up ad-hoc later by the
      functions that actually flush the TLB.  And so any such case that forgot
      to update the TLB range entries would potentially miss TLB invalidates.
      
      Rather than try to figure out exactly which particular ad-hoc range
      setup was missing (I personally suspect it's the hugetlb case in
      zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
      did), this patch just gets rid of the problem at the source: make the
      TLB range information available to tlb_gather_mmu(), and initialize it
      when initializing all the other tlb gather fields.
      
      This makes the patch larger, but conceptually much simpler.  And the end
      result is much more understandable; even if you want to play games with
      partial ranges when invalidating the TLB contents in chunks, now the
      range information is always there, and anybody who doesn't want to
      bother with it won't introduce subtle bugs.
      
      Ben verified that this fixes his problem.
      Reported-bisected-and-tested-by: default avatarBen Tebulin <tebulin@googlemail.com>
      Build-testing-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Build-testing-by: default avatarRichard Weinberger <richard.weinberger@gmail.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2b047252
  7. 06 Jun, 2013 1 commit
    • Peter Zijlstra's avatar
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra authored
      
      
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  8. 04 Jun, 2013 1 commit
  9. 25 Aug, 2012 1 commit
  10. 02 Feb, 2012 1 commit
  11. 08 Dec, 2011 2 commits
  12. 25 May, 2011 1 commit
    • Peter Zijlstra's avatar
      arm: mmu_gather rework · 9e14f674
      Peter Zijlstra authored
      
      
      Fix up the arm mmu_gather code to conform to the new API.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e14f674
  13. 21 Feb, 2011 2 commits
  14. 27 Jul, 2009 1 commit
    • Benjamin Herrenschmidt's avatar
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() · 9e1b32ca
      Benjamin Herrenschmidt authored
      
      
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
      
      Upcoming paches to support the new 64-bit "BookE" powerpc architecture
      will need to have the virtual address corresponding to PTE page when
      freeing it, due to the way the HW table walker works.
      
      Basically, the TLB can be loaded with "large" pages that cover the whole
      virtual space (well, sort-of, half of it actually) represented by a PTE
      page, and which contain an "indirect" bit indicating that this TLB entry
      RPN points to an array of PTEs from which the TLB can then create direct
      entries. Thus, in order to invalidate those when PTE pages are deleted,
      we need the virtual address to pass to tlbilx or tlbivax instructions.
      
      The old trick of sticking it somewhere in the PTE page struct page sucks
      too much, the address is almost readily available in all call sites and
      almost everybody implemets these as macros, so we may as well add the
      argument everywhere. I added it to the pmd and pud variants for consistency.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e1b32ca
  15. 15 Apr, 2009 1 commit
    • Aaro Koskinen's avatar
      [ARM] 5450/1: Flush only the needed range when unmapping a VMA · 7fccfc00
      Aaro Koskinen authored
      
      
      When unmapping N pages (e.g. shared memory) the amount of TLB flushes
      done can be (N*PAGE_SIZE/ZAP_BLOCK_SIZE)*N although it should be N at
      maximum. With PREEMPT kernel ZAP_BLOCK_SIZE is 8 pages, so there is a
      noticeable performance penalty when unmapping a large VMA and the system
      is spending its time in flush_tlb_range().
      
      The problem is that tlb_end_vma() is always flushing the full VMA
      range. The subrange that needs to be flushed can be calculated by
      tlb_remove_tlb_entry(). This approach was suggested by Hugh Dickins,
      and is also used by other arches.
      
      The speed increase is roughly 3x for 8M mappings and for larger mappings
      even more.
      Signed-off-by: default avatarAaro Koskinen <Aaro.Koskinen@nokia.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      7fccfc00
  16. 02 Aug, 2008 1 commit
  17. 05 Feb, 2008 1 commit
  18. 21 Mar, 2006 1 commit
  19. 30 Oct, 2005 3 commits
    • Hugh Dickins's avatar
      [PATCH] mm: tlb_finish_mmu forget rss · fc2acab3
      Hugh Dickins authored
      
      
      zap_pte_range has been counting the pages it frees in tlb->freed, then
      tlb_finish_mmu has used that to update the mm's rss.  That got stranger when I
      added anon_rss, yet updated it by a different route; and stranger when rss and
      anon_rss became mm_counters with special access macros.  And it would no
      longer be viable if we're relying on page_table_lock to stabilize the
      mm_counter, but calling tlb_finish_mmu outside that lock.
      
      Remove the mmu_gather's freed field, let tlb_finish_mmu stick to its own
      business, just decrement the rss mm_counter in zap_pte_range (yes, there was
      some point to batching the update, and a subsequent patch restores that).  And
      forget the anal paranoia of first reading the counter to avoid going negative
      - if rss does go negative, just fix that bug.
      
      Remove the mmu_gather's flushes and avoided_flushes from arm and arm26: no use
      was being made of them.  But arm26 alone was actually using the freed, in the
      way some others use need_flush: give it a need_flush.  arm26 seems to prefer
      spaces to tabs here: respect that.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fc2acab3
    • Hugh Dickins's avatar
      [PATCH] mm: tlb_is_full_mm was obscure · 4d6ddfa9
      Hugh Dickins authored
      
      
      tlb_is_full_mm?  What does that mean?  The TLB is full?  No, it means that the
      mm's last user has gone and the whole mm is being torn down.  And it's an
      inline function because sparc64 uses a different (slightly better)
      "tlb_frozen" name for the flag others call "fullmm".
      
      And now the ptep_get_and_clear_full macro used in zap_pte_range refers
      directly to tlb->fullmm, which would be wrong for sparc64.  Rather than
      correct that, I'd prefer to scrap tlb_is_full_mm altogether, and change
      sparc64 to just use the same poor name as everyone else - is that okay?
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4d6ddfa9
    • Hugh Dickins's avatar
      [PATCH] mm: tlb_gather_mmu get_cpu_var · 15a23ffa
      Hugh Dickins authored
      
      
      tlb_gather_mmu dates from before kernel preemption was allowed, and uses
      smp_processor_id or __get_cpu_var to find its per-cpu mmu_gather.  That works
      because it's currently only called after getting page_table_lock, which is not
      dropped until after the matching tlb_finish_mmu.  But don't rely on that, it
      will soon change: now disable preemption internally by proper get_cpu_var in
      tlb_gather_mmu, put_cpu_var in tlb_finish_mmu.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      15a23ffa
  20. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4