1. 22 Mar, 2006 4 commits
    • David Gibson's avatar
      [PATCH] hugepage: Fix hugepage logic in free_pgtables() · 9da61aef
      David Gibson authored
      free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
      of the normal free_pgd_range() on hugepage VMAs.  However, the test it uses
      to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
      range at the start of the vma.  is_hugepage_only_range() will return true
      if the given range has any intersection with a hugepage address region, and
      in this case the given region need not be hugepage aligned.  So, for
      example, this test can return true if called on, say, a 4k VMA immediately
      preceding a (nicely aligned) hugepage VMA.
      At present we get away with this because the powerpc version of
      hugetlb_free_pgd_range() is just a call to free_pgd_range().  On ia64 (the
      only other arch with a non-trivial is_hugepage_only_range()) we get away
      with it for a different reason; the hugepage area is not contiguous with
      the rest of the user address space, and VMAs are not permitted in between,
      so the test can't return a false positive there.
      Nonetheless this should be fixed.  We do that in the patch below by
      replacing the is_hugepage_only_range() test with an explicit test of the
      VMA using is_vm_hugetlb_page().
      This in turn changes behaviour for platforms where is_hugepage_only_range()
      returns false always (everything except powerpc and ia64).  We address this
      by ensuring that hugetlb_free_pgd_range() is defined to be identical to
      free_pgd_range() (instead of a no-op) on everything except ia64.  Even so,
      it will prevent some otherwise possible coalescing of calls down to
      free_pgd_range().  Since this only happens for hugepage VMAs, removing this
      small optimization seems unlikely to cause any trouble.
      This patch causes no regressions on the libhugetlbfs testsuite - ppc64
      POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Acked-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • David Gibson's avatar
      [PATCH] hugepage: Make {alloc,free}_huge_page() local · 27a85ef1
      David Gibson authored
      Originally, mm/hugetlb.c just handled the hugepage physical allocation path
      and its {alloc,free}_huge_page() functions were used from the arch specific
      hugepage code.  These days those functions are only used with mm/hugetlb.c
      itself.  Therefore, this patch makes them static and removes their
      prototypes from hugetlb.h.  This requires a small rearrangement of code in
      mm/hugetlb.c to avoid a forward declaration.
      This patch causes no regressions on the libhugetlbfs testsuite (ppc64,
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • David Gibson's avatar
      [PATCH] hugepage: Strict page reservation for hugepage inodes · b45b5bd6
      David Gibson authored
      These days, hugepages are demand-allocated at first fault time.  There's a
      somewhat dubious (and racy) heuristic when making a new mmap() to check if
      there are enough available hugepages to fully satisfy that mapping.
      A particularly obvious case where the heuristic breaks down is where a
      process maps its hugepages not as a single chunk, but as a bunch of
      individually mmap()ed (or shmat()ed) blocks without touching and
      instantiating the pages in between allocations.  In this case the size of
      each block is compared against the total number of available hugepages.
      It's thus easy for the process to become overcommitted, because each block
      mapping will succeed, although the total number of hugepages required by
      all blocks exceeds the number available.  In particular, this defeats such
      a program which will detect a mapping failure and adjust its hugepage usage
      downward accordingly.
      The patch below addresses this problem, by strictly reserving a number of
      physical hugepages for hugepage inodes which have been mapped, but not
      instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
      mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
      trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
      only if the sysadmin explicitly reduces the hugepage pool between mapping
      and instantiation)
      This patch appears to address the problem at hand - it allows DB2 to start
      correctly, for instance, which previously suffered the failure described
      This patch causes no regressions on the libhugetblfs testsuite, and makes a
      test (designed to catch this problem) pass which previously failed (ppc64,
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Zhang, Yanmin's avatar
      [PATCH] Enable mprotect on huge pages · 8f860591
      Zhang, Yanmin authored
      2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
      From: David Gibson <david@gibson.dropbear.id.au>
        Remove a test from the mprotect() path which checks that the mprotect()ed
        range on a hugepage VMA is hugepage aligned (yes, really, the sense of
        is_aligned_hugepage_range() is the opposite of what you'd guess :-/).
        In fact, we don't need this test.  If the given addresses match the
        beginning/end of a hugepage VMA they must already be suitably aligned.  If
        they don't, then mprotect_fixup() will attempt to split the VMA.  The very
        first test in split_vma() will check for a badly aligned address on a
        hugepage VMA and return -EINVAL if necessary.
      From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
        On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.  The
        identify of hugetlb pte is lost when changing page protection via mprotect.
        A page fault occurs later will trigger a bug check in huge_pte_alloc().
        The fix is to always make new pte a hugetlb pte and also to clean up
        legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
      Signed-off-by: default avatarZhang Yanmin <yanmin.zhang@intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Signed-off-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  2. 06 Jan, 2006 1 commit
  3. 14 Nov, 2005 1 commit
    • Robin Holt's avatar
      [PATCH] mm: ZAP_BLOCK causes redundant work · 51c6f666
      Robin Holt authored
      The address based work estimate for unmapping (for lockbreak) is and always
      was horribly inefficient for sparse mappings.  The problem is most simply
      explained with an example:
      If we find a pgd is clear, we still have to call into unmap_page_range
      PGDIR_SIZE / ZAP_BLOCK_SIZE times, each time checking the clear pgd, in
      order to progress the working address to the next pgd.
      The fundamental way to solve the problem is to keep track of the end
      address we've processed and pass it back to the higher layers.
      From: Nick Piggin <npiggin@suse.de>
        Modification to completely get away from address based work estimate
        and instead use an abstract count, with a very small cost for empty
        entries as opposed to present pages.
        On 2.6.14-git2, ppc64, and CONFIG_PREEMPT=y, mapping and unmapping 1TB
        of virtual address space takes 1.69s; with the following patch applied,
        this operation can be done 1000 times in less than 0.01s
      From: Andrew Morton <akpm@osdl.org>
      mm/memory.c: In function `unmap_vmas':
      mm/memory.c:779: warning: division by zero
      Due to
      			zap_work -= (end - start) /
      					(HPAGE_SIZE / PAGE_SIZE);
      So make the dummy HPAGE_SIZE non-zero
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  4. 30 Oct, 2005 1 commit
    • Hugh Dickins's avatar
      [PATCH] mm: unmap_vmas with inner ptlock · 508034a3
      Hugh Dickins authored
      Remove the page_table_lock from around the calls to unmap_vmas, and replace
      the pte_offset_map in zap_pte_range by pte_offset_map_lock: all callers are
      now safe to descend without page_table_lock.
      Don't attempt fancy locking for hugepages, just take page_table_lock in
      unmap_hugepage_range.  Which makes zap_hugepage_range, and the hugetlb test in
      zap_page_range, redundant: unmap_vmas calls unmap_hugepage_range anyway.  Nor
      does unmap_vmas have much use for its mm arg now.
      The tlb_start_vma and tlb_end_vma in unmap_page_range are now called without
      page_table_lock: if they're implemented at all, they typically come down to
      flush_cache_range (usually done outside page_table_lock) and flush_tlb_range
      (which we already audited for the mprotect case).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  5. 20 Oct, 2005 1 commit
  6. 19 Oct, 2005 1 commit
    • Seth, Rohit's avatar
      [PATCH] Handle spurious page fault for hugetlb region · 3359b54c
      Seth, Rohit authored
      The hugetlb pages are currently pre-faulted.  At the time of mmap of
      hugepages, we populate the new PTEs.  It is possible that HW has already
      cached some of the unused PTEs internally.  These stale entries never
      get a chance to be purged in existing control flow.
      This patch extends the check in page fault code for hugepages.  Check if
      a faulted address falls with in size for the hugetlb file backing it.
      We return VM_FAULT_MINOR for these cases (assuming that the arch
      specific page-faulting code purges the stale entry for the archs that
      need it).
      Signed-off-by: default avatarRohit Seth <rohit.seth@intel.com>
      [ This is apparently arguably an ia64 port bug. But the code won't
        hurt, and for now it fixes a real problem on some ia64 machines ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  7. 05 Sep, 2005 1 commit
    • Chen, Kenneth W's avatar
      [PATCH] remove hugetlb_clean_stale_pgtable() and fix huge_pte_alloc() · 0e5c9f39
      Chen, Kenneth W authored
      I don't think we need to call hugetlb_clean_stale_pgtable() anymore
      in 2.6.13 because of the rework with free_pgtables().  It now collect
      all the pte page at the time of munmap.  It used to only collect page
      table pages when entire one pgd can be freed and left with staled pte
      pages.  Not anymore with 2.6.13.  This function will never be called
      and We should turn it into a BUG_ON.
      I also spotted two problems here, not Adam's fault :-)
      (1) in huge_pte_alloc(), it looks like a bug to me that pud is not
          checked before calling pmd_alloc()
      (2) in hugetlb_clean_stale_pgtable(), it also missed a call to
          pmd_free_tlb.  I think a tlb flush is required to flush the mapping
          for the page table itself when we clear out the pmd pointing to a
          pte page.  However, since hugetlb_clean_stale_pgtable() is never
          called, so it won't trigger the bug.
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  8. 22 Jun, 2005 1 commit
    • David Gibson's avatar
      [PATCH] Hugepage consolidation · 63551ae0
      David Gibson authored
      A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
      attempts to consolidate a lot of the code across the arch's, putting the
      combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
      order to covert all the hugepage archs, but the result is a very large
      reduction in the total amount of code.  It also means things like hugepage
      lazy allocation could be implemented in one place, instead of six.
      Tested, at least a little, on ppc64, i386 and x86_64.
      	- this patch changes the meaning of set_huge_pte() to be more
      	  analagous to set_pte()
      	- does SH4 need s special huge_ptep_get_and_clear()??
      Acked-by: default avatarWilliam Lee Irwin <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  9. 19 Apr, 2005 1 commit
    • Hugh Dickins's avatar
      [PATCH] freepgt: hugetlb_free_pgd_range · 3bf5ee95
      Hugh Dickins authored
      ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being
      called, and it wasn't obvious what to do about them.
      The ppc64 case turns out to be easy: the associated tables are noted elsewhere
      and freed later, safe to either skip its hugetlb areas or go through the
      motions of freeing nothing.  Since ia64 does need a special case, restore to
      ppc64 the special case of skipping them.
      The ia64 hugetlb case has been broken since pgd_addr_end went in, though it
      probably appeared to work okay if you just had one such area; in fact it's
      been broken much longer if you consider a long munmap spanning from another
      region into the hugetlb region.
      In the ia64 hugetlb region, more virtual address bits are available than in
      the other regions, yet the page tables are structured the same way: the page
      at the bottom is larger.  Here we need to scale down each addr before passing
      it to the standard free_pgd_range.  Was about to write a hugely_scaled_down
      macro, but found htlbpage_to_page already exists for just this purpose.  Fixed
      off-by-one in ia64 is_hugepage_only_range.
      Uninline free_pgd_range to make it available to ia64.  Make sure the
      vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any
      other (safe to join huges?  probably but don't bother).
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  10. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      Let it rip!