1. 15 Aug, 2017 1 commit
  2. 12 Jul, 2017 1 commit
    • Michal Hocko's avatar
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko authored
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
  3. 05 Jun, 2017 1 commit
    • Balbir Singh's avatar
      powerpc/mm/book(e)(3s)/64: Add page table accounting · de3b8761
      Balbir Singh authored
      Introduce a helper pgtable_gfp_flags() which
      just returns the current gfp flags and adds
      __GFP_ACCOUNT to account for page table allocation.
      The generic helper is added to include/asm/pgalloc.h
      and has two variants - WARNING ugly bits ahead
      
      1. If the header is included from a module, no check
      for mm == &init_mm is done, since init_mm is not
      exported
      2. For kernel includes, the check is done and required
      see (3e79ec7d
      
       arch: x86: charge page tables to kmemcg)
      
      The fundamental assumption is that no module should be
      doing pgd/pud/pmd and pte alloc's on behalf of init_mm
      directly.
      
      NOTE: This adds an overhead to pmd/pud/pgd allocations
      similar to x86.  The other alternative was to implement
      pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
      with their offset variants.
      
      For 4k page size, pte_alloc_one no longer calls
      pte_alloc_one_kernel.
      Signed-off-by: default avatarBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      de3b8761
  4. 25 Jun, 2016 2 commits
    • Michal Hocko's avatar
      powerpc: get rid of superfluous __GFP_REPEAT · 2379a23e
      Michal Hocko authored
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.
      
      {pud,pmd}_alloc_one are allocating from {PGT,PUD}_CACHE initialized in
      pgtable_cache_init which doesn't have larger than sizeof(void *) << 12
      size and that fits into !costly allocation request size.
      
      PGALLOC_GFP is used only in radix__pgd_alloc which uses either order-0
      or order-4 requests.  The first one doesn't need the flag while the
      second does.  Drop __GFP_REPEAT from PGALLOC_GFP and add it for the
      order-4 one.
      
      This means that this flag has never been actually useful here because it
      has always been used only for PAGE_ALLOC_COSTLY requests.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-12-git-send-email-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2379a23e
    • Michal Hocko's avatar
      tree wide: get rid of __GFP_REPEAT for order-0 allocations part I · 32d6bd90
      Michal Hocko authored
      This is the third version of the patchset previously sent [1].  I have
      basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get
      rid of superfluous gfp flags" which went through dm tree.  I am sending
      it now because it is tree wide and chances for conflicts are reduced
      considerably when we want to target rc2.  I plan to send the next step
      and rename the flag and move to a better semantic later during this
      release cycle so we will have a new semantic ready for 4.8 merge window
      hopefully.
      
      Motivation:
      
      While working on something unrelated I've checked the current usage of
      __GFP_REPEAT in the tree.  It seems that a majority of the usage is and
      always has been bogus because __GFP_REPEAT has always been about costly
      high order allocations while we are using it for order-0 or very small
      orders very often.  It seems that a big pile of them is just a
      copy&paste when a code has been adopted from one arch to another.
      
      I think it makes some sense to get rid of them because they are just
      making the semantic more unclear.  Please note that GFP_REPEAT is
      documented as
      
      * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
      
      * _might_ fail.  This depends upon the particular VM implementation.
        while !costly requests have basically nofail semantic.  So one could
        reasonably expect that order-0 request with __GFP_REPEAT will not loop
        for ever.  This is not implemented right now though.
      
      I would like to move on with __GFP_REPEAT and define a better semantic
      for it.
      
        $ git grep __GFP_REPEAT origin/master | wc -l
        111
        $ git grep __GFP_REPEAT | wc -l
        36
      
      So we are down to the third after this patch series.  The remaining
      places really seem to be relying on __GFP_REPEAT due to large allocation
      requests.  This still needs some double checking which I will do later
      after all the simple ones are sorted out.
      
      I am touching a lot of arch specific code here and I hope I got it right
      but as a matter of fact I even didn't compile test for some archs as I
      do not have cross compiler for them.  Patches should be quite trivial to
      review for stupid compile mistakes though.  The tricky parts are usually
      hidden by macro definitions and thats where I would appreciate help from
      arch maintainers.
      
      [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org
      
      This patch (of 19):
      
      __GFP_REPEAT has a rather weak semantic but since it has been introduced
      around 2.6.12 it has been ignored for low order allocations.  Yet we
      have the full kernel tree with its usage for apparently order-0
      allocations.  This is really confusing because __GFP_REPEAT is
      explicitly documented to allow allocation failures which is a weaker
      semantic than the current order-0 has (basically nofail).
      
      Let's simply drop __GFP_REPEAT from those places.  This would allow to
      identify place which really need allocator to retry harder and formulate
      a more specific semantic for what the flag is supposed to do actually.
      
      Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: John Crispin <blogic@openwrt.org>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      32d6bd90
  5. 10 Jun, 2016 1 commit
    • Aneesh Kumar K.V's avatar
      powerpc/mm/radix: Flush page walk cache when freeing page table · a145abf1
      Aneesh Kumar K.V authored
      Even though a tlb_flush() does a flush with invalidate all cache,
      we can end up doing an RCU page table free before calling tlb_flush().
      That means we can have page walk cache entries even after we free the
      page table pages. This can result in us doing wrong page table walk.
      
      Avoid this by doing pwc flush on every page table free. We can't batch
      the pwc flush, because the rcu call back function where we free the
      page table pages doesn't have information of the mmu gather. Thus we
      have to do a pwc on every page table page freed.
      
      Note: I also removed the dummy tlb_flush_pgtable call functions for
      hash 32.
      
      Fixes: 1a472c9d
      
       ("powerpc/mm/radix: Add tlbflush routines")
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      a145abf1
  6. 11 May, 2016 6 commits
  7. 03 Mar, 2016 1 commit
  8. 29 Feb, 2016 1 commit
  9. 14 Dec, 2015 2 commits
  10. 10 Dec, 2013 1 commit
    • Hong H. Pham's avatar
      powerpc: Fix PTE page address mismatch in pgtable ctor/dtor · cf77ee54
      Hong H. Pham authored
      In pte_alloc_one(), pgtable_page_ctor() is passed an address that has
      not been converted by page_address() to the newly allocated PTE page.
      
      When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor()
      with an address to the PTE page that has been converted by page_address().
      The mismatch in the PTE's page address causes pgtable_page_dtor() to access
      invalid memory, so resources for that PTE (such as the page lock) is not
      properly cleaned up.
      
      On PPC32, only SMP kernels are affected.
      
      On PPC64, only SMP kernels with 4K page size are affected.
      
      This bug was introduced by commit d614bb04
      "powerpc: Move the pte free routines from common header".
      
      On a preempt-rt kernel, a spinlock is dynamically allocated for each
      PTE in pgtable_page_ctor().  When the PTE is freed, calling
      pgtable_page_dtor() with a mismatched page address causes a memory leak,
      as the pointer to the PTE's spinlock is bogus.
      
      On mainline, there isn't any immediately obvious symptoms, but the
      problem still exists here.
      
      Fixes: d614bb04
      
       "powerpc: Move the pte free routes from common header"
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linux-stable <stable@vger.kernel.org> # v3.10+
      Signed-off-by: default avatarHong H. Pham <hong.pham@windriver.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf77ee54
  11. 25 Nov, 2013 1 commit
    • Hari Bathini's avatar
      powerpc/kdump: Adding symbols in vmcoreinfo to facilitate dump filtering · 8ff81271
      Hari Bathini authored
      
      
      When CONFIG_SPARSEMEM_VMEMMAP option is used in kernel, makedumpfile fails
      to filter vmcore dump as it fails to do vmemmap translations. So far
      dump filtering on ppc64 never had to deal with vmemmap addresses seperately
      as vmemmap regions where mapped in zone normal. But with the inclusion of
      CONFIG_SPARSEMEM_VMEMMAP config option in kernel, this vmemmap address
      translation support becomes necessary for dump filtering. For vmemmap adress
      translation, few kernel symbols are needed by dump filtering tool. This patch
      adds those symbols to vmcoreinfo, which a dump filtering tool can use for
      filtering the kernel dump. Tested this changes successfully with makedumpfile
      tool that supports vmemmap to physical address translation outside zone normal.
      
      [ Removed unneeded #ifdef as suggested by Michael Ellerman --BenH ]
      Signed-off-by: default avatarHari Bathini <hbathini@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8ff81271
  12. 15 Nov, 2013 1 commit
  13. 21 Jun, 2013 1 commit
  14. 14 May, 2013 1 commit
  15. 30 Apr, 2013 3 commits
  16. 06 May, 2010 1 commit
    • Mark Nelson's avatar
      powerpc/mm: Track backing pages allocated by vmemmap_populate() · 91eea67c
      Mark Nelson authored
      
      
      We need to keep track of the backing pages that get allocated by
      vmemmap_populate() so that when we use kdump, the dump-capture kernel knows
      where these pages are.
      
      We use a simple linked list of structures that contain the physical address
      of the backing page and corresponding virtual address to track the backing
      pages.
      To save space, we just use a pointer to the next struct vmemmap_backing. We
      can also do this because we never remove nodes.  We call the pointer "list"
      to be compatible with changes made to the crash utility.
      
      vmemmap_populate() is called either at boot-time or on a memory hotplug
      operation. We don't have to worry about the boot-time calls because they
      will be inherently single-threaded, and for a memory hotplug operation
      vmemmap_populate() is called through:
      sparse_add_one_section()
                  |
                  V
      kmalloc_section_memmap()
                  |
                  V
      sparse_mem_map_populate()
                  |
                  V
      vmemmap_populate()
      and in sparse_add_one_section() we're protected by pgdat_resize_lock().
      So, we don't need a spinlock to protect the vmemmap_list.
      
      We allocate space for the vmemmap_backing structs by allocating whole pages
      in vmemmap_list_alloc() and then handing out chunks of this to
      vmemmap_list_populate().
      
      This means that we waste at most just under one page, but this keeps the code
      is simple.
      Signed-off-by: default avatarMark Nelson <markn@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      91eea67c
  17. 08 Dec, 2009 1 commit
  18. 01 Dec, 2009 1 commit
  19. 27 Nov, 2009 1 commit
  20. 30 Oct, 2009 1 commit
    • David Gibson's avatar
      powerpc/mm: Cleanup management of kmem_caches for pagetables · a0668cdc
      David Gibson authored
      
      
      Currently we have a fair bit of rather fiddly code to manage the
      various kmem_caches used to store page tables of various levels.  We
      generally have two caches holding some combination of PGD, PUD and PMD
      tables, plus several more for the special hugepage pagetables.
      
      This patch cleans this all up by taking a different approach.  Rather
      than the caches being designated as for PUDs or for hugeptes for 16M
      pages, the caches are simply allocated to be a specific size.  Thus
      sharing of caches between different types/levels of pagetables happens
      naturally.  The pagetable size, where needed, is passed around encoded
      in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
      pagetable contains 2^n pointers.
      Signed-off-by: default avatarDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0668cdc
  21. 27 Jul, 2009 1 commit
    • Benjamin Herrenschmidt's avatar
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() · 9e1b32ca
      Benjamin Herrenschmidt authored
      
      
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
      
      Upcoming paches to support the new 64-bit "BookE" powerpc architecture
      will need to have the virtual address corresponding to PTE page when
      freeing it, due to the way the HW table walker works.
      
      Basically, the TLB can be loaded with "large" pages that cover the whole
      virtual space (well, sort-of, half of it actually) represented by a PTE
      page, and which contain an "indirect" bit indicating that this TLB entry
      RPN points to an array of PTEs from which the TLB can then create direct
      entries. Thus, in order to invalidate those when PTE pages are deleted,
      we need the virtual address to pass to tlbilx or tlbivax instructions.
      
      The old trick of sticking it somewhere in the PTE page struct page sucks
      too much, the address is almost readily available in all call sites and
      almost everybody implemets these as macros, so we may as well add the
      argument everywhere. I added it to the pmd and pud variants for consistency.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e1b32ca
  22. 03 Dec, 2008 1 commit
  23. 04 Aug, 2008 1 commit
  24. 24 Jul, 2008 1 commit
  25. 08 Feb, 2008 1 commit
    • Martin Schwidefsky's avatar
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky authored
      
      
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  26. 05 Feb, 2008 1 commit
  27. 23 Jan, 2008 1 commit
    • Paul Mackerras's avatar
      [POWERPC] Provide a way to protect 4k subpages when using 64k pages · fa28237c
      Paul Mackerras authored
      
      
      Using 64k pages on 64-bit PowerPC systems makes life difficult for
      emulators that are trying to emulate an ISA, such as x86, which use a
      smaller page size, since the emulator can no longer use the MMU and
      the normal system calls for controlling page protections.  Of course,
      the emulator can emulate the MMU by checking and possibly remapping
      the address for each memory access in software, but that is pretty
      slow.
      
      This provides a facility for such programs to control the access
      permissions on individual 4k sub-pages of 64k pages.  The idea is
      that the emulator supplies an array of protection masks to apply to a
      specified range of virtual addresses.  These masks are applied at the
      level where hardware PTEs are inserted into the hardware page table
      based on the Linux PTEs, so the Linux PTEs are not affected.  Note
      that this new mechanism does not allow any access that would otherwise
      be prohibited; it can only prohibit accesses that would otherwise be
      allowed.  This new facility is only available on 64-bit PowerPC and
      only when the kernel is configured for 64k pages.
      
      The masks are supplied using a new subpage_prot system call, which
      takes a starting virtual address and length, and a pointer to an array
      of protection masks in memory.  The array has a 32-bit word per 64k
      page to be protected; each 32-bit word consists of 16 2-bit fields,
      for which 0 allows any access (that is otherwise allowed), 1 prevents
      write accesses, and 2 or 3 prevent any access.
      
      Implicit in this is that the regions of the address space that are
      protected are switched to use 4k hardware pages rather than 64k
      hardware pages (on machines with hardware 64k page support).  In fact
      the whole process is switched to use 4k hardware pages when the
      subpage_prot system call is used, but this could be improved in future
      to switch only the affected segments.
      
      The subpage protection bits are stored in a 3 level tree akin to the
      page table tree.  The top level of this tree is stored in a structure
      that is appended to the top level of the page table tree, i.e., the
      pgd array.  Since it will often only be 32-bit addresses (below 4GB)
      that are protected, the pointers to the first four bottom level pages
      are also stored in this structure (each bottom level page contains the
      protection bits for 1GB of address space), so the protection bits for
      addresses below 4GB can be accessed with one fewer loads than those
      for higher addresses.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      fa28237c
  28. 02 Jun, 2007 1 commit
  29. 09 May, 2007 1 commit
  30. 02 May, 2007 1 commit
    • David Gibson's avatar
      [POWERPC] Remove arch/powerpc's dependence on asm-ppc/pg{alloc,table}.h · f88df14b
      David Gibson authored
      
      
      Currently, all 32-bit powerpc platforms use asm-ppc/pgtable.h and
      asm-ppc/pgalloc.h, even when otherwise compiled with ARCH=powerpc.
      Those asm-ppc files are a fairly nasty tangle of #ifdefs including a
      bunch of things which shouldn't be necessary any more in arch/powerpc.
      
      Cleaning up that mess is going to take a while, but this patch is a
      first step.  It separates the asm-powerpc/pg{alloc,table}.h into 64
      bit and 32 bit versions in asm-powerpc, which the basic .h files in
      asm-powerpc select based on config.  We make a few tiny tweaks to the
      innards of the files along the way, making the outermost ifdefs
      (double-inclusion protection and __KERNEL__) a little cleaner, and
      #including asm-generic/pgtable.h from the top-level
      asm-powerpc/pgtable.h (since both the old 32-bit and 64-bit versions
      ended with such an #include).
      Signed-off-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      f88df14b
  31. 07 Dec, 2006 1 commit