1. 08 Apr, 2014 1 commit
  2. 07 Apr, 2014 2 commits
    • Mark Salter's avatar
      x86: use generic early_ioremap · 5b7c73e0
      Mark Salter authored
      
      
      Move x86 over to the generic early ioremap implementation.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b7c73e0
    • Dave Young's avatar
      x86/mm: sparse warning fix for early_memremap · 6b550f6f
      Dave Young authored
      
      
      This patch series takes the common bits from the x86 early ioremap
      implementation and creates a generic implementation which may be used by
      other architectures.  The early ioremap interfaces are intended for
      situations where boot code needs to make temporary virtual mappings
      before the normal ioremap interfaces are available.  Typically, this
      means before paging_init() has run.
      
      This patch (of 6):
      
      There's a lot of sparse warnings for code like below: void *a =
      early_memremap(phys_addr, size);
      
      early_memremap intend to map kernel memory with ioremap facility, the
      return pointer should be a kernel ram pointer instead of iomem one.
      
      For making the function clearer and supressing sparse warnings this patch
      do below two things:
      1. cast to (__force void *) for the return value of early_memremap
      2. add early_memunmap function and pass (__force void __iomem *) to iounmap
      
      From Boris:
        "Ingo told me yesterday, it makes sense too.  I'd guess we can try it.
         FWIW, all callers of early_memremap use the memory they get remapped
         as normal memory so we should be safe"
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b550f6f
  3. 13 Mar, 2014 1 commit
  4. 06 Mar, 2014 1 commit
  5. 05 Mar, 2014 1 commit
  6. 04 Mar, 2014 3 commits
  7. 27 Feb, 2014 2 commits
  8. 13 Feb, 2014 1 commit
  9. 09 Feb, 2014 1 commit
  10. 06 Feb, 2014 2 commits
  11. 02 Feb, 2014 1 commit
  12. 25 Jan, 2014 4 commits
  13. 24 Jan, 2014 1 commit
  14. 22 Jan, 2014 4 commits
    • Grygorii Strashko's avatar
      x86/mm: memblock: switch to use NUMA_NO_NODE · 9a28f9dc
      Grygorii Strashko authored
      Update X86 code to use NUMA_NO_NODE instead of MAX_NUMNODES while
      calling memblock APIs, because memblock API will be changed to use
      NUMA_NO_NODE and will produce warning during boot otherwise.
      
      See:
       https://lkml.org/lkml/2013/12/9/898
      
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a28f9dc
    • Tang Chen's avatar
      acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable · a0acda91
      Tang Chen authored
      
      
      At very early time, the kernel have to use some memory such as loading
      the kernel image.  We cannot prevent this anyway.  So any node the
      kernel resides in should be un-hotpluggable.
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0acda91
    • Tang Chen's avatar
      acpi, numa, mem_hotplug: mark hotpluggable memory in memblock · 05d1d8cb
      Tang Chen authored
      
      
      When parsing SRAT, we know that which memory area is hotpluggable.  So we
      invoke function memblock_mark_hotplug() introduced by previous patch to
      mark hotpluggable memory in memblock.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      05d1d8cb
    • Tang Chen's avatar
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen authored
      
      
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
  15. 16 Jan, 2014 1 commit
  16. 07 Jan, 2014 1 commit
  17. 19 Dec, 2013 2 commits
    • Lans Zhang's avatar
      x86/mm/numa: Fix 32-bit kernel NUMA boot · f3d815cb
      Lans Zhang authored
      
      
      When booting a 32-bit x86 kernel on a NUMA machine, node data
      cannot be allocated from local node if the account of memory for
      node 0 covers the low memory space entirely:
      
        [    0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
        [    0.000000]   NODE_DATA [mem 0x367ed000-0x367edfff]
        [    0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
        [    0.000000] Cannot find 4096 bytes in node 1
        [    0.000000] 64664MB HIGHMEM available.
        [    0.000000] 871MB LOWMEM available.
      
      To fix this issue, node data is allowed to be allocated from
      other nodes if the memory of local node is still not mapped. The
      expected result looks like this:
      
        [    0.000000] Initmem setup node 0 [mem 0x00000000-0x83fffffff]
        [    0.000000]   NODE_DATA [mem 0x367ed000-0x367edfff]
        [    0.000000] Initmem setup node 1 [mem 0x840000000-0xfffffffff]
        [    0.000000]   NODE_DATA [mem 0x367ec000-0x367ecfff]
        [    0.000000]     NODE_DATA(1) on node 0
        [    0.000000] 64664MB HIGHMEM available.
        [    0.000000] 871MB LOWMEM available.
      Signed-off-by: default avatarLans Zhang <jia.zhang@windriver.com>
      Cc: <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1386303510-18574-1-git-send-email-jia.zhang@windriver.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f3d815cb
    • Mel Gorman's avatar
      mm: numa: serialise parallel get_user_page against THP migration · 2b4847e7
      Mel Gorman authored
      Base pages are unmapped and flushed from cache and TLB during normal
      page migration and replaced with a migration entry that causes any
      parallel NUMA hinting fault or gup to block until migration completes.
      
      THP does not unmap pages due to a lack of support for migration entries
      at a PMD level.  This allows races with get_user_pages and
      get_user_pages_fast which commit 3f926ab9
      
       ("mm: Close races between
      THP migration and PMD numa clearing") made worse by introducing a
      pmd_clear_flush().
      
      This patch forces get_user_page (fast and normal) on a pmd_numa page to
      go through the slow get_user_page path where it will serialise against
      THP migration and properly account for the NUMA hinting fault.  On the
      migration side the page table lock is taken for each PTE update.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2b4847e7
  18. 22 Nov, 2013 1 commit
  19. 20 Nov, 2013 1 commit
  20. 19 Nov, 2013 1 commit
    • Kirill A. Shutemov's avatar
      x86/mm: Implement ASLR for hugetlb mappings · fd8526ad
      Kirill A. Shutemov authored
      
      
      Matthew noticed that hugetlb mappings don't participate in ASLR on x86-64:
      
        %  for i in `seq 3`; do
        > tools/testing/selftests/vm/map_hugetlb | grep address
        > done
        Returned address is 0x2aaaaac00000
        Returned address is 0x2aaaaac00000
        Returned address is 0x2aaaaac00000
      
      /proc/PID/maps entries for the mapping are always the same
      (except inode number):
      
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 8200              /anon_hugepage (deleted)
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 256               /anon_hugepage (deleted)
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 7180              /anon_hugepage (deleted)
      
      The reason is the generic hugetlb_get_unmapped_area() function
      which is used on x86-64.  It doesn't support randomization and
      use bottom-up unmapped area lookup, instead of usual top-down
      on x86-64.
      
      x86 has arch-specific hugetlb_get_unmapped_area(), but it's used
      only on x86-32.
      
      Let's use arch-specific hugetlb_get_unmapped_area() on x86-64
      too. That adds ASLR and switches hugetlb mappings to use top-down
      unmapped area lookup:
      
        %  for i in `seq 3`; do
        > tools/testing/selftests/vm/map_hugetlb | grep address
        > done
        Returned address is 0x7f4f08a00000
        Returned address is 0x7fdda4200000
        Returned address is 0x7febe0000000
      
      /proc/PID/maps entries:
      
        7f4f08a00000-7f4f18a00000 rw-p 00000000 00:0c 1168              /anon_hugepage (deleted)
        7fdda4200000-7fddb4200000 rw-p 00000000 00:0c 7092              /anon_hugepage (deleted)
        7febe0000000-7febf0000000 rw-p 00000000 00:0c 7183              /anon_hugepage (deleted)
      
      Unmapped area lookup policy for hugetlb mappings is consistent
      with normal mappings now -- the only difference is alignment
      requirements for huge pages.
      
      libhugetlbfs test-suite didn't detect any regressions with the
      patch applied (although it shows few failures on my machine
      regardless the patch).
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Link: http://lkml.kernel.org/r/20131119131750.EA45CE0090@blue.fi.intel.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fd8526ad
  21. 15 Nov, 2013 2 commits
    • Kirill A. Shutemov's avatar
      x86: handle pgtable_page_ctor() fail · cecbd1b5
      Kirill A. Shutemov authored
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cecbd1b5
    • Kirill A. Shutemov's avatar
      x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds · 09ef4939
      Kirill A. Shutemov authored
      
      
      In split page table lock case, we embed spinlock_t into struct page.
      For obvious reason, we don't want to increase size of struct page if
      spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or
      on -rt kernel.  So we disable split page table lock, if spinlock_t is
      too big.
      
      This patchset allows to allocate the lock dynamically if spinlock_t is
      big.  In this page->ptl is used to store pointer to spinlock instead of
      spinlock itself.  It costs additional cache line for indirect access,
      but fix page fault scalability for multi-threaded applications.
      
      LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
      LOCK_STAT to analyse scalability issues breaks scalability.  ;)
      
      The patchset mostly fixes this.  Results for ./thp_memscale -c 80 -b 512M
      on 4-socket machine:
      
      baseline, no CONFIG_LOCK_STAT:	9.115460703 seconds time elapsed
      baseline, CONFIG_LOCK_STAT=y:	53.890567123 seconds time elapsed
      patched, no CONFIG_LOCK_STAT:	8.852250368 seconds time elapsed
      patched, CONFIG_LOCK_STAT=y:	11.069770759 seconds time elapsed
      
      Patch count is scary, but most of them trivial. Overview:
      
       Patches 1-4	Few bug fixes. No dependencies to other patches.
      		Probably should applied as soon as possible.
      
       Patch 5	Changes signature of pgtable_page_ctor(). We will use it
      		for dynamic lock allocation, so it can fail.
      
       Patches 6-8	Add missing constructor/destructor calls on few archs.
      		It's fixes NR_PAGETABLE accounting and prepare to use
      		split ptl.
      
       Patches 9-33	Add pgtable_page_ctor() fail handling to all archs.
      
       Patches 34	Finally adds support of dynamically-allocated page->pte.
      		Also contains documentation for split page table lock.
      
      This patch (of 34):
      
      I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
      enabled.  Let's add missed constructor/destructor calls.
      
      I haven't noticed it during testing since prep_new_page() clears
      page->mapping and therefore page->ptl.  It's effectively equal to
      spin_lock_init(&page->ptl).
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09ef4939
  22. 13 Nov, 2013 4 commits
    • Zhi Yong Wu's avatar
    • Tang Chen's avatar
      mem-hotplug: introduce movable_node boot option · c5320926
      Tang Chen authored
      
      
      The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
      As we mentioned before, if hotpluggable memory is used by the kernel, it
      cannot be hot-removed.  So memory hotplug users may want to set all
      hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.
      
      Memory hotplug users may also set a node as movable node, which has
      ZONE_MOVABLE only, so that the whole node can be hot-removed.
      
      But the kernel cannot use memory in ZONE_MOVABLE.  By doing this, the
      kernel cannot use memory in movable nodes.  This will cause NUMA
      performance down.  And other users may be unhappy.
      
      So we need a way to allow users to enable and disable this functionality.
      In this patch, we introduce movable_node boot option to allow users to
      choose to not to consume hotpluggable memory at early boot time and later
      we can set it as ZONE_MOVABLE.
      
      To achieve this, the movable_node boot option will control the memblock
      allocation direction.  That said, after memblock is ready, before SRAT is
      parsed, we should allocate memory near the kernel image as we explained in
      the previous patches.  So if movable_node boot option is set, the kernel
      does the following:
      
      1. After memblock is ready, make memblock allocate memory bottom up.
      2. After SRAT is parsed, make memblock behave as default, allocate memory
         top down.
      
      Users can specify "movable_node" in kernel commandline to enable this
      functionality.  For those who don't use memory hotplug or who don't want
      to lose their NUMA performance, just don't specify anything.  The kernel
      will work as before.
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Suggested-by: default avatarKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Suggested-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c5320926
    • Tang Chen's avatar
      x86/mem-hotplug: support initialize page tables in bottom-up · b959ed6c
      Tang Chen authored
      
      
      The Linux kernel cannot migrate pages used by the kernel.  As a result,
      kernel pages cannot be hot-removed.  So we cannot allocate hotpluggable
      memory for the kernel.
      
      In a memory hotplug system, any numa node the kernel resides in should be
      unhotpluggable.  And for a modern server, each node could have at least
      16GB memory.  So memory around the kernel image is highly likely
      unhotpluggable.
      
      ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
      info.  But before SRAT is parsed, memblock has already started to allocate
      memory for the kernel.  So we need to prevent memblock from doing this.
      
      So direct memory mapping page tables setup is the case.
      init_mem_mapping() is called before SRAT is parsed.  To prevent page
      tables being allocated within hotpluggable memory, we will use bottom-up
      direction to allocate page tables from the end of kernel image to the
      higher memory.
      
      Note:
      As for allocating page tables in lower memory, TJ said:
      
      : This is an optional behavior which is triggered by a very specific kernel
      : boot param, which I suspect is gonna need to stick around to support
      : memory hotplug in the current setup unless we add another layer of address
      : translation to support memory hotplug.
      
      As for page tables may occupy too much lower memory if using 4K mapping
      (CONFIG_DEBUG_PAGEALLOC and CONFIG_KMEMCHECK both disable using >4k
      pages), TJ said:
      
      : But as I said in the same paragraph, parsing SRAT earlier doesn't solve
      : the problem in itself either.  Ignoring the option if 4k mapping is
      : required and memory consumption would be prohibitive should work, no?
      : Something like that would be necessary if we're gonna worry about cases
      : like this no matter how we implement it, but, frankly, I'm not sure this
      : is something worth worrying about.
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b959ed6c
    • Tang Chen's avatar
      x86/mm: factor out of top-down direct mapping setup · 0167d7d8
      Tang Chen authored
      
      
      Create a new function memory_map_top_down to factor out of the top-down
      direct memory mapping pagetable setup.  This is also a preparation for the
      following patch, which will introduce the bottom-up memory mapping.  That
      said, we will put the two ways of pagetable setup into separate functions,
      and choose to use which way in init_mem_mapping, which makes the code more
      clear.
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0167d7d8
  23. 12 Nov, 2013 1 commit
    • Jiri Slaby's avatar
      x86/dumpstack: Fix printk_address for direct addresses · 5f01c988
      Jiri Slaby authored
      Consider a kernel crash in a module, simulated the following way:
      
       static int my_init(void)
       {
               char *map = (void *)0x5;
               *map = 3;
               return 0;
       }
       module_init(my_init);
      
      When we turn off FRAME_POINTERs, the very first instruction in
      that function causes a BUG. The problem is that we print IP in
      the BUG report using %pB (from printk_address). And %pB
      decrements the pointer by one to fix printing addresses of
      functions with tail calls.
      
      This was added in commit 71f9e598
      
       ("x86, dumpstack: Use
      %pB format specifier for stack trace") to fix the call stack
      printouts.
      
      So instead of correct output:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
      
      We get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa0152000>] 0xffffffffa0151fff
      
      To fix that, we use %pS only for stack addresses printouts (via
      newly added printk_stack_address) and %pB for regs->ip (via
      printk_address). I.e. we revert to the old behaviour for all
      except call stacks. And since from all those reliable is 1, we
      remove that parameter from printk_address.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: joe@perches.com
      Cc: jirislaby@gmail.com
      Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.cz
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5f01c988
  24. 11 Nov, 2013 1 commit