1. 29 Mar, 2020 1 commit
    • Aneesh Kumar K.V's avatar
      mm/sparse: fix kernel crash with pfn_section_valid check · b943f045
      Aneesh Kumar K.V authored
      Fix the crash like this:
      
          BUG: Kernel NULL pointer dereference on read at 0x00000000
          Faulting instruction address: 0xc000000000c3447c
          Oops: Kernel access of bad area, sig: 11 [#1]
          LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
          CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
          ...
          NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
          LR [c000000000088354] vmemmap_free+0x144/0x320
          Call Trace:
             section_deactivate+0x220/0x240
             __remove_pages+0x118/0x170
             arch_remove_memory+0x3c/0x150
             memunmap_pages+0x1cc/0x2f0
             devm_action_release+0x30/0x50
             release_nodes+0x2f8/0x3e0
             device_release_driver_internal+0x168/0x270
             unbind_store+0x130/0x170
             drv_attr_store+0x44/0x60
             sysfs_kf_write+0x68/0x80
             kernfs_fop_write+0x100/0x290
             __vfs_write+0x3c/0x70
             vfs_write+0xcc/0x240
             ksys_write+0x7c/0x140
             system_call+0x5c/0x68
      
      The crash is due to NULL dereference at
      
      	test_bit(idx, ms->usage->subsection_map);
      
      due to ms->usage = NULL in pfn_section_valid()
      
      With commit d41e2f3b ("mm/hotplug: fix hot remove failure in
      SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
      depopulate_section_mem().  This was done so that pfn_page() can work
      correctly with kernel config that disables SPARSEMEM_VMEMMAP.  With that
      config pfn_to_page does
      
      	__section_mem_map_addr(__sec) + __pfn;
      
      where
      
        static inline struct page *__section_mem_map_addr(struct mem_section *section)
        {
      	unsigned long map = section->section_mem_map;
      	map &= SECTION_MAP_MASK;
      	return (struct page *)map;
        }
      
      Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
      used to check the pfn validity (pfn_valid()).  Since section_deactivate
      release mem_section->usage if a section is fully deactivated,
      pfn_valid() check after a subsection_deactivate cause a kernel crash.
      
        static inline int pfn_valid(unsigned long pfn)
        {
        ...
      	return early_section(ms) || pfn_section_valid(ms, pfn);
        }
      
      where
      
        static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
        {
      	int idx = subsection_map_index(pfn);
      
      	return test_bit(idx, ms->usage->subsection_map);
        }
      
      Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
      freed.  For architectures like ppc64 where large pages are used for
      vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
      sections.  Hence before a vmemmap mapping page can be freed, the kernel
      needs to make sure there are no valid sections within that mapping.
      Clearing the section valid bit before depopulate_section_memap enables
      this.
      
      [aneesh.kumar@linux.ibm.com: add comment]
        Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.comLink: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com
      Fixes: d41e2f3b
      
       ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
      Reported-by: default avatarSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarSachin Sant <sachinp@linux.vnet.ibm.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b943f045
  2. 22 Mar, 2020 1 commit
  3. 21 Feb, 2020 1 commit
    • Wei Yang's avatar
      mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM · 18e19f19
      Wei Yang authored
      When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page()
      doesn't work before sparse_init_one_section() is called.
      
      This leads to a crash when hotplug memory:
      
          BUG: unable to handle page fault for address: 0000000006400000
          #PF: supervisor write access in kernel mode
          #PF: error_code(0x0002) - not-present page
          PGD 0 P4D 0
          Oops: 0002 [#1] SMP PTI
          CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G        W         5.5.0-next-20200205+ #343
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
          Workqueue: kacpi_hotplug acpi_hotplug_work_fn
          RIP: 0010:__memset+0x24/0x30
          Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
          RSP: 0018:ffffb43ac0373c80 EFLAGS: 00010a87
          RAX: ffffffffffffffff RBX: ffff8a1518800000 RCX: 0000000000050000
          RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000006400000
          RBP: 0000000000140000 R08: 0000000000100000 R09: 0000000006400000
          R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
          R13: 0000000000000028 R14: 0000000000000000 R15: ffff8a153ffd9280
          FS:  0000000000000000(0000) GS:ffff8a153ab00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000006400000 CR3: 0000000136fca000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           sparse_add_section+0x1c9/0x26a
           __add_pages+0xbf/0x150
           add_pages+0x12/0x60
           add_memory_resource+0xc8/0x210
           __add_memory+0x62/0xb0
           acpi_memory_device_add+0x13f/0x300
           acpi_bus_attach+0xf6/0x200
           acpi_bus_scan+0x43/0x90
           acpi_device_hotplug+0x275/0x3d0
           acpi_hotplug_work_fn+0x1a/0x30
           process_one_work+0x1a7/0x370
           worker_thread+0x30/0x380
           kthread+0x112/0x130
           ret_from_fork+0x35/0x40
      
      We should use memmap as it did.
      
      On x86 the impact is limited to x86_32 builds, or x86_64 configurations
      that override the default setting for SPARSEMEM_VMEMMAP.
      
      Other memory hotplug archs (arm64, ia64, and ppc) also default to
      SPARSEMEM_VMEMMAP=y.
      
      [dan.j.williams@intel.com: changelog update]
      {rppt@linux.ibm.com: changelog update]
      Link: http://lkml.kernel.org/r/20200219030454.4844-1-bhe@redhat.com
      Fixes: ba72b4c8
      
       ("mm/sparsemem: support sub-section hotplug")
      Signed-off-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18e19f19
  4. 04 Feb, 2020 1 commit
  5. 31 Jan, 2020 1 commit
  6. 14 Jan, 2020 1 commit
    • David Hildenbrand's avatar
      mm/memory_hotplug: don't free usage map when removing a re-added early section · 8068df3b
      David Hildenbrand authored
      When we remove an early section, we don't free the usage map, as the
      usage maps of other sections are placed into the same page.  Once the
      section is removed, it is no longer an early section (especially, the
      memmap is freed).  When we re-add that section, the usage map is reused,
      however, it is no longer an early section.  When removing that section
      again, we try to kfree() a usage map that was allocated during early
      boot - bad.
      
      Let's check against PageReserved() to see if we are dealing with an
      usage map that was allocated during boot.  We could also check against
      !(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is
      cleaner.
      
      Can be triggered using memtrace under ppc64/powernv:
      
        $ mount -t debugfs none /sys/kernel/debug/
        $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
        $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
         ------------[ cut here ]------------
         kernel BUG at mm/slub.c:3969!
         Oops: Exception in kernel mode, sig: 5 [#1]
         LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV
         Modules linked in:
         CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61
         NIP kfree+0x338/0x3b0
         LR section_deactivate+0x138/0x200
         Call Trace:
           section_deactivate+0x138/0x200
           __remove_pages+0x114/0x150
           arch_remove_memory+0x3c/0x160
           try_remove_memory+0x114/0x1a0
           __remove_memory+0x20/0x40
           memtrace_enable_set+0x254/0x850
           simple_attr_write+0x138/0x160
           full_proxy_write+0x8c/0x110
           __vfs_write+0x38/0x70
           vfs_write+0x11c/0x2a0
           ksys_write+0x84/0x140
           system_call+0x5c/0x68
         ---[ end trace 4b053cbd84e0db62 ]---
      
      The first invocation will offline+remove memory blocks.  The second
      invocation will first add+online them again, in order to offline+remove
      them again (usually we are lucky and the exact same memory blocks will
      get "reallocated").
      
      Tested on powernv with boot memory: The usage map will not get freed.
      Tested on x86-64 with DIMMs: The usage map will get freed.
      
      Using Dynamic Memory under a Power DLAPR can trigger it easily.
      
      Triggering removal (I assume after previously removed+re-added) of
      memory from the HMC GUI can crash the kernel with the same call trace
      and is fixed by this patch.
      
      Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com
      Fixes: 326e1b8f
      
       ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarPingfan Liu <piliu@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8068df3b
  7. 01 Dec, 2019 4 commits
  8. 07 Oct, 2019 1 commit
  9. 24 Sep, 2019 5 commits
  10. 19 Jul, 2019 11 commits
    • Dan Williams's avatar
      mm/sparsemem: cleanup 'section number' data types · 9a845030
      Dan Williams authored
      David points out that there is a mixture of 'int' and 'unsigned long'
      usage for section number data types.  Update the memory hotplug path to
      use 'unsigned long' consistently for section numbers.
      
      [akpm@linux-foundation.org: fix printk format]
      Link: http://lkml.kernel.org/r/156107543656.1329419.11505835211949439815.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a845030
    • Dan Williams's avatar
      mm/sparsemem: support sub-section hotplug · ba72b4c8
      Dan Williams authored
      The libnvdimm sub-system has suffered a series of hacks and broken
      workarounds for the memory-hotplug implementation's awkward
      section-aligned (128MB) granularity.
      
      For example the following backtrace is emitted when attempting
      arch_add_memory() with physical address ranges that intersect 'System
      RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:
      
          # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
          100000000-1ffffffff : System RAM
          200000000-303ffffff : Persistent Memory (legacy)
          304000000-43fffffff : System RAM
          440000000-23ffffffff : Persistent Memory
          2400000000-43bfffffff : Persistent Memory
            2400000000-43bfffffff : namespace2.0
      
          WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
          [..]
          RIP: 0010:add_pages+0x5c/0x60
          [..]
          Call Trace:
           devm_memremap_pages+0x460/0x6e0
           pmem_attach_disk+0x29e/0x680 [nd_pmem]
           ? nd_dax_probe+0xfc/0x120 [libnvdimm]
           nvdimm_bus_probe+0x66/0x160 [libnvdimm]
      
      It was discovered that the problem goes beyond RAM vs PMEM collisions as
      some platform produce PMEM vs PMEM collisions within a given section.
      The libnvdimm workaround for that case revealed that the libnvdimm
      section-alignment-padding implementation has been broken for a long
      while.
      
      A fix for that long-standing breakage introduces as many problems as it
      solves as it would require a backward-incompatible change to the
      namespace metadata interpretation.  Instead of that dubious route [1],
      address the root problem in the memory-hotplug implementation.
      
      Note that EEXIST is no longer treated as success as that is how
      sparse_add_section() reports subsection collisions, it was also obviated
      by recent changes to perform the request_region() for 'System RAM'
      before arch_add_memory() in the add_memory() sequence.
      
      [1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
      
      [osalvador@suse.de: fix deactivate_section for early sections]
        Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
      Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba72b4c8
    • Dan Williams's avatar
      mm/sparsemem: prepare for sub-section ranges · 7ea62160
      Dan Williams authored
      Prepare the memory hot-{add,remove} paths for handling sub-section
      ranges by plumbing the starting page frame and number of pages being
      handled through arch_{add,remove}_memory() to
      sparse_{add,remove}_one_section().
      
      This is simply plumbing, small cleanups, and some identifier renames.
      No intended functional changes.
      
      Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ea62160
    • Dan Williams's avatar
      mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() · e9c0a3f0
      Dan Williams authored
      Allow sub-section sized ranges to be added to the memmap.
      
      populate_section_memmap() takes an explict pfn range rather than
      assuming a full section, and those parameters are plumbed all the way
      through to vmmemap_populate().  There should be no sub-section usage in
      current deployments.  New warnings are added to clarify which memmap
      allocation paths are sub-section capable.
      
      Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e9c0a3f0
    • Dan Williams's avatar
      mm/sparsemem: add helpers track active portions of a section at boot · f46edbd1
      Dan Williams authored
      Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
      sub-section active bitmask, each bit representing a PMD_SIZE span of the
      architecture's memory hotplug section size.
      
      The implications of a partially populated section is that pfn_valid()
      needs to go beyond a valid_section() check and either determine that the
      section is an "early section", or read the sub-section active ranges
      from the bitmask.  The expectation is that the bitmask (subsection_map)
      fits in the same cacheline as the valid_section() / early_section()
      data, so the incremental performance overhead to pfn_valid() should be
      negligible.
      
      The rationale for using early_section() to short-ciruit the
      subsection_map check is that there are legacy code paths that use
      pfn_valid() at section granularity before validating the pfn against
      pgdat data.  So, the early_section() check allows those traditional
      assumptions to persist while also permitting subsection_map to tell the
      truth for purposes of populating the unused portions of early sections
      with PMEM and other ZONE_DEVICE mappings.
      
      Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Tested-by: default avatarJane Chu <jane.chu@oracle.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f46edbd1
    • Dan Williams's avatar
      mm/sparsemem: introduce a SECTION_IS_EARLY flag · 326e1b8f
      Dan Williams authored
      In preparation for sub-section hotplug, track whether a given section
      was created during early memory initialization, or later via memory
      hotplug.  This distinction is needed to maintain the coarse expectation
      that pfn_valid() returns true for any pfn within a given section even if
      that section has pages that are reserved from the page allocator.
      
      For example one of the of goals of subsection hotplug is to support
      cases where the system physical memory layout collides System RAM and
      PMEM within a section.  Several pfn_valid() users expect to just check
      if a section is valid, but they are not careful to check if the given
      pfn is within a "System RAM" boundary and instead expect pgdat
      information to further validate the pfn.
      
      Rather than unwind those paths to make their pfn_valid() queries more
      precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
      traditional expectation that pfn_valid() returns true for all early
      sections.
      
      Link: https://lore.kernel.org/lkml/1560366952-10660-1-git-send-email-cai@lca.pw/
      Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      326e1b8f
    • Dan Williams's avatar
      mm/sparsemem: introduce struct mem_section_usage · f1eca35a
      Dan Williams authored
      Patch series "mm: Sub-section memory hotplug support", v10.
      
      The memory hotplug section is an arbitrary / convenient unit for memory
      hotplug.  'Section-size' units have bled into the user interface
      ('memblock' sysfs) and can not be changed without breaking existing
      userspace.  The section-size constraint, while mostly benign for typical
      memory hotplug, has and continues to wreak havoc with 'device-memory'
      use cases, persistent memory (pmem) in particular.  Recall that pmem
      uses devm_memremap_pages(), and subsequently arch_add_memory(), to
      allocate a 'struct page' memmap for pmem.  However, it does not use the
      'bottom half' of memory hotplug, i.e.  never marks pmem pages online and
      never exposes the userspace memblock interface for pmem.  This leaves an
      opening to redress the section-size constraint.
      
      To date, the libnvdimm subsystem has attempted to inject padding to
      satisfy the internal constraints of arch_add_memory().  Beyond
      complicating the code, leading to bugs [2], wasting memory, and limiting
      configuration flexibility, the padding hack is broken when the platform
      changes this physical memory alignment of pmem from one boot to the
      next.  Device failure (intermittent or permanent) and physical
      reconfiguration are events that can cause the platform firmware to
      change the physical placement of pmem on a subsequent boot, and device
      failure is an everyday event in a data-center.
      
      It turns out that sections are only a hard requirement of the
      user-facing interface for memory hotplug and with a bit more
      infrastructure sub-section arch_add_memory() support can be added for
      kernel internal usages like devm_memremap_pages().  Here is an analysis
      of the current design assumptions in the current code and how they are
      addressed in the new implementation:
      
      Current design assumptions:
      
       - Sections that describe boot memory (early sections) are never
         unplugged / removed.
      
       - pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
         valid_section() check
      
       - __add_pages() and helper routines assume all operations occur in
         PAGES_PER_SECTION units.
      
       - The memblock sysfs interface only comprehends full sections
      
      New design assumptions:
      
       - Sections are instrumented with a sub-section bitmask to track (on
         x86) individual 2MB sub-divisions of a 128MB section.
      
       - Partially populated early sections can be extended with additional
         sub-sections, and those sub-sections can be removed with
         arch_remove_memory(). With this in place we no longer lose usable
         memory capacity to padding.
      
       - pfn_valid() is updated to look deeper than valid_section() to also
         check the active-sub-section mask. This indication is in the same
         cacheline as the valid_section() so the performance impact is
         expected to be negligible. So far the lkp robot has not reported any
         regressions.
      
       - Outside of the core vmemmap population routines which are replaced,
         other helper routines like shrink_{zone,pgdat}_span() are updated to
         handle the smaller granularity. Core memory hotplug routines that
         deal with online memory are not touched.
      
       - The existing memblock sysfs user api guarantees / assumptions are not
         touched since this capability is limited to !online
         !memblock-sysfs-accessible sections.
      
      Meanwhile the issue reports continue to roll in from users that do not
      understand when and how the 128MB constraint will bite them.  The current
      implementation relied on being able to support at least one misaligned
      namespace, but that immediately falls over on any moderately complex
      namespace creation attempt.  Beyond the initial problem of 'System RAM'
      colliding with pmem, and the unsolvable problem of physical alignment
      changes, Linux is now being exposed to platforms that collide pmem ranges
      with other pmem ranges by default [3].  In short, devm_memremap_pages()
      has pushed the venerable section-size constraint past the breaking point,
      and the simplicity of section-aligned arch_add_memory() is no longer
      tenable.
      
      These patches are exposed to the kbuild robot on a subsection-v10 branch
      [4], and a preview of the unit test for this functionality is available
      on the 'subsection-pending' branch of ndctl [5].
      
      [2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
      [3]: https://github.com/pmem/ndctl/issues/76
      [4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10
      [5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c
      
      This patch (of 13):
      
      Towards enabling memory hotplug to track partial population of a section,
      introduce 'struct mem_section_usage'.
      
      A pointer to a 'struct mem_section_usage' instance replaces the existing
      pointer to a 'pageblock_flags' bitmap.  Effectively it adds one more
      'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house
      a new 'subsection_map' bitmap.  The new bitmap enables the memory
      hot{plug,remove} implementation to act on incremental sub-divisions of a
      section.
      
      SUBSECTION_SHIFT is defined as global constant instead of per-architecture
      value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of
      subsection users.  Specifically a common subsection size allows for the
      possibility that persistent memory namespace configurations be made
      compatible across architectures.
      
      The primary motivation for this functionality is to support platforms that
      mix "System RAM" and "Persistent Memory" within a single section, or
      multiple PMEM ranges with different mapping lifetimes within a single
      section.  The section restriction for hotplug has caused an ongoing saga
      of hacks and bugs for devm_memremap_pages() users.
      
      Beyond the fixups to teach existing paths how to retrieve the 'usemap'
      from a section, and updates to usemap allocation path, there are no
      expected behavior changes.
      
      Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.com
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1eca35a
    • David Hildenbrand's avatar
      mm: section numbers use the type "unsigned long" · 2491f0a2
      David Hildenbrand authored
      Patch series "mm: Further memory block device cleanups", v1.
      
      Some further cleanups around memory block devices.  Especially, clean up
      and simplify walk_memory_range().  Including some other minor cleanups.
      
      This patch (of 6):
      
      We are using a mixture of "int" and "unsigned long".  Let's make this
      consistent by using "unsigned long" everywhere.  We'll do the same with
      memory block ids next.
      
      While at it, turn the "unsigned long i" in removable_show() into an int
      - sections_per_block is an int.
      
      [akpm@linux-foundation.org: s/unsigned long i/unsigned long nr/]
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-2-david@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-2-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2491f0a2
    • Wei Yang's avatar
      mm/sparse.c: set section nid for hot-add memory · 26f26bed
      Wei Yang authored
      In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in
      section_to_node_table[].  While for hot-add memory, this is missed.
      Without this information, page_to_nid() may not give the right node id.
      
      BTW, current online_pages works because it leverages nid in
      memory_block.  But the granularity of node id should be mem_section
      wide.
      
      Link: http://lkml.kernel.org/r/20190618005537.18878-1-richardw.yang@linux.intel.com
      
      Signed-off-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26f26bed
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section · b9bf8d34
      David Hildenbrand authored
      The parameter is unused, so let's drop it.  Memory removal paths should
      never care about zones.  This is the job of memory offlining and will
      require more refactorings.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-12-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9bf8d34
    • David Hildenbrand's avatar
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand authored
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d
  11. 14 May, 2019 1 commit
  12. 29 Mar, 2019 1 commit
    • Qian Cai's avatar
      mm/hotplug: fix offline undo_isolate_page_range() · 9b7ea46a
      Qian Cai authored
      Commit f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded
      memory to zones until online") introduced move_pfn_range_to_zone() which
      calls memmap_init_zone() during onlining a memory block.
      memmap_init_zone() will reset pagetype flags and makes migrate type to
      be MOVABLE.
      
      However, in __offline_pages(), it also call undo_isolate_page_range()
      after offline_isolated_pages() to do the same thing.  Due to commit
      2ce13640 ("mm: __first_valid_page skip over offline pages") changed
      __first_valid_page() to skip offline pages, undo_isolate_page_range()
      here just waste CPU cycles looping around the offlining PFN range while
      doing nothing, because __first_valid_page() will return NULL as
      offline_isolated_pages() has already marked all memory sections within
      the pfn range as offline via offline_mem_sections().
      
      Also, after calling the "useless" undo_isolate_page_range() here, it
      reaches the point of no returning by notifying MEM_OFFLINE.  Those pages
      will be marked as MIGRATE_MOVABLE again once onlining.  The only thing
      left to do is to decrease the number of isolated pageblocks zone counter
      which would make some paths of the page allocation slower that the above
      commit introduced.
      
      Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
      on ppc64, an "int" should still be enough to represent the number of
      pageblocks there.  Fix an incorrect comment along the way.
      
      [cai@lca.pw: v4]
        Link: http://lkml.kernel.org/r/20190314150641.59358-1-cai@lca.pw
      Link: http://lkml.kernel.org/r/20190313143133.46200-1-cai@lca.pw
      Fixes: 2ce13640
      
       ("mm: __first_valid_page skip over offline pages")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b7ea46a
  13. 12 Mar, 2019 2 commits
    • Mike Rapoport's avatar
      memblock: drop memblock_alloc_*_nopanic() variants · 26fb3dae
      Mike Rapoport authored
      As all the memblock allocation functions return NULL in case of error
      rather than panic(), the duplicates with _nopanic suffix can be removed.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: Petr Mladek <pmladek@suse.com>		[printk]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26fb3dae
    • Mike Rapoport's avatar
      treewide: add checks for the return value of memblock_alloc*() · 8a7f97b9
      Mike Rapoport authored
      Add check for the return value of memblock_alloc*() functions and call
      panic() in case of error.  The panic message repeats the one used by
      panicing memblock allocators with adjustment of parameters to include
      only relevant ones.
      
      The replacement was mostly automated with semantic patches like the one
      below with manual massaging of format strings.
      
        @@
        expression ptr, size, align;
        @@
        ptr = memblock_alloc(size, align);
        + if (!ptr)
        + 	panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align);
      
      [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type]
        Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org
      [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc]
        Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com
      [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails]
        Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx
      [akpm@linux-foundation.org: fix xtensa printk warning]
      Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Reviewed-by: Guo Ren <ren_guo@c-sky.com>		[c-sky]
      Acked-by: Paul Burton <paul.burton@mips.com>		[MIPS]
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>	[s390]
      Reviewed-by: Juergen Gross <jgross@suse.com>		[Xen]
      Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Max Filippov <jcmvbkbc@gmail.com>		[xtensa]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a7f97b9
  14. 06 Mar, 2019 1 commit
    • Qian Cai's avatar
      mm/sparse: fix a bad comparison · d778015a
      Qian Cai authored
      next_present_section_nr() could only return an unsigned number -1, so
      just check it specifically where compilers will convert -1 to unsigned
      if needed.
      
        mm/sparse.c: In function 'sparse_init_nid':
        mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
               ((section_nr >= 0) &&    \
                            ^~
        mm/sparse.c:478:2: note: in expansion of macro
        'for_each_present_section_nr'
          for_each_present_section_nr(pnum_begin, pnum) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
        mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
               ((section_nr >= 0) &&    \
                            ^~
        mm/sparse.c:497:2: note: in expansion of macro
        'for_each_present_section_nr'
          for_each_present_section_nr(pnum_begin, pnum) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
        mm/sparse.c: In function 'sparse_init':
        mm/sparse.c:200:20: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
               ((section_nr >= 0) &&    \
                            ^~
        mm/sparse.c:520:2: note: in expansion of macro
        'for_each_present_section_nr'
          for_each_present_section_nr(pnum_begin + 1, pnum_end) {
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Link: http://lkml.kernel.org/r/20190228181839.86504-1-cai@lca.pw
      Fixes: c4e1be9e
      
       ("mm, sparsemem: break out of loops early")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d778015a
  15. 28 Dec, 2018 3 commits
  16. 14 Dec, 2018 1 commit
  17. 31 Oct, 2018 4 commits
    • Mike Rapoport's avatar
      memblock: stop using implicit alignment to SMP_CACHE_BYTES · 7e1c4e27
      Mike Rapoport authored
      When a memblock allocation APIs are called with align = 0, the alignment
      is implicitly set to SMP_CACHE_BYTES.
      
      Implicit alignment is done deep in the memblock allocator and it can
      come as a surprise.  Not that such an alignment would be wrong even
      when used incorrectly but it is better to be explicit for the sake of
      clarity and the prinicple of the least surprise.
      
      Replace all such uses of memblock APIs with the 'align' parameter
      explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
      in the memblock internal allocation functions.
      
      For the case when memblock APIs are used via helper functions, e.g.  like
      iommu_arena_new_node() in Alpha, the helper functions were detected with
      Coccinelle's help and then manually examined and updated where
      appropriate.
      
      The direct memblock APIs users were updated using the semantic patch below:
      
      @@
      expression size, min_addr, max_addr, nid;
      @@
      (
      |
      - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
      nid)
      |
      - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
      nid)
      |
      - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
      + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
      |
      - memblock_alloc(size, 0)
      + memblock_alloc(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_raw(size, 0)
      + memblock_alloc_raw(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_from(size, 0, min_addr)
      + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
      |
      - memblock_alloc_nopanic(size, 0)
      + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_low(size, 0)
      + memblock_alloc_low(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_low_nopanic(size, 0)
      + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
      |
      - memblock_alloc_from_nopanic(size, 0, min_addr)
      + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
      |
      - memblock_alloc_node(size, 0, nid)
      + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
      )
      
      [mhocko@suse.com: changelog update]
      [akpm@linux-foundation.org: coding-style fixes]
      [rppt@linux.ibm.com: fix missed uses of implicit alignment]
        Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
      Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: Paul Burton <paul.burton@mips.com>	[MIPS]
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e1c4e27
    • Mike Rapoport's avatar
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport authored
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
    • Mike Rapoport's avatar
      memblock: replace BOOTMEM_ALLOC_* with MEMBLOCK variants · 97ad1087
      Mike Rapoport authored
      Drop BOOTMEM_ALLOC_ACCESSIBLE and BOOTMEM_ALLOC_ANYWHERE in favor of
      identical MEMBLOCK definitions.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-29-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97ad1087
    • Mike Rapoport's avatar
      memblock: add align parameter to memblock_alloc_node() · 3913c8f9
      Mike Rapoport authored
      With the align parameter memblock_alloc_node() can be used as drop in
      replacement for alloc_bootmem_pages_node() and __alloc_bootmem_node(),
      which is done in the following patches.
      
      Link: http://lkml.kernel.org/r/1536927045-23536-15-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3913c8f9