1. 29 Mar, 2020 1 commit
    • David Hildenbrand's avatar
      drivers/base/memory.c: indicate all memory blocks as removable · 53cdc1cb
      David Hildenbrand authored
      We see multiple issues with the implementation/interface to compute
      whether a memory block can be offlined (exposed via
      /sys/devices/system/memory/memoryX/removable) and would like to simplify
      it (remove the implementation).
      
      1. It runs basically lockless. While this might be good for performance,
         we see possible races with memory offlining that will require at
         least some sort of locking to fix.
      
      2. Nowadays, more false positives are possible. No arch-specific checks
         are performed that validate if memory offlining will not be denied
         right away (and such check will require locking). For example, arm64
         won't allow to offline any memory block that was added during boot -
         which will imply a very high error rate. Other archs have other
         constraints.
      
      3. The interface is inherently racy. E.g., if a memory block is detected
         to be removable (and was not a false positive at that time), there is
         still no guarantee that offlining will actually succeed. So any
         caller already has to deal with false positives.
      
      4. It is unclear which performance benefit this interface actually
         provides. The introducing commit 5c755e9f ("memory-hotplug: add
         sysfs removable attribute for hotplug memory remove") mentioned
      
      	"A user-level agent must be able to identify which sections
      	 of memory are likely to be removable before attempting the
      	 potentially expensive operation."
      
         However, no actual performance comparison was included.
      
      Known users:
      
       - lsmem: Will group memory blocks based on the "removable" property. [1]
      
       - chmem: Indirect user. It has a RANGE mode where one can specify
                removable ranges identified via lsmem to be offlined. However,
                it also has a "SIZE" mode, which allows a sysadmin to skip the
                manual "identify removable blocks" step. [2]
      
       - powerpc-utils: Uses the "removable" attribute to skip some memory
                blocks right away when trying to find some to offline+remove.
                However, with ballooning enabled, it already skips this
                information completely (because it once resulted in many false
                negatives). Therefore, the implementation can deal with false
                positives properly already. [3]
      
      According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
      driven from userspace via the drmgr command (powerpc-utils).  Nowadays
      it's managed in the kernel - including onlining/offlining of memory
      blocks - triggered by drmgr writing to /sys/kernel/dlpar.  So the
      affected legacy userspace handling is only active on old kernels.  Only
      very old versions of drmgr on a new kernel (unlikely) might execute
      slower - totally acceptable.
      
      With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
      break any user space tool.  We implement a very bad heuristic now.
      Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
      "not removable" as before.
      
      Original discussion can be found in [4] ("[PATCH RFC v1] mm:
      is_mem_section_removable() overhaul").
      
      Other users of is_mem_section_removable() will be removed next, so that
      we can remove is_mem_section_removable() completely.
      
      [1] http://man7.org/linux/man-pages/man1/lsmem.1.html
      [2] http://man7.org/linux/man-pages/man8/chmem.8.html
      [3] https://github.com/ibm-power-utilities/powerpc-utils
      [4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com
      
      Also, this patch probably fixes a crash reported by Steve.
      http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com
      
      Reported-by: default avatar"Scargall, Steve" <steve.scargall@intel.com>
      Suggested-by: default avatarMichal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarNathan Fontenot <ndfont@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Karel Zak <kzak@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53cdc1cb
  2. 04 Feb, 2020 1 commit
  3. 31 Jan, 2020 2 commits
  4. 01 Dec, 2019 2 commits
    • David Hildenbrand's avatar
      drivers/base/memory.c: drop the mem_sysfs_mutex · 848e19ad
      David Hildenbrand authored
      The mem_sysfs_mutex isn't really helpful.  Also, it's not really clear
      what the mutex protects at all.
      
      The device lists of the memory subsystem are protected separately.  We
      don't need that mutex when looking up.  creating, or removing
      independent devices.  find_memory_block_by_id() will perform locking on
      its own and grab a reference of the returned device.
      
      At the time memory_dev_init() is called, we cannot have concurrent
      hot(un)plug operations yet - we're still fairly early during boot.  We
      don't need any locking.
      
      The creation/removal of memory block devices should be protected on a
      higher level - especially using the device hotplug lock to avoid
      documented issues (see Documentation/core-api/memory-hotplug.rst) - or
      if that is reworked, using similar locking.
      
      Protecting in the context of these functions only doesn't really make
      sense.  Especially, if we would have a situation where the same memory
      blocks are created/deleted at the same time, there is something horribly
      going wrong (imagining adding/removing a DIMM at the same time from two
      call paths) - after the functions succeeded something else in the
      callers would blow up (e.g., create_memory_block_devices() succeeded but
      there are no memory block devices anymore).
      
      All relevant call paths (except when adding memory early during boot via
      ACPI, which is now documented) hold the device hotplug lock when adding
      memory, and when removing memory.  Let's document that instead.
      
      Add a simple safety net to create_memory_block_devices() in case we
      would actually remove memory blocks while adding them, so we'll never
      dereference a NULL pointer.  Simplify memory_dev_init() now that the
      lock is gone.
      
      Link: http://lkml.kernel.org/r/20190925082621.4927-1-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      848e19ad
    • Naoya Horiguchi's avatar
      mm, soft-offline: convert parameter to pfn · feec24a6
      Naoya Horiguchi authored
      Currently soft_offline_page() receives struct page, and its sibling
      memory_failure() receives pfn.  This discrepancy looks weird and makes
      precheck on pfn validity tricky.  So let's align them.
      
      Link: http://lkml.kernel.org/r/20191016234706.GA5493@www9186uo.sakura.ne.jp
      
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      feec24a6
  5. 16 Nov, 2019 1 commit
    • David Hildenbrand's avatar
      mm/memory_hotplug: fix try_offline_node() · 2c91f8fc
      David Hildenbrand authored
      try_offline_node() is pretty much broken right now:
      
       - The node span is updated when onlining memory, not when adding it. We
         ignore memory that was mever onlined. Bad.
      
       - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
         trigger a kernel panic. Bad for memory that is offline but also bad
         for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
         first PFN of a section might contain garbage.
      
       - Sections belonging to mixed nodes are not properly considered.
      
      As memory blocks might belong to multiple nodes, we would have to walk
      all pageblocks (or at least subsections) within present sections.
      However, we don't have a way to identify whether a memmap that is not
      online was initialized (relevant for ZONE_DEVICE).  This makes things
      more complicated.
      
      Luckily, we can piggy pack on the node span and the nid stored in memory
      blocks.  Currently, the node span is grown when calling
      move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
      removing memory, before calling try_offline_node().  Sysfs links are
      created via link_mem_sections(), e.g., during boot or when adding
      memory.
      
      If the node still spans memory or if any memory block belongs to the
      nid, we don't set the node offline.  As memory blocks that span multiple
      nodes cannot get offlined, the nid stored in memory blocks is reliable
      enough (for such online memory blocks, the node still spans the memory).
      
      Introduce for_each_memory_block() to efficiently walk all memory blocks.
      
      Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
      when removing ZONE_DEVICE memory to fix similar issues (access of
      garbage memmaps) - until we have a reliable way to identify whether
      these memmaps were properly initialized.  This implies later, that once
      a node had ZONE_DEVICE memory, we won't be able to set a node offline -
      which should be acceptable.
      
      Since commit f1dd2cd1 ("mm, memory_hotplug: do not associate
      hotadded memory to zones until online") memory that is added is not
      assoziated with a zone/node (memmap not initialized).  The introducing
      commit 60a5a19e ("memory-hotplug: remove sysfs file of node")
      already missed that we could have multiple nodes for a section and that
      the zone/node span is updated when onlining pages, not when adding them.
      
      I tested this by hotplugging two DIMMs to a memory-less and cpu-less
      NUMA node.  The node is properly onlined when adding the DIMMs.  When
      removing the DIMMs, the node is properly offlined.
      
      Masayoshi Mizuma reported:
      
      : Without this patch, memory hotplug fails as panic:
      :
      :  BUG: kernel NULL pointer dereference, address: 0000000000000000
      :  ...
      :  Call Trace:
      :   remove_memory_block_devices+0x81/0xc0
      :   try_remove_memory+0xb4/0x130
      :   __remove_memory+0xa/0x20
      :   acpi_memory_device_remove+0x84/0x100
      :   acpi_bus_trim+0x57/0x90
      :   acpi_bus_trim+0x2e/0x90
      :   acpi_device_hotplug+0x2b2/0x4d0
      :   acpi_hotplug_work_fn+0x1a/0x30
      :   process_one_work+0x171/0x380
      :   worker_thread+0x49/0x3f0
      :   kthread+0xf8/0x130
      :   ret_from_fork+0x35/0x40
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
      Fixes: 60a5a19e ("memory-hotplug: remove sysfs file of node")
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e8
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c91f8fc
  6. 19 Oct, 2019 1 commit
  7. 24 Sep, 2019 4 commits
  8. 19 Jul, 2019 8 commits
    • David Hildenbrand's avatar
      drivers/base/memory.c: get rid of find_memory_block_hinted() · dd625285
      David Hildenbrand authored
      No longer needed, let's remove it.  Also, drop the "hint" parameter
      completely from "find_memory_block_by_id", as nobody needs it anymore.
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-7-david@redhat.com
      [david@redhat.com: handle zero-length walks]
        Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e5007e8@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-7-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarQian Cai <cai@lca.pw>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd625285
    • David Hildenbrand's avatar
      mm/memory_hotplug: move and simplify walk_memory_blocks() · ea884641
      David Hildenbrand authored
      Let's move walk_memory_blocks() to the place where memory block logic
      resides and simplify it.  While at it, add a type for the callback
      function.
      
      Link: http://lkml.kernel.org/r/20190614100114.311-6-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea884641
    • David Hildenbrand's avatar
      drivers/base/memory: use "unsigned long" for block ids · 90ec010f
      David Hildenbrand authored
      Block ids are just shifted section numbers, so let's also use "unsigned
      long" for them, too.
      
      Link: http://lkml.kernel.org/r/20190614100114.311-3-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90ec010f
    • David Hildenbrand's avatar
      mm: section numbers use the type "unsigned long" · 2491f0a2
      David Hildenbrand authored
      Patch series "mm: Further memory block device cleanups", v1.
      
      Some further cleanups around memory block devices.  Especially, clean up
      and simplify walk_memory_range().  Including some other minor cleanups.
      
      This patch (of 6):
      
      We are using a mixture of "int" and "unsigned long".  Let's make this
      consistent by using "unsigned long" everywhere.  We'll do the same with
      memory block ids next.
      
      While at it, turn the "unsigned long i" in removable_show() into an int
      - sections_per_block is an int.
      
      [akpm@linux-foundation.org: s/unsigned long i/unsigned long nr/]
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-2-david@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-2-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2491f0a2
    • David Hildenbrand's avatar
      mm/memory_hotplug: remove memory block devices before arch_remove_memory() · 4c4b7f9b
      David Hildenbrand authored
      Let's factor out removing of memory block devices, which is only
      necessary for memory added via add_memory() and friends that created
      memory block devices.  Remove the devices before calling
      arch_remove_memory().
      
      This finishes factoring out memory block device handling from
      arch_add_memory() and arch_remove_memory().
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c4b7f9b
    • David Hildenbrand's avatar
      mm/memory_hotplug: create memory block devices after arch_add_memory() · db051a0d
      David Hildenbrand authored
      Only memory to be added to the buddy and to be onlined/offlined by user
      space using /sys/devices/system/memory/...  needs (and should have!)
      memory block devices.
      
      Factor out creation of memory block devices.  Create all devices after
      arch_add_memory() succeeded.  We can later drop the want_memblock
      parameter, because it is now effectively stale.
      
      Only after memory block devices have been added, memory can be onlined
      by user space.  This implies, that memory is not visible to user space
      at all before arch_add_memory() succeeded.
      
      While at it
       - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
       - introduce find_memory_block_by_id() to search via block id
       - Use find_memory_block_by_id() in init_memory_block() to catch
         duplicates
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-8-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db051a0d
    • David Hildenbrand's avatar
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand authored
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d
    • David Hildenbrand's avatar
      drivers/base/memory: pass a block_id to init_memory_block() · 18115825
      David Hildenbrand authored
      We'll rework hotplug_memory_register() shortly, so it no longer consumes
      pass a section.
      
      [cai@lca.pw: fix a compilation warning]
        Link: http://lkml.kernel.org/r/1559320186-28337-1-git-send-email-cai@lca.pw
      Link: http://lkml.kernel.org/r/20190527111152.16324-6-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18115825
  9. 14 May, 2019 2 commits
  10. 25 Apr, 2019 1 commit
  11. 19 Apr, 2019 1 commit
  12. 28 Feb, 2019 1 commit
    • Dave Hansen's avatar
      device-dax: "Hotplug" persistent memory for use like normal RAM · c221c0b0
      Dave Hansen authored
      
      
      This is intended for use with NVDIMMs that are physically persistent
      (physically like flash) so that they can be used as a cost-effective
      RAM replacement.  Intel Optane DC persistent memory is one
      implementation of this kind of NVDIMM.
      
      Currently, a persistent memory region is "owned" by a device driver,
      either the "Direct DAX" or "Filesystem DAX" drivers.  These drivers
      allow applications to explicitly use persistent memory, generally
      by being modified to use special, new libraries. (DIMM-based
      persistent memory hardware/software is described in great detail
      here: Documentation/nvdimm/nvdimm.txt).
      
      However, this limits persistent memory use to applications which
      *have* been modified.  To make it more broadly usable, this driver
      "hotplugs" memory into the kernel, to be managed and used just like
      normal RAM would be.
      
      To make this work, management software must remove the device from
      being controlled by the "Device DAX" infrastructure:
      
      	echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
      
      and then tell the new driver that it can bind to the device:
      
      	echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
      
      After this, there will be a number of new memory sections visible
      in sysfs that can be onlined, or that may get onlined by existing
      udev-initiated memory hotplug rules.
      
      This rebinding procedure is currently a one-way trip.  Once memory
      is bound to "kmem", it's there permanently and can not be
      unbound and assigned back to device_dax.
      
      The kmem driver will never bind to a dax device unless the device
      is *explicitly* bound to the driver.  There are two reasons for
      this: One, since it is a one-way trip, it can not be undone if
      bound incorrectly.  Two, the kmem driver destroys data on the
      device.  Think of if you had good data on a pmem device.  It
      would be catastrophic if you compile-in "kmem", but leave out
      the "device_dax" driver.  kmem would take over the device and
      write volatile data all over your good data.
      
      This inherits any existing NUMA information for the newly-added
      memory from the persistent memory device that came from the
      firmware.  On Intel platforms, the firmware has guarantees that
      require each socket's persistent memory to be in a separate
      memory-only NUMA node.  That means that this patch is not expected
      to create NUMA nodes, but will simply hotplug memory into existing
      nodes.
      
      Because NUMA nodes are created, the existing NUMA APIs and tools
      are sufficient to create policies for applications or memory areas
      to have affinity for or an aversion to using this memory.
      
      There is currently some metadata at the beginning of pmem regions.
      The section-size memory hotplug restrictions, plus this small
      reserved area can cause the "loss" of a section or two of capacity.
      This should be fixable in follow-on patches.  But, as a first step,
      losing 256MB of memory (worst case) out of hundreds of gigabytes
      is a good tradeoff vs. the required code to fix this up precisely.
      This calculation is also the reason we export
      memory_block_size_bytes().
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: linux-nvdimm@lists.01.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      c221c0b0
  13. 28 Dec, 2018 2 commits
  14. 20 Dec, 2018 1 commit
  15. 06 Dec, 2018 1 commit
  16. 31 Oct, 2018 2 commits
    • David Hildenbrand's avatar
      mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock · 381eab4a
      David Hildenbrand authored
      There seem to be some problems as result of 30467e0b ("mm, hotplug:
      fix concurrent memory hot-add deadlock"), which tried to fix a possible
      lock inversion reported and discussed in [1] due to the two locks
      	a) device_lock()
      	b) mem_hotplug_lock
      
      While add_memory() first takes b), followed by a) during
      bus_probe_device(), onlining of memory from user space first took a),
      followed by b), exposing a possible deadlock.
      
      In [1], and it was decided to not make use of device_hotplug_lock, but
      rather to enforce a locking order.
      
      The problems I spotted related to this:
      
      1. Memory block device attributes: While .state first calls
         mem_hotplug_begin() and the calls device_online() - which takes
         device_lock() - .online does no longer call mem_hotplug_begin(), so
         effectively calls online_pages() without mem_hotplug_lock.
      
      2. device_online() should be called under device_hotplug_lock, however
         onlining memory during add_memory() does not take care of that.
      
      In addition, I think there is also something wrong about the locking in
      
      3. arch/powerpc/platforms/powernv/memtrace.c calls offline_pages()
         without locks. This was introduced after 30467e0b. And skimming over
         the code, I assume it could need some more care in regards to locking
         (e.g. device_online() called without device_hotplug_lock. This will
         be addressed in the following patches.
      
      Now that we hold the device_hotplug_lock when
      - adding memory (e.g. via add_memory()/add_memory_resource())
      - removing memory (e.g. via remove_memory())
      - device_online()/device_offline()
      
      We can move mem_hotplug_lock usage back into
      online_pages()/offline_pages().
      
      Why is mem_hotplug_lock still needed? Essentially to make
      get_online_mems()/put_online_mems() be very fast (relying on
      device_hotplug_lock would be very slow), and to serialize against
      addition of memory that does not create memory block devices (hmm).
      
      [1] http://driverdev.linuxdriverproject.org/pipermail/ driverdev-devel/
          2015-February/065324.html
      
      This patch is partly based on a patch by Vitaly Kuznetsov.
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-4-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      381eab4a
    • David Hildenbrand's avatar
      mm/memory_hotplug: make add_memory() take the device_hotplug_lock · 8df1d0e4
      David Hildenbrand authored
      add_memory() currently does not take the device_hotplug_lock, however
      is aleady called under the lock from
      	arch/powerpc/platforms/pseries/hotplug-memory.c
      	drivers/acpi/acpi_memhotplug.c
      to synchronize against CPU hot-remove and similar.
      
      In general, we should hold the device_hotplug_lock when adding memory to
      synchronize against online/offline request (e.g.  from user space) - which
      already resulted in lock inversions due to device_lock() and
      mem_hotplug_lock - see 30467e0b ("mm, hotplug: fix concurrent memory
      hot-add deadlock").  add_memory()/add_memory_resource() will create memory
      block devices, so this really feels like the right thing to do.
      
      Holding the device_hotplug_lock makes sure that a memory block device
      can really only be accessed (e.g. via .online/.state) from user space,
      once the memory has been fully added to the system.
      
      The lock is not held yet in
      	drivers/xen/balloon.c
      	arch/powerpc/platforms/powernv/memtrace.c
      	drivers/s390/char/sclp_cmd.c
      	drivers/hv/hv_balloon.c
      So, let's either use the locked variants or take the lock.
      
      Don't export add_memory_resource(), as it once was exported to be used by
      XEN, which is never built as a module.  If somebody requires it, we also
      have to export a locked variant (as device_hotplug_lock is never
      exported).
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8df1d0e4
  17. 04 Sep, 2018 1 commit
    • Mikhail Zaslonko's avatar
      memory_hotplug: fix kernel_panic on offline page processing · 4e8346d0
      Mikhail Zaslonko authored
      Within show_valid_zones() the function test_pages_in_a_zone() should be
      called for online memory blocks only.
      
      Otherwise it might lead to the VM_BUG_ON due to uninitialized struct
      pages (when CONFIG_DEBUG_VM_PGFLAGS kernel option is set):
      
       page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
       ------------[ cut here ]------------
       Call Trace:
       ([<000000000038f91e>] test_pages_in_a_zone+0xe6/0x168)
        [<0000000000923472>] show_valid_zones+0x5a/0x1a8
        [<0000000000900284>] dev_attr_show+0x3c/0x78
        [<000000000046f6f0>] sysfs_kf_seq_show+0xd0/0x150
        [<00000000003ef662>] seq_read+0x212/0x4b8
        [<00000000003bf202>] __vfs_read+0x3a/0x178
        [<00000000003bf3ca>] vfs_read+0x8a/0x148
        [<00000000003bfa3a>] ksys_read+0x62/0xb8
        [<0000000000bc2220>] system_call+0xdc/0x2d8
      
      That VM_BUG_ON was triggered by the page poisoning introduced in
      mm/sparse.c with the git commit d0dc12e8 ("mm/memory_hotplug:
      optimize memory hotplug").
      
      With the same commit the new 'nid' field has been added to the struct
      memory_block in order to store and later on derive the node id for
      offline pages (instead of accessing struct page which might be
      uninitialized).  But one reference to nid in show_valid_zones() function
      has been overlooked.  Fixed with current commit.  Also, nr_pages will
      not be used any more after test_pages_in_a_zone() call, do not update
      it.
      
      Link: http://lkml.kernel.org/r/20180828090539.41491-1-zaslonko@linux.ibm.com
      Fixes: d0dc12e8
      
       ("mm/memory_hotplug: optimize memory hotplug")
      Signed-off-by: default avatarMikhail Zaslonko <zaslonko@linux.ibm.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: <stable@vger.kernel.org>	[4.17+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e8346d0
  18. 17 Aug, 2018 1 commit
  19. 14 May, 2018 1 commit
  20. 11 Apr, 2018 1 commit
  21. 06 Apr, 2018 2 commits
  22. 23 Jan, 2018 1 commit
  23. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      
      
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  24. 07 Sep, 2017 1 commit
    • Michal Hocko's avatar
      mm, memory_hotplug: remove zone restrictions · c6f03e29
      Michal Hocko authored
      Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
      to precede the Movable zone in the physical memory range.  The purpose
      of the movable zone is, however, not bound to any physical memory
      restriction.  It merely defines a class of migrateable and reclaimable
      memory.
      
      There are users (e.g.  CMA) who might want to reserve specific physical
      memory ranges for their own purpose.  Moreover our pfn walkers have to
      be prepared for zones overlapping in the physical range already because
      we do support interleaving NUMA nodes and therefore zones can interleave
      as well.  This means we can allow each memory block to be associated
      with a different zone.
      
      Loosen the current onlining semantic and allow explicit onlining type on
      any memblock.  That means that online_{kernel,movable} will be allowed
      regardless of the physical address of the memblock as long as it is
      offline of course.  This might result in moveble zone overlapping with
      other kernel zones.  Default onlining then becomes a bit tricky but
      still sensible.  echo online > memoryXY/state will online the given
      block to
      
      	1) the default zone if the given range is outside of any zone
      	2) the enclosing zone if such a zone doesn't interleave with
      	   any other zone
              3) the default zone if more zones interleave for this range
      
      where default zone is movable zone only if movable_node is enabled
      otherwise it is a kernel zone.
      
      Here is an example of the semantic with (movable_node is not present but
      it work in an analogous way). We start with following memblocks, all of
      them offline:
      
        memory34/valid_zones:Normal Movable
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Normal Movable
        memory38/valid_zones:Normal Movable
        memory39/valid_zones:Normal Movable
        memory40/valid_zones:Normal Movable
        memory41/valid_zones:Normal Movable
      
      Now, we online block 34 in default mode and block 37 as movable
      
        root@test1:/sys/devices/system/node/node1# echo online > memory34/state
        root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Movable
        memory38/valid_zones:Normal Movable
        memory39/valid_zones:Normal Movable
        memory40/valid_zones:Normal Movable
        memory41/valid_zones:Normal Movable
      
      As we can see all other blocks can still be onlined both into Normal and
      Movable zones and the Normal is default because the Movable zone spans
      only block37 now.
      
        root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Movable
        memory38/valid_zones:Movable Normal
        memory39/valid_zones:Movable Normal
        memory40/valid_zones:Movable Normal
        memory41/valid_zones:Movable
      
      Now the default zone for blocks 37-41 has changed because movable zone
      spans that range.
      
        root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Movable
        memory38/valid_zones:Normal Movable
        memory39/valid_zones:Normal
        memory40/valid_zones:Movable Normal
        memory41/valid_zones:Movable
      
      Note that the block 39 now belongs to the zone Normal and so block38
      falls into Normal by default as well.
      
      For completness
      
        root@test1:/sys/devices/system/node/node1# for i in memory[34]?
        do
      	echo online > $i/state 2>/dev/null
        done
      
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal
        memory36/valid_zones:Normal
        memory37/valid_zones:Movable
        memory38/valid_zones:Normal
        memory39/valid_zones:Normal
        memory40/valid_zones:Movable
        memory41/valid_zones:Movable
      
      Implementation wise the change is quite straightforward.  We can get rid
      of allow_online_pfn_range altogether.  online_pages allows only offline
      nodes already.  The original default_zone_for_pfn will become
      default_kernel_zone_for_pfn.  New default_zone_for_pfn implements the
      above semantic.  zone_for_pfn_range is slightly reorganized to implement
      kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes a
      catch all default behavior.
      
      Link: http://lkml.kernel.org/r/20170714121233.16861-3-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarReza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Kani Toshimitsu <toshi.kani@hpe.com>
      Cc: <slaoub@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6f03e29