Skip to content
  • David Hildenbrand's avatar
    mm/memory_hotplug: fix try_offline_node() · 2c91f8fc
    David Hildenbrand authored
    try_offline_node() is pretty much broken right now:
    
     - The node span is updated when onlining memory, not when adding it. We
       ignore memory that was mever onlined. Bad.
    
     - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
       trigger a kernel panic. Bad for memory that is offline but also bad
       for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
       first PFN of a section might contain garbage.
    
     - Sections belonging to mixed nodes are not properly considered.
    
    As memory blocks might belong to multiple nodes, we would have to walk
    all pageblocks (or at least subsections) within present sections.
    However, we don't have a way to identify whether a memmap that is not
    online was initialized (relevant for ZONE_DEVICE).  This makes things
    more complicated.
    
    Luckily, we can piggy pack on the node span and the nid stored in memory
    blocks.  Currently, the node span is grown when calling
    move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
    removing memory, before calling try_offline_node().  Sysfs links are
    created via link_mem_sections(), e.g., during boot or when adding
    memory.
    
    If the node still spans memory or if any memory block belongs to the
    nid, we don't set the node offline.  As memory blocks that span multiple
    nodes cannot get offlined, the nid stored in memory blocks is reliable
    enough (for such online memory blocks, the node still spans the memory).
    
    Introduce for_each_memory_block() to efficiently walk all memory blocks.
    
    Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
    when removing ZONE_DEVICE memory to fix similar issues (access of
    garbage memmaps) - until we have a reliable way to identify whether
    these memmaps were properly initialized.  This implies later, that once
    a node had ZONE_DEVICE memory, we won't be able to set a node offline -
    which should be acceptable.
    
    Since commit f1dd2cd1 ("mm, memory_hotplug: do not associate
    hotadded memory to zones until online") memory that is added is not
    assoziated with a zone/node (memmap not initialized).  The introducing
    commit 60a5a19e ("memory-hotplug: remove sysfs file of node")
    already missed that we could have multiple nodes for a section and that
    the zone/node span is updated when onlining pages, not when adding them.
    
    I tested this by hotplugging two DIMMs to a memory-less and cpu-less
    NUMA node.  The node is properly onlined when adding the DIMMs.  When
    removing the DIMMs, the node is properly offlined.
    
    Masayoshi Mizuma reported:
    
    : Without this patch, memory hotplug fails as panic:
    :
    :  BUG: kernel NULL pointer dereference, address: 0000000000000000
    :  ...
    :  Call Trace:
    :   remove_memory_block_devices+0x81/0xc0
    :   try_remove_memory+0xb4/0x130
    :   __remove_memory+0xa/0x20
    :   acpi_memory_device_remove+0x84/0x100
    :   acpi_bus_trim+0x57/0x90
    :   acpi_bus_trim+0x2e/0x90
    :   acpi_device_hotplug+0x2b2/0x4d0
    :   acpi_hotplug_work_fn+0x1a/0x30
    :   process_one_work+0x171/0x380
    :   worker_thread+0x49/0x3f0
    :   kthread+0xf8/0x130
    :   ret_from_fork+0x35/0x40
    
    [david@redhat.com: v3]
      Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
    Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
    Fixes: 60a5a19e ("memory-hotplug: remove sysfs file of node")
    Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e8
    
    
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Tested-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
    Cc: Tang Chen <tangchen@cn.fujitsu.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: "Rafael J. Wysocki" <rafael@kernel.org>
    Cc: Keith Busch <keith.busch@intel.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
    Cc: Jani Nikula <jani.nikula@intel.com>
    Cc: Nayna Jain <nayna@linux.ibm.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    2c91f8fc