Skip to content
  • Michal Hocko's avatar
    hugetlb, memory_hotplug: prefer to use reserved pages for migration · 4db9b2ef
    Michal Hocko authored
    new_node_page will try to use the origin's next NUMA node as the
    migration destination for hugetlb pages.  If such a node doesn't have
    any preallocated pool it falls back to __alloc_buddy_huge_page_no_mpol
    to allocate a surplus page instead.  This is quite subotpimal for any
    configuration when hugetlb pages are no distributed to all NUMA nodes
    evenly.  Say we have a hotplugable node 4 and spare hugetlb pages are
    node 0
    
      /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:10000
      /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
      /sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages:0
      /sys/devices/system/node/node3/hugepages/hugepages-2048kB/nr_hugepages:0
      /sys/devices/system/node/node4/hugepages/hugepages-2048kB/nr_hugepages:10000
      /sys/devices/system/node/node5/hugepages/hugepages-2048kB/nr_hugepages:0
      /sys/devices/system/node/node6/hugepages/hugepages-2048kB/nr_hugepages:0
      /sys/devices/system/node/node7/hugepages/hugepages-2048kB/nr_hugepages:0
    
    Now we consume the whole pool on node 4 and try to offline this node.
    All the allocated pages should be moved to node0 which has enough
    preallocated pages to hold them.  With the current implementation
    offlining very likely fails because hugetlb allocations during runtime
    are much less reliable.
    
    Fix this by reusing the nodemask which excludes migration source and try
    to find a first node which has a page in the preallocated pool first and
    fall back to __alloc_buddy_huge_page_no_mpol only when the whole pool is
    consumed.
    
    [akpm@linux-foundation.org: remove bogus arg from alloc_huge_page_nodemask() stub]
    Link: http://lkml.kernel.org/r/20170608074553.22152-3-mhocko@kernel.org
    
    
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Xishi Qiu <qiuxishi@huawei.com>
    Cc: zhong jiang <zhongjiang@huawei.com>
    Cc: Joonsoo Kim <js1304@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    4db9b2ef