• Michal Hocko's avatar
    mm, hugetlb: do not rely on overcommit limit during migration · ab5ac90a
    Michal Hocko authored
    hugepage migration relies on __alloc_buddy_huge_page to get a new page.
    This has 2 main disadvantages.
    
    1) it doesn't allow to migrate any huge page if the pool is used
       completely which is not an exceptional case as the pool is static and
       unused memory is just wasted.
    
    2) it leads to a weird semantic when migration between two numa nodes
       might increase the pool size of the destination NUMA node while the
       page is in use.  The issue is caused by per NUMA node surplus pages
       tracking (see free_huge_page).
    
    Address both issues by changing the way how we allocate and account
    pages allocated for migration.  Those should temporal by definition.  So
    we mark them that way (we will abuse page flags in the 3rd page) and
    update free_huge_page to free such pages to the page allocator.  Page
    migration path then just transfers the temporal status from the new page
    to the old one which will be freed on the last reference.  The global
    surplus count will never change during this path but we still have to be
    careful when migrating a per-node suprlus page.  This is now handled in
    move_hugetlb_state which is called from the migration path and it copies
    the hugetlb specific page state and fixes up the accounting when needed
    
    Rename __alloc_buddy_huge_page to __alloc_surplus_huge_page to better
    reflect its purpose.  The new allocation routine for the migration path
    is __alloc_migrate_huge_page.
    
    The user visible effect of this patch is that migrated pages are really
    temporal and they travel between NUMA nodes as per the migration
    request:
    
    Before migration
      /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
      /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:1
      /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
      /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
      /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:0
      /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
    
    After
      /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0
      /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages:0
      /sys/devices/system/node/node0/hugepages/hugepages-2048kB/surplus_hugepages:0
      /sys/devices/system/node/node1/hugepages/hugepages-2048kB/free_hugepages:0
      /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages:1
      /sys/devices/system/node/node1/hugepages/hugepages-2048kB/surplus_hugepages:0
    
    with the previous implementation, both nodes would have nr_hugepages:1
    until the page is freed.
    
    Link: http://lkml.kernel.org/r/20180103093213.26329-4-mhocko@kernel.org
    
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
    Cc: Andrea Reale <ar@linux.vnet.ibm.com>
    Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Zi Yan <zi.yan@cs.rutgers.edu>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    ab5ac90a
hugetlb.h 16.9 KB