Skip to content
  • Huang Ying's avatar
    mm, THP, swap: support to clear swap cache flag for THP swapped out · a3aea839
    Huang Ying authored
    Patch series "mm, THP, swap: Delay splitting THP after swapped out", v3.
    
    This is the second step of THP (Transparent Huge Page) swap
    optimization.  In the first step, the splitting huge page is delayed
    from almost the first step of swapping out to after allocating the swap
    space for the THP and adding the THP into the swap cache.  In the second
    step, the splitting is delayed further to after the swapping out
    finished.  The plan is to delay splitting THP step by step, finally
    avoid splitting THP for the THP swapping out and swap out/in the THP as
    a whole.
    
    In the patchset, more operations for the anonymous THP reclaiming, such
    as TLB flushing, writing the THP to the swap device, removing the THP
    from the swap cache are batched.  So that the performance of anonymous
    THP swapping out are improved.
    
    During the development, the following scenarios/code paths have been
    checked,
    
     - swap out/in
     - swap off
     - write protect page fault
     - madvise_free
     - process exit
     - split huge page
    
    With the patchset, the swap out throughput improves 42% (from about
    5.81GB/s to about 8.25GB/s) in the vm-scalability swap-w-seq test case
    with 16 processes.  At the same time, the IPI (reflect TLB flushing)
    reduced about 78.9%.  The test is done on a Xeon E5 v3 system.  The swap
    device used is a RAM simulated PMEM (persistent memory) device.  To test
    the sequential swapping out, the test case creates 8 processes, which
    sequentially allocate and write to the anonymous pages until the RAM and
    part of the swap device is used up.
    
    Below is the part of the cover letter for the first step patchset of THP
    swap optimization which applies to all steps.
    
    =========================
    
    Recently, the performance of the storage devices improved so fast that
    we cannot saturate the disk bandwidth with single logical CPU when do
    page swap out even on a high-end server machine.  Because the
    performance of the storage device improved faster than that of single
    logical CPU.  And it seems that the trend will not change in the near
    future.  On the other hand, the THP becomes more and more popular
    because of increased memory size.  So it becomes necessary to optimize
    THP swap performance.
    
    The advantages of the THP swap support include:
    
     - Batch the swap operations for the THP to reduce TLB flushing and lock
       acquiring/releasing, including allocating/freeing the swap space,
       adding/deleting to/from the swap cache, and writing/reading the swap
       space, etc. This will help improve the performance of the THP swap.
    
     - The THP swap space read/write will be 2M sequential IO. It is
       particularly helpful for the swap read, which are usually 4k random
       IO. This will improve the performance of the THP swap too.
    
     - It will help the memory fragmentation, especially when the THP is
       heavily used by the applications. The 2M continuous pages will be
       free up after THP swapping out.
    
     - It will improve the THP utilization on the system with the swap
       turned on. Because the speed for khugepaged to collapse the normal
       pages into the THP is quite slow. After the THP is split during the
       swapping out, it will take quite long time for the normal pages to
       collapse back into the THP after being swapped in. The high THP
       utilization helps the efficiency of the page based memory management
       too.
    
    There are some concerns regarding THP swap in, mainly because possible
    enlarged read/write IO size (for swap in/out) may put more overhead on
    the storage device.  To deal with that, the THP swap in should be turned
    on only when necessary.
    
    For example, it can be selected via "always/never/madvise" logic, to be
    turned on globally, turned off globally, or turned on only for VMA with
    MADV_HUGEPAGE, etc.
    
    This patch (of 12):
    
    Previously, swapcache_free_cluster() is used only in the error path of
    shrink_page_list() to free the swap cluster just allocated if the THP
    (Transparent Huge Page) is failed to be split.  In this patch, it is
    enhanced to clear the swap cache flag (SWAP_HAS_CACHE) for the swap
    cluster that holds the contents of THP swapped out.
    
    This will be used in delaying splitting THP after swapping out support.
    Because there is no THP swapping in as a whole support yet, after
    clearing the swap cache flag, the swap cluster backing the THP swapped
    out will be split.  So that the swap slots in the swap cluster can be
    swapped in as normal pages later.
    
    Link: http://lkml.kernel.org/r/20170724051840.2309-2-ying.huang@intel.com
    
    
    Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
    Acked-by: default avatarRik van Riel <riel@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Shaohua Li <shli@kernel.org>
    Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
    Cc: Vishal L Verma <vishal.l.verma@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    a3aea839