• Andrea Arcangeli's avatar
    mm: thp: calculate the mapcount correctly for THP pages during WP faults · 6d0a07ed
    Andrea Arcangeli authored
    This will provide fully accuracy to the mapcount calculation in the
    write protect faults, so page pinning will not get broken by false
    positive copy-on-writes.
    
    total_mapcount() isn't the right calculation needed in
    reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
    that is effectively the full accurate return value for page_mapcount()
    if dealing with Transparent Hugepages, however we only use the
    page_trans_huge_mapcount() during COW faults where it strictly needed,
    due to its higher runtime cost.
    
    This also provide at practical zero cost the total_mapcount
    information which is needed to know if we can still relocate the page
    anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
    can reuse the page no matter if it's a pte or a pmd_trans_huge
    triggering the fault, but we can only relocate the page anon_vma to
    the local vma->anon_vma if we're sure it's only this "vma" mapping the
    whole THP physical range.
    
    Kirill A. Shutemov discovered the problem with moving the page
    anon_vma to the local vma->anon_vma in a previous version of this
    patch and another problem in the way page_move_anon_rmap() was called.
    
    Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
    previous version, because reuse_swap_page must be a macro to call
    page_trans_huge_mapcount from swap.h, so this uses a macro again
    instead of an inline function. With this change at least it's a less
    dangerous usage than it was before, because "page" is used only once
    now, while with the previous code reuse_swap_page(page++) would have
    called page_mapcount on page+1 and it would have increased page twice
    instead of just once.
    
    Dean Luick noticed an uninitialized variable that could result in a
    rmap inefficiency for the non-THP case in a previous version.
    
    Mike Marciniszyn said:
    
    : Our RDMA tests are seeing an issue with memory locking that bisects to
    : commit 61f5d698 ("mm: re-enable THP")
    :
    : The test program registers two rather large MRs (512M) and RDMA
    : writes data to a passive peer using the first and RDMA reads it back
    : into the second MR and compares that data.  The sizes are chosen randomly
    : between 0 and 1024 bytes.
    :
    : The test will get through a few (<= 4 iterations) and then gets a
    : compare error.
    :
    : Tracing indicates the kernel logical addresses associated with the individual
    : pages at registration ARE correct , the data in the "RDMA read response only"
    : packets ARE correct.
    :
    : The "corruption" occurs when the packet crosse two pages that are not physically
    : contiguous.   The second page reads back as zero in the program.
    :
    : It looks like the user VA at the point of the compare error no longer points to
    : the same physical address as was registered.
    :
    : This patch totally resolves the issue!
    
    Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Reviewed-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
    Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
    Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
    Tested-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
    Tested-by: default avatarJosh Collier <josh.d.collier@intel.com>
    Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
    Cc: <stable@vger.kernel.org>	[4.5]
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    6d0a07ed
swap.h 18.2 KB