Skip to content
  • John Hubbard's avatar
    mm/gup: split get_user_pages_remote() into two routines · 22bf29b6
    John Hubbard authored
    Patch series "mm/gup: track FOLL_PIN pages", v6.
    
    This activates tracking of FOLL_PIN pages.  This is in support of fixing
    the get_user_pages()+DMA problem described in [1]-[4].
    
    FOLL_PIN support is now in the main linux tree.  However, the patch to use
    FOLL_PIN to track pages was *not* submitted, because Leon saw an RDMA test
    suite failure that involved (I think) page refcount overflows when huge
    pages were used.
    
    This patch definitively solves that kind of overflow problem, by adding an
    exact pincount, for compound pages (of order > 1), in the 3rd struct page
    of a compound page.  If available, that form of pincounting is used,
    instead of the GUP_PIN_COUNTING_BIAS approach.  Thanks again to Jan Kara
    for that idea.
    
    Other interesting changes:
    
    * dump_page(): added one, or two new things to report for compound
      pages: head refcount (for all compound pages), and map_pincount (for
      compound pages of order > 1).
    
    * Documentation/core-api/pin_user_pages.rst: removed the "TODO" for the
      huge page refcount upper limit problems, and added notes about how it
      works now.  Also added a note about the dump_page() enhancements.
    
    * Added some comments in gup.c and mm.h, to explain that there are two
      ways to count pinned pages: exact (for compound pages of order > 1) and
      fuzzy (GUP_PIN_COUNTING_BIAS: for all other pages).
    
    ============================================================
    General notes about the tracking patch:
    
    This is a prerequisite to solving the problem of proper interactions
    between file-backed pages, and [R]DMA activities, as discussed in [1],
    [2], [3], [4] and in a remarkable number of email threads since about
    2017.  :)
    
    In contrast to earlier approaches, the page tracking can be incrementally
    applied to the kernel call sites that, until now, have been simply calling
    get_user_pages() ("gup").  In other words, opt-in by changing from this:
    
        get_user_pages() (sets FOLL_GET)
        put_page()
    
    to this:
        pin_user_pages() (sets FOLL_PIN)
        unpin_user_page()
    
    ============================================================
    Future steps:
    
    * Convert more subsystems from get_user_pages() to pin_user_pages().
      The first probably needs to be bio/biovecs, because any filesystem
      testing is too difficult without those in place.
    
    * Change VFS and filesystems to respond appropriately when encountering
      dma-pinned pages.
    
    * Work with Ira and others to connect this all up with file system
      leases.
    
    [1] Some slow progress on get_user_pages() (Apr 2, 2019):
        https://lwn.net/Articles/784574/
    
    [2] DMA and get_user_pages() (LPC: Dec 12, 2018):
        https://lwn.net/Articles/774411/
    
    [3] The trouble with get_user_pages() (Apr 30, 2018):
        https://lwn.net/Articles/753027/
    
    [4] LWN kernel index: get_user_pages()
        https://lwn.net/Kernel/Index/#Memory_management-get_user_pages
    
    
    
    This patch (of 12):
    
    An upcoming patch requires reusing the implementation of
    get_user_pages_remote().  Split up get_user_pages_remote() into an outer
    routine that checks flags, and an implementation routine that will be
    reused.  This makes subsequent changes much easier to understand.
    
    There should be no change in behavior due to this patch.
    
    Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Ira Weiny <ira.weiny@intel.com>
    Cc: Jérôme Glisse <jglisse@redhat.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Link: http://lkml.kernel.org/r/20200211001536.1027652-2-jhubbard@nvidia.com
    
    
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    22bf29b6