• Andrea Arcangeli's avatar
    mm: gup: add get_user_pages_locked and get_user_pages_unlocked · f0818f47
    Andrea Arcangeli authored
    FAULT_FOLL_ALLOW_RETRY allows the page fault to drop the mmap_sem for
    reading to reduce the mmap_sem contention (for writing), like while
    waiting for I/O completion.  The problem is that right now practically no
    get_user_pages call uses FAULT_FOLL_ALLOW_RETRY, so we're not leveraging
    that nifty feature.
    
    Andres fixed it for the KVM page fault.  However get_user_pages_fast
    remains uncovered, and 99% of other get_user_pages aren't using it either
    (the only exception being FOLL_NOWAIT in KVM which is really nonblocking
    and in fact it doesn't even release the mmap_sem).
    
    So this patchsets extends the optimization Andres did in the KVM page
    fault to the whole kernel.  It makes most important places (including
    gup_fast) to use FAULT_FOLL_ALLOW_RETRY to reduce the mmap_sem hold times
    during I/O.
    
    The only few places that remains uncovered are drivers like v4l and other
    exceptions that tends to work on their own memory and they're not working
    on random user memory (for example like O_DIRECT that uses gup_fast and is
    fully covered by this patch).
    
    A follow up patch should probably also add a printk_once warning to
    get_user_pages that should go obsolete and be phased out eventually.  The
    "vmas" parameter of get_user_pages makes it fundamentally incompatible
    with FAULT_FOLL_ALLOW_RETRY (vmas array becomes meaningless the moment the
    mmap_sem is released).
    
    While this is just an optimization, this becomes an absolute requirement
    for the userfaultfd feature http://lwn.net/Articles/615086/
    
     .
    
    The userfaultfd allows to block the page fault, and in order to do so I
    need to drop the mmap_sem first.  So this patch also ensures that all
    memory where userfaultfd could be registered by KVM, the very first fault
    (no matter if it is a regular page fault, or a get_user_pages) always has
    FAULT_FOLL_ALLOW_RETRY set.  Then the userfaultfd blocks and it is waken
    only when the pagetable is already mapped.  The second fault attempt after
    the wakeup doesn't need FAULT_FOLL_ALLOW_RETRY, so it's ok to retry
    without it.
    
    This patch (of 5):
    
    We can leverage the VM_FAULT_RETRY functionality in the page fault paths
    better by using either get_user_pages_locked or get_user_pages_unlocked.
    
    The former allows conversion of get_user_pages invocations that will have
    to pass a "&locked" parameter to know if the mmap_sem was dropped during
    the call.  Example from:
    
        down_read(&mm->mmap_sem);
        do_something()
        get_user_pages(tsk, mm, ..., pages, NULL);
        up_read(&mm->mmap_sem);
    
    to:
    
        int locked = 1;
        down_read(&mm->mmap_sem);
        do_something()
        get_user_pages_locked(tsk, mm, ..., pages, &locked);
        if (locked)
            up_read(&mm->mmap_sem);
    
    The latter is suitable only as a drop in replacement of the form:
    
        down_read(&mm->mmap_sem);
        get_user_pages(tsk, mm, ..., pages, NULL);
        up_read(&mm->mmap_sem);
    
    into:
    
        get_user_pages_unlocked(tsk, mm, ..., pages);
    
    Where tsk, mm, the intermediate "..." paramters and "pages" can be any
    value as before.  Just the last parameter of get_user_pages (vmas) must be
    NULL for get_user_pages_locked|unlocked to be usable (the latter original
    form wouldn't have been safe anyway if vmas wasn't null, for the former we
    just make it explicit by dropping the parameter).
    
    If vmas is not NULL these two methods cannot be used.
    Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
    Reviewed-by: default avatarAndres Lagar-Cavilla <andreslc@google.com>
    Reviewed-by: default avatarPeter Feiner <pfeiner@google.com>
    Reviewed-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    f0818f47
gup.c 34.8 KB