Skip to content
  • Andrea Arcangeli's avatar
    mmu-notifiers: add mm_take_all_locks() operation · 7906d00c
    Andrea Arcangeli authored
    
    
    mm_take_all_locks holds off reclaim from an entire mm_struct.  This allows
    mmu notifiers to register into the mm at any time with the guarantee that
    no mmu operation is in progress on the mm.
    
    This operation locks against the VM for all pte/vma/mm related operations
    that could ever happen on a certain mm.  This includes vmtruncate,
    try_to_unmap, and all page faults.
    
    The caller must take the mmap_sem in write mode before calling
    mm_take_all_locks().  The caller isn't allowed to release the mmap_sem
    until mm_drop_all_locks() returns.
    
    mmap_sem in write mode is required in order to block all operations that
    could modify pagetables and free pages without need of altering the vma
    layout (for example populate_range() with nonlinear vmas).  It's also
    needed in write mode to avoid new anon_vmas to be associated with existing
    vmas.
    
    A single task can't take more than one mm_take_all_locks() in a row or it
    would deadlock.
    
    mm_take_all_locks() and mm_drop_all_locks are expensive operations that
    may have to take thousand of locks.
    
    mm_take_all_locks() can fail if it's interrupted by signals.
    
    When mmu_notifier_register returns, we must be sure that the driver is
    notified if some task is in the middle of a vmtruncate for the 'mm' where
    the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
    is run around the vmtruncation but mmu_notifier_register can run after
    mmu_notifier_invalidate_range_start and before
    mmu_notifier_invalidate_range_end).  Same problem for rmap paths.  And
    we've to remove page pinning to avoid replicating the tlb_gather logic
    inside KVM (and GRU doesn't work well with page pinning regardless of
    needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
    the page, kvm would have no way to notice that it mapped into sptes a page
    that is going into the freelist without a chance of any further
    mmu_notifier notification.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: default avatarAndrea Arcangeli <andrea@qumranet.com>
    Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Cc: Christoph Lameter <cl@linux-foundation.org>
    Cc: Jack Steiner <steiner@sgi.com>
    Cc: Robin Holt <holt@sgi.com>
    Cc: Nick Piggin <npiggin@suse.de>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
    Cc: Roland Dreier <rdreier@cisco.com>
    Cc: Steve Wise <swise@opengridcomputing.com>
    Cc: Avi Kivity <avi@qumranet.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Cc: Rusty Russell <rusty@rustcorp.com.au>
    Cc: Anthony Liguori <aliguori@us.ibm.com>
    Cc: Chris Wright <chrisw@redhat.com>
    Cc: Marcelo Tosatti <marcelo@kvack.org>
    Cc: Eric Dumazet <dada1@cosmosbay.com>
    Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
    Cc: Izik Eidus <izike@qumranet.com>
    Cc: Anthony Liguori <aliguori@us.ibm.com>
    Cc: Rik van Riel <riel@redhat.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7906d00c