Skip to content
  • Waiman Long's avatar
    hugetlbfs: take read_lock on i_mmap for PMD sharing · 930668c3
    Waiman Long authored
    A customer with large SMP systems (up to 16 sockets) with application
    that uses large amount of static hugepages (~500-1500GB) are
    experiencing random multisecond delays.  These delays were caused by the
    long time it took to scan the VMA interval tree with mmap_sem held.
    
    The sharing of huge PMD does not require changes to the i_mmap at all.
    Therefore, we can just take the read lock and let other threads
    searching for the right VMA share it in parallel.  Once the right VMA is
    found, either the PMD lock (2M huge page for x86-64) or the
    mm->page_table_lock will be acquired to perform the actual PMD sharing.
    
    Lock contention, if present, will happen in the spinlock.  That is much
    better than contention in the rwsem where the time needed to scan the
    the interval tree is indeterminate.
    
    With this patch applied, the customer is seeing significant performance
    improvement over the unpatched kernel.
    
    Link: http://lkml.kernel.org/r/20191107211809.9539-1-longman@redhat.com
    
    
    Signed-off-by: default avatarWaiman Long <longman@redhat.com>
    Suggested-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    930668c3