1. 05 Jul, 2019 1 commit
  2. 21 May, 2019 1 commit
  3. 14 May, 2019 2 commits
    • Mike Kravetz's avatar
      hugetlbfs: always use address space in inode for resv_map pointer · f27a5136
      Mike Kravetz authored
      Continuing discussion about 58b6e5e8 ("hugetlbfs: fix memory leak for
      resv_map") brought up the issue that inode->i_mapping may not point to the
      address space embedded within the inode at inode eviction time.  The
      hugetlbfs truncate routine handles this by explicitly using inode->i_data.
      However, code cleaning up the resv_map will still use the address space
      pointed to by inode->i_mapping.  Luckily, private_data is NULL for address
      spaces in all such cases today but, there is no guarantee this will
      Change all hugetlbfs code getting a resv_map pointer to explicitly get it
      from the address space embedded within the inode.  In addition, add more
      comments in the code to indicate why this is being done.
      Link: http://lkml.kernel.org/r/20190419204435.16984-1-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarYufen Yu <yuyufen@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mike Kravetz's avatar
      hugetlb: use same fault hash key for shared and private mappings · 1b426bac
      Mike Kravetz authored
      hugetlb uses a fault mutex hash table to prevent page faults of the
      same pages concurrently.  The key for shared and private mappings is
      different.  Shared keys off address_space and file index.  Private keys
      off mm and virtual address.  Consider a private mappings of a populated
      hugetlbfs file.  A fault will map the page from the file and if needed
      do a COW to map a writable page.
      Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
      pages.  It uses the address_space file index key.  However, private
      mappings will use a different key and could race with this code to map
      the file page.  This causes problems (BUG) for the page cache remove
      code as it expects the page to be unmapped.  A sample stack is:
      page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
      kernel BUG at mm/filemap.c:169!
      RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
      Call Trace:
      ? __add_to_page_cache_locked+0x162/0x380
      ? _cond_resched+0x15/0x30
      ? __inode_security_revalidate+0x5d/0x70
      ? selinux_file_permission+0x100/0x130
      There seems to be another potential COW issue/race with this approach
      of different private and shared keys as noted in commit 8382d914
      ("mm, hugetlb: improve page-fault scalability").
      Since every hugetlb mapping (even anon and private) is actually a file
      mapping, just use the address_space index key for all mappings.  This
      results in potentially more hash collisions.  However, this should not
      be the common case.
      Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
      Fixes: b5cec28d ("hugetlbfs: truncate_hugepages() takes a range of pages")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 02 May, 2019 1 commit
  5. 06 Apr, 2019 1 commit
  6. 06 Mar, 2019 1 commit
    • Joel Fernandes (Google)'s avatar
      mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd · ab3948f5
      Joel Fernandes (Google) authored
      Android uses ashmem for sharing memory regions.  We are looking forward
      to migrating all usecases of ashmem to memfd so that we can possibly
      remove the ashmem driver in the future from staging while also
      benefiting from using memfd and contributing to it.  Note staging
      drivers are also not ABI and generally can be removed at anytime.
      One of the main usecases Android has is the ability to create a region
      and mmap it as writeable, then add protection against making any
      "future" writes while keeping the existing already mmap'ed
      writeable-region active.  This allows us to implement a usecase where
      receivers of the shared memory buffer can get a read-only view, while
      the sender continues to write to the buffer.  See CursorWindow
      documentation in Android for more details:
      This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
      To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
      which prevents any future mmap and write syscalls from succeeding while
      keeping the existing mmap active.
      A better way to do F_SEAL_FUTURE_WRITE seal was discussed [1] last week
      where we don't need to modify core VFS structures to get the same
      behavior of the seal.  This solves several side-effects pointed by Andy.
      self-tests are provided in later patch to verify the expected semantics.
      [1] https://lore.kernel.org/lkml/20181111173650.GA256781@google.com/
      Thanks a lot to Andy for suggestions to improve code.
      Link: http://lkml.kernel.org/r/20190112203816.85534-2-joel@joelfernandes.orgSigned-off-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      Acked-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Marc-Andr Lureau <marcandre.lureau@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 01 Mar, 2019 1 commit
    • Mike Kravetz's avatar
      hugetlbfs: fix races and page leaks during migration · cb6acd01
      Mike Kravetz authored
      hugetlb pages should only be migrated if they are 'active'.  The
      routines set/clear_page_huge_active() modify the active state of hugetlb
      When a new hugetlb page is allocated at fault time, set_page_huge_active
      is called before the page is locked.  Therefore, another thread could
      race and migrate the page while it is being added to page table by the
      fault code.  This race is somewhat hard to trigger, but can be seen by
      strategically adding udelay to simulate worst case scheduling behavior.
      Depending on 'how' the code races, various BUG()s could be triggered.
      To address this issue, simply delay the set_page_huge_active call until
      after the page is successfully added to the page table.
      Hugetlb pages can also be leaked at migration time if the pages are
      associated with a file in an explicitly mounted hugetlbfs filesystem.
      For example, consider a two node system with 4GB worth of huge pages
      available.  A program mmaps a 2G file in a hugetlbfs filesystem.  It
      then migrates the pages associated with the file from one node to
      another.  When the program exits, huge page counts are as follows:
        1024    free_hugepages
        1024    nr_hugepages
        0       free_hugepages
        1024    nr_hugepages
        Filesystem                         Size  Used Avail Use% Mounted on
        nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool
      That is as expected.  2G of huge pages are taken from the free_hugepages
      counts, and 2G is the size of the file in the explicitly mounted
      filesystem.  If the file is then removed, the counts become:
        1024    free_hugepages
        1024    nr_hugepages
        1024    free_hugepages
        1024    nr_hugepages
        Filesystem                         Size  Used Avail Use% Mounted on
        nodev                              4.0G  2.0G  2.0G  50% /var/opt/hugepool
      Note that the filesystem still shows 2G of pages used, while there
      actually are no huge pages in use.  The only way to 'fix' the filesystem
      accounting is to unmount the filesystem
      If a hugetlb page is associated with an explicitly mounted filesystem,
      this information in contained in the page_private field.  At migration
      time, this information is not preserved.  To fix, simply transfer
      page_private from old to new page at migration time if necessary.
      There is a related race with removing a huge page from a file and
      migration.  When a huge page is removed from the pagecache, the
      page_mapping() field is cleared, yet page_private remains set until the
      page is actually freed by free_huge_page().  A page could be migrated
      while in this state.  However, since page_mapping() is not set the
      hugetlbfs specific routine to transfer page_private is not called and we
      leak the page count in the filesystem.
      To fix that, check for this condition before migrating a huge page.  If
      the condition is detected, return EBUSY for the page.
      Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
      Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
      Fixes: bcc54222 ("mm: hugetlb: introduce page_huge_active")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <stable@vger.kernel.org>
      [mike.kravetz@oracle.com: v2]
        Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
      [mike.kravetz@oracle.com: update comment and changelog]
        Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.comSigned-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 28 Feb, 2019 1 commit
  9. 09 Jan, 2019 1 commit
  10. 28 Dec, 2018 1 commit
    • Mike Kravetz's avatar
      hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race · c86aa7bb
      Mike Kravetz authored
      hugetlbfs page faults can race with truncate and hole punch operations.
      Current code in the page fault path attempts to handle this by 'backing
      out' operations if we encounter the race.  One obvious omission in the
      current code is removing a page newly added to the page cache.  This is
      pretty straight forward to address, but there is a more subtle and
      difficult issue of backing out hugetlb reservations.  To handle this
      correctly, the 'reservation state' before page allocation needs to be
      noted so that it can be properly backed out.  There are four distinct
      possibilities for reservation state: shared/reserved, shared/no-resv,
      private/reserved and private/no-resv.  Backing out a reservation may
      require memory allocation which could fail so that needs to be taken into
      account as well.
      Instead of writing the required complicated code for this rare occurrence,
      just eliminate the race.  i_mmap_rwsem is now held in read mode for the
      duration of page fault processing.  Hold i_mmap_rwsem longer in truncation
      and hold punch code to cover the call to remove_inode_hugepages.
      With this modification, code in remove_inode_hugepages checking for races
      becomes 'dead' as it can not longer happen.  Remove the dead code and
      expand comments to explain reasoning.  Similarly, checks for races with
      truncation in the page fault path can be simplified and removed.
      [mike.kravetz@oracle.com: incorporat suggestions from Kirill]
        Link: http://lkml.kernel.org/r/20181222223013.22193-3-mike.kravetz@oracle.com
      Link: http://lkml.kernel.org/r/20181218223557.5202-3-mike.kravetz@oracle.com
      Fixes: ebed4bfc ("hugetlb: fix absurd HugePages_Rsvd")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  11. 22 Aug, 2018 1 commit
  12. 27 Jul, 2018 1 commit
  13. 12 Jul, 2018 2 commits
  14. 06 Apr, 2018 1 commit
  15. 23 Mar, 2018 1 commit
  16. 01 Feb, 2018 2 commits
  17. 30 Nov, 2017 1 commit
    • Nadav Amit's avatar
      fs/hugetlbfs/inode.c: change put_page/unlock_page order in hugetlbfs_fallocate() · 72639e6d
      Nadav Amit authored
      hugetlfs_fallocate() currently performs put_page() before unlock_page().
      This scenario opens a small time window, from the time the page is added
      to the page cache, until it is unlocked, in which the page might be
      removed from the page-cache by another core.  If the page is removed
      during this time windows, it might cause a memory corruption, as the
      wrong page will be unlocked.
      It is arguable whether this scenario can happen in a real system, and
      there are several mitigating factors.  The issue was found by code
      inspection (actually grep), and not by actually triggering the flow.
      Yet, since putting the page before unlocking is incorrect it should be
      fixed, if only to prevent future breakage or someone copy-pasting this
      Mike said:
       "I am of the opinion that this does not need to be sent to stable.
        Although the ordering is current code is incorrect, there is no way
        for this to be a problem with current locking. In addition, I verified
        that the perhaps bigger issue with sys_fadvise64(POSIX_FADV_DONTNEED)
        for hugetlbfs and other filesystems is addressed in 3a77d214 ("mm:
        fadvise: avoid fadvise for fs without backing device")"
      Link: http://lkml.kernel.org/r/20170826191124.51642-1-namit@vmware.com
      Fixes: 70c3547e ("hugetlbfs: add hugetlbfs_fallocate()")
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  18. 16 Nov, 2017 2 commits
  19. 03 Nov, 2017 1 commit
  20. 09 Sep, 2017 2 commits
    • Davidlohr Bueso's avatar
      lib/interval_tree: fast overlap detection · f808c13f
      Davidlohr Bueso authored
      Allow interval trees to quickly check for overlaps to avoid unnecesary
      tree lookups in interval_tree_iter_first().
      As of this patch, all interval tree flavors will require using a
      'rb_root_cached' such that we can have the leftmost node easily
      available.  While most users will make use of this feature, those with
      special functions (in addition to the generic insert, delete, search
      calls) will avoid using the cached option as they can do funky things
      with insertions -- for example, vma_interval_tree_insert_after().
      [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
        Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
      Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.netSigned-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarDoug Ledford <dledford@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Christian Benvenuti <benve@cisco.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Jérôme Glisse's avatar
      mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY · 2916ecc0
      Jérôme Glisse authored
      Introduce a new migration mode that allow to offload the copy to a device
      DMA engine.  This changes the workflow of migration and not all
      address_space migratepage callback can support this.
      This is intended to be use by migrate_vma() which itself is use for thing
      like HMM (see include/linux/hmm.h).
      No additional per-filesystem migratepage testing is needed.  I disables
      MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
      added comment in those to explain why (part of this patch).  The commit
      message is unclear it should say that any callback that wish to support
      this new mode need to be aware of the difference in the migration flow
      from other mode.
      Some of these callbacks do extra locking while copying (aio, zsmalloc,
      balloon, ...) and for DMA to be effective you want to copy multiple
      pages in one DMA operations.  But in the problematic case you can not
      easily hold the extra lock accross multiple call to this callback.
      Usual flow is:
      For each page {
       1 - lock page
       2 - call migratepage() callback
       3 - (extra locking in some migratepage() callback)
       4 - migrate page state (freeze refcount, update page cache, buffer
           head, ...)
       5 - copy page
       6 - (unlock any extra lock of migratepage() callback)
       7 - return from migratepage() callback
       8 - unlock page
      The new mode MIGRATE_SYNC_NO_COPY:
       1 - lock multiple pages
      For each page {
       2 - call migratepage() callback
       3 - abort in all problematic migratepage() callback
       4 - migrate page state (freeze refcount, update page cache, buffer
           head, ...)
      } // finished all calls to migratepage() callback
       5 - DMA copy multiple pages
       6 - unlock all the pages
      To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
      new callback migratepages() (for instance) that deals with multiple
      pages in one transaction.
      Because the problematic cases are not important for current usage I did
      not wanted to complexify this patchset even more for no good reason.
      Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.comSigned-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  21. 07 Sep, 2017 3 commits
  22. 10 Jul, 2017 1 commit
  23. 06 Jul, 2017 1 commit
    • David Howells's avatar
      hugetlbfs: Implement show_options · 4a25220d
      David Howells authored
      Implement the show_options superblock op for hugetlbfs as part of a bid to
      get rid of s_options and generic_show_options() to make it easier to
      implement a context-based mount where the mount options can be passed
      individually over a file descriptor.
      Note that the uid and gid should possibly be displayed relative to the
      viewer's user namespace.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Nadia Yvette Chambers <nyc@holomorphy.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
  24. 19 Jun, 2017 1 commit
    • Hugh Dickins's avatar
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins authored
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: default avatarOleg Nesterov <oleg@redhat.com>
      Original-patch-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  25. 14 Apr, 2017 1 commit
    • Mike Kravetz's avatar
      hugetlbfs: fix offset overflow in hugetlbfs mmap · 045c7a3f
      Mike Kravetz authored
      If mmap() maps a file, it can be passed an offset into the file at which
      the mapping is to start.  Offset could be a negative value when
      represented as a loff_t.  The offset plus length will be used to update
      the file size (i_size) which is also a loff_t.
      Validate the value of offset and offset + length to make sure they do
      not overflow and appear as negative.
      Found by syzcaller with commit ff8c0c53 ("mm/hugetlb.c: don't call
      region_abort if region_chg fails") applied.  Prior to this commit, the
      overflow would still occur but we would luckily return ENOMEM.
      To reproduce:
         mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);
      Resulted in,
        kernel BUG at mm/hugetlb.c:742!
        Call Trace:
      Fixes: ff8c0c53 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
      Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.comReported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  26. 01 Apr, 2017 1 commit
    • Mike Kravetz's avatar
      hugetlbfs: initialize shared policy as part of inode allocation · 4742a35d
      Mike Kravetz authored
      Any time after inode allocation, destroy_inode can be called.  The
      hugetlbfs inode contains a shared_policy structure, and
      mpol_free_shared_policy is unconditionally called as part of
      hugetlbfs_destroy_inode.  Initialize the policy as part of inode
      allocation so that any quick (error path) calls to destroy_inode will be
      handed an initialized policy.
      syzkaller fuzzer found this bug, that resulted in the following:
          BUG: KASAN: user-memory-access in atomic_inc
          include/asm-generic/atomic-instrumented.h:87 [inline] at addr
          BUG: KASAN: user-memory-access in __lock_acquire+0x21a/0x3a80
          kernel/locking/lockdep.c:3239 at addr 000000131730bd7a
          Write of size 4 by task syz-executor6/14086
          CPU: 3 PID: 14086 Comm: syz-executor6 Not tainted 4.11.0-rc3+ #364
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           atomic_inc include/asm-generic/atomic-instrumented.h:87 [inline]
           __lock_acquire+0x21a/0x3a80 kernel/locking/lockdep.c:3239
           lock_acquire+0x1ee/0x590 kernel/locking/lockdep.c:3762
           __raw_write_lock include/linux/rwlock_api_smp.h:210 [inline]
           _raw_write_lock+0x33/0x50 kernel/locking/spinlock.c:295
           mpol_free_shared_policy+0x43/0xb0 mm/mempolicy.c:2536
           hugetlbfs_destroy_inode+0xca/0x120 fs/hugetlbfs/inode.c:952
           alloc_inode+0x10d/0x180 fs/inode.c:216
           new_inode_pseudo+0x69/0x190 fs/inode.c:889
           new_inode+0x1c/0x40 fs/inode.c:918
           hugetlbfs_get_inode+0x40/0x420 fs/hugetlbfs/inode.c:734
           hugetlb_file_setup+0x329/0x9f0 fs/hugetlbfs/inode.c:1282
           newseg+0x422/0xd30 ipc/shm.c:575
           ipcget_new ipc/util.c:285 [inline]
           ipcget+0x21e/0x580 ipc/util.c:639
           SYSC_shmget ipc/shm.c:673 [inline]
           SyS_shmget+0x158/0x230 ipc/shm.c:657
      Analysis provided by Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Link: http://lkml.kernel.org/r/1490477850-7944-1-git-send-email-mike.kravetz@oracle.comSigned-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  27. 02 Mar, 2017 1 commit
  28. 24 Dec, 2016 1 commit
  29. 08 Oct, 2016 1 commit
  30. 28 Sep, 2016 1 commit
  31. 27 Sep, 2016 2 commits
  32. 22 Sep, 2016 1 commit