1. 06 Jun, 2014 2 commits
  2. 28 Jan, 2014 3 commits
  3. 22 Nov, 2013 2 commits
    • Jesper Nilsson's avatar
      ipc,shm: correct error return value in shmctl (SHM_UNLOCK) · 3a72660b
      Jesper Nilsson authored
      Commit 2caacaa8 ("ipc,shm: shorten critical region for shmctl")
      restructured the ipc shm to shorten critical region, but introduced a
      path where the return value could be -EPERM, even if the operation
      actually was performed.
      
      Before the commit, the err return value was reset by the return value
      from security_shm_shmctl() after the if (!ns_capable(...)) statement.
      
      Now, we still exit the if statement with err set to -EPERM, and in the
      case of SHM_UNLOCK, it is not reset at all, and used as the return value
      from shmctl.
      
      To fix this, we only set err when errors occur, leaving the fallthrough
      case alone.
      Signed-off-by: default avatarJesper Nilsson <jesper.nilsson@axis.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[3.12.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3a72660b
    • Greg Thelen's avatar
      ipc,shm: fix shm_file deletion races · a399b29d
      Greg Thelen authored
      When IPC_RMID races with other shm operations there's potential for
      use-after-free of the shm object's associated file (shm_file).
      
      Here's the race before this patch:
      
        TASK 1                     TASK 2
        ------                     ------
        shm_rmid()
          ipc_lock_object()
                                   shmctl()
                                   shp = shm_obtain_object_check()
      
          shm_destroy()
            shum_unlock()
            fput(shp->shm_file)
                                   ipc_lock_object()
                                   shmem_lock(shp->shm_file)
                                   <OOPS>
      
      The oops is caused because shm_destroy() calls fput() after dropping the
      ipc_lock.  fput() clears the file's f_inode, f_path.dentry, and
      f_path.mnt, which causes various NULL pointer references in task 2.  I
      reliably see the oops in task 2 if with shmlock, shmu
      
      This patch fixes the races by:
      1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock().
      2) modify at risk operations to check shm_file while holding
         ipc_object_lock().
      
      Example workloads, which each trigger oops...
      
      Workload 1:
        while true; do
          id=$(shmget 1 4096)
          shm_rmid $id &
          shmlock $id &
          wait
        done
      
        The oops stack shows accessing NULL f_inode due to racing fput:
          _raw_spin_lock
          shmem_lock
          SyS_shmctl
      
      Workload 2:
        while true; do
          id=$(shmget 1 4096)
          shmat $id 4096 &
          shm_rmid $id &
          wait
        done
      
        The oops stack is similar to workload 1 due to NULL f_inode:
          touch_atime
          shmem_mmap
          shm_mmap
          mmap_region
          do_mmap_pgoff
          do_shmat
          SyS_shmat
      
      Workload 3:
        while true; do
          id=$(shmget 1 4096)
          shmlock $id
          shm_rmid $id &
          shmunlock $id &
          wait
        done
      
        The oops stack shows second fput tripping on an NULL f_inode.  The
        first fput() completed via from shm_destroy(), but a racing thread did
        a get_file() and queued this fput():
          locks_remove_flock
          __fput
          ____fput
          task_work_run
          do_notify_resume
          int_signal
      
      Fixes: c2c737a0 ("ipc,shm: shorten critical region for shmat")
      Fixes: 2caacaa8 ("ipc,shm: shorten critical region for shmctl")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>  # 3.10.17+ 3.11.6+
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a399b29d
  4. 24 Sep, 2013 1 commit
    • Davidlohr Bueso's avatar
      ipc: fix race with LSMs · 53dad6d3
      Davidlohr Bueso authored
      Currently, IPC mechanisms do security and auditing related checks under
      RCU.  However, since security modules can free the security structure,
      for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
      race if the structure is freed before other tasks are done with it,
      creating a use-after-free condition.  Manfred illustrates this nicely,
      for instance with shared mem and selinux:
      
       -> do_shmat calls rcu_read_lock()
       -> do_shmat calls shm_object_check().
           Checks that the object is still valid - but doesn't acquire any locks.
           Then it returns.
       -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
       -> selinux_shm_shmat calls ipc_has_perm()
       -> ipc_has_perm accesses ipc_perms->security
      
      shm_close()
       -> shm_close acquires rw_mutex & shm_lock
       -> shm_close calls shm_destroy
       -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
       -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
       -> ipc_free_security calls kfree(ipc_perms->security)
      
      This patch delays the freeing of the security structures after all RCU
      readers are done.  Furthermore it aligns the security life cycle with
      that of the rest of IPC - freeing them based on the reference counter.
      For situations where we need not free security, the current behavior is
      kept.  Linus states:
      
       "... the old behavior was suspect for another reason too: having the
        security blob go away from under a user sounds like it could cause
        various other problems anyway, so I think the old code was at least
        _prone_ to bugs even if it didn't have catastrophic behavior."
      
      I have tested this patch with IPC testcases from LTP on both my
      quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
      enabled, and tests pass for both voluntary and forced preemption models.
      While the mentioned races are theoretical (at least no one as reported
      them), I wanted to make sure that this new logic doesn't break anything
      we weren't aware of.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      53dad6d3
  5. 11 Sep, 2013 10 commits
  6. 09 Jul, 2013 4 commits
  7. 09 May, 2013 1 commit
  8. 08 May, 2013 1 commit
  9. 01 May, 2013 1 commit
    • Robin Holt's avatar
      ipc: sysv shared memory limited to 8TiB · d69f3bad
      Robin Holt authored
      Trying to run an application which was trying to put data into half of
      memory using shmget(), we found that having a shmall value below 8EiB-8TiB
      would prevent us from using anything more than 8TiB.  By setting
      kernel.shmall greater than 8EiB-8TiB would make the job work.
      
      In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
      
      ipc/shm.c:
       458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
       459 {
      ...
       465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
      ...
       474         if (ns->shm_tot + numpages > ns->shm_ctlall)
       475                 return -ENOSPC;
      
      [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Reported-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d69f3bad
  10. 24 Feb, 2013 2 commits
  11. 23 Feb, 2013 2 commits
  12. 12 Dec, 2012 1 commit
    • Andi Kleen's avatar
      mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB · 42d7395f
      Andi Kleen authored
      There was some desire in large applications using MAP_HUGETLB or
      SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
      others.  This is useful together with NUMA policy: use 2MB interleaving
      on some mappings, but 1GB on local mappings.
      
      This patch extends the IPC/SHM syscall interfaces slightly to allow
      specifying the page size.
      
      It borrows some upper bits in the existing flag arguments and allows
      encoding the log of the desired page size in addition to the *_HUGETLB
      flag.  When 0 is specified the default size is used, this makes the
      change fully compatible.
      
      Extending the internal hugetlb code to handle this is straight forward.
      Instead of a single mount it just keeps an array of them and selects the
      right mount based on the specified page size.  When no page size is
      specified it uses the mount of the default page size.
      
      The change is not visible in /proc/mounts because internal mounts don't
      appear there.  It also has very little overhead: the additional mounts
      just consume a super block, but not more memory when not used.
      
      I also exported the new flags to the user headers (they were previously
      under __KERNEL__).  Right now only symbols for x86 and some other
      architecture for 1GB and 2MB are defined.  The interface should already
      work for all other architectures though.  Only architectures that define
      multiple hugetlb sizes actually need it (that is currently x86, tile,
      powerpc).  However tile and powerpc have user configurable hugetlb
      sizes, so it's not easy to add defines.  A program on those
      architectures would need to query sysfs and use the appropiate log2.
      
      [akpm@linux-foundation.org: cleanups]
      [rientjes@google.com: fix build]
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      42d7395f
  13. 07 Sep, 2012 1 commit
  14. 31 Jul, 2012 1 commit
  15. 07 Jun, 2012 1 commit
  16. 01 Jun, 2012 2 commits
  17. 22 Mar, 2012 1 commit
  18. 23 Jan, 2012 2 commits
    • Hugh Dickins's avatar
      SHM_UNLOCK: fix Unevictable pages stranded after swap · 24513264
      Hugh Dickins authored
      Commit cc39c6a9 ("mm: account skipped entries to avoid looping in
      find_get_pages") correctly fixed an infinite loop; but left a problem
      that find_get_pages() on shmem would return 0 (appearing to callers to
      mean end of tree) when it meets a run of nr_pages swap entries.
      
      The only uses of find_get_pages() on shmem are via pagevec_lookup(),
      called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
      scan_mapping_unevictable_pages().  The first is already commented, and
      not worth worrying about; but the second can leave pages on the
      Unevictable list after an unusual sequence of swapping and locking.
      
      Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
      swap) instead of pagevec_lookup().
      
      But I don't want to contaminate vmscan.c with shmem internals, nor
      shmem.c with LRU locking.  So move scan_mapping_unevictable_pages() into
      shmem.c, renaming it shmem_unlock_mapping(); and rename
      check_move_unevictable_page() to check_move_unevictable_pages(), looping
      down an array of pages, oftentimes under the same lock.
      
      Leave out the "rotate unevictable list" block: that's a leftover from
      when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
      handling involved looking at pages at tail of LRU.
      
      Was there significance to the sequence first ClearPageUnevictable, then
      test page_evictable, then SetPageUnevictable here? I think not, we're
      under LRU lock, and have no barriers between those.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: <stable@vger.kernel.org> [back to 3.1 but will need respins]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      24513264
    • Hugh Dickins's avatar
      SHM_UNLOCK: fix long unpreemptible section · 85046579
      Hugh Dickins authored
      scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
      evictable again once the shared memory is unlocked.  It does this with
      pagevec_lookup()s across the whole object (which might occupy most of
      memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
      PAGEVEC_SIZE pages would be good.
      
      However, KOSAKI-san points out that this is called under shmem.c's
      info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
      There is no strong reason for that: we need to take these pages off the
      unevictable list soonish, but those locks are not required for it.
      
      So move the call to scan_mapping_unevictable_pages() from shmem.c's
      unlock handling up to shm.c's unlock handling.  Remove the recently
      added barrier, not needed now we have spin_unlock() before the scan.
      
      Use get_file(), with subsequent fput(), to make sure we have a reference
      to mapping throughout scan_mapping_unevictable_pages(): that's something
      that was previously guaranteed by the shm_lock().
      
      Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK
      time, and we lazily discover them to be Unevictable later, so it serves
      no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since
      pages still on pagevec are not marked Unevictable.
      
      The original code avoided redundant rescans by checking VM_LOCKED flag
      at its level: now avoid them by checking shp's SHM_LOCKED.
      
      The original code called scan_mapping_unevictable_pages() on a locked
      area at shm_destroy() time: perhaps we once had accounting cross-checks
      which required that, but not now, so skip the overhead and just let
      inode eviction deal with them.
      
      Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
      under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
      more as comment than to save space; comment them used for SHM_UNLOCK.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      85046579
  19. 05 Aug, 2011 1 commit
  20. 04 Aug, 2011 1 commit