1. 08 Aug, 2014 1 commit
    • David Herrmann's avatar
      mm: allow drivers to prevent new writable mappings · 4bb5f5d9
      David Herrmann authored
      This patch (of 6):
      
      The i_mmap_writable field counts existing writable mappings of an
      address_space.  To allow drivers to prevent new writable mappings, make
      this counter signed and prevent new writable mappings if it is negative.
      This is modelled after i_writecount and DENYWRITE.
      
      This will be required by the shmem-sealing infrastructure to prevent any
      new writable mappings after the WRITE seal has been set.  In case there
      exists a writable mapping, this operation will fail with EBUSY.
      
      Note that we rely on the fact that iff you already own a writable mapping,
      you can increase the counter without using the helpers.  This is the same
      that we do for i_writecount.
      Signed-off-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ryan Lortie <desrt@desrt.ca>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Daniel Mack <zonque@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4bb5f5d9
  2. 16 Jul, 2014 1 commit
    • NeilBrown's avatar
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown authored
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      74316201
  3. 10 Jun, 2014 1 commit
    • Andy Lutomirski's avatar
      fs,userns: Change inode_capable to capable_wrt_inode_uidgid · 23adbe12
      Andy Lutomirski authored
      The kernel has no concept of capabilities with respect to inodes; inodes
      exist independently of namespaces.  For example, inode_capable(inode,
      CAP_LINUX_IMMUTABLE) would be nonsense.
      
      This patch changes inode_capable to check for uid and gid mappings and
      renames it to capable_wrt_inode_uidgid, which should make it more
      obvious what it does.
      
      Fixes CVE-2014-4014.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23adbe12
  4. 06 Jun, 2014 1 commit
  5. 03 Apr, 2014 1 commit
    • Johannes Weiner's avatar
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner authored
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  6. 01 Apr, 2014 1 commit
  7. 24 Mar, 2014 1 commit
  8. 09 Nov, 2013 5 commits
  9. 10 Sep, 2013 5 commits
    • Dave Chinner's avatar
      fs: convert inode and dentry shrinking to be node aware · 9b17c623
      Dave Chinner authored
      Now that the shrinker is passing a node in the scan control structure, we
      can pass this to the the generic LRU list code to isolate reclaim to the
      lists on matching nodes.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarGlauber Costa <glommer@parallels.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9b17c623
    • Glauber Costa's avatar
      inode: move inode to a different list inside lock · d38fa698
      Glauber Costa authored
      When removing an element from the lru, this will be done today after the lock
      is released. This is a clear mistake, although we are not sure if the bugs we
      are seeing are related to this. All list manipulations are done inside the
      lock, and so should this one.
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Tested-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d38fa698
    • Dave Chinner's avatar
      inode: convert inode lru list to generic lru list code. · bc3b14cb
      Dave Chinner authored
      [glommer@openvz.org: adapted for new LRU return codes]
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      bc3b14cb
    • Dave Chinner's avatar
      shrinker: convert superblock shrinkers to new API · 0a234c6d
      Dave Chinner authored
      Convert superblock shrinker to use the new count/scan API, and propagate
      the API changes through to the filesystem callouts.  The filesystem
      callouts already use a count/scan API, so it's just changing counters to
      longs to match the VM API.
      
      This requires the dentry and inode shrinker callouts to be converted to
      the count/scan API.  This is mainly a mechanical change.
      
      [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0a234c6d
    • Glauber Costa's avatar
      fs: bump inode and dentry counters to long · 3942c07c
      Glauber Costa authored
      This series reworks our current object cache shrinking infrastructure in
      two main ways:
      
       * Noticing that a lot of users copy and paste their own version of LRU
         lists for objects, we put some effort in providing a generic version.
         It is modeled after the filesystem users: dentries, inodes, and xfs
         (for various tasks), but we expect that other users could benefit in
         the near future with little or no modification.  Let us know if you
         have any issues.
      
       * The underlying list_lru being proposed automatically and
         transparently keeps the elements in per-node lists, and is able to
         manipulate the node lists individually.  Given this infrastructure, we
         are able to modify the up-to-now hammer called shrink_slab to proceed
         with node-reclaim instead of always searching memory from all over like
         it has been doing.
      
      Per-node lru lists are also expected to lead to less contention in the lru
      locks on multi-node scans, since we are now no longer fighting for a
      global lock.  The locks usually disappear from the profilers with this
      change.
      
      Although we have no official benchmarks for this version - be our guest to
      independently evaluate this - earlier versions of this series were
      performance tested (details at
      http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
      visible performance regressions while yielding a better qualitative
      behavior in NUMA machines.
      
      With this infrastructure in place, we can use the list_lru entry point to
      provide memcg isolation and per-memcg targeted reclaim.  Historically,
      those two pieces of work have been posted together.  This version presents
      only the infrastructure work, deferring the memcg work for a later time,
      so we can focus on getting this part tested.  You can see more about the
      history of such work at http://lwn.net/Articles/552769/
      
      Dave Chinner (18):
        dcache: convert dentry_stat.nr_unused to per-cpu counters
        dentry: move to per-sb LRU locks
        dcache: remove dentries from LRU before putting on dispose list
        mm: new shrinker API
        shrinker: convert superblock shrinkers to new API
        list: add a new LRU list type
        inode: convert inode lru list to generic lru list code.
        dcache: convert to use new lru list infrastructure
        list_lru: per-node list infrastructure
        shrinker: add node awareness
        fs: convert inode and dentry shrinking to be node aware
        xfs: convert buftarg LRU to generic code
        xfs: rework buffer dispose list tracking
        xfs: convert dquot cache lru to list_lru
        fs: convert fs shrinkers to new scan/count API
        drivers: convert shrinkers to new count/scan API
        shrinker: convert remaining shrinkers to count/scan API
        shrinker: Kill old ->shrink API.
      
      Glauber Costa (7):
        fs: bump inode and dentry counters to long
        super: fix calculation of shrinkable objects for small numbers
        list_lru: per-node API
        vmscan: per-node deferred work
        i915: bail out earlier when shrinker cannot acquire mutex
        hugepage: convert huge zero page shrinker to new shrinker API
        list_lru: dynamically adjust node arrays
      
      This patch:
      
      There are situations in very large machines in which we can have a large
      quantity of dirty inodes, unused dentries, etc.  This is particularly true
      when umounting a filesystem, where eventually since every live object will
      eventually be discarded.
      
      Dave Chinner reported a problem with this while experimenting with the
      shrinker revamp patchset.  So we believe it is time for a change.  This
      patch just moves int to longs.  Machines where it matters should have a
      big long anyway.
      Signed-off-by: default avatarGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3942c07c
  10. 04 Sep, 2013 1 commit
  11. 29 Jun, 2013 1 commit
  12. 13 Apr, 2013 1 commit
  13. 09 Apr, 2013 1 commit
  14. 28 Feb, 2013 1 commit
    • Sasha Levin's avatar
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin authored
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: default avatarPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  15. 23 Feb, 2013 1 commit
  16. 12 Dec, 2012 1 commit
    • Rafael Aquini's avatar
      mm: redefine address_space.assoc_mapping · 252aa6f5
      Rafael Aquini authored
      Overhaul struct address_space.assoc_mapping renaming it to
      address_space.private_data and its type is redefined to void*.  By this
      approach we consistently name the .private_* elements from struct
      address_space as well as allow extended usage for address_space
      association with other data structures through ->private_data.
      
      Also, all users of old ->assoc_mapping element are converted to reflect
      its new name and type change (->private_data).
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      252aa6f5
  17. 27 Nov, 2012 1 commit
    • Jan Kara's avatar
      writeback: put unused inodes to LRU after writeback completion · 4eff96dd
      Jan Kara authored
      Commit 169ebd90 ("writeback: Avoid iput() from flusher thread")
      removed iget-iput pair from inode writeback.  As a side effect, inodes
      that are dirty during iput_final() call won't be ever added to inode LRU
      (iput_final() doesn't add dirty inodes to LRU and later when the inode
      is cleaned there's noone to add the inode there).  Thus inodes are
      effectively unreclaimable until someone looks them up again.
      
      The practical effect of this bug is limited by the fact that inodes are
      pinned by a dentry for long enough that the inode gets cleaned.  But
      still the bug can have nasty consequences leading up to OOM conditions
      under certain circumstances.  Following can easily reproduce the
      problem:
      
        for (( i = 0; i < 1000; i++ )); do
          mkdir $i
          for (( j = 0; j < 1000; j++ )); do
            touch $i/$j
            echo 2 > /proc/sys/vm/drop_caches
          done
        done
      
      then one needs to run 'sync; ls -lR' to make inodes reclaimable again.
      
      We fix the issue by inserting unused clean inodes into the LRU after
      writeback finishes in inode_sync_complete().
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reported-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>		[3.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4eff96dd
  18. 09 Oct, 2012 1 commit
    • Michel Lespinasse's avatar
      mm: replace vma prio_tree with an interval tree · 6b2dbba8
      Michel Lespinasse authored
      Implement an interval tree as a replacement for the VMA prio_tree.  The
      algorithms are similar to lib/interval_tree.c; however that code can't be
      directly reused as the interval endpoints are not explicitly stored in the
      VMA.  So instead, the common algorithm is moved into a template and the
      details (node type, how to get interval endpoints from the node, etc) are
      filled in using the C preprocessor.
      
      Once the interval tree functions are available, using them as a
      replacement to the VMA prio tree is a relatively simple, mechanical job.
      Signed-off-by: default avatarMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b2dbba8
  19. 31 Jul, 2012 2 commits
  20. 23 Jul, 2012 1 commit
  21. 14 Jul, 2012 1 commit
  22. 01 Jun, 2012 1 commit
    • Josef Bacik's avatar
      fs: introduce inode operation ->update_time · c3b2da31
      Josef Bacik authored
      Btrfs has to make sure we have space to allocate new blocks in order to modify
      the inode, so updating time can fail.  We've gotten around this by having our
      own file_update_time but this is kind of a pain, and Christoph has indicated he
      would like to make xfs do something different with atime updates.  So introduce
      ->update_time, where we will deal with i_version an a/m/c time updates and
      indicate which changes need to be made.  The normal version just does what it
      has always done, updates the time and marks the inode dirty, and then
      filesystems can choose to do something different.
      
      I've gone through all of the users of file_update_time and made them check for
      errors with the exception of the fault code since it's complicated and I wasn't
      quite sure what to do there, also Jan is going to be pushing the file time
      updates into page_mkwrite for those who have it so that should satisfy btrfs and
      make it not a big deal to check the file_update_time() return code in the
      generic fault path. Thanks,
      Signed-off-by: default avatarJosef Bacik <josef@redhat.com>
      c3b2da31
  23. 31 May, 2012 2 commits
  24. 30 May, 2012 1 commit
  25. 24 May, 2012 1 commit
  26. 06 May, 2012 3 commits
    • Jan Kara's avatar
      writeback: Avoid iput() from flusher thread · 169ebd90
      Jan Kara authored
      Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
      because iput() can do a lot of work - for example truncate the inode if it's
      the last iput on unlinked file. Some filesystems depend on flusher thread
      progressing (e.g. because they need to flush delay allocated blocks to reduce
      allocation uncertainty) and so flusher thread doing truncate creates
      interesting dependencies and possibilities for deadlocks.
      
      We get rid of iput() in flusher thread by using the fact that I_SYNC inode
      flag effectively pins the inode in memory. So if we take care to either hold
      i_lock or have I_SYNC set, we can get away without taking inode reference
      in writeback_sb_inodes().
      
      As a side effect of these changes, we also fix possible use-after-free in
      wb_writeback() because inode_wait_for_writeback() call could try to reacquire
      i_lock on the inode that was already free.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      169ebd90
    • Jan Kara's avatar
      vfs: Rename end_writeback() to clear_inode() · dbd5768f
      Jan Kara authored
      After we moved inode_sync_wait() from end_writeback() it doesn't make sense
      to call the function end_writeback() anymore. Rename it to clear_inode()
      which well says what the function really does - set I_CLEAR flag.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      dbd5768f
    • Jan Kara's avatar
      vfs: Move waiting for inode writeback from end_writeback() to evict_inode() · 7994e6f7
      Jan Kara authored
      Currently, I_SYNC can never be set when evict_inode() (and thus
      end_writeback()) is called because flusher thread holds inode reference while
      inode is under writeback. As a result inode_sync_wait() in those places
      currently does nothing. However that is going to change and unveils problems
      with calling inode_sync_wait() from end_writeback(). Several filesystems call
      end_writeback() after they have deleted the inode (btrfs, gfs2, ...) and other
      filesystems (ext3, ext4, reiserfs, ...) can deadlock when waiting for I_SYNC
      because they call end_writeback() from within a transaction.
      
      To avoid these issues, we move inode_sync_wait() into evict_inode() before
      calling ->evict_inode(). That way we preserve the current property that
      ->evict_inode() and writeback never run in parallel and all filesystems are
      safe.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      7994e6f7
  27. 03 May, 2012 1 commit
  28. 08 Apr, 2012 1 commit
    • Eric W. Biederman's avatar
      userns: Replace the hard to write inode_userns with inode_capable. · 1a48e2ac
      Eric W. Biederman authored
      This represents a change in strategy of how to handle user namespaces.
      Instead of tagging everything explicitly with a user namespace and bulking
      up all of the comparisons of uids and gids in the kernel,  all uids and gids
      in use will have a mapping to a flat kuid and kgid spaces respectively.  This
      allows much more of the existing logic to be preserved and in general
      allows for faster code.
      
      In this new and improved world we allow someone to utiliize capabilities
      over an inode if the inodes owner mapps into the capabilities holders user
      namespace and the user has capabilities in their user namespace.  Which
      is simple and efficient.
      
      Moving the fs uid comparisons to be comparisons in a flat kuid space
      follows in later patches, something that is only significant if you
      are using user namespaces.
      Acked-by: default avatarSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      1a48e2ac