1. 23 Aug, 2017 1 commit
    • Christoph Hellwig's avatar
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig authored
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      74d46992
  2. 01 Aug, 2017 1 commit
    • Jeff Layton's avatar
      fs: convert a pile of fsync routines to errseq_t based reporting · 3b49c9a1
      Jeff Layton authored
      This patch converts most of the in-kernel filesystems that do writeback
      out of the pagecache to report errors using the errseq_t-based
      infrastructure that was recently added. This allows them to report
      errors once for each open file description.
      
      Most filesystems have a fairly straightforward fsync operation. They
      call filemap_write_and_wait_range to write back all of the data and
      wait on it, and then (sometimes) sync out the metadata.
      
      For those filesystems this is a straightforward conversion from calling
      filemap_write_and_wait_range in their fsync operation to calling
      file_write_and_wait_range.
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      3b49c9a1
  3. 05 Jul, 2017 1 commit
  4. 09 May, 2017 1 commit
  5. 20 Apr, 2017 1 commit
  6. 14 Jan, 2017 1 commit
    • Peter Zijlstra's avatar
      locking/atomic, kref: Add kref_read() · 2c935bc5
      Peter Zijlstra authored
      Since we need to change the implementation, stop exposing internals.
      
      Provide kref_read() to read the current reference count; typically
      used for debug messages.
      
      Kills two anti-patterns:
      
      	atomic_read(&kref->refcount)
      	kref->refcount.counter
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2c935bc5
  7. 10 Dec, 2016 1 commit
  8. 28 Oct, 2016 1 commit
  9. 11 Oct, 2016 1 commit
  10. 28 Sep, 2016 1 commit
  11. 27 Sep, 2016 2 commits
  12. 22 Sep, 2016 1 commit
  13. 07 Jun, 2016 1 commit
  14. 09 May, 2016 1 commit
  15. 02 May, 2016 1 commit
    • Al Viro's avatar
      make ext2_get_page() and friends work without external serialization · be5b82db
      Al Viro authored
      Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
      relies upon the directory being locked - the way it sets and tests Checked and
      Error bits would be racy without that.  Switch to a slightly different scheme,
      _not_ setting Checked in case of failure.  That way the logics becomes
      	if Checked => OK
      	else if Error => fail
      	else if !validate => fail
      	else => OK
      with validation setting Checked or Error on success and failure resp. and
      returning which one had happened.  Equivalent to the current logics, but unlike
      the current logics not sensitive to the order of set_bit, test_bit getting
      reordered by CPU, etc.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      be5b82db
  16. 01 May, 2016 1 commit
  17. 18 Apr, 2016 1 commit
  18. 10 Apr, 2016 1 commit
  19. 04 Apr, 2016 1 commit
    • Kirill A. Shutemov's avatar
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov authored
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  20. 22 Jan, 2016 1 commit
    • Al Viro's avatar
      wrappers for ->i_mutex access · 5955102c
      Al Viro authored
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  21. 15 Jan, 2016 1 commit
    • Vladimir Davydov's avatar
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov authored
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  22. 12 Dec, 2015 1 commit
    • Hugh Dickins's avatar
      osd fs: __r4w_get_page rely on PageUptodate for uptodate · 3066a967
      Hugh Dickins authored
      Commit 42cb14b1 ("mm: migrate dirty page without
      clear_page_dirty_for_io etc") simplified the migration of a PageDirty
      pagecache page: one stat needs moving from zone to zone and that's about
      all.
      
      It's convenient and safest for it to shift the PageDirty bit from old
      page to new, just before updating the zone stats: before copying data
      and marking the new PageUptodate.  This is all done while both pages are
      isolated and locked, just as before; and just as before, there's a
      moment when the new page is visible in the radix_tree, but not yet
      PageUptodate.  What's new is that it may now be briefly visible as
      PageDirty before it is PageUptodate.
      
      When I scoured the tree to see if this could cause a problem anywhere,
      the only places I found were in two similar functions __r4w_get_page():
      which look up a page with find_get_page() (not using page lock), then
      claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
      
      I'm not sure whether that was right before, but now it might be wrong
      (on rare occasions): only claim the page is uptodate if PageUptodate.
      Or perhaps the page in question could never be migratable anyway?
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Tested-by: default avatarBoaz Harrosh <ooo@electrozaur.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3066a967
  23. 09 Dec, 2015 1 commit
    • Al Viro's avatar
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro authored
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      21fc61c7
  24. 09 Nov, 2015 1 commit
  25. 23 Jun, 2015 1 commit
  26. 11 May, 2015 1 commit
  27. 15 Apr, 2015 1 commit
  28. 12 Apr, 2015 2 commits
  29. 17 Feb, 2015 1 commit
    • Matthew Wilcox's avatar
      vfs: remove get_xip_mem · e748dcd0
      Matthew Wilcox authored
      All callers of get_xip_mem() are now gone.  Remove checks for it,
      initialisers of it, documentation of it and the only implementation of it.
       Also remove mm/filemap_xip.c as it is now empty.  Also remove
      documentation of the long-gone get_xip_page().
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e748dcd0
  30. 20 Jan, 2015 2 commits
  31. 19 Oct, 2014 1 commit
  32. 08 Aug, 2014 1 commit
  33. 12 Jun, 2014 1 commit
    • Al Viro's avatar
      ->splice_write() via ->write_iter() · 8d020765
      Al Viro authored
      iter_file_splice_write() - a ->splice_write() instance that gathers the
      pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
      it to ->write_iter().  A bunch of simple cases coverted to that...
      
      [AV: fixed the braino spotted by Cyrill]
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8d020765
  34. 22 May, 2014 3 commits
    • Boaz Harrosh's avatar
      ore: Support for raid 6 · ce5d36aa
      Boaz Harrosh authored
      This simple patch adds support for raid6 to the ORE.
      Most operations and calculations where already for the general
      case. Only things left:
      * call async_gen_syndrome() in the case of raid6
        (NOTE that the raid6 math is the one supported by the Linux Kernel
         see: crypto/async_tx/async_pq.c)
      * call _ore_add_parity_unit() twice with only last call generating
        the redundancy pages.
      
      * Fix couple BUGS in old code
        a. In reads when parity==2 it can happen that per_dev->length=0
           but per_dev->offset was set and adjusted by _ore_add_sg_seg().
           Don't let it be overwritten.
        b. The all 'cur_comp > starting_dev' thing to determine if:
             "per_dev->offset is in the current stripe number or the
             next one."
           Was a complete raid5/4 accident. When parity==2 this is not
           at all true usually. All we need to do is increment si->ob_offset
           once we pass by the first parity device.
           (This also greatly simplifies the code, amen)
        c. Calculation of si->dev rotation can overflow when parity==2.
      
      * Then last enable raid6 in ore_verify_layout()
      
      I want to deeply thank Daniel Gryniewicz who found first all the
      bugs in the old raid code, and inspired these patches:
      	Inspired-by Daniel Gryniewicz <dang@linuxbox.com>
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      ce5d36aa
    • Boaz Harrosh's avatar
      ore: Remove redundant dev_order(), more cleanups · 455682ce
      Boaz Harrosh authored
      Two cleanups:
      * si->cur_comp, si->cur_pg where always calculated after
        the call to ore_calc_stripe_info() with the help of
        _dev_order(...). But these are already calculated by
        ore_calc_stripe_info() and can be just set there.
        (This is left over from the time that si->cur_comp, si->cur_pg
         were only used by raid code, but now the main loop manages
         them anyway even though they are ultimately not used in
         none raid code)
      
      * si->cur_comp - For the very last stripe case, was set inside
        _ore_add_parity_unit(). This is not clear and will be wrong
        for coming raid6 so move this to only caller. Now si->cur_comp
        is only manipulated within _prepare_for_striping(), always next
        to the manipulation of cur_dev.
        Which is much easier to understand and follow.
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      455682ce
    • Boaz Harrosh's avatar
      ore: (trivial) reformat some code · 101a6427
      Boaz Harrosh authored
      rearrange some source lines. Nothing changed.
      Signed-off-by: default avatarBoaz Harrosh <bharrosh@panasas.com>
      101a6427
  35. 06 May, 2014 1 commit