1. 01 Feb, 2018 26 commits
  2. 31 Jan, 2018 1 commit
    • Jeff Layton's avatar
      iversion: make inode_cmp_iversion{+raw} return bool instead of s64 · c0cef30e
      Jeff Layton authored
      As Linus points out:
          The inode_cmp_iversion{+raw}() functions are pure and utter crap.
          You say that they return 0/negative/positive, but they do so in a
          completely broken manner. They return that ternary value as the
          sequence number difference in a 's64', which means that if you
          actually care about that ternary value, and do the *sane* thing that
          the kernel-doc of the function implies is the right thing, you would
              int cmp = inode_cmp_iversion(inode, old);
              if (cmp < 0 ...
          and as a result you get code that looks sane, but that doesn't
          actually *WORK* right.
      Since none of the callers actually care about the ternary value here,
      convert the inode_cmp_iversion{+raw} functions to just return a boolean
      value (false for matching, true for non-matching).
      This matches the existing use of these functions just fine, and makes it
      simple to convert them to return a ternary value in the future if we
      grow callers that need it.
      With this change we can also reimplement inode_cmp_iversion in a simpler
      way using inode_peek_iversion.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  3. 29 Jan, 2018 5 commits
    • Mike Snitzer's avatar
      dm mpath: delay the retry of a request if the target responded as busy · ac514ffc
      Mike Snitzer authored
      Add DM_ENDIO_DELAY_REQUEUE to allow request-based multipath's
      multipath_end_io() to instruct dm-rq.c:dm_done() to delay a requeue.
      This is beneficial to do if BLK_STS_RESOURCE is returned from the target
      (because target is busy).
      Relative to blk-mq: kick the hw queues via blk_mq_requeue_work(),
      indirectly from dm-rq.c:__dm_mq_kick_requeue_list(), after a delay.
      For old .request_fn: use blk_delay_queue().
      bio-based multipath doesn't have feature parity with request-based for
      retryable error requeues; that is something that'll need fixing in the
      Suggested-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      [as interpreted from Bart's "... patch looks fine to me."]
    • Darrick J. Wong's avatar
      xfs: only grab shared inode locks for source file during reflink · 01c2e13d
      Darrick J. Wong authored
      Reflink and dedupe operations remap blocks from a source file into a
      destination file.  The destination file needs exclusive locks on all
      levels because we're updating its block map, but the source file isn't
      undergoing any block map changes so we can use a shared lock.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    • Jeff Layton's avatar
      fs: handle inode->i_version more efficiently · f02a9ad1
      Jeff Layton authored
      Since i_version is mostly treated as an opaque value, we can exploit that
      fact to avoid incrementing it when no one is watching. With that change,
      we can avoid incrementing the counter on writes, unless someone has
      queried for it since it was last incremented. If the a/c/mtime don't
      change, and the i_version hasn't changed, then there's no need to dirty
      the inode metadata on a write.
      Convert the i_version counter to an atomic64_t, and use the lowest order
      bit to hold a flag that will tell whether anyone has queried the value
      since it was last incremented.
      When we go to maybe increment it, we fetch the value and check the flag
      bit.  If it's clear then we don't need to do anything if the update
      isn't being forced.
      If we do need to update, then we increment the counter by 2, and clear
      the flag bit, and then use a CAS op to swap it into place. If that
      works, we return true. If it doesn't then do it again with the value
      that we fetch from the CAS operation.
      On the query side, if the flag is already set, then we just shift the
      value down by 1 bit and return it. Otherwise, we set the flag in our
      on-stack value and again use cmpxchg to swap it into place if it hasn't
      changed. If it has, then we use the value from the cmpxchg as the new
      "old" value and try again.
      This method allows us to avoid incrementing the counter on writes (and
      dirtying the metadata) under typical workloads. We only need to increment
      if it has been queried since it was last changed.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDave Chinner <dchinner@redhat.com>
      Tested-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
    • Jeff Layton's avatar
      fs: don't take the i_lock in inode_inc_iversion · 7594c461
      Jeff Layton authored
      The rationale for taking the i_lock when incrementing this value is
      lost in antiquity. The readers of the field don't take it (at least
      not universally), so my assumption is that it was only done here to
      serialize incrementors.
      If that is indeed the case, then we can drop the i_lock from this
      codepath and treat it as a atomic64_t for the purposes of
      incrementing it. This allows us to use inode_inc_iversion without
      any danger of lock inversion.
      Note that the read side is not fetched atomically with this change.
      The assumption here is that that is not a critical issue since the
      i_version is not fully synchronized with anything else anyway.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
    • Jeff Layton's avatar
      fs: new API for handling inode->i_version · ae5e165d
      Jeff Layton authored
      Add a documentation blob that explains what the i_version field is, how
      it is expected to work, and how it is currently implemented by various
      We already have inode_inc_iversion. Add several other functions for
      manipulating and accessing the i_version counter. For now, the
      implementation is trivial and basically works the way that all of the
      open-coded i_version accesses work today.
      Future patches will convert existing users of i_version to use the new
      API, and then convert the backend implementation to do things more
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
  4. 26 Jan, 2018 7 commits
  5. 25 Jan, 2018 1 commit