1. 13 Dec, 2013 1 commit
    • Christoph Hellwig's avatar
      xfs: format log items write directly into the linear CIL buffer · bde7cff6
      Christoph Hellwig authored
      Instead of setting up pointers to memory locations in iop_format which then
      get copied into the CIL linear buffer after return move the copy into
      the individual inode items.  This avoids the need to always have a memory
      block in the exact same layout that gets written into the log around, and
      allow the log items to be much more flexible in their in-memory layouts.
      
      The only caveat is that we need to properly align the data for each
      iovec so that don't have structures misaligned in subsequent iovecs.
      
      Note that all log item format routines now need to be careful to modify
      the copy of the item that was placed into the CIL after calls to
      xlog_copy_iovec instead of the in-memory copy.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      bde7cff6
  2. 23 Oct, 2013 3 commits
    • Dave Chinner's avatar
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner authored
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      a4fbe6ab
    • Dave Chinner's avatar
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner authored
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      239880ef
    • Dave Chinner's avatar
      xfs: remove unused transaction callback variables · d420e5c8
      Dave Chinner authored
      We don't do callbacks at transaction commit time, no do we have any
      infrastructure to set up or run such callbacks, so remove the
      variables and typedefs for these operations. If we ever need to add
      callbacks, we can reintroduce the variables at that time.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      d420e5c8
  3. 30 Aug, 2013 1 commit
  4. 13 Aug, 2013 3 commits
    • Dave Chinner's avatar
      xfs: avoid CIL allocation during insert · f5baac35
      Dave Chinner authored
      Now that we have the size of the log vector that has been allocated,
      we can determine if we need to allocate a new log vector for
      formatting and insertion. We only need to allocate a new vector if
      it won't fit into the existing buffer.
      
      However, we need to hold the CIL context lock while we do this so
      that we can't race with a push draining the currently queued log
      vectors. It is safe to do this as long as we do GFP_NOFS allocation
      to avoid avoid memory allocation recursing into the filesystem.
      Hence we can safely overwrite the existing log vector on the CIL if
      it is large enough to hold all the dirty regions of the current
      item.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      f5baac35
    • Dave Chinner's avatar
      xfs: Reduce allocations during CIL insertion · 7492c5b4
      Dave Chinner authored
      Now that we have the size of the object before the formatting pass
      is called, we can allocation the log vector and it's buffer in a
      single allocation rather than two separate allocations.
      
      Store the size of the allocated buffer in the log vector so that
      we potentially avoid allocation for future modifications of the
      object.
      
      While touching this code, remove the IOP_FORMAT definition.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      7492c5b4
    • Dave Chinner's avatar
      xfs: return log item size in IOP_SIZE · 166d1368
      Dave Chinner authored
      To begin optimising the CIL commit process, we need to have IOP_SIZE
      return both the number of vectors and the size of the data pointed
      to by the vectors. This enables us to calculate the size ofthe
      memory allocation needed before the formatting step and reduces the
      number of memory allocations per item by one.
      
      While there, kill the IOP_SIZE macro.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      166d1368
  5. 12 Aug, 2013 4 commits
  6. 27 Jun, 2013 2 commits
    • Dave Chinner's avatar
      xfs: Inode create log items · 3ebe7d2d
      Dave Chinner authored
      Introduce the inode create log item type for logical inode create logging.
      Instead of logging the changes in buffers, pass the range to be
      initialised through the log by a new transaction type.  This reduces
      the amount of log space required to record initialisation during
      allocation from about 128 bytes per inode to a small fixed amount
      per inode extent to be initialised.
      
      This requires a new log item type to track it through the log
      and the AIL. This is a relatively simple item - most callbacks are
      noops as this item has the same life cycle as the transaction.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      3ebe7d2d
    • Dave Chinner's avatar
      xfs: Introduce an ordered buffer item · 5f6bed76
      Dave Chinner authored
      If we have a buffer that we have modified but we do not wish to
      physically log in a transaction (e.g. we've logged a logical
      change), we still need to ensure that transactional integrity is
      maintained. Hence we must not move the tail of the log past the
      transaction that the buffer is associated with before the buffer is
      written to disk.
      
      This means these special buffers still need to be included in the
      transaction and added to the AIL just like a normal buffer, but we
      do not want the modifications to the buffer written into the
      transaction. IOWs, what we want is an "ordered buffer" that
      maintains the same transactional life cycle as a physically logged
      buffer, just without the transcribing of the modifications to the
      log.
      
      Hence we need to flag the buffer as an "ordered buffer" to avoid
      including it in vector size calculations or formatting during the
      transaction. Once the transaction is committed, the buffer appears
      for all intents to be the same as a physically logged buffer as it
      transitions through the log and AIL.
      
      Relogging will also work just fine for such an ordered buffer - the
      logical transaction will be replayed before the subsequent
      modifications that relog the buffer, so everything will be
      reconstructed correctly by recovery.
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      5f6bed76
  7. 19 Jun, 2013 1 commit
  8. 07 May, 2013 1 commit
    • Dave Chinner's avatar
      xfs: introduce CONFIG_XFS_WARN · 742ae1e3
      Dave Chinner authored
      Running a CONFIG_XFS_DEBUG kernel in production environments is not
      the best idea as it introduces significant overhead, can change
      the behaviour of algorithms (such as allocation) to improve test
      coverage, and (most importantly) panic the machine on non-fatal
      errors.
      
      There are many cases where all we want to do is run a
      kernel with more bounds checking enabled, such as is provided by the
      ASSERT() statements throughout the code, but without all the
      potential overhead and drawbacks.
      
      This patch converts all the ASSERT statements to evaluate as
      WARN_ON(1) statements and hence if they fail dump a warning and a
      stack trace to the log. This has minimal overhead and does not
      change any algorithms, and will allow us to find strange "out of
      bounds" problems more easily on production machines.
      
      There are a few places where assert statements contain debug only
      code. These are converted to be debug-or-warn only code so that we
      still get all the assert checks in the code.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      742ae1e3
  9. 27 Apr, 2013 2 commits
  10. 21 Apr, 2013 1 commit
    • Christoph Hellwig's avatar
      xfs: add support for large btree blocks · ee1a47ab
      Christoph Hellwig authored
      Add support for larger btree blocks that contains a CRC32C checksum,
      a filesystem uuid and block number for detecting filesystem
      consistency and out of place writes.
      
      [dchinner@redhat.com] Also include an owner field to allow reverse
      mappings to be implemented for improved repairability and a LSN
      field to so that log recovery can easily determine the last
      modification that made it to disk for each buffer.
      
      [dchinner@redhat.com] Add buffer log format flags to indicate the
      type of buffer to recovery so that we don't have to do blind magic
      number tests to determine what the buffer is.
      
      [dchinner@redhat.com] Modified to fit into the verifier structure.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBen Myers <bpm@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      ee1a47ab
  11. 01 Feb, 2013 7 commits
  12. 16 Nov, 2012 2 commits
    • Dave Chinner's avatar
      xfs: convert buffer verifiers to an ops structure. · 1813dd64
      Dave Chinner authored
      To separate the verifiers from iodone functions and associate read
      and write verifiers at the same time, introduce a buffer verifier
      operations structure to the xfs_buf.
      
      This avoids the need for assigning the write verifier, clearing the
      iodone function and re-running ioend processing in the read
      verifier, and gets rid of the nasty "b_pre_io" name for the write
      verifier function pointer. If we ever need to, it will also be
      easier to add further content specific callbacks to a buffer with an
      ops structure in place.
      
      We also avoid needing to export verifier functions, instead we
      can simply export the ops structures for those that are needed
      outside the function they are defined in.
      
      This patch also fixes a directory block readahead verifier issue
      it exposed.
      
      This patch also adds ops callbacks to the inode/alloc btree blocks
      initialised by growfs. These will need more work before they will
      work with CRCs.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarPhil White <pwhite@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      1813dd64
    • Dave Chinner's avatar
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner authored
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarPhil White <pwhite@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      c3f8fc73
  13. 31 Jul, 2012 1 commit
    • Jan Kara's avatar
      xfs: Convert to new freezing code · d9457dc0
      Jan Kara authored
      Generic code now blocks all writers from standard write paths. So we add
      blocking of all writers coming from ioctl (we get a protection of ioctl against
      racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
      non-racy freeze protection. We also keep freeze protection on transaction
      start to block internal filesystem writes such as removal of preallocated
      blocks.
      
      CC: Ben Myers <bpm@sgi.com>
      CC: Alex Elder <elder@kernel.org>
      CC: xfs@oss.sgi.com
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d9457dc0
  14. 01 Jul, 2012 1 commit
  15. 30 May, 2012 1 commit
  16. 14 May, 2012 1 commit
    • Christoph Hellwig's avatar
      xfs: on-stack delayed write buffer lists · 43ff2122
      Christoph Hellwig authored
      Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
      and write back the buffers per-process instead of by waking up xfsbufd.
      
      This is now easily doable given that we have very few places left that write
      delwri buffers:
      
       - log recovery:
      	Only done at mount time, and already forcing out the buffers
      	synchronously using xfs_flush_buftarg
      
       - quotacheck:
      	Same story.
      
       - dquot reclaim:
      	Writes out dirty dquots on the LRU under memory pressure.  We might
      	want to look into doing more of this via xfsaild, but it's already
      	more optimal than the synchronous inode reclaim that writes each
      	buffer synchronously.
      
       - xfsaild:
      	This is the main beneficiary of the change.  By keeping a local list
      	of buffers to write we reduce latency of writing out buffers, and
      	more importably we can remove all the delwri list promotions which
      	were hitting the buffer cache hard under sustained metadata loads.
      
      The implementation is very straight forward - xfs_buf_delwri_queue now gets
      a new list_head pointer that it adds the delwri buffers to, and all callers
      need to eventually submit the list using xfs_buf_delwi_submit or
      xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
      skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
      list.  The biggest change to pass down the buffer list was done to the AIL
      pushing. Now that we operate on buffers the trylock, push and pushbuf log
      item methods are merged into a single push routine, which tries to lock the
      item, and if possible add the buffer that needs writeback to the buffer list.
      This leads to much simpler code than the previous split but requires the
      individual IOP_PUSH instances to unlock and reacquire the AIL around calls
      to blocking routines.
      
      Given that xfsailds now also handle writing out buffers, the conditions for
      log forcing and the sleep times needed some small changes.  The most
      important one is that we consider an AIL busy as long we still have buffers
      to push, and the other one is that we do increment the pushed LSN for
      buffers that are under flushing at this moment, but still count them towards
      the stuck items for restart purposes.  Without this we could hammer on stuck
      items without ever forcing the log and not make progress under heavy random
      delete workloads on fast flash storage devices.
      
      [ Dave Chinner:
      	- rebase on previous patches.
      	- improved comments for XBF_DELWRI_Q handling
      	- fix XBF_ASYNC handling in queue submission (test 106 failure)
      	- rename delwri submit function buffer list parameters for clarity
      	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarMark Tinguely <tinguely@sgi.com>
      Signed-off-by: default avatarBen Myers <bpm@sgi.com>
      43ff2122
  17. 08 Dec, 2011 1 commit
  18. 08 Nov, 2011 1 commit
  19. 12 Oct, 2011 2 commits
    • Christoph Hellwig's avatar
      xfs: simplify xfs_trans_ijoin* again · ddc3415a
      Christoph Hellwig authored
      There is no reason to keep a reference to the inode even if we unlock
      it during transaction commit because we never drop a reference between
      the ijoin and commit.  Also use this fact to merge xfs_trans_ijoin_ref
      back into xfs_trans_ijoin - the third argument decides if an unlock
      is needed now.
      
      I'm actually starting to wonder if allowing inodes to be unlocked
      at transaction commit really is worth the effort.  The only real
      benefit is that they can be unlocked earlier when commiting a
      synchronous transactions, but that could be solved by doing the
      log force manually after the unlock, too.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      
      ddc3415a
    • Christoph Hellwig's avatar
      xfs: unlock the inode before log force in xfs_fsync · b1037058
      Christoph Hellwig authored
      Only read the LSN we need to push to with the ilock held, and then release
      it before we do the log force to improve concurrency.
      
      This also removes the only direct caller of _xfs_trans_commit, thus
      allowing it to be merged into the plain xfs_trans_commit again.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      
      b1037058
  20. 11 Oct, 2011 1 commit
  21. 11 Jul, 2011 1 commit
  22. 08 Jul, 2011 1 commit
  23. 23 Feb, 2011 1 commit
    • Christoph Hellwig's avatar
      xfs: more sensible inode refcounting for ialloc · ec3ba85f
      Christoph Hellwig authored
      Currently we return iodes from xfs_ialloc with just a single reference held.
      But we need two references, as one is dropped during transaction commit and
      the second needs to be transfered to the VFS.  Change xfs_ialloc to use
      xfs_iget plus xfs_trans_ijoin_ref to grab two references to the inode,
      and remove the now superflous IHOLD calls from all callers.  This also
      greatly simplifies the error handling in xfs_create and also allow to remove
      xfs_trans_iget as no other callers are left.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      ec3ba85f