Skip to content
  • Dave Chinner's avatar
    xfs: log vector rounding leaks log space · 110dc24a
    Dave Chinner authored
    
    
    The addition of direct formatting of log items into the CIL
    linear buffer added alignment restrictions that the start of each
    vector needed to be 64 bit aligned. Hence padding was added in
    xlog_finish_iovec() to round up the vector length to ensure the next
    vector started with the correct alignment.
    
    This adds a small number of bytes to the size of
    the linear buffer that is otherwise unused. The issue is that we
    then use the linear buffer size to determine the log space used by
    the log item, and this includes the unused space. Hence when we
    account for space used by the log item, it's more than is actually
    written into the iclogs, and hence we slowly leak this space.
    
    This results on log hangs when reserving space, with threads getting
    stuck with these stack traces:
    
    Call Trace:
    [<ffffffff81d15989>] schedule+0x29/0x70
    [<ffffffff8150d3a2>] xlog_grant_head_wait+0xa2/0x1a0
    [<ffffffff8150d55d>] xlog_grant_head_check+0xbd/0x140
    [<ffffffff8150ee33>] xfs_log_reserve+0x103/0x220
    [<ffffffff814b7f05>] xfs_trans_reserve+0x2f5/0x310
    .....
    
    The 4 bytes is significant. Brain Foster did all the hard work in
    tracking down a reproducable leak to inode chunk allocation (it went
    away with the ikeep mount option). His rough numbers were that
    creating 50,000 inodes leaked 11 log blocks. This turns out to be
    roughly 800 inode chunks or 1600 inode cluster buffers. That
    works out at roughly 4 bytes per cluster buffer logged, and at that
    I started looking for a 4 byte leak in the buffer logging code.
    
    What I found was that a struct xfs_buf_log_format structure for an
    inode cluster buffer is 28 bytes in length. This gets rounded up to
    32 bytes, but the vector length remains 28 bytes. Hence the CIL
    ticket reservation is decremented by 32 bytes (via lv->lv_buf_len)
    for that vector rather than 28 bytes which are written into the log.
    
    The fix for this problem is to separately track the bytes used by
    the log vectors in the item and use that instead of the buffer
    length when accounting for the log space that will be used by the
    formatted log item.
    
    Again, thanks to Brian Foster for doing all the hard work and long
    hours to isolate this leak and make finding the bug relatively
    simple.
    
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    
    110dc24a