Skip to content
  • Jeff Layton's avatar
    fs: handle inode->i_version more efficiently · f02a9ad1
    Jeff Layton authored
    
    
    Since i_version is mostly treated as an opaque value, we can exploit that
    fact to avoid incrementing it when no one is watching. With that change,
    we can avoid incrementing the counter on writes, unless someone has
    queried for it since it was last incremented. If the a/c/mtime don't
    change, and the i_version hasn't changed, then there's no need to dirty
    the inode metadata on a write.
    
    Convert the i_version counter to an atomic64_t, and use the lowest order
    bit to hold a flag that will tell whether anyone has queried the value
    since it was last incremented.
    
    When we go to maybe increment it, we fetch the value and check the flag
    bit.  If it's clear then we don't need to do anything if the update
    isn't being forced.
    
    If we do need to update, then we increment the counter by 2, and clear
    the flag bit, and then use a CAS op to swap it into place. If that
    works, we return true. If it doesn't then do it again with the value
    that we fetch from the CAS operation.
    
    On the query side, if the flag is already set, then we just shift the
    value down by 1 bit and return it. Otherwise, we set the flag in our
    on-stack value and again use cmpxchg to swap it into place if it hasn't
    changed. If it has, then we use the value from the cmpxchg as the new
    "old" value and try again.
    
    This method allows us to avoid incrementing the counter on writes (and
    dirtying the metadata) under typical workloads. We only need to increment
    if it has been queried since it was last changed.
    
    Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Acked-by: default avatarDave Chinner <dchinner@redhat.com>
    Tested-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
    f02a9ad1