Skip to content
  • Filipe David Borba Manana's avatar
    Btrfs: fix race between removing a dev and writing sbs · d7306801
    Filipe David Borba Manana authored
    
    
    This change fixes an issue when removing a device and writing
    all super blocks run simultaneously. Here's the steps necessary
    for the issue to happen:
    
    1) disk-io.c:write_all_supers() gets a number of N devices from the
       super_copy, so it will not panic if it fails to write super blocks
       for N - 1 devices;
    
    2) Then it tries to acquire the device_list_mutex, but blocks because
       volumes.c:btrfs_rm_device() got it first;
    
    3) btrfs_rm_device() removes the device from the list, then unlocks the
       mutex and after the unlock it updates the number of devices in
       super_copy to N - 1.
    
    4) write_all_supers() finally acquires the mutex, iterates over all the
       devices in the list and gets N - 1 errors, that is, it failed to write
       super blocks to all the devices;
    
    5) Because write_all_supers() thinks there are a total of N devices, it
       considers N - 1 errors to be ok, and therefore won't panic.
    
    So this change just makes sure that write_all_supers() reads the number
    of devices from super_copy after it acquires the device_list_mutex.
    Conversely, it changes btrfs_rm_device() to update the number of devices
    in super_copy before it releases the device list mutex.
    
    The code path to add a new device (volumes.c:btrfs_init_new_device),
    already has the right behaviour: it updates the number of devices in
    super_copy while holding the device_list_mutex.
    
    The only code path that doesn't lock the device list mutex
    before updating the number of devices in the super copy is
    disk-io.c:next_root_backup(), called by open_ctree() during
    mount time where concurrency issues can't happen.
    
    Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
    Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
    Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
    d7306801