Skip to content
Snippets Groups Projects
  1. Apr 02, 2021
  2. Apr 01, 2021
    • Yufen Yu's avatar
      block: only update parent bi_status when bio fail · 3edf5346
      Yufen Yu authored
      
      For multiple split bios, if one of the bio is fail, the whole
      should return error to application. But we found there is a race
      between bio_integrity_verify_fn and bio complete, which return
      io success to application after one of the bio fail. The race as
      following:
      
      split bio(READ)          kworker
      
      nvme_complete_rq
      blk_update_request //split error=0
        bio_endio
          bio_integrity_endio
            queue_work(kintegrityd_wq, &bip->bip_work);
      
                               bio_integrity_verify_fn
                               bio_endio //split bio
                                __bio_chain_endio
                                   if (!parent->bi_status)
      
                                     <interrupt entry>
                                     nvme_irq
                                       blk_update_request //parent error=7
                                       req_bio_endio
                                          bio->bi_status = 7 //parent bio
                                     <interrupt exit>
      
                                     parent->bi_status = 0
                              parent->bi_end_io() // return bi_status=0
      
      The bio has been split as two: split and parent. When split
      bio completed, it depends on kworker to do endio, while
      bio_integrity_verify_fn have been interrupted by parent bio
      complete irq handler. Then, parent bio->bi_status which have
      been set in irq handler will overwrite by kworker.
      
      In fact, even without the above race, we also need to conside
      the concurrency beteen mulitple split bio complete and update
      the same parent bi_status. Normally, multiple split bios will
      be issued to the same hctx and complete from the same irq
      vector. But if we have updated queue map between multiple split
      bios, these bios may complete on different hw queue and different
      irq vector. Then the concurrency update parent bi_status may
      cause the final status error.
      
      Suggested-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarYufen Yu <yuyufen@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3edf5346
  3. Mar 27, 2021
  4. Mar 24, 2021
  5. Mar 23, 2021
    • David Jeffery's avatar
      block: recalculate segment count for multi-segment discards correctly · a958937f
      David Jeffery authored
      
      When a stacked block device inserts a request into another block device
      using blk_insert_cloned_request, the request's nr_phys_segments field gets
      recalculated by a call to blk_recalc_rq_segments in
      blk_cloned_rq_check_limits. But blk_recalc_rq_segments does not know how to
      handle multi-segment discards. For disk types which can handle
      multi-segment discards like nvme, this results in discard requests which
      claim a single segment when it should report several, triggering a warning
      in nvme and causing nvme to fail the discard from the invalid state.
      
       WARNING: CPU: 5 PID: 191 at drivers/nvme/host/core.c:700 nvme_setup_discard+0x170/0x1e0 [nvme_core]
       ...
       nvme_setup_cmd+0x217/0x270 [nvme_core]
       nvme_loop_queue_rq+0x51/0x1b0 [nvme_loop]
       __blk_mq_try_issue_directly+0xe7/0x1b0
       blk_mq_request_issue_directly+0x41/0x70
       ? blk_account_io_start+0x40/0x50
       dm_mq_queue_rq+0x200/0x3e0
       blk_mq_dispatch_rq_list+0x10a/0x7d0
       ? __sbitmap_queue_get+0x25/0x90
       ? elv_rb_del+0x1f/0x30
       ? deadline_remove_request+0x55/0xb0
       ? dd_dispatch_request+0x181/0x210
       __blk_mq_do_dispatch_sched+0x144/0x290
       ? bio_attempt_discard_merge+0x134/0x1f0
       __blk_mq_sched_dispatch_requests+0x129/0x180
       blk_mq_sched_dispatch_requests+0x30/0x60
       __blk_mq_run_hw_queue+0x47/0xe0
       __blk_mq_delay_run_hw_queue+0x15b/0x170
       blk_mq_sched_insert_requests+0x68/0xe0
       blk_mq_flush_plug_list+0xf0/0x170
       blk_finish_plug+0x36/0x50
       xlog_cil_committed+0x19f/0x290 [xfs]
       xlog_cil_process_committed+0x57/0x80 [xfs]
       xlog_state_do_callback+0x1e0/0x2a0 [xfs]
       xlog_ioend_work+0x2f/0x80 [xfs]
       process_one_work+0x1b6/0x350
       worker_thread+0x53/0x3e0
       ? process_one_work+0x350/0x350
       kthread+0x11b/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      
      This patch fixes blk_recalc_rq_segments to be aware of devices which can
      have multi-segment discards. It calculates the correct discard segment
      count by counting the number of bio as each discard bio is considered its
      own segment.
      
      Fixes: 1e739730 ("block: optionally merge discontiguous discard bios into a single request")
      Signed-off-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarLaurence Oberman <loberman@redhat.com>
      Link: https://lore.kernel.org/r/20210211143807.GA115624@redhat
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a958937f
  6. Mar 11, 2021
  7. Mar 10, 2021
  8. Mar 05, 2021
    • Xunlei Pang's avatar
      blk-cgroup: Fix the recursive blkg rwstat · 4f44657d
      Xunlei Pang authored
      
      The current blkio.throttle.io_service_bytes_recursive doesn't
      work correctly.
      
      As an example, for the following blkcg hierarchy:
       (Made 1GB READ in test1, 512MB READ in test2)
           test
          /    \
       test1   test2
      
      $ head -n 1 test/test1/blkio.throttle.io_service_bytes_recursive
      8:0 Read 1073684480
      $ head -n 1 test/test2/blkio.throttle.io_service_bytes_recursive
      8:0 Read 537448448
      $ head -n 1 test/blkio.throttle.io_service_bytes_recursive
      8:0 Read 537448448
      
      Clearly, above data of "test" reflects "test2" not "test1"+"test2".
      
      Do the correct summary in blkg_rwstat_recursive_sum().
      
      Signed-off-by: default avatarXunlei Pang <xlpang@linux.alibaba.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4f44657d
  9. Mar 02, 2021
  10. Mar 01, 2021
  11. Feb 26, 2021
  12. Feb 24, 2021
  13. Feb 23, 2021
  14. Feb 22, 2021
  15. Feb 12, 2021
  16. Feb 11, 2021
    • Satya Tangirala's avatar
      dm: support key eviction from keyslot managers of underlying devices · 9355a9eb
      Satya Tangirala authored
      
      Now that device mapper supports inline encryption, add the ability to
      evict keys from all underlying devices. When an upper layer requests
      a key eviction, we simply iterate through all underlying devices
      and evict that key from each device.
      
      Co-developed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      9355a9eb
    • Satya Tangirala's avatar
      block/keyslot-manager: Introduce functions for device mapper support · d3b17a24
      Satya Tangirala authored
      
      Introduce blk_ksm_update_capabilities() to update the capabilities of
      a keyslot manager (ksm) in-place. The pointer to a ksm in a device's
      request queue may not be easily replaced, because upper layers like
      the filesystem might access it (e.g. for programming keys/checking
      capabilities) at the same time the device wants to replace that
      request queue's ksm (and free the old ksm's memory). This function
      allows the device to update the capabilities of the ksm in its request
      queue directly. Devices can safely update the ksm this way without any
      synchronization with upper layers *only* if the updated (new) ksm
      continues to support all the crypto capabilities that the old ksm did
      (see description below for blk_ksm_is_superset() for why this is so).
      
      Also introduce blk_ksm_is_superset() which checks whether one ksm's
      capabilities are a (not necessarily strict) superset of another ksm's.
      The blk-crypto framework requires that crypto capabilities that were
      advertised when a bio was created continue to be supported by the
      device until that bio is ended - in practice this probably means that
      a device's advertised crypto capabilities can *never* "shrink" (since
      there's no synchronization between bio creation and when a device may
      want to change its advertised capabilities) - so a previously
      advertised crypto capability must always continue to be supported.
      This function can be used to check that a new ksm is a valid
      replacement for an old ksm.
      
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      d3b17a24
    • Satya Tangirala's avatar
      block/keyslot-manager: Introduce passthrough keyslot manager · 7bdcc48f
      Satya Tangirala authored
      
      The device mapper may map over devices that have inline encryption
      capabilities, and to make use of those capabilities, the DM device must
      itself advertise those inline encryption capabilities. One way to do this
      would be to have the DM device set up a keyslot manager with a
      "sufficiently large" number of keyslots, but that would use a lot of
      memory. Also, the DM device itself has no "keyslots", and it doesn't make
      much sense to talk about "programming a key into a DM device's keyslot
      manager", so all that extra memory used to represent those keyslots is just
      wasted. All a DM device really needs to be able to do is advertise the
      crypto capabilities of the underlying devices in a coherent manner and
      expose a way to evict keys from the underlying devices.
      
      There are also devices with inline encryption hardware that do not
      have a limited number of keyslots. One can send a raw encryption key along
      with a bio to these devices (as opposed to typical inline encryption
      hardware that require users to first program a raw encryption key into a
      keyslot, and send the index of that keyslot along with the bio). These
      devices also only need the same things from the keyslot manager that DM
      devices need - a way to advertise crypto capabilities and potentially a way
      to expose a function to evict keys from hardware.
      
      So we introduce a "passthrough" keyslot manager that provides a way to
      represent a keyslot manager that doesn't have just a limited number of
      keyslots, and for which do not require keys to be programmed into keyslots.
      DM devices can set up a passthrough keyslot manager in their request
      queues, and advertise appropriate crypto capabilities based on those of the
      underlying devices. Blk-crypto does not attempt to program keys into any
      keyslots in the passthrough keyslot manager. Instead, if/when the bio is
      resubmitted to the underlying device, blk-crypto will try to program the
      key into the underlying device's keyslot manager.
      
      Signed-off-by: default avatarSatya Tangirala <satyat@google.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      7bdcc48f
  17. Feb 10, 2021
  18. Feb 08, 2021
Loading