- Apr 02, 2021
-
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 01, 2021
-
-
Yufen Yu authored
For multiple split bios, if one of the bio is fail, the whole should return error to application. But we found there is a race between bio_integrity_verify_fn and bio complete, which return io success to application after one of the bio fail. The race as following: split bio(READ) kworker nvme_complete_rq blk_update_request //split error=0 bio_endio bio_integrity_endio queue_work(kintegrityd_wq, &bip->bip_work); bio_integrity_verify_fn bio_endio //split bio __bio_chain_endio if (!parent->bi_status) <interrupt entry> nvme_irq blk_update_request //parent error=7 req_bio_endio bio->bi_status = 7 //parent bio <interrupt exit> parent->bi_status = 0 parent->bi_end_io() // return bi_status=0 The bio has been split as two: split and parent. When split bio completed, it depends on kworker to do endio, while bio_integrity_verify_fn have been interrupted by parent bio complete irq handler. Then, parent bio->bi_status which have been set in irq handler will overwrite by kworker. In fact, even without the above race, we also need to conside the concurrency beteen mulitple split bio complete and update the same parent bi_status. Normally, multiple split bios will be issued to the same hctx and complete from the same irq vector. But if we have updated queue map between multiple split bios, these bios may complete on different hw queue and different irq vector. Then the concurrency update parent bi_status may cause the final status error. Suggested-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Yufen Yu <yuyufen@huawei.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210331115359.1125679-1-yuyufen@huawei.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 27, 2021
-
-
Ming Lei authored
Commit a33df75c ("block: use an xarray for disk->part_tbl") drops the check on max supported number of partitionsr, and allows partition with bigger partition numbers to be added. However, ->bd_partno is defined as u8, so partition index of xarray table may not match with ->bd_partno. Then delete_partition() may delete one unmatched partition, and caused use-after-free. Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Reported-by:
<syzbot+8fede7e30c7cee0de139@syzkaller.appspotmail.com> Fixes: a33df75c ("block: use an xarray for disk->part_tbl") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 24, 2021
-
-
Johannes Thumshirn authored
Christoph reported that we'll likely trigger the WARN_ON_ONCE() checking that we're not submitting a bvec with REQ_OP_ZONE_APPEND in bio_iov_iter_get_pages() some time ago using zoned btrfs, but I couldn't reproduce it back then. Now Naohiro was able to trigger the bug as well with xfstests generic/095 on a zoned btrfs. There is nothing that prevents bvec submissions via REQ_OP_ZONE_APPEND if the hardware's zone append limit is met. Reported-by:
Naohiro Aota <naohiro.aota@wdc.com> Reported-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/10bd414d9326c90cd69029077db63b363854eee5.1616600835.git.johannes.thumshirn@wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 23, 2021
-
-
David Jeffery authored
When a stacked block device inserts a request into another block device using blk_insert_cloned_request, the request's nr_phys_segments field gets recalculated by a call to blk_recalc_rq_segments in blk_cloned_rq_check_limits. But blk_recalc_rq_segments does not know how to handle multi-segment discards. For disk types which can handle multi-segment discards like nvme, this results in discard requests which claim a single segment when it should report several, triggering a warning in nvme and causing nvme to fail the discard from the invalid state. WARNING: CPU: 5 PID: 191 at drivers/nvme/host/core.c:700 nvme_setup_discard+0x170/0x1e0 [nvme_core] ... nvme_setup_cmd+0x217/0x270 [nvme_core] nvme_loop_queue_rq+0x51/0x1b0 [nvme_loop] __blk_mq_try_issue_directly+0xe7/0x1b0 blk_mq_request_issue_directly+0x41/0x70 ? blk_account_io_start+0x40/0x50 dm_mq_queue_rq+0x200/0x3e0 blk_mq_dispatch_rq_list+0x10a/0x7d0 ? __sbitmap_queue_get+0x25/0x90 ? elv_rb_del+0x1f/0x30 ? deadline_remove_request+0x55/0xb0 ? dd_dispatch_request+0x181/0x210 __blk_mq_do_dispatch_sched+0x144/0x290 ? bio_attempt_discard_merge+0x134/0x1f0 __blk_mq_sched_dispatch_requests+0x129/0x180 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x47/0xe0 __blk_mq_delay_run_hw_queue+0x15b/0x170 blk_mq_sched_insert_requests+0x68/0xe0 blk_mq_flush_plug_list+0xf0/0x170 blk_finish_plug+0x36/0x50 xlog_cil_committed+0x19f/0x290 [xfs] xlog_cil_process_committed+0x57/0x80 [xfs] xlog_state_do_callback+0x1e0/0x2a0 [xfs] xlog_ioend_work+0x2f/0x80 [xfs] process_one_work+0x1b6/0x350 worker_thread+0x53/0x3e0 ? process_one_work+0x350/0x350 kthread+0x11b/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 This patch fixes blk_recalc_rq_segments to be aware of devices which can have multi-segment discards. It calculates the correct discard segment count by counting the number of bio as each discard bio is considered its own segment. Fixes: 1e739730 ("block: optionally merge discontiguous discard bios into a single request") Signed-off-by:
David Jeffery <djeffery@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Laurence Oberman <loberman@redhat.com> Link: https://lore.kernel.org/r/20210211143807.GA115624@redhat Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 11, 2021
-
-
Shin'ichiro Kawasaki authored
When zone reset ioctl and data read race for a same zone on zoned block devices, the data read leaves stale page cache even though the zone reset ioctl zero clears all the zone data on the device. To avoid non-zero data read from the stale page cache after zone reset, discard page cache of reset target zones in blkdev_zone_mgmt_ioctl(). Introduce the helper function blkdev_truncate_zone_range() to discard the page cache. Ensure the page cache discarded by calling the helper function before and after zone reset in same manner as fallocate does. This patch can be applied back to the stable kernel version v5.10.y. Rework is needed for older stable kernels. Signed-off-by:
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 3ed05a98 ("blk-zoned: implement ioctls") Cc: <stable@vger.kernel.org> # 5.10+ Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210311072546.678999-1-shinichiro.kawasaki@wdc.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Daniel Wagner authored
register_disk() suppress uevents for devices with the GENHD_FL_HIDDEN but enables uevents at the end again in order to announce disk after possible partitions are created. When the device is removed the uevents are still on and user land sees 'remove' messages for devices which were never 'add'ed to the system. KERNEL[95481.571887] remove /devices/virtual/nvme-fabrics/ctl/nvme5/nvme0c5n1 (block) Let's suppress the uevents for GENHD_FL_HIDDEN by not enabling the uevents at all. Signed-off-by:
Daniel Wagner <dwagner@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin Wilck <mwilck@suse.com> Link: https://lore.kernel.org/r/20210311151917.136091-1-dwagner@suse.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 10, 2021
-
-
Damien Le Moal authored
Similarly to a single zone reset operation (REQ_OP_ZONE_RESET), execute REQ_OP_ZONE_RESET_ALL operations with REQ_SYNC set. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 05, 2021
-
-
Xunlei Pang authored
The current blkio.throttle.io_service_bytes_recursive doesn't work correctly. As an example, for the following blkcg hierarchy: (Made 1GB READ in test1, 512MB READ in test2) test / \ test1 test2 $ head -n 1 test/test1/blkio.throttle.io_service_bytes_recursive 8:0 Read 1073684480 $ head -n 1 test/test2/blkio.throttle.io_service_bytes_recursive 8:0 Read 537448448 $ head -n 1 test/blkio.throttle.io_service_bytes_recursive 8:0 Read 537448448 Clearly, above data of "test" reflects "test2" not "test1"+"test2". Do the correct summary in blkg_rwstat_recursive_sum(). Signed-off-by:
Xunlei Pang <xlpang@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 02, 2021
-
-
Joseph Qi authored
Correct the comments since bfq_fifo_expire[0] is for async request, while bfq_fifo_expire[1] is for sync request. Also update docs, according the source code, the default fifo_expire_async is 250ms, and fifo_expire_sync is 125ms. Signed-off-by:
Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Mar 01, 2021
-
-
Jean Delvare authored
Commit a1ce35fa ("block: remove dead elevator code") removed all users of RQF_SORTED. However it is still defined, and there is one reference left to it (which in effect is dead code). Clear it all up. Signed-off-by:
Jean Delvare <jdelvare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
With the removal of the skd driver, using IRQ safe locking of a bdev bd_size_lock spinlock to protect the bdev inode size is not necessary anymore as there is no other known driver using this lock under an IRQ disabled context (e.g. calling set_capacity() with IRQ disabled). Revert commit 0fe37724 ("block: fix bd_size_lock use") which introduced the IRQ safe change. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 26, 2021
-
-
Matthew Wilcox (Oracle) authored
It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 24, 2021
-
-
Christoph Hellwig authored
The caller can't cope with a failure from bounce_clone_bio, so use __GFP_NOFAIL for the passthrough case. bio_alloc_bioset already won't fail due to the use of mempools. And yes, we need to get rid of this bock layer bouncing code entirely sooner or later.. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The only caller always passes GFP_NOIO. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Now that bio_alloc_bioset does not fall back to kmalloc for a NULL bio_set, handle that case explicitly and simplify the calling conventions. Based on an earlier patch from Chaitanya Kulkarni. Fixes: 3175199a ("block: split bio_kmalloc from bio_alloc_bioset") Reported-by:
Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_split with a NULL bs argumen used to fall back to kmalloc the bio, which does not guarantee forward progress and could to deadlocks. Now that the overloading of the NULL bs argument to bio_alloc_bioset has been removed it crashes instead. Fix all that by using a special crafted bioset. Fixes: 3175199a ("block: split bio_kmalloc from bio_alloc_bioset") Reported-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Local variable of 'capacity' stores the previous disk capacity, and 'size' variable records the latest disk capacity, so swap them for fixing logging on capacity change. Cc: Christoph Hellwig <hch@lst.de> Fixes: a782483c ("block: remove the nr_sects field in struct hd_struct") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Mikulas Patocka authored
We get I/O errors when we run md-raid1 on the top of dm-integrity on the top of ramdisk. device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0xff00, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1 device-mapper: integrity: Bio not aligned on 8 sectors: 0xffff, 0x1 device-mapper: integrity: Bio not aligned on 8 sectors: 0x8048, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8147, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8246, 0xff device-mapper: integrity: Bio not aligned on 8 sectors: 0x8345, 0xbb The ramdisk device has logical_block_size 512 and max_sectors 255. The dm-integrity device uses logical_block_size 4096 and it doesn't affect the "max_sectors" value - thus, it inherits 255 from the ramdisk. So, we have a device with max_sectors not aligned on logical_block_size. The md-raid device sees that the underlying leg has max_sectors 255 and it will split the bios on 255-sector boundary, making the bios unaligned on logical_block_size. In order to fix the bug, we round down max_sectors to logical_block_size. Cc: stable@vger.kernel.org Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Historically the BLKRRPART ioctls called into the now defunct ->revalidate method, which caused the sd driver to check if any media is present. When the ->revalidate method was removed this revalidation was lost, leading to lots of I/O errors when using the eject command. Fix this by reopening the device to rescan the partitions, and thus calling the revalidation logic in the sd driver. Fixes: 471bd0af ("sd: use bdev_check_media_change") Reported--by:
Tom Seewald <tseewald@gmail.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Tom Seewald <tseewald@gmail.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 23, 2021
-
-
Christoph Hellwig authored
Restore the previous behavior by using the correct flag for the whole device ("part0"). Fixes: 99dfc43e ("block: use ->bi_bdev for bio based I/O accounting") Reported-by:
John Stultz <john.stultz@linaro.org> Tested-by:
John Stultz <john.stultz@linaro.org> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 22, 2021
-
-
Yang Yang authored
Hang occurs when user changes the scheduler queue depth, by writing to the 'nr_requests' sysfs file of that device. The details of the environment that we found the problem are as follows: an eMMC block device total driver tags: 16 default queue_depth: 32 kqd->async_depth initialized in kyber_init_sched() with queue_depth=32 Then we change queue_depth to 256, by writing to the 'nr_requests' sysfs file. But kqd->async_depth don't be updated after queue_depth changes. Now the value of async depth is too small for queue_depth=256, this may cause hang. This patch introduces kyber_depth_updated(), so that kyber can update async depth when queue depth changes. Signed-off-by:
Yang Yang <yang.yang@vivo.com> Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jeffle Xu authored
QUEUE_FLAG_POLL flag will be cleared when turning off 'io_poll', while at that moment there may be IOs stuck in hw queue uncompleted. The following polling routine won't help reap these IOs, since blk_poll() will return immediately because of cleared QUEUE_FLAG_POLL flag. Thus these IOs will hang until they finnaly time out. The hang out can be observed by 'fio --engine=io_uring iodepth=1', while turning off 'io_poll' at the same time. To fix this, freeze and flush the request queue first when turning off 'io_poll'. Signed-off-by:
Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Chaitanya Kulkarni authored
Get rid of the wrapper for trace_block_rq_insert() and call the function directly. Signed-off-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Commit a1ce35fa ("block: remove dead elevator code") removed the last callers of blk_pm_requeue_request(), blk_pm_add_request() and blk_pm_put_request(). Hence remove the definitions of these functions. Removing these functions removes all users of the struct request nr_pending member. Hence also remove 'nr_pending'. Note: 'nr_pending' is no longer used since commit 7cedffec ("block: Make blk_get_request() block for non-PM requests while suspended"). Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 12, 2021
-
-
Sebastian Andrzej Siewior authored
With llist_head it is possible to avoid the locking (the irq-off region) when items are added. This makes it possible to add items on a remote CPU without additional locking. llist_add() returns true if the list was previously empty. This can be used to invoke the SMP function call / raise sofirq only if the first item was added (otherwise it is already pending). This simplifies the code a little and reduces the IRQ-off regions. blk_mq_raise_softirq() needs a preempt-disable section to ensure the request is enqueued on the same CPU as the softirq is raised. Some callers (USB-storage) invoke this path in preemptible context. Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Daniel Wagner <dwagner@suse.de> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Sebastian Andrzej Siewior authored
Controllers with multiple queues have their IRQ-handelers pinned to a CPU. The core shouldn't need to complete the request on a remote CPU. Remove this case and always raise the softirq to complete the request. Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Daniel Wagner <dwagner@suse.de> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 11, 2021
-
-
Satya Tangirala authored
Now that device mapper supports inline encryption, add the ability to evict keys from all underlying devices. When an upper layer requests a key eviction, we simply iterate through all underlying devices and evict that key from each device. Co-developed-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Satya Tangirala <satyat@google.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com>
-
Satya Tangirala authored
Introduce blk_ksm_update_capabilities() to update the capabilities of a keyslot manager (ksm) in-place. The pointer to a ksm in a device's request queue may not be easily replaced, because upper layers like the filesystem might access it (e.g. for programming keys/checking capabilities) at the same time the device wants to replace that request queue's ksm (and free the old ksm's memory). This function allows the device to update the capabilities of the ksm in its request queue directly. Devices can safely update the ksm this way without any synchronization with upper layers *only* if the updated (new) ksm continues to support all the crypto capabilities that the old ksm did (see description below for blk_ksm_is_superset() for why this is so). Also introduce blk_ksm_is_superset() which checks whether one ksm's capabilities are a (not necessarily strict) superset of another ksm's. The blk-crypto framework requires that crypto capabilities that were advertised when a bio was created continue to be supported by the device until that bio is ended - in practice this probably means that a device's advertised crypto capabilities can *never* "shrink" (since there's no synchronization between bio creation and when a device may want to change its advertised capabilities) - so a previously advertised crypto capability must always continue to be supported. This function can be used to check that a new ksm is a valid replacement for an old ksm. Signed-off-by:
Satya Tangirala <satyat@google.com> Reviewed-by:
Eric Biggers <ebiggers@google.com> Acked-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Mike Snitzer <snitzer@redhat.com>
-
Satya Tangirala authored
The device mapper may map over devices that have inline encryption capabilities, and to make use of those capabilities, the DM device must itself advertise those inline encryption capabilities. One way to do this would be to have the DM device set up a keyslot manager with a "sufficiently large" number of keyslots, but that would use a lot of memory. Also, the DM device itself has no "keyslots", and it doesn't make much sense to talk about "programming a key into a DM device's keyslot manager", so all that extra memory used to represent those keyslots is just wasted. All a DM device really needs to be able to do is advertise the crypto capabilities of the underlying devices in a coherent manner and expose a way to evict keys from the underlying devices. There are also devices with inline encryption hardware that do not have a limited number of keyslots. One can send a raw encryption key along with a bio to these devices (as opposed to typical inline encryption hardware that require users to first program a raw encryption key into a keyslot, and send the index of that keyslot along with the bio). These devices also only need the same things from the keyslot manager that DM devices need - a way to advertise crypto capabilities and potentially a way to expose a function to evict keys from hardware. So we introduce a "passthrough" keyslot manager that provides a way to represent a keyslot manager that doesn't have just a limited number of keyslots, and for which do not require keys to be programmed into keyslots. DM devices can set up a passthrough keyslot manager in their request queues, and advertise appropriate crypto capabilities based on those of the underlying devices. Blk-crypto does not attempt to program keys into any keyslots in the passthrough keyslot manager. Instead, if/when the bio is resubmitted to the underlying device, blk-crypto will try to program the key into the underlying device's keyslot manager. Signed-off-by:
Satya Tangirala <satyat@google.com> Reviewed-by:
Eric Biggers <ebiggers@google.com> Acked-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Mike Snitzer <snitzer@redhat.com>
-
- Feb 10, 2021
-
-
Damien Le Moal authored
Introduce the internal function blk_queue_clear_zone_settings() to cleanup all limits and resources related to zoned block devices. This new function is called from blk_queue_set_zoned() when a disk zoned model is set to BLK_ZONED_NONE. This particular case can happens when a partition is created on a host-aware scsi disk. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
Per ZBC and ZAC specifications, host-managed SMR hard-disks mandate that all writes into sequential write required zones be aligned to the device physical block size. However, NVMe ZNS does not have this constraint and allows write operations into sequential zones to be aligned to the device logical block size. This inconsistency does not help with software portability across device types. To solve this, introduce the zone_write_granularity queue limit to indicate the alignment constraint, in bytes, of write operations into zones of a zoned block device. This new limit is exported as a read-only sysfs queue attribute and the helper blk_queue_zone_write_granularity() introduced for drivers to set this limit. The function blk_queue_set_zoned() is modified to set this new limit to the device logical block size by default. NVMe ZNS devices as well as zoned nullb devices use this default value as is. The scsi disk driver is modified to execute the blk_queue_zone_write_granularity() helper to set the zone write granularity of host-managed SMR disks to the disk physical block size. The accessor functions queue_zone_write_granularity() and bdev_zone_write_granularity() are also introduced. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
When changing the zoned model of host-aware zoned block devices, use blk_queue_set_zoned() instead of directly assigning the gendisk queue zoned limit. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 08, 2021
-
-
Johannes Thumshirn authored
Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which is intended to be used by file systems that directly add pages to a bio instead of using bio_iov_iter_get_pages(). Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Christoph Hellwig authored
Instead of encoding of the bvec pool using magic bio flags, just use a helper to find the pool based on the max_vecs value. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_iov_bvec_set clones the bio_vecs from the iter, and thus should be treated like a cloned bio in every respect. That also includes not touching bi_max_vecs as that is a property of the bio allocation and not its current payload. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio_iov_bvec_set assigns the foreign bvec, so setting the NO_PAGE_REF directly there seems like the best fit. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Remove a pointless layer of indentation after a return statement. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The bi_max_vecs and bi_vcnt fields are defined as unsigned short, so don't allow passing larger values in. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-