- Jun 09, 2015
-
-
Ming Lei authored
Now blk_cleanup_queue() can be called before calling del_gendisk()[1], inside which hctx->ctxs is touched from blk_mq_unregister_hctx(), but the variable has been freed by blk_cleanup_queue() at that time. So this patch moves freeing of hctx->ctxs into queue's release handler for fixing the oops reported by Stefan. [1], 6cd18e71 (block: destroy bdi before blockdev is unregistered) Reported-by:
Stefan Seyfried <stefan.seyfried@googlemail.com> Cc: NeilBrown <neilb@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org (v4.0) Signed-off-by:
Ming Lei <tom.leiming@gmail.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- May 04, 2015
-
-
Shaohua Li authored
Normally if driver is busy to dispatch a request the logic is like below: block layer: driver: __blk_mq_run_hw_queue a. blk_mq_stop_hw_queue b. rq add to ctx->dispatch later: 1. blk_mq_start_hw_queue 2. __blk_mq_run_hw_queue But it's possible step 1-2 runs between a and b. And since rq isn't in ctx->dispatch yet, step 2 will not run rq. The rq might get lost if there are no subsequent requests kick in. Signed-off-by:
Shaohua Li <shli@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Apr 23, 2015
-
-
Ming Lei authored
hctx->tags has to be set as NULL in case that it is to be unmapped no matter if set->tags[hctx->queue_num] is NULL or not in blk_mq_map_swqueue() because shared tags can be freed already from another request queue. The same situation has to be considered during handling CPU online too. Unmapped hw queue can be remapped after CPU topo is changed, so we need to allocate tags for the hw queue in blk_mq_map_swqueue(). Then tags allocation for hw queue can be removed in hctx cpu online notifier, and it is reasonable to do that after mapping is updated. Cc: <stable@vger.kernel.org> Reported-by:
Dongsu Park <dongsu.park@profitbricks.com> Tested-by:
Dongsu Park <dongsu.park@profitbricks.com> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Ming Lei authored
Firstly during CPU hotplug, even queue is freezed, timeout handler still may come and access hctx->tags, which may cause use after free, so this patch deactivates timeout handler inside CPU hotplug notifier. Secondly, tags can be shared by more than one queues, so we have to check if the hctx has been unmapped, otherwise still use-after-free on tags can be triggered. Cc: <stable@vger.kernel.org> Reported-by:
Dongsu Park <dongsu.park@profitbricks.com> Tested-by:
Dongsu Park <dongsu.park@profitbricks.com> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Apr 17, 2015
-
-
Jens Axboe authored
Commit 889fa31f was a bit too eager in reducing the loop count, so we ended up missing queues in some configurations. Ensure that our division rounds up, so that's not the case. Reported-by:
Guenter Roeck <linux@roeck-us.net> Fixes: 889fa31f ("blk-mq: reduce unnecessary software queue looping") Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Apr 15, 2015
-
-
Chong Yuan authored
In flush_busy_ctxs() and blk_mq_hctx_has_pending(), regardless of how many ctxs assigned to one hctx, they will all loop hctx->ctx_map.map_size times. Here hctx->ctx_map.map_size is a const ALIGN(nr_cpu_ids, 8) / 8. Especially, flush_busy_ctxs() is in hot code path. And it's unnecessary. Change ->map_size to contain the actually mapped software queues, so we only loop for as many iterations as we have to. And remove cpumask setting and nr_ctx count in blk_mq_init_cpu_queues() since they are all re-done in blk_mq_map_swqueue(). blk_mq_map_swqueue(). Signed-off-by:
Chong Yuan <chong.yuan@memblaze.com> Reviewed-by:
Wenbo Wang <wenbo.wang@memblaze.com> Updated by me for formatting and commenting. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Apr 11, 2015
-
-
Linus Torvalds authored
Jan Engelhardt reports a strange oops with an invalid ->sense_buffer pointer in scsi_init_cmd_errh() with the blk-mq code. The sense_buffer pointer should have been initialized by the call to scsi_init_request() from blk_mq_init_rq_map(), but there seems to be some non-repeatable memory corruptor. This patch makes sure we initialize the whole struct request allocation (and the associated 'struct scsi_cmnd' for the SCSI case) to zero, by using __GFP_ZERO in the allocation. The old code initialized a couple of individual fields, leaving the rest undefined (although many of them are then initialized in later phases, like blk_mq_rq_ctx_init() etc. It's not entirely clear why this matters, but it's the rigth thing to do regardless, and with 4.0 imminent this is the defensive "let's just make sure everything is initialized properly" patch. Tested-by:
Jan Engelhardt <jengelh@inai.de> Acked-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 30, 2015
-
-
Wei Fang authored
Don't assign ->rq_timeout twice. Signed-off-by:
Wei Fang <fangwei1@huawei.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Xiaoguang Wang authored
At the beginning of blk_mq_alloc_tag_set(), we have already checked whether 'set->nr_hw_queues' is zero, so here remove this redundant check. Signed-off-by:
Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Mar 13, 2015
-
-
Keith Busch authored
Return -EBUSY if we're unable to enter a queue immediately when allocating a blk-mq request without __GFP_WAIT. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Mike Snitzer authored
Rename blk_mq_run_queues to blk_mq_run_hw_queues, add async argument, and export it. DM's suspend support must be able to run the queue without starting stopped hw queues. Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Mike Snitzer authored
Add a variant of blk_mq_init_queue that allows a previously allocated queue to be initialized. blk_mq_init_allocated_queue models blk_init_allocated_queue -- which was also created for DM's use. DM's approach to device creation requires a placeholder request_queue be allocated for use with alloc_dev() but the decision about what type of request_queue will be ultimately created is deferred until all component devices referenced in the DM table are processed to determine the table type (request-based, blk-mq request-based, or bio-based). Also, because of DM's late finalization of the request_queue type the call to blk_mq_register_disk() doesn't happen during alloc_dev(). Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir once the blk-mq queue is fully allocated. Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Mike Snitzer authored
If percpu_ref_init() fails the allocated q and hctxs must get cleaned up; using 'err_map' doesn't allow that to happen. Signed-off-by:
Mike Snitzer <snitzer@redhat.com> Reviewed-by:
Ming Lei <ming.lei@canonical.com> Cc: stable@kernel.org Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Feb 10, 2015
-
-
Jens Axboe authored
We no longer use it outside of blk-mq.c, so we can make it static and stop exporting it. Additionally, kill the 'async' argument, as there's only one used of it. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Jan 29, 2015
-
-
Ming Lei authored
The kobject memory inside blk-mq hctx/ctx shouldn't have been freed before the kobject is released because driver core can access it freely before its release. We can't do that in all ctx/hctx/mq_kobj's release handler because it can be run before blk_cleanup_queue(). Given mq_kobj shouldn't have been introduced, this patch simply moves mq's release into blk_release_queue(). Reported-by:
Sasha Levin <sasha.levin@oracle.com> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Ming Lei authored
This reverts commit 76d697d1. The commit 76d697d1 causes general protection fault reported from Bart Van Assche: https://lkml.org/lkml/2015/1/28/334 Reported-by:
Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Jan 23, 2015
-
-
Shaohua Li authored
This is the blk-mq part to support tag allocation policy. The default allocation policy isn't changed (though it's not a strict FIFO). The new policy is round-robin for libata. But it's a try-best implementation. If multiple tasks are competing, the tags returned will be mixed (which is unavoidable even with !mq, as requests from different tasks can be mixed in queue) Cc: Jens Axboe <axboe@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by:
Shaohua Li <shli@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Jan 20, 2015
-
-
Ming Lei authored
The kobject memory shouldn't have been freed before the kobject is released because driver core can access it freely before its release. This patch frees hctx in its release callback. For ctx, they share one single per-cpu variable which is associated with the request queue, so free ctx in q->mq_kobj's release handler. Signed-off-by:
Sasha Levin <sasha.levin@oracle.com> (fix ctx kobjects) Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Jan 08, 2015
-
-
Keith Busch authored
Requests that haven't been started prior to a queue dying can be ended in error without waiting for them to start and time out. Signed-off-by:
Keith Busch <keith.busch@intel.com> Added code comment to explain why this is done. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Keith Busch authored
Some types of requests may be started that are not gauranteed to ever complete. This adds a request flag that a driver can use so mark the request as such. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
Adds a helper function a driver can use to abort requeued requests in case any are pending when h/w queues are being removed. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Keith Busch authored
Kicking requeued requests will start h/w queues in a work_queue, which may alter the driver's requested state to temporarily stop them. This patch exports a method to cancel the q->requeue_work so a driver can be assured stopped h/w queues won't be started up before it is ready. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Keith Busch authored
Drivers can iterate over all allocated request tags, but their callback needs a way to know if the driver started the request in the first place. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Keith Busch authored
When the queue is set to dying, wake up tasks that are waiting on frozen queue so they realize it is dying and abandon their request. Signed-off-by:
Keith Busch <keith.busch@intel.com> Modified by me to add a code comment on the need for the wakeup. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Jan 07, 2015
-
-
Jens Axboe authored
We store it in the tag set, we don't need it in the hardware queue. While removing cmd_size, place ->queue_num further down to avoid a hole on 64-bit archs. It's not used in any fast paths, so we can safely move it. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Jan 02, 2015
-
-
Jens Axboe authored
Commit b4c6a028 exported the start and unfreeze, but we need the regular blk_mq_freeze_queue() for the loop conversion. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Dec 31, 2014
-
-
Jens Axboe authored
If it's dying, we can't expect new request to complete and come in an wake up other tasks waiting for requests. So after we have marked it as dying, wake up everybody currently waiting for a request. Once they wake, they will retry their allocation and fail appropriately due to the state of the queue. Tested-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Dec 20, 2014
-
-
Keith Busch authored
Let drivers prevent entering a queue that isn't available. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Keith Busch authored
Fixes usage counter when a request could not be allocated. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Dec 09, 2014
-
-
Ming Lei authored
When one hardware queue has no mapped software queues, it shouldn't have been scheduled. Otherwise WARNING or OOPS can triggered. blk_mq_hw_queue_mapped() helper is introduce for fixing the problem. Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Dec 01, 2014
-
-
Shaohua Li authored
We call blk_mq_alloc_tag_set() first then blk_mq_init_queue(). The requests are allocated in the former function. So the kdump check should be moved to there to really save memory. Signed-off-by:
Shaohua Li <shli@fb.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Nov 24, 2014
-
-
Christoph Hellwig authored
Don't duplicate the code to handle the not cpu bounce case in the caller, do it inside blk_mq_hctx_next_cpu instead. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Nov 17, 2014
-
-
Jens Axboe authored
It's silly to use blk_mq_free_request() which in turn maps the request to the hardware queue, for places where we already know what the hardware queue is. This saves us an extra mapping of a hardware queue on request completion, if the caller knows this information already. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
Drivers that know they are blk-mq should just use this function instead of calling through blk_put_request(). Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Nov 12, 2014
-
-
Bart Van Assche authored
The queuecommand() callback functions in SCSI low-level drivers need to know which hardware context has been selected by the block layer. Since this information is not available in the request structure, and since passing the hctx pointer directly to the queuecommand callback function would require modification of all SCSI LLDs, add a function to the block layer that allows to query the hardware context index. Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Acked-by:
Jens Axboe <axboe@kernel.dk> Reviewed-by:
Sagi Grimberg <sagig@mellanox.com> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- Nov 11, 2014
-
-
Paolo Bonzini authored
blk-mq is using preempt_disable/enable in order to ensure that the queue runners are placed on the right CPU. This does not work with the RT patches, because __blk_mq_run_hw_queue takes a non-raw spinlock with the preemption-disabled region. If there is contention on the lock, this violates the rules for preemption-disabled regions. While this should be easily fixable within the RT patches just by doing migrate_disable/enable, we can do better and document _why_ this particular region runs with disabled preemption. After the previous patch, it is trivial to switch it to get/put_cpu; the RT patches then can change it to get_cpu_light, which lets virtio-blk run under RT kernels. Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by:
Clark Williams <williams@redhat.com> Tested-by:
Clark Williams <williams@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Paolo Bonzini authored
preempt_disable/enable surrounds every call to blk_mq_run_hw_queue, except the one in blk-flush.c. In fact that one is always asynchronous, and it does not need smp_processor_id(). We can do the same for all other calls, avoiding preempt_disable when async is true. This avoids peppering blk-mq.c with preemption-disabled regions. Cc: Jens Axboe <axboe@kernel.dk> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by:
Clark Williams <williams@redhat.com> Tested-by:
Clark Williams <williams@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Nov 04, 2014
-
-
Tejun Heo authored
q->mq_usage_counter is a percpu_ref which is killed and drained when the queue is frozen. On a CPU hotplug event, blk_mq_queue_reinit() which involves freezing the queue is invoked on all existing queues. Because percpu_ref killing and draining involve a RCU grace period, doing the above on one queue after another may take a long time if there are many queues on the system. This patch splits out initiation of freezing and waiting for its completion, and updates blk_mq_queue_reinit_notify() so that the queues are frozen in parallel instead of one after another. Note that freezing and unfreezing are moved from blk_mq_queue_reinit() to blk_mq_queue_reinit_notify(). Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Christian Borntraeger <borntraeger@de.ibm.com> Tested-by:
Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- Oct 29, 2014
-
-
Jens Axboe authored
Drivers can now tell blk-mq if they take advantage of the deferred issue through 'last' or not. If they do, don't do queue-direct for sync IO. This is a preparation patch for the nvme conversion. Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Jens Axboe authored
Since we have the notion of a 'last' request in a chain, we can use this to have the hardware optimize the issuing of requests. Add a list_head parameter to queue_rq that the driver can use to temporarily store hw commands for issue when 'last' is true. If we are doing a chain of requests, pass in a NULL list for the first request to force issue of that immediately, then batch the remainder for deferred issue until the last request has been sent. Instead of adding yet another argument to the hot ->queue_rq path, encapsulate the passed arguments in a blk_mq_queue_data structure. This is passed as a constant, and has been tested as faster than passing 4 (or even 3) args through ->queue_rq. Update drivers for the new ->queue_rq() prototype. There are no functional changes in this patch for drivers - if they don't use the passed in list, then they will just queue requests individually like before. Signed-off-by:
Jens Axboe <axboe@fb.com>
-