Skip to content
  • Paolo Valente's avatar
    block, bfq: merge bursts of newly-created queues · 430a67f9
    Paolo Valente authored
    
    
    Many throughput-sensitive workloads are made of several parallel I/O
    flows, with all flows generated by the same application, or more
    generically by the same task (e.g., system boot). The most
    counterproductive action with these workloads is plugging I/O dispatch
    when one of the bfq_queues associated with these flows remains
    temporarily empty.
    
    To avoid this plugging, BFQ has been using a burst-handling mechanism
    for years now. This mechanism has proven effective for throughput, and
    not detrimental for service guarantees. This commit pushes this
    mechanism a little bit further, basing on the following two facts.
    
    First, all the I/O flows of a the same application or task contribute
    to the execution/completion of that common application or task. So the
    performance figures that matter are total throughput of the flows and
    task-wide I/O latency.  In particular, these flows do not need to be
    protected from each other, in terms of individual bandwidth or
    latency.
    
    Second, the above fact holds regardless of the number of flows.
    
    Putting these two facts together, this commits merges stably the
    bfq_queues associated with these I/O flows, i.e., with the processes
    that generate these IO/ flows, regardless of how many the involved
    processes are.
    
    To decide whether a set of bfq_queues is actually associated with the
    I/O flows of a common application or task, and to merge these queues
    stably, this commit operates as follows: given a bfq_queue, say Q2,
    currently being created, and the last bfq_queue, say Q1, created
    before Q2, Q2 is merged stably with Q1 if
    - very little time has elapsed since when Q1 was created
    - Q2 has the same ioprio as Q1
    - Q2 belongs to the same group as Q1
    
    Merging bfq_queues also reduces scheduling overhead. A fio test with
    ten random readers on /dev/nullb shows a throughput boost of 40%, with
    a quadcore. Since BFQ's execution time amounts to ~50% of the total
    per-request processing time, the above throughput boost implies that
    BFQ's overhead is reduced by more than 50%.
    
    Tested-by: default avatarJan Kara <jack@suse.cz>
    Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
    Tested-by: default avatarOleksandr Natalenko <oleksandr@natalenko.name>
    Link: https://lore.kernel.org/r/20210304174627.161-7-paolo.valente@linaro.org
    
    
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    430a67f9