blob: df3a8a47f58c0ba2f0752d675219799395b516d9 [file] [log] [blame]
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -03001==========================
Paolo Valenteaee69d72017-04-19 08:29:02 -06002BFQ (Budget Fair Queueing)
3==========================
4
5BFQ is a proportional-share I/O scheduler, with some extra
6low-latency capabilities. In addition to cgroups support (blkio or io
7controllers), BFQ's main features are:
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -03008
Paolo Valenteaee69d72017-04-19 08:29:02 -06009- BFQ guarantees a high system and application responsiveness, and a
10 low latency for time-sensitive applications, such as audio or video
11 players;
12- BFQ distributes bandwidth, and not just time, among processes or
13 groups (switching back to time distribution when needed to keep
14 throughput high).
15
Paolo Valente43c1b3d2017-05-09 12:54:23 +020016In its default configuration, BFQ privileges latency over
17throughput. So, when needed for achieving a lower latency, BFQ builds
18schedules that may lead to a lower throughput. If your main or only
19goal, for a given device, is to achieve the maximum-possible
20throughput at all times, then do switch off all low-latency heuristics
Paolo Valente233f0bf2017-08-31 20:00:30 +020021for that device, by setting low_latency to 0. See Section 3 for
22details on how to configure BFQ for the desired tradeoff between
23latency and throughput, or on how to maximize throughput.
Paolo Valente43c1b3d2017-05-09 12:54:23 +020024
Paolo Valente4438cf52019-03-12 09:59:35 +010025As every I/O scheduler, BFQ adds some overhead to per-I/O-request
26processing. To give an idea of this overhead, the total,
27single-lock-protected, per-request processing time of BFQ---i.e., the
28sum of the execution times of the request insertion, dispatch and
29completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
30(dated CPU for notebooks; time measured with simple code
31instrumentation, and using the throughput-sync.sh script of the S
32suite [1], in performance-profiling mode). To put this result into
33context, the total, single-lock-protected, per-request execution time
34of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7
35us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).
36
37Scheduling overhead further limits the maximum IOPS that a CPU can
38process (already limited by the execution of the rest of the I/O
39stack). To give an idea of the limits with BFQ, on slow or average
40CPUs, here are, first, the limits of BFQ for three different CPUs, on,
41respectively, an average laptop, an old desktop, and a cheap embedded
42system, in case full hierarchical support is enabled (i.e.,
Christoph Hellwig8060c472019-06-06 12:26:24 +020043CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
Paolo Valente4438cf52019-03-12 09:59:35 +010044set (Section 4-2):
Luca Miccioa33801e2017-11-13 07:34:10 +010045- Intel i7-4850HQ: 400 KIOPS
46- AMD A8-3850: 250 KIOPS
47- ARM CortexTM-A53 Octa-core: 80 KIOPS
48
Christoph Hellwig8060c472019-06-06 12:26:24 +020049If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
Luca Miccioa33801e2017-11-13 07:34:10 +010050support is enabled), then the sustainable throughput with BFQ
51decreases, because all blkio.bfq* statistics are created and updated
52(Section 4-2). For BFQ, this leads to the following maximum
53sustainable throughputs, on the same systems as above:
Paolo Valente24bfd192017-11-13 07:34:09 +010054- Intel i7-4850HQ: 310 KIOPS
55- AMD A8-3850: 200 KIOPS
56- ARM CortexTM-A53 Octa-core: 56 KIOPS
Paolo Valente68017e52017-11-13 07:34:07 +010057
58BFQ works for multi-queue devices too.
Paolo Valenteaee69d72017-04-19 08:29:02 -060059
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -030060.. The table of contents follow. Impatients can just jump to Section 3.
Paolo Valenteaee69d72017-04-19 08:29:02 -060061
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -030062.. CONTENTS
Paolo Valenteaee69d72017-04-19 08:29:02 -060063
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -030064 1. When may BFQ be useful?
65 1-1 Personal systems
66 1-2 Server systems
67 2. How does BFQ work?
68 3. What are BFQ's tunables and how to properly configure BFQ?
69 4. BFQ group scheduling
70 4-1 Service guarantees provided
71 4-2 Interface
Paolo Valenteaee69d72017-04-19 08:29:02 -060072
731. When may BFQ be useful?
74==========================
75
76BFQ provides the following benefits on personal and server systems.
77
781-1 Personal systems
79--------------------
80
81Low latency for interactive applications
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -030082^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valenteaee69d72017-04-19 08:29:02 -060083
84Regardless of the actual background workload, BFQ guarantees that, for
85interactive tasks, the storage device is virtually as responsive as if
86it was idle. For example, even if one or more of the following
87background workloads are being executed:
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -030088
Paolo Valenteaee69d72017-04-19 08:29:02 -060089- one or more large files are being read, written or copied,
90- a tree of source files is being compiled,
91- one or more virtual machines are performing I/O,
92- a software update is in progress,
93- indexing daemons are scanning filesystems and updating their
94 databases,
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -030095
Paolo Valenteaee69d72017-04-19 08:29:02 -060096starting an application or loading a file from within an application
97takes about the same time as if the storage device was idle. As a
98comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
99applications experience high latencies, or even become unresponsive
100until the background workload terminates (also on SSDs).
101
102Low latency for soft real-time applications
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300103^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valenteaee69d72017-04-19 08:29:02 -0600104Also soft real-time applications, such as audio and video
105players/streamers, enjoy a low latency and a low drop rate, regardless
106of the background I/O workload. As a consequence, these applications
107do not suffer from almost any glitch due to the background workload.
108
109Higher speed for code-development tasks
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300110^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valenteaee69d72017-04-19 08:29:02 -0600111
112If some additional workload happens to be executed in parallel, then
113BFQ executes the I/O-related components of typical code-development
114tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
115NOOP or DEADLINE.
116
117High throughput
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300118^^^^^^^^^^^^^^^
Paolo Valenteaee69d72017-04-19 08:29:02 -0600119
120On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
121up to 150% higher throughput than DEADLINE and NOOP, with all the
122sequential workloads considered in our tests. With random workloads,
123and with all the workloads on flash-based devices, BFQ achieves,
124instead, about the same throughput as the other schedulers.
125
126Strong fairness, bandwidth and delay guarantees
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300127^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valenteaee69d72017-04-19 08:29:02 -0600128
129BFQ distributes the device throughput, and not just the device time,
130among I/O-bound applications in proportion their weights, with any
131workload and regardless of the device parameters. From these bandwidth
132guarantees, it is possible to compute tight per-I/O-request delay
133guarantees by a simple formula. If not configured for strict service
134guarantees, BFQ switches to time-based resource sharing (only) for
135applications that would otherwise cause a throughput loss.
136
1371-2 Server systems
138------------------
139
140Most benefits for server systems follow from the same service
141properties as above. In particular, regardless of whether additional,
142possibly heavy workloads are being served, BFQ guarantees:
143
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300144* audio and video-streaming with zero or very low jitter and drop
Paolo Valenteaee69d72017-04-19 08:29:02 -0600145 rate;
146
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300147* fast retrieval of WEB pages and embedded objects;
Paolo Valenteaee69d72017-04-19 08:29:02 -0600148
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300149* real-time recording of data in live-dumping applications (e.g.,
Paolo Valenteaee69d72017-04-19 08:29:02 -0600150 packet logging);
151
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300152* responsiveness in local and remote access to a server.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600153
154
1552. How does BFQ work?
156=====================
157
158BFQ is a proportional-share I/O scheduler, whose general structure,
159plus a lot of code, are borrowed from CFQ.
160
161- Each process doing I/O on a device is associated with a weight and a
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300162 `(bfq_)queue`.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600163
164- BFQ grants exclusive access to the device, for a while, to one queue
165 (process) at a time, and implements this service model by
166 associating every queue with a budget, measured in number of
167 sectors.
168
169 - After a queue is granted access to the device, the budget of the
170 queue is decremented, on each request dispatch, by the size of the
171 request.
172
173 - The in-service queue is expired, i.e., its service is suspended,
174 only if one of the following events occurs: 1) the queue finishes
175 its budget, 2) the queue empties, 3) a "budget timeout" fires.
176
177 - The budget timeout prevents processes doing random I/O from
178 holding the device for too long and dramatically reducing
179 throughput.
180
181 - Actually, as in CFQ, a queue associated with a process issuing
182 sync requests may not be expired immediately when it empties. In
183 contrast, BFQ may idle the device for a short time interval,
184 giving the process the chance to go on being served if it issues
185 a new request in time. Device idling typically boosts the
Paolo Valente2670cd12017-08-31 20:00:31 +0200186 throughput on rotational devices and on non-queueing flash-based
187 devices, if processes do synchronous and sequential I/O. In
188 addition, under BFQ, device idling is also instrumental in
189 guaranteeing the desired throughput fraction to processes
190 issuing sync requests (see the description of the slice_idle
191 tunable in this document, or [1, 2], for more details).
Paolo Valenteaee69d72017-04-19 08:29:02 -0600192
193 - With respect to idling for service guarantees, if several
194 processes are competing for the device at the same time, but
Paolo Valente233f0bf2017-08-31 20:00:30 +0200195 all processes and groups have the same weight, then BFQ
196 guarantees the expected throughput distribution without ever
197 idling the device. Throughput is thus as high as possible in
198 this common scenario.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600199
Paolo Valente2670cd12017-08-31 20:00:31 +0200200 - On flash-based storage with internal queueing of commands
201 (typically NCQ), device idling happens to be always detrimental
202 for throughput. So, with these devices, BFQ performs idling
203 only when strictly needed for service guarantees, i.e., for
204 guaranteeing low latency or fairness. In these cases, overall
205 throughput may be sub-optimal. No solution currently exists to
206 provide both strong service guarantees and optimal throughput
207 on devices with internal queueing.
208
Paolo Valenteaee69d72017-04-19 08:29:02 -0600209 - If low-latency mode is enabled (default configuration), BFQ
210 executes some special heuristics to detect interactive and soft
211 real-time applications (e.g., video or audio players/streamers),
212 and to reduce their latency. The most important action taken to
213 achieve this goal is to give to the queues associated with these
214 applications more than their fair share of the device
215 throughput. For brevity, we call just "weight-raising" the whole
216 sets of actions taken by BFQ to privilege these queues. In
217 particular, BFQ provides a milder form of weight-raising for
218 interactive applications, and a stronger form for soft real-time
219 applications.
220
221 - BFQ automatically deactivates idling for queues born in a burst of
222 queue creations. In fact, these queues are usually associated with
223 the processes of applications and services that benefit mostly
224 from a high throughput. Examples are systemd during boot, or git
225 grep.
226
227 - As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
228 performing random I/O that becomes mostly sequential if
229 merged. Differently from CFQ, BFQ achieves this goal with a more
230 reactive mechanism, called Early Queue Merge (EQM). EQM is so
231 responsive in detecting interleaved I/O (cooperating processes),
232 that it enables BFQ to achieve a high throughput, by queue
233 merging, even for queues for which CFQ needs a different
234 mechanism, preemption, to get a high throughput. As such EQM is a
235 unified mechanism to achieve a high throughput with interleaved
236 I/O.
237
238 - Queues are scheduled according to a variant of WF2Q+, named
239 B-WF2Q+, and implemented using an augmented rb-tree to preserve an
240 O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
Paolo Valente233f0bf2017-08-31 20:00:30 +0200241 also ready for hierarchical scheduling, details in Section 4.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600242
243 - B-WF2Q+ guarantees a tight deviation with respect to an ideal,
244 perfectly fair, and smooth service. In particular, B-WF2Q+
245 guarantees that each queue receives a fraction of the device
246 throughput proportional to its weight, even if the throughput
247 fluctuates, and regardless of: the device parameters, the current
248 workload and the budgets assigned to the queue.
249
250 - The last, budget-independence, property (although probably
251 counterintuitive in the first place) is definitely beneficial, for
252 the following reasons:
253
254 - First, with any proportional-share scheduler, the maximum
255 deviation with respect to an ideal service is proportional to
256 the maximum budget (slice) assigned to queues. As a consequence,
257 BFQ can keep this deviation tight not only because of the
258 accurate service of B-WF2Q+, but also because BFQ *does not*
259 need to assign a larger budget to a queue to let the queue
260 receive a higher fraction of the device throughput.
261
262 - Second, BFQ is free to choose, for every process (queue), the
263 budget that best fits the needs of the process, or best
264 leverages the I/O pattern of the process. In particular, BFQ
265 updates queue budgets with a simple feedback-loop algorithm that
266 allows a high throughput to be achieved, while still providing
267 tight latency guarantees to time-sensitive applications. When
268 the in-service queue expires, this algorithm computes the next
269 budget of the queue so as to:
270
271 - Let large budgets be eventually assigned to the queues
272 associated with I/O-bound applications performing sequential
273 I/O: in fact, the longer these applications are served once
274 got access to the device, the higher the throughput is.
275
276 - Let small budgets be eventually assigned to the queues
277 associated with time-sensitive applications (which typically
278 perform sporadic and short I/O), because, the smaller the
279 budget assigned to a queue waiting for service is, the sooner
280 B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
281
282- If several processes are competing for the device at the same time,
283 but all processes and groups have the same weight, then BFQ
284 guarantees the expected throughput distribution without ever idling
285 the device. It uses preemption instead. Throughput is then much
286 higher in this common scenario.
287
288- ioprio classes are served in strict priority order, i.e.,
289 lower-priority queues are not served as long as there are
290 higher-priority queues. Among queues in the same class, the
291 bandwidth is distributed in proportion to the weight of each
292 queue. A very thin extra bandwidth is however guaranteed to
293 the Idle class, to prevent it from starving.
294
295
Paolo Valente2670cd12017-08-31 20:00:31 +02002963. What are BFQ's tunables and how to properly configure BFQ?
297=============================================================
Paolo Valenteaee69d72017-04-19 08:29:02 -0600298
Paolo Valente2670cd12017-08-31 20:00:31 +0200299Most BFQ tunables affect service guarantees (basically latency and
300fairness) and throughput. For full details on how to choose the
301desired tradeoff between service guarantees and throughput, see the
302parameters slice_idle, strict_guarantees and low_latency. For details
303on how to maximise throughput, see slice_idle, timeout_sync and
304max_budget. The other performance-related parameters have been
305inherited from, and have been preserved mostly for compatibility with
306CFQ. So far, no performance improvement has been reported after
307changing the latter parameters in BFQ.
308
309In particular, the tunables back_seek-max, back_seek_penalty,
310fifo_expire_async and fifo_expire_sync below are the same as in
311CFQ. Their description is just copied from that for CFQ. Some
312considerations in the description of slice_idle are copied from CFQ
313too.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600314
315per-process ioprio and weight
316-----------------------------
317
Arianna Avanzinie21b7a02017-04-12 18:23:08 +0200318Unless the cgroups interface is used (see "4. BFQ group scheduling"),
319weights can be assigned to processes only indirectly, through I/O
320priorities, and according to the relation:
321weight = (IOPRIO_BE_NR - ioprio) * 10.
322
323Beware that, if low-latency is set, then BFQ automatically raises the
324weight of the queues associated with interactive and soft real-time
325applications. Unset this tunable if you need/want to control weights.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600326
327slice_idle
328----------
329
330This parameter specifies how long BFQ should idle for next I/O
331request, when certain sync BFQ queues become empty. By default
332slice_idle is a non-zero value. Idling has a double purpose: boosting
333throughput and making sure that the desired throughput distribution is
334respected (see the description of how BFQ works, and, if needed, the
335papers referred there).
336
337As for throughput, idling can be very helpful on highly seeky media
338like single spindle SATA/SAS disks where we can cut down on overall
339number of seeks and see improved throughput.
340
341Setting slice_idle to 0 will remove all the idling on queues and one
342should see an overall improved throughput on faster storage devices
Paolo Valente2670cd12017-08-31 20:00:31 +0200343like multiple SATA/SAS disks in hardware RAID configuration, as well
344as flash-based storage with internal command queueing (and
345parallelism).
Paolo Valenteaee69d72017-04-19 08:29:02 -0600346
347So depending on storage and workload, it might be useful to set
348slice_idle=0. In general for SATA/SAS disks and software RAID of
349SATA/SAS disks keeping slice_idle enabled should be useful. For any
350configurations where there are multiple spindles behind single LUN
Paolo Valente2670cd12017-08-31 20:00:31 +0200351(Host based hardware RAID controller or for storage arrays), or with
352flash-based fast storage, setting slice_idle=0 might end up in better
353throughput and acceptable latencies.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600354
355Idling is however necessary to have service guarantees enforced in
356case of differentiated weights or differentiated I/O-request lengths.
357To see why, suppose that a given BFQ queue A must get several I/O
358requests served for each request served for another queue B. Idling
359ensures that, if A makes a new I/O request slightly after becoming
360empty, then no request of B is dispatched in the middle, and thus A
361does not lose the possibility to get more than one request dispatched
362before the next request of B is dispatched. Note that idling
363guarantees the desired differentiated treatment of queues only in
364terms of I/O-request dispatches. To guarantee that the actual service
365order then corresponds to the dispatch order, the strict_guarantees
366tunable must be set too.
367
368There is an important flipside for idling: apart from the above cases
369where it is beneficial also for throughput, idling can severely impact
370throughput. One important case is random workload. Because of this
371issue, BFQ tends to avoid idling as much as possible, when it is not
Paolo Valente2670cd12017-08-31 20:00:31 +0200372beneficial also for throughput (as detailed in Section 2). As a
373consequence of this behavior, and of further issues described for the
374strict_guarantees tunable, short-term service guarantees may be
375occasionally violated. And, in some cases, these guarantees may be
376more important than guaranteeing maximum throughput. For example, in
377video playing/streaming, a very low drop rate may be more important
378than maximum throughput. In these cases, consider setting the
379strict_guarantees parameter.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600380
John Pittman47cb3932019-01-08 16:56:13 -0500381slice_idle_us
382-------------
383
384Controls the same tuning parameter as slice_idle, but in microseconds.
385Either tunable can be used to set idling behavior. Afterwards, the
386other tunable will reflect the newly set value in sysfs.
387
Paolo Valenteaee69d72017-04-19 08:29:02 -0600388strict_guarantees
389-----------------
390
391If this parameter is set (default: unset), then BFQ
392
393- always performs idling when the in-service queue becomes empty;
394
395- forces the device to serve one I/O request at a time, by dispatching a
396 new request only if there is no outstanding request.
397
398In the presence of differentiated weights or I/O-request sizes, both
399the above conditions are needed to guarantee that every BFQ queue
400receives its allotted share of the bandwidth. The first condition is
401needed for the reasons explained in the description of the slice_idle
402tunable. The second condition is needed because all modern storage
403devices reorder internally-queued requests, which may trivially break
404the service guarantees enforced by the I/O scheduler.
405
406Setting strict_guarantees may evidently affect throughput.
407
408back_seek_max
409-------------
410
411This specifies, given in Kbytes, the maximum "distance" for backward seeking.
412The distance is the amount of space from the current head location to the
413sectors that are backward in terms of distance.
414
415This parameter allows the scheduler to anticipate requests in the "backward"
416direction and consider them as being the "next" if they are within this
417distance from the current head location.
418
419back_seek_penalty
420-----------------
421
422This parameter is used to compute the cost of backward seeking. If the
423backward distance of request is just 1/back_seek_penalty from a "front"
424request, then the seeking cost of two requests is considered equivalent.
425
426So scheduler will not bias toward one or the other request (otherwise scheduler
427will bias toward front request). Default value of back_seek_penalty is 2.
428
429fifo_expire_async
430-----------------
431
432This parameter is used to set the timeout of asynchronous requests. Default
Joseph Qi4168a8d2021-02-23 09:55:28 +0800433value of this is 250ms.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600434
435fifo_expire_sync
436----------------
437
438This parameter is used to set the timeout of synchronous requests. Default
Joseph Qi4168a8d2021-02-23 09:55:28 +0800439value of this is 125ms. In case to favor synchronous requests over asynchronous
Paolo Valenteaee69d72017-04-19 08:29:02 -0600440one, this value should be decreased relative to fifo_expire_async.
441
442low_latency
443-----------
444
445This parameter is used to enable/disable BFQ's low latency mode. By
446default, low latency mode is enabled. If enabled, interactive and soft
447real-time applications are privileged and experience a lower latency,
448as explained in more detail in the description of how BFQ works.
449
Paolo Valente43c1b3d2017-05-09 12:54:23 +0200450DISABLE this mode if you need full control on bandwidth
Paolo Valente44e44a12017-04-12 18:23:12 +0200451distribution. In fact, if it is enabled, then BFQ automatically
452increases the bandwidth share of privileged applications, as the main
453means to guarantee a lower latency to them.
454
Paolo Valente43c1b3d2017-05-09 12:54:23 +0200455In addition, as already highlighted at the beginning of this document,
456DISABLE this mode if your only goal is to achieve a high throughput.
457In fact, privileging the I/O of some application over the rest may
458entail a lower throughput. To achieve the highest-possible throughput
459on a non-rotational device, setting slice_idle to 0 may be needed too
460(at the cost of giving up any strong guarantee on fairness and low
461latency).
462
Paolo Valenteaee69d72017-04-19 08:29:02 -0600463timeout_sync
464------------
465
466Maximum amount of device time that can be given to a task (queue) once
467it has been selected for service. On devices with costly seeks,
468increasing this time usually increases maximum throughput. On the
469opposite end, increasing this time coarsens the granularity of the
470short-term bandwidth and latency guarantees, especially if the
471following parameter is set to zero.
472
473max_budget
474----------
475
476Maximum amount of service, measured in sectors, that can be provided
477to a BFQ queue once it is set in service (of course within the limits
478of the above timeout). According to what said in the description of
479the algorithm, larger values increase the throughput in proportion to
480the percentage of sequential I/O requests issued. The price of larger
481values is that they coarsen the granularity of short-term bandwidth
482and latency guarantees.
483
484The default value is 0, which enables auto-tuning: BFQ sets max_budget
485to the maximum number of sectors that can be served during
486timeout_sync, according to the estimated peak rate.
487
Paolo Valente2670cd12017-08-31 20:00:31 +0200488For specific devices, some users have occasionally reported to have
489reached a higher throughput by setting max_budget explicitly, i.e., by
490setting max_budget to a higher value than 0. In particular, they have
491set max_budget to higher values than those to which BFQ would have set
492it with auto-tuning. An alternative way to achieve this goal is to
493just increase the value of timeout_sync, leaving max_budget equal to 0.
494
Paolo Valenteaee69d72017-04-19 08:29:02 -06004954. Group scheduling with BFQ
496============================
497
Arianna Avanzinie21b7a02017-04-12 18:23:08 +0200498BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
499blkio and io. In particular, BFQ supports weight-based proportional
500share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
Paolo Valenteaee69d72017-04-19 08:29:02 -0600501
5024-1 Service guarantees provided
503-------------------------------
504
505With BFQ, proportional share means true proportional share of the
506device bandwidth, according to group weights. For example, a group
507with weight 200 gets twice the bandwidth, and not just twice the time,
508of a group with weight 100.
509
510BFQ supports hierarchies (group trees) of any depth. Bandwidth is
511distributed among groups and processes in the expected way: for each
512group, the children of the group share the whole bandwidth of the
513group in proportion to their weights. In particular, this implies
514that, for each leaf group, every process of the group receives the
515same share of the whole group bandwidth, unless the ioprio of the
516process is modified.
517
518The resource-sharing guarantee for a group may partially or totally
519switch from bandwidth to time, if providing bandwidth guarantees to
520the group lowers the throughput too much. This switch occurs on a
521per-process basis: if a process of a leaf group causes throughput loss
522if served in such a way to receive its share of the bandwidth, then
523BFQ switches back to just time-based proportional share for that
524process.
525
5264-2 Interface
527-------------
528
529To get proportional sharing of bandwidth with BFQ for a given device,
530BFQ must of course be the active scheduler for that device.
531
532Within each group directory, the names of the files associated with
533BFQ-specific cgroup parameters and stats begin with the "bfq."
534prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
535BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
536parameter to set the weight of a group with BFQ is blkio.bfq.weight
537or io.bfq.weight.
538
Luca Miccioa33801e2017-11-13 07:34:10 +0100539As for cgroups-v1 (blkio controller), the exact set of stat files
540created, and kept up-to-date by bfq, depends on whether
Christoph Hellwig8060c472019-06-06 12:26:24 +0200541CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
Luca Miccioa33801e2017-11-13 07:34:10 +0100542the stat files documented in
Mauro Carvalho Chehabda82c922019-06-27 13:08:35 -0300543Documentation/admin-guide/cgroup-v1/blkio-controller.rst. If, instead,
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300544CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files::
545
546 blkio.bfq.io_service_bytes
547 blkio.bfq.io_service_bytes_recursive
548 blkio.bfq.io_serviced
549 blkio.bfq.io_serviced_recursive
Luca Miccioa33801e2017-11-13 07:34:10 +0100550
Christoph Hellwig8060c472019-06-06 12:26:24 +0200551The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
Luca Miccioa33801e2017-11-13 07:34:10 +0100552throughput sustainable with bfq, because updating the blkio.bfq.*
553stats is rather costly, especially for some of the stats enabled by
Christoph Hellwig8060c472019-06-06 12:26:24 +0200554CONFIG_BFQ_CGROUP_DEBUG.
Luca Miccioa33801e2017-11-13 07:34:10 +0100555
Kir Kolyshkinfda0b5b2021-06-14 14:41:09 -0700556Parameters
557----------
Paolo Valenteaee69d72017-04-19 08:29:02 -0600558
Kir Kolyshkinfda0b5b2021-06-14 14:41:09 -0700559For each group, the following parameters can be set:
Paolo Valenteaee69d72017-04-19 08:29:02 -0600560
Kir Kolyshkinfda0b5b2021-06-14 14:41:09 -0700561 weight
562 This specifies the default weight for the cgroup inside its parent.
563 Available values: 1..1000 (default: 100).
Paolo Valenteaee69d72017-04-19 08:29:02 -0600564
Kir Kolyshkinfda0b5b2021-06-14 14:41:09 -0700565 For cgroup v1, it is set by writing the value to `blkio.bfq.weight`.
566
567 For cgroup v2, it is set by writing the value to `io.bfq.weight`.
568 (with an optional prefix of `default` and a space).
569
570 The linear mapping between ioprio and weights, described at the beginning
571 of the tunable section, is still valid, but all weights higher than
572 IOPRIO_BE_NR*10 are mapped to ioprio 0.
573
574 Recall that, if low-latency is set, then BFQ automatically raises the
575 weight of the queues associated with interactive and soft real-time
576 applications. Unset this tunable if you need/want to control weights.
577
578 weight_device
579 This specifies a per-device weight for the cgroup. The syntax is
580 `minor:major weight`. A weight of `0` may be used to reset to the default
581 weight.
582
583 For cgroup v1, it is set by writing the value to `blkio.bfq.weight_device`.
584
585 For cgroup v2, the file name is `io.bfq.weight`.
Paolo Valente44e44a12017-04-12 18:23:12 +0200586
Paolo Valenteaee69d72017-04-19 08:29:02 -0600587
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300588[1]
589 P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
Paolo Valenteaee69d72017-04-19 08:29:02 -0600590 Scheduler", Proceedings of the First Workshop on Mobile System
591 Technologies (MST-2015), May 2015.
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300592
Paolo Valenteaee69d72017-04-19 08:29:02 -0600593 http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
594
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300595[2]
596 P. Valente and M. Andreolini, "Improving Application
Paolo Valenteaee69d72017-04-19 08:29:02 -0600597 Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
598 the 5th Annual International Systems and Storage Conference
599 (SYSTOR '12), June 2012.
Paolo Valente4438cf52019-03-12 09:59:35 +0100600
Mauro Carvalho Chehab898bd372019-04-18 19:45:00 -0300601 Slightly extended version:
602
603 http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-results.pdf
604
605[3]
606 https://github.com/Algodev-github/S