Blame - Documentation/block/bfq-iosched.rst - SHIFTPHONES/kernel/shift/mainline

blob: 19d4d1570cee75ce477e3dc81f8fd23e4ac47635 [file] [log] [blame]

Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	1	==========================
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	2	BFQ (Budget Fair Queueing)
				3	==========================
				4
				5	BFQ is a proportional-share I/O scheduler, with some extra
				6	low-latency capabilities. In addition to cgroups support (blkio or io
				7	controllers), BFQ's main features are:
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	8
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	9	- BFQ guarantees a high system and application responsiveness, and a
				10	low latency for time-sensitive applications, such as audio or video
				11	players;
				12	- BFQ distributes bandwidth, and not just time, among processes or
				13	groups (switching back to time distribution when needed to keep
				14	throughput high).
				15
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	16	In its default configuration, BFQ privileges latency over
				17	throughput. So, when needed for achieving a lower latency, BFQ builds
				18	schedules that may lead to a lower throughput. If your main or only
				19	goal, for a given device, is to achieve the maximum-possible
				20	throughput at all times, then do switch off all low-latency heuristics
Paolo Valente	233f0bf	2017-08-31 20:00:30 +0200	[diff] [blame]	21	for that device, by setting low_latency to 0. See Section 3 for
				22	details on how to configure BFQ for the desired tradeoff between
				23	latency and throughput, or on how to maximize throughput.
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	24
Paolo Valente	4438cf5	2019-03-12 09:59:35 +0100	[diff] [blame]	25	As every I/O scheduler, BFQ adds some overhead to per-I/O-request
				26	processing. To give an idea of this overhead, the total,
				27	single-lock-protected, per-request processing time of BFQ---i.e., the
				28	sum of the execution times of the request insertion, dispatch and
				29	completion hooks---is, e.g., 1.9 us on an Intel Core i7-2760QM@2.40GHz
				30	(dated CPU for notebooks; time measured with simple code
				31	instrumentation, and using the throughput-sync.sh script of the S
				32	suite [1], in performance-profiling mode). To put this result into
				33	context, the total, single-lock-protected, per-request execution time
				34	of the lightest I/O scheduler available in blk-mq, mq-deadline, is 0.7
				35	us (mq-deadline is ~800 LOC, against ~10500 LOC for BFQ).
				36
				37	Scheduling overhead further limits the maximum IOPS that a CPU can
				38	process (already limited by the execution of the rest of the I/O
				39	stack). To give an idea of the limits with BFQ, on slow or average
				40	CPUs, here are, first, the limits of BFQ for three different CPUs, on,
				41	respectively, an average laptop, an old desktop, and a cheap embedded
				42	system, in case full hierarchical support is enabled (i.e.,
Christoph Hellwig	8060c47	2019-06-06 12:26:24 +0200	[diff] [blame]	43	CONFIG_BFQ_GROUP_IOSCHED is set), but CONFIG_BFQ_CGROUP_DEBUG is not
Paolo Valente	4438cf5	2019-03-12 09:59:35 +0100	[diff] [blame]	44	set (Section 4-2):
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	45	- Intel i7-4850HQ: 400 KIOPS
				46	- AMD A8-3850: 250 KIOPS
				47	- ARM CortexTM-A53 Octa-core: 80 KIOPS
				48
Christoph Hellwig	8060c47	2019-06-06 12:26:24 +0200	[diff] [blame]	49	If CONFIG_BFQ_CGROUP_DEBUG is set (and of course full hierarchical
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	50	support is enabled), then the sustainable throughput with BFQ
				51	decreases, because all blkio.bfq* statistics are created and updated
				52	(Section 4-2). For BFQ, this leads to the following maximum
				53	sustainable throughputs, on the same systems as above:
Paolo Valente	24bfd19	2017-11-13 07:34:09 +0100	[diff] [blame]	54	- Intel i7-4850HQ: 310 KIOPS
				55	- AMD A8-3850: 200 KIOPS
				56	- ARM CortexTM-A53 Octa-core: 56 KIOPS
Paolo Valente	68017e5	2017-11-13 07:34:07 +0100	[diff] [blame]	57
				58	BFQ works for multi-queue devices too.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	59
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	60	.. The table of contents follow. Impatients can just jump to Section 3.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	61
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	62	.. CONTENTS
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	63
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	64	1. When may BFQ be useful?
				65	1-1 Personal systems
				66	1-2 Server systems
				67	2. How does BFQ work?
				68	3. What are BFQ's tunables and how to properly configure BFQ?
				69	4. BFQ group scheduling
				70	4-1 Service guarantees provided
				71	4-2 Interface
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	72
				73	1. When may BFQ be useful?
				74	==========================
				75
				76	BFQ provides the following benefits on personal and server systems.
				77
				78	1-1 Personal systems
				79	--------------------
				80
				81	Low latency for interactive applications
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	82	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	83
				84	Regardless of the actual background workload, BFQ guarantees that, for
				85	interactive tasks, the storage device is virtually as responsive as if
				86	it was idle. For example, even if one or more of the following
				87	background workloads are being executed:
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	88
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	89	- one or more large files are being read, written or copied,
				90	- a tree of source files is being compiled,
				91	- one or more virtual machines are performing I/O,
				92	- a software update is in progress,
				93	- indexing daemons are scanning filesystems and updating their
				94	databases,
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	95
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	96	starting an application or loading a file from within an application
				97	takes about the same time as if the storage device was idle. As a
				98	comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
				99	applications experience high latencies, or even become unresponsive
				100	until the background workload terminates (also on SSDs).
				101
				102	Low latency for soft real-time applications
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	103	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	104	Also soft real-time applications, such as audio and video
				105	players/streamers, enjoy a low latency and a low drop rate, regardless
				106	of the background I/O workload. As a consequence, these applications
				107	do not suffer from almost any glitch due to the background workload.
				108
				109	Higher speed for code-development tasks
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	110	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	111
				112	If some additional workload happens to be executed in parallel, then
				113	BFQ executes the I/O-related components of typical code-development
				114	tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
				115	NOOP or DEADLINE.
				116
				117	High throughput
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	118	^^^^^^^^^^^^^^^
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	119
				120	On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
				121	up to 150% higher throughput than DEADLINE and NOOP, with all the
				122	sequential workloads considered in our tests. With random workloads,
				123	and with all the workloads on flash-based devices, BFQ achieves,
				124	instead, about the same throughput as the other schedulers.
				125
				126	Strong fairness, bandwidth and delay guarantees
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	127	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	128
				129	BFQ distributes the device throughput, and not just the device time,
				130	among I/O-bound applications in proportion their weights, with any
				131	workload and regardless of the device parameters. From these bandwidth
				132	guarantees, it is possible to compute tight per-I/O-request delay
				133	guarantees by a simple formula. If not configured for strict service
				134	guarantees, BFQ switches to time-based resource sharing (only) for
				135	applications that would otherwise cause a throughput loss.
				136
				137	1-2 Server systems
				138	------------------
				139
				140	Most benefits for server systems follow from the same service
				141	properties as above. In particular, regardless of whether additional,
				142	possibly heavy workloads are being served, BFQ guarantees:
				143
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	144	* audio and video-streaming with zero or very low jitter and drop
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	145	rate;
				146
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	147	* fast retrieval of WEB pages and embedded objects;
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	148
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	149	* real-time recording of data in live-dumping applications (e.g.,
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	150	packet logging);
				151
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	152	* responsiveness in local and remote access to a server.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	153
				154
				155	2. How does BFQ work?
				156	=====================
				157
				158	BFQ is a proportional-share I/O scheduler, whose general structure,
				159	plus a lot of code, are borrowed from CFQ.
				160
				161	- Each process doing I/O on a device is associated with a weight and a
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	162	`(bfq_)queue`.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	163
				164	- BFQ grants exclusive access to the device, for a while, to one queue
				165	(process) at a time, and implements this service model by
				166	associating every queue with a budget, measured in number of
				167	sectors.
				168
				169	- After a queue is granted access to the device, the budget of the
				170	queue is decremented, on each request dispatch, by the size of the
				171	request.
				172
				173	- The in-service queue is expired, i.e., its service is suspended,
				174	only if one of the following events occurs: 1) the queue finishes
				175	its budget, 2) the queue empties, 3) a "budget timeout" fires.
				176
				177	- The budget timeout prevents processes doing random I/O from
				178	holding the device for too long and dramatically reducing
				179	throughput.
				180
				181	- Actually, as in CFQ, a queue associated with a process issuing
				182	sync requests may not be expired immediately when it empties. In
				183	contrast, BFQ may idle the device for a short time interval,
				184	giving the process the chance to go on being served if it issues
				185	a new request in time. Device idling typically boosts the
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	186	throughput on rotational devices and on non-queueing flash-based
				187	devices, if processes do synchronous and sequential I/O. In
				188	addition, under BFQ, device idling is also instrumental in
				189	guaranteeing the desired throughput fraction to processes
				190	issuing sync requests (see the description of the slice_idle
				191	tunable in this document, or [1, 2], for more details).
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	192
				193	- With respect to idling for service guarantees, if several
				194	processes are competing for the device at the same time, but
Paolo Valente	233f0bf	2017-08-31 20:00:30 +0200	[diff] [blame]	195	all processes and groups have the same weight, then BFQ
				196	guarantees the expected throughput distribution without ever
				197	idling the device. Throughput is thus as high as possible in
				198	this common scenario.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	199
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	200	- On flash-based storage with internal queueing of commands
				201	(typically NCQ), device idling happens to be always detrimental
				202	for throughput. So, with these devices, BFQ performs idling
				203	only when strictly needed for service guarantees, i.e., for
				204	guaranteeing low latency or fairness. In these cases, overall
				205	throughput may be sub-optimal. No solution currently exists to
				206	provide both strong service guarantees and optimal throughput
				207	on devices with internal queueing.
				208
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	209	- If low-latency mode is enabled (default configuration), BFQ
				210	executes some special heuristics to detect interactive and soft
				211	real-time applications (e.g., video or audio players/streamers),
				212	and to reduce their latency. The most important action taken to
				213	achieve this goal is to give to the queues associated with these
				214	applications more than their fair share of the device
				215	throughput. For brevity, we call just "weight-raising" the whole
				216	sets of actions taken by BFQ to privilege these queues. In
				217	particular, BFQ provides a milder form of weight-raising for
				218	interactive applications, and a stronger form for soft real-time
				219	applications.
				220
				221	- BFQ automatically deactivates idling for queues born in a burst of
				222	queue creations. In fact, these queues are usually associated with
				223	the processes of applications and services that benefit mostly
				224	from a high throughput. Examples are systemd during boot, or git
				225	grep.
				226
				227	- As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
				228	performing random I/O that becomes mostly sequential if
				229	merged. Differently from CFQ, BFQ achieves this goal with a more
				230	reactive mechanism, called Early Queue Merge (EQM). EQM is so
				231	responsive in detecting interleaved I/O (cooperating processes),
				232	that it enables BFQ to achieve a high throughput, by queue
				233	merging, even for queues for which CFQ needs a different
				234	mechanism, preemption, to get a high throughput. As such EQM is a
				235	unified mechanism to achieve a high throughput with interleaved
				236	I/O.
				237
				238	- Queues are scheduled according to a variant of WF2Q+, named
				239	B-WF2Q+, and implemented using an augmented rb-tree to preserve an
				240	O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
Paolo Valente	233f0bf	2017-08-31 20:00:30 +0200	[diff] [blame]	241	also ready for hierarchical scheduling, details in Section 4.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	242
				243	- B-WF2Q+ guarantees a tight deviation with respect to an ideal,
				244	perfectly fair, and smooth service. In particular, B-WF2Q+
				245	guarantees that each queue receives a fraction of the device
				246	throughput proportional to its weight, even if the throughput
				247	fluctuates, and regardless of: the device parameters, the current
				248	workload and the budgets assigned to the queue.
				249
				250	- The last, budget-independence, property (although probably
				251	counterintuitive in the first place) is definitely beneficial, for
				252	the following reasons:
				253
				254	- First, with any proportional-share scheduler, the maximum
				255	deviation with respect to an ideal service is proportional to
				256	the maximum budget (slice) assigned to queues. As a consequence,
				257	BFQ can keep this deviation tight not only because of the
				258	accurate service of B-WF2Q+, but also because BFQ does not
				259	need to assign a larger budget to a queue to let the queue
				260	receive a higher fraction of the device throughput.
				261
				262	- Second, BFQ is free to choose, for every process (queue), the
				263	budget that best fits the needs of the process, or best
				264	leverages the I/O pattern of the process. In particular, BFQ
				265	updates queue budgets with a simple feedback-loop algorithm that
				266	allows a high throughput to be achieved, while still providing
				267	tight latency guarantees to time-sensitive applications. When
				268	the in-service queue expires, this algorithm computes the next
				269	budget of the queue so as to:
				270
				271	- Let large budgets be eventually assigned to the queues
				272	associated with I/O-bound applications performing sequential
				273	I/O: in fact, the longer these applications are served once
				274	got access to the device, the higher the throughput is.
				275
				276	- Let small budgets be eventually assigned to the queues
				277	associated with time-sensitive applications (which typically
				278	perform sporadic and short I/O), because, the smaller the
				279	budget assigned to a queue waiting for service is, the sooner
				280	B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
				281
				282	- If several processes are competing for the device at the same time,
				283	but all processes and groups have the same weight, then BFQ
				284	guarantees the expected throughput distribution without ever idling
				285	the device. It uses preemption instead. Throughput is then much
				286	higher in this common scenario.
				287
				288	- ioprio classes are served in strict priority order, i.e.,
				289	lower-priority queues are not served as long as there are
				290	higher-priority queues. Among queues in the same class, the
				291	bandwidth is distributed in proportion to the weight of each
				292	queue. A very thin extra bandwidth is however guaranteed to
				293	the Idle class, to prevent it from starving.
				294
				295
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	296	3. What are BFQ's tunables and how to properly configure BFQ?
				297	=============================================================
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	298
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	299	Most BFQ tunables affect service guarantees (basically latency and
				300	fairness) and throughput. For full details on how to choose the
				301	desired tradeoff between service guarantees and throughput, see the
				302	parameters slice_idle, strict_guarantees and low_latency. For details
				303	on how to maximise throughput, see slice_idle, timeout_sync and
				304	max_budget. The other performance-related parameters have been
				305	inherited from, and have been preserved mostly for compatibility with
				306	CFQ. So far, no performance improvement has been reported after
				307	changing the latter parameters in BFQ.
				308
				309	In particular, the tunables back_seek-max, back_seek_penalty,
				310	fifo_expire_async and fifo_expire_sync below are the same as in
				311	CFQ. Their description is just copied from that for CFQ. Some
				312	considerations in the description of slice_idle are copied from CFQ
				313	too.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	314
				315	per-process ioprio and weight
				316	-----------------------------
				317
Arianna Avanzini	e21b7a0	2017-04-12 18:23:08 +0200	[diff] [blame]	318	Unless the cgroups interface is used (see "4. BFQ group scheduling"),
				319	weights can be assigned to processes only indirectly, through I/O
				320	priorities, and according to the relation:
				321	weight = (IOPRIO_BE_NR - ioprio) * 10.
				322
				323	Beware that, if low-latency is set, then BFQ automatically raises the
				324	weight of the queues associated with interactive and soft real-time
				325	applications. Unset this tunable if you need/want to control weights.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	326
				327	slice_idle
				328	----------
				329
				330	This parameter specifies how long BFQ should idle for next I/O
				331	request, when certain sync BFQ queues become empty. By default
				332	slice_idle is a non-zero value. Idling has a double purpose: boosting
				333	throughput and making sure that the desired throughput distribution is
				334	respected (see the description of how BFQ works, and, if needed, the
				335	papers referred there).
				336
				337	As for throughput, idling can be very helpful on highly seeky media
				338	like single spindle SATA/SAS disks where we can cut down on overall
				339	number of seeks and see improved throughput.
				340
				341	Setting slice_idle to 0 will remove all the idling on queues and one
				342	should see an overall improved throughput on faster storage devices
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	343	like multiple SATA/SAS disks in hardware RAID configuration, as well
				344	as flash-based storage with internal command queueing (and
				345	parallelism).
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	346
				347	So depending on storage and workload, it might be useful to set
				348	slice_idle=0. In general for SATA/SAS disks and software RAID of
				349	SATA/SAS disks keeping slice_idle enabled should be useful. For any
				350	configurations where there are multiple spindles behind single LUN
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	351	(Host based hardware RAID controller or for storage arrays), or with
				352	flash-based fast storage, setting slice_idle=0 might end up in better
				353	throughput and acceptable latencies.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	354
				355	Idling is however necessary to have service guarantees enforced in
				356	case of differentiated weights or differentiated I/O-request lengths.
				357	To see why, suppose that a given BFQ queue A must get several I/O
				358	requests served for each request served for another queue B. Idling
				359	ensures that, if A makes a new I/O request slightly after becoming
				360	empty, then no request of B is dispatched in the middle, and thus A
				361	does not lose the possibility to get more than one request dispatched
				362	before the next request of B is dispatched. Note that idling
				363	guarantees the desired differentiated treatment of queues only in
				364	terms of I/O-request dispatches. To guarantee that the actual service
				365	order then corresponds to the dispatch order, the strict_guarantees
				366	tunable must be set too.
				367
				368	There is an important flipside for idling: apart from the above cases
				369	where it is beneficial also for throughput, idling can severely impact
				370	throughput. One important case is random workload. Because of this
				371	issue, BFQ tends to avoid idling as much as possible, when it is not
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	372	beneficial also for throughput (as detailed in Section 2). As a
				373	consequence of this behavior, and of further issues described for the
				374	strict_guarantees tunable, short-term service guarantees may be
				375	occasionally violated. And, in some cases, these guarantees may be
				376	more important than guaranteeing maximum throughput. For example, in
				377	video playing/streaming, a very low drop rate may be more important
				378	than maximum throughput. In these cases, consider setting the
				379	strict_guarantees parameter.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	380
John Pittman	47cb393	2019-01-08 16:56:13 -0500	[diff] [blame]	381	slice_idle_us
				382	-------------
				383
				384	Controls the same tuning parameter as slice_idle, but in microseconds.
				385	Either tunable can be used to set idling behavior. Afterwards, the
				386	other tunable will reflect the newly set value in sysfs.
				387
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	388	strict_guarantees
				389	-----------------
				390
				391	If this parameter is set (default: unset), then BFQ
				392
				393	- always performs idling when the in-service queue becomes empty;
				394
				395	- forces the device to serve one I/O request at a time, by dispatching a
				396	new request only if there is no outstanding request.
				397
				398	In the presence of differentiated weights or I/O-request sizes, both
				399	the above conditions are needed to guarantee that every BFQ queue
				400	receives its allotted share of the bandwidth. The first condition is
				401	needed for the reasons explained in the description of the slice_idle
				402	tunable. The second condition is needed because all modern storage
				403	devices reorder internally-queued requests, which may trivially break
				404	the service guarantees enforced by the I/O scheduler.
				405
				406	Setting strict_guarantees may evidently affect throughput.
				407
				408	back_seek_max
				409	-------------
				410
				411	This specifies, given in Kbytes, the maximum "distance" for backward seeking.
				412	The distance is the amount of space from the current head location to the
				413	sectors that are backward in terms of distance.
				414
				415	This parameter allows the scheduler to anticipate requests in the "backward"
				416	direction and consider them as being the "next" if they are within this
				417	distance from the current head location.
				418
				419	back_seek_penalty
				420	-----------------
				421
				422	This parameter is used to compute the cost of backward seeking. If the
				423	backward distance of request is just 1/back_seek_penalty from a "front"
				424	request, then the seeking cost of two requests is considered equivalent.
				425
				426	So scheduler will not bias toward one or the other request (otherwise scheduler
				427	will bias toward front request). Default value of back_seek_penalty is 2.
				428
				429	fifo_expire_async
				430	-----------------
				431
				432	This parameter is used to set the timeout of asynchronous requests. Default
				433	value of this is 248ms.
				434
				435	fifo_expire_sync
				436	----------------
				437
				438	This parameter is used to set the timeout of synchronous requests. Default
				439	value of this is 124ms. In case to favor synchronous requests over asynchronous
				440	one, this value should be decreased relative to fifo_expire_async.
				441
				442	low_latency
				443	-----------
				444
				445	This parameter is used to enable/disable BFQ's low latency mode. By
				446	default, low latency mode is enabled. If enabled, interactive and soft
				447	real-time applications are privileged and experience a lower latency,
				448	as explained in more detail in the description of how BFQ works.
				449
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	450	DISABLE this mode if you need full control on bandwidth
Paolo Valente	44e44a1	2017-04-12 18:23:12 +0200	[diff] [blame]	451	distribution. In fact, if it is enabled, then BFQ automatically
				452	increases the bandwidth share of privileged applications, as the main
				453	means to guarantee a lower latency to them.
				454
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	455	In addition, as already highlighted at the beginning of this document,
				456	DISABLE this mode if your only goal is to achieve a high throughput.
				457	In fact, privileging the I/O of some application over the rest may
				458	entail a lower throughput. To achieve the highest-possible throughput
				459	on a non-rotational device, setting slice_idle to 0 may be needed too
				460	(at the cost of giving up any strong guarantee on fairness and low
				461	latency).
				462
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	463	timeout_sync
				464	------------
				465
				466	Maximum amount of device time that can be given to a task (queue) once
				467	it has been selected for service. On devices with costly seeks,
				468	increasing this time usually increases maximum throughput. On the
				469	opposite end, increasing this time coarsens the granularity of the
				470	short-term bandwidth and latency guarantees, especially if the
				471	following parameter is set to zero.
				472
				473	max_budget
				474	----------
				475
				476	Maximum amount of service, measured in sectors, that can be provided
				477	to a BFQ queue once it is set in service (of course within the limits
				478	of the above timeout). According to what said in the description of
				479	the algorithm, larger values increase the throughput in proportion to
				480	the percentage of sequential I/O requests issued. The price of larger
				481	values is that they coarsen the granularity of short-term bandwidth
				482	and latency guarantees.
				483
				484	The default value is 0, which enables auto-tuning: BFQ sets max_budget
				485	to the maximum number of sectors that can be served during
				486	timeout_sync, according to the estimated peak rate.
				487
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	488	For specific devices, some users have occasionally reported to have
				489	reached a higher throughput by setting max_budget explicitly, i.e., by
				490	setting max_budget to a higher value than 0. In particular, they have
				491	set max_budget to higher values than those to which BFQ would have set
				492	it with auto-tuning. An alternative way to achieve this goal is to
				493	just increase the value of timeout_sync, leaving max_budget equal to 0.
				494
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	495	4. Group scheduling with BFQ
				496	============================
				497
Arianna Avanzini	e21b7a0	2017-04-12 18:23:08 +0200	[diff] [blame]	498	BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
				499	blkio and io. In particular, BFQ supports weight-based proportional
				500	share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	501
				502	4-1 Service guarantees provided
				503	-------------------------------
				504
				505	With BFQ, proportional share means true proportional share of the
				506	device bandwidth, according to group weights. For example, a group
				507	with weight 200 gets twice the bandwidth, and not just twice the time,
				508	of a group with weight 100.
				509
				510	BFQ supports hierarchies (group trees) of any depth. Bandwidth is
				511	distributed among groups and processes in the expected way: for each
				512	group, the children of the group share the whole bandwidth of the
				513	group in proportion to their weights. In particular, this implies
				514	that, for each leaf group, every process of the group receives the
				515	same share of the whole group bandwidth, unless the ioprio of the
				516	process is modified.
				517
				518	The resource-sharing guarantee for a group may partially or totally
				519	switch from bandwidth to time, if providing bandwidth guarantees to
				520	the group lowers the throughput too much. This switch occurs on a
				521	per-process basis: if a process of a leaf group causes throughput loss
				522	if served in such a way to receive its share of the bandwidth, then
				523	BFQ switches back to just time-based proportional share for that
				524	process.
				525
				526	4-2 Interface
				527	-------------
				528
				529	To get proportional sharing of bandwidth with BFQ for a given device,
				530	BFQ must of course be the active scheduler for that device.
				531
				532	Within each group directory, the names of the files associated with
				533	BFQ-specific cgroup parameters and stats begin with the "bfq."
				534	prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
				535	BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
				536	parameter to set the weight of a group with BFQ is blkio.bfq.weight
				537	or io.bfq.weight.
				538
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	539	As for cgroups-v1 (blkio controller), the exact set of stat files
				540	created, and kept up-to-date by bfq, depends on whether
Christoph Hellwig	8060c47	2019-06-06 12:26:24 +0200	[diff] [blame]	541	CONFIG_BFQ_CGROUP_DEBUG is set. If it is set, then bfq creates all
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	542	the stat files documented in
Mauro Carvalho Chehab	da82c92	2019-06-27 13:08:35 -0300	[diff] [blame]	543	Documentation/admin-guide/cgroup-v1/blkio-controller.rst. If, instead,
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	544	CONFIG_BFQ_CGROUP_DEBUG is not set, then bfq creates only the files::
				545
				546	blkio.bfq.io_service_bytes
				547	blkio.bfq.io_service_bytes_recursive
				548	blkio.bfq.io_serviced
				549	blkio.bfq.io_serviced_recursive
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	550
Christoph Hellwig	8060c47	2019-06-06 12:26:24 +0200	[diff] [blame]	551	The value of CONFIG_BFQ_CGROUP_DEBUG greatly influences the maximum
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	552	throughput sustainable with bfq, because updating the blkio.bfq.*
				553	stats is rather costly, especially for some of the stats enabled by
Christoph Hellwig	8060c47	2019-06-06 12:26:24 +0200	[diff] [blame]	554	CONFIG_BFQ_CGROUP_DEBUG.
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	555
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	556	Parameters to set
				557	-----------------
				558
				559	For each group, there is only the following parameter to set.
				560
				561	weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
Yufen Yu	65752ae	2020-07-03 02:13:23 -0400	[diff] [blame]	562	group inside its parent. Available values: 1..1000 (default 100). The
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	563	linear mapping between ioprio and weights, described at the beginning
				564	of the tunable section, is still valid, but all weights higher than
				565	IOPRIO_BE_NR*10 are mapped to ioprio 0.
				566
Paolo Valente	44e44a1	2017-04-12 18:23:12 +0200	[diff] [blame]	567	Recall that, if low-latency is set, then BFQ automatically raises the
				568	weight of the queues associated with interactive and soft real-time
				569	applications. Unset this tunable if you need/want to control weights.
				570
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	571
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	572	[1]
				573	P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	574	Scheduler", Proceedings of the First Workshop on Mobile System
				575	Technologies (MST-2015), May 2015.
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	576
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	577	http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
				578
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	579	[2]
				580	P. Valente and M. Andreolini, "Improving Application
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	581	Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
				582	the 5th Annual International Systems and Storage Conference
				583	(SYSTOR '12), June 2012.
Paolo Valente	4438cf5	2019-03-12 09:59:35 +0100	[diff] [blame]	584
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	585	Slightly extended version:
				586
				587	http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-results.pdf
				588
				589	[3]
				590	https://github.com/Algodev-github/S