Blame - Documentation/block/bfq-iosched.txt - SHIFTPHONES/mainline/linux

blob: 8d8d8f06cab29aab5fb07b96c9a5e452515fb819 [file] [log] [blame]

Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	1	BFQ (Budget Fair Queueing)
				2	==========================
				3
				4	BFQ is a proportional-share I/O scheduler, with some extra
				5	low-latency capabilities. In addition to cgroups support (blkio or io
				6	controllers), BFQ's main features are:
				7	- BFQ guarantees a high system and application responsiveness, and a
				8	low latency for time-sensitive applications, such as audio or video
				9	players;
				10	- BFQ distributes bandwidth, and not just time, among processes or
				11	groups (switching back to time distribution when needed to keep
				12	throughput high).
				13
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	14	In its default configuration, BFQ privileges latency over
				15	throughput. So, when needed for achieving a lower latency, BFQ builds
				16	schedules that may lead to a lower throughput. If your main or only
				17	goal, for a given device, is to achieve the maximum-possible
				18	throughput at all times, then do switch off all low-latency heuristics
Paolo Valente	233f0bf	2017-08-31 20:00:30 +0200	[diff] [blame]	19	for that device, by setting low_latency to 0. See Section 3 for
				20	details on how to configure BFQ for the desired tradeoff between
				21	latency and throughput, or on how to maximize throughput.
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	22
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	23	BFQ has a non-null overhead, which limits the maximum IOPS that a CPU
				24	can process for a device scheduled with BFQ. To give an idea of the
				25	limits on slow or average CPUs, here are, first, the limits of BFQ for
				26	three different CPUs, on, respectively, an average laptop, an old
				27	desktop, and a cheap embedded system, in case full hierarchical
				28	support is enabled (i.e., CONFIG_BFQ_GROUP_IOSCHED is set), but
				29	CONFIG_DEBUG_BLK_CGROUP is not set (Section 4-2):
				30	- Intel i7-4850HQ: 400 KIOPS
				31	- AMD A8-3850: 250 KIOPS
				32	- ARM CortexTM-A53 Octa-core: 80 KIOPS
				33
				34	If CONFIG_DEBUG_BLK_CGROUP is set (and of course full hierarchical
				35	support is enabled), then the sustainable throughput with BFQ
				36	decreases, because all blkio.bfq* statistics are created and updated
				37	(Section 4-2). For BFQ, this leads to the following maximum
				38	sustainable throughputs, on the same systems as above:
Paolo Valente	24bfd19	2017-11-13 07:34:09 +0100	[diff] [blame]	39	- Intel i7-4850HQ: 310 KIOPS
				40	- AMD A8-3850: 200 KIOPS
				41	- ARM CortexTM-A53 Octa-core: 56 KIOPS
Paolo Valente	68017e5	2017-11-13 07:34:07 +0100	[diff] [blame]	42
				43	BFQ works for multi-queue devices too.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	44
				45	The table of contents follow. Impatients can just jump to Section 3.
				46
				47	CONTENTS
				48
				49	1. When may BFQ be useful?
				50	1-1 Personal systems
				51	1-2 Server systems
				52	2. How does BFQ work?
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	53	3. What are BFQ's tunables and how to properly configure BFQ?
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	54	4. BFQ group scheduling
				55	4-1 Service guarantees provided
				56	4-2 Interface
				57
				58	1. When may BFQ be useful?
				59	==========================
				60
				61	BFQ provides the following benefits on personal and server systems.
				62
				63	1-1 Personal systems
				64	--------------------
				65
				66	Low latency for interactive applications
				67
				68	Regardless of the actual background workload, BFQ guarantees that, for
				69	interactive tasks, the storage device is virtually as responsive as if
				70	it was idle. For example, even if one or more of the following
				71	background workloads are being executed:
				72	- one or more large files are being read, written or copied,
				73	- a tree of source files is being compiled,
				74	- one or more virtual machines are performing I/O,
				75	- a software update is in progress,
				76	- indexing daemons are scanning filesystems and updating their
				77	databases,
				78	starting an application or loading a file from within an application
				79	takes about the same time as if the storage device was idle. As a
				80	comparison, with CFQ, NOOP or DEADLINE, and in the same conditions,
				81	applications experience high latencies, or even become unresponsive
				82	until the background workload terminates (also on SSDs).
				83
				84	Low latency for soft real-time applications
				85
				86	Also soft real-time applications, such as audio and video
				87	players/streamers, enjoy a low latency and a low drop rate, regardless
				88	of the background I/O workload. As a consequence, these applications
				89	do not suffer from almost any glitch due to the background workload.
				90
				91	Higher speed for code-development tasks
				92
				93	If some additional workload happens to be executed in parallel, then
				94	BFQ executes the I/O-related components of typical code-development
				95	tasks (compilation, checkout, merge, ...) much more quickly than CFQ,
				96	NOOP or DEADLINE.
				97
				98	High throughput
				99
				100	On hard disks, BFQ achieves up to 30% higher throughput than CFQ, and
				101	up to 150% higher throughput than DEADLINE and NOOP, with all the
				102	sequential workloads considered in our tests. With random workloads,
				103	and with all the workloads on flash-based devices, BFQ achieves,
				104	instead, about the same throughput as the other schedulers.
				105
				106	Strong fairness, bandwidth and delay guarantees
				107
				108	BFQ distributes the device throughput, and not just the device time,
				109	among I/O-bound applications in proportion their weights, with any
				110	workload and regardless of the device parameters. From these bandwidth
				111	guarantees, it is possible to compute tight per-I/O-request delay
				112	guarantees by a simple formula. If not configured for strict service
				113	guarantees, BFQ switches to time-based resource sharing (only) for
				114	applications that would otherwise cause a throughput loss.
				115
				116	1-2 Server systems
				117	------------------
				118
				119	Most benefits for server systems follow from the same service
				120	properties as above. In particular, regardless of whether additional,
				121	possibly heavy workloads are being served, BFQ guarantees:
				122
				123	. audio and video-streaming with zero or very low jitter and drop
				124	rate;
				125
				126	. fast retrieval of WEB pages and embedded objects;
				127
				128	. real-time recording of data in live-dumping applications (e.g.,
				129	packet logging);
				130
				131	. responsiveness in local and remote access to a server.
				132
				133
				134	2. How does BFQ work?
				135	=====================
				136
				137	BFQ is a proportional-share I/O scheduler, whose general structure,
				138	plus a lot of code, are borrowed from CFQ.
				139
				140	- Each process doing I/O on a device is associated with a weight and a
				141	(bfq_)queue.
				142
				143	- BFQ grants exclusive access to the device, for a while, to one queue
				144	(process) at a time, and implements this service model by
				145	associating every queue with a budget, measured in number of
				146	sectors.
				147
				148	- After a queue is granted access to the device, the budget of the
				149	queue is decremented, on each request dispatch, by the size of the
				150	request.
				151
				152	- The in-service queue is expired, i.e., its service is suspended,
				153	only if one of the following events occurs: 1) the queue finishes
				154	its budget, 2) the queue empties, 3) a "budget timeout" fires.
				155
				156	- The budget timeout prevents processes doing random I/O from
				157	holding the device for too long and dramatically reducing
				158	throughput.
				159
				160	- Actually, as in CFQ, a queue associated with a process issuing
				161	sync requests may not be expired immediately when it empties. In
				162	contrast, BFQ may idle the device for a short time interval,
				163	giving the process the chance to go on being served if it issues
				164	a new request in time. Device idling typically boosts the
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	165	throughput on rotational devices and on non-queueing flash-based
				166	devices, if processes do synchronous and sequential I/O. In
				167	addition, under BFQ, device idling is also instrumental in
				168	guaranteeing the desired throughput fraction to processes
				169	issuing sync requests (see the description of the slice_idle
				170	tunable in this document, or [1, 2], for more details).
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	171
				172	- With respect to idling for service guarantees, if several
				173	processes are competing for the device at the same time, but
Paolo Valente	233f0bf	2017-08-31 20:00:30 +0200	[diff] [blame]	174	all processes and groups have the same weight, then BFQ
				175	guarantees the expected throughput distribution without ever
				176	idling the device. Throughput is thus as high as possible in
				177	this common scenario.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	178
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	179	- On flash-based storage with internal queueing of commands
				180	(typically NCQ), device idling happens to be always detrimental
				181	for throughput. So, with these devices, BFQ performs idling
				182	only when strictly needed for service guarantees, i.e., for
				183	guaranteeing low latency or fairness. In these cases, overall
				184	throughput may be sub-optimal. No solution currently exists to
				185	provide both strong service guarantees and optimal throughput
				186	on devices with internal queueing.
				187
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	188	- If low-latency mode is enabled (default configuration), BFQ
				189	executes some special heuristics to detect interactive and soft
				190	real-time applications (e.g., video or audio players/streamers),
				191	and to reduce their latency. The most important action taken to
				192	achieve this goal is to give to the queues associated with these
				193	applications more than their fair share of the device
				194	throughput. For brevity, we call just "weight-raising" the whole
				195	sets of actions taken by BFQ to privilege these queues. In
				196	particular, BFQ provides a milder form of weight-raising for
				197	interactive applications, and a stronger form for soft real-time
				198	applications.
				199
				200	- BFQ automatically deactivates idling for queues born in a burst of
				201	queue creations. In fact, these queues are usually associated with
				202	the processes of applications and services that benefit mostly
				203	from a high throughput. Examples are systemd during boot, or git
				204	grep.
				205
				206	- As CFQ, BFQ merges queues performing interleaved I/O, i.e.,
				207	performing random I/O that becomes mostly sequential if
				208	merged. Differently from CFQ, BFQ achieves this goal with a more
				209	reactive mechanism, called Early Queue Merge (EQM). EQM is so
				210	responsive in detecting interleaved I/O (cooperating processes),
				211	that it enables BFQ to achieve a high throughput, by queue
				212	merging, even for queues for which CFQ needs a different
				213	mechanism, preemption, to get a high throughput. As such EQM is a
				214	unified mechanism to achieve a high throughput with interleaved
				215	I/O.
				216
				217	- Queues are scheduled according to a variant of WF2Q+, named
				218	B-WF2Q+, and implemented using an augmented rb-tree to preserve an
				219	O(log N) overall complexity. See [2] for more details. B-WF2Q+ is
Paolo Valente	233f0bf	2017-08-31 20:00:30 +0200	[diff] [blame]	220	also ready for hierarchical scheduling, details in Section 4.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	221
				222	- B-WF2Q+ guarantees a tight deviation with respect to an ideal,
				223	perfectly fair, and smooth service. In particular, B-WF2Q+
				224	guarantees that each queue receives a fraction of the device
				225	throughput proportional to its weight, even if the throughput
				226	fluctuates, and regardless of: the device parameters, the current
				227	workload and the budgets assigned to the queue.
				228
				229	- The last, budget-independence, property (although probably
				230	counterintuitive in the first place) is definitely beneficial, for
				231	the following reasons:
				232
				233	- First, with any proportional-share scheduler, the maximum
				234	deviation with respect to an ideal service is proportional to
				235	the maximum budget (slice) assigned to queues. As a consequence,
				236	BFQ can keep this deviation tight not only because of the
				237	accurate service of B-WF2Q+, but also because BFQ does not
				238	need to assign a larger budget to a queue to let the queue
				239	receive a higher fraction of the device throughput.
				240
				241	- Second, BFQ is free to choose, for every process (queue), the
				242	budget that best fits the needs of the process, or best
				243	leverages the I/O pattern of the process. In particular, BFQ
				244	updates queue budgets with a simple feedback-loop algorithm that
				245	allows a high throughput to be achieved, while still providing
				246	tight latency guarantees to time-sensitive applications. When
				247	the in-service queue expires, this algorithm computes the next
				248	budget of the queue so as to:
				249
				250	- Let large budgets be eventually assigned to the queues
				251	associated with I/O-bound applications performing sequential
				252	I/O: in fact, the longer these applications are served once
				253	got access to the device, the higher the throughput is.
				254
				255	- Let small budgets be eventually assigned to the queues
				256	associated with time-sensitive applications (which typically
				257	perform sporadic and short I/O), because, the smaller the
				258	budget assigned to a queue waiting for service is, the sooner
				259	B-WF2Q+ will serve that queue (Subsec 3.3 in [2]).
				260
				261	- If several processes are competing for the device at the same time,
				262	but all processes and groups have the same weight, then BFQ
				263	guarantees the expected throughput distribution without ever idling
				264	the device. It uses preemption instead. Throughput is then much
				265	higher in this common scenario.
				266
				267	- ioprio classes are served in strict priority order, i.e.,
				268	lower-priority queues are not served as long as there are
				269	higher-priority queues. Among queues in the same class, the
				270	bandwidth is distributed in proportion to the weight of each
				271	queue. A very thin extra bandwidth is however guaranteed to
				272	the Idle class, to prevent it from starving.
				273
				274
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	275	3. What are BFQ's tunables and how to properly configure BFQ?
				276	=============================================================
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	277
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	278	Most BFQ tunables affect service guarantees (basically latency and
				279	fairness) and throughput. For full details on how to choose the
				280	desired tradeoff between service guarantees and throughput, see the
				281	parameters slice_idle, strict_guarantees and low_latency. For details
				282	on how to maximise throughput, see slice_idle, timeout_sync and
				283	max_budget. The other performance-related parameters have been
				284	inherited from, and have been preserved mostly for compatibility with
				285	CFQ. So far, no performance improvement has been reported after
				286	changing the latter parameters in BFQ.
				287
				288	In particular, the tunables back_seek-max, back_seek_penalty,
				289	fifo_expire_async and fifo_expire_sync below are the same as in
				290	CFQ. Their description is just copied from that for CFQ. Some
				291	considerations in the description of slice_idle are copied from CFQ
				292	too.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	293
				294	per-process ioprio and weight
				295	-----------------------------
				296
Arianna Avanzini	e21b7a0	2017-04-12 18:23:08 +0200	[diff] [blame]	297	Unless the cgroups interface is used (see "4. BFQ group scheduling"),
				298	weights can be assigned to processes only indirectly, through I/O
				299	priorities, and according to the relation:
				300	weight = (IOPRIO_BE_NR - ioprio) * 10.
				301
				302	Beware that, if low-latency is set, then BFQ automatically raises the
				303	weight of the queues associated with interactive and soft real-time
				304	applications. Unset this tunable if you need/want to control weights.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	305
				306	slice_idle
				307	----------
				308
				309	This parameter specifies how long BFQ should idle for next I/O
				310	request, when certain sync BFQ queues become empty. By default
				311	slice_idle is a non-zero value. Idling has a double purpose: boosting
				312	throughput and making sure that the desired throughput distribution is
				313	respected (see the description of how BFQ works, and, if needed, the
				314	papers referred there).
				315
				316	As for throughput, idling can be very helpful on highly seeky media
				317	like single spindle SATA/SAS disks where we can cut down on overall
				318	number of seeks and see improved throughput.
				319
				320	Setting slice_idle to 0 will remove all the idling on queues and one
				321	should see an overall improved throughput on faster storage devices
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	322	like multiple SATA/SAS disks in hardware RAID configuration, as well
				323	as flash-based storage with internal command queueing (and
				324	parallelism).
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	325
				326	So depending on storage and workload, it might be useful to set
				327	slice_idle=0. In general for SATA/SAS disks and software RAID of
				328	SATA/SAS disks keeping slice_idle enabled should be useful. For any
				329	configurations where there are multiple spindles behind single LUN
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	330	(Host based hardware RAID controller or for storage arrays), or with
				331	flash-based fast storage, setting slice_idle=0 might end up in better
				332	throughput and acceptable latencies.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	333
				334	Idling is however necessary to have service guarantees enforced in
				335	case of differentiated weights or differentiated I/O-request lengths.
				336	To see why, suppose that a given BFQ queue A must get several I/O
				337	requests served for each request served for another queue B. Idling
				338	ensures that, if A makes a new I/O request slightly after becoming
				339	empty, then no request of B is dispatched in the middle, and thus A
				340	does not lose the possibility to get more than one request dispatched
				341	before the next request of B is dispatched. Note that idling
				342	guarantees the desired differentiated treatment of queues only in
				343	terms of I/O-request dispatches. To guarantee that the actual service
				344	order then corresponds to the dispatch order, the strict_guarantees
				345	tunable must be set too.
				346
				347	There is an important flipside for idling: apart from the above cases
				348	where it is beneficial also for throughput, idling can severely impact
				349	throughput. One important case is random workload. Because of this
				350	issue, BFQ tends to avoid idling as much as possible, when it is not
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	351	beneficial also for throughput (as detailed in Section 2). As a
				352	consequence of this behavior, and of further issues described for the
				353	strict_guarantees tunable, short-term service guarantees may be
				354	occasionally violated. And, in some cases, these guarantees may be
				355	more important than guaranteeing maximum throughput. For example, in
				356	video playing/streaming, a very low drop rate may be more important
				357	than maximum throughput. In these cases, consider setting the
				358	strict_guarantees parameter.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	359
				360	strict_guarantees
				361	-----------------
				362
				363	If this parameter is set (default: unset), then BFQ
				364
				365	- always performs idling when the in-service queue becomes empty;
				366
				367	- forces the device to serve one I/O request at a time, by dispatching a
				368	new request only if there is no outstanding request.
				369
				370	In the presence of differentiated weights or I/O-request sizes, both
				371	the above conditions are needed to guarantee that every BFQ queue
				372	receives its allotted share of the bandwidth. The first condition is
				373	needed for the reasons explained in the description of the slice_idle
				374	tunable. The second condition is needed because all modern storage
				375	devices reorder internally-queued requests, which may trivially break
				376	the service guarantees enforced by the I/O scheduler.
				377
				378	Setting strict_guarantees may evidently affect throughput.
				379
				380	back_seek_max
				381	-------------
				382
				383	This specifies, given in Kbytes, the maximum "distance" for backward seeking.
				384	The distance is the amount of space from the current head location to the
				385	sectors that are backward in terms of distance.
				386
				387	This parameter allows the scheduler to anticipate requests in the "backward"
				388	direction and consider them as being the "next" if they are within this
				389	distance from the current head location.
				390
				391	back_seek_penalty
				392	-----------------
				393
				394	This parameter is used to compute the cost of backward seeking. If the
				395	backward distance of request is just 1/back_seek_penalty from a "front"
				396	request, then the seeking cost of two requests is considered equivalent.
				397
				398	So scheduler will not bias toward one or the other request (otherwise scheduler
				399	will bias toward front request). Default value of back_seek_penalty is 2.
				400
				401	fifo_expire_async
				402	-----------------
				403
				404	This parameter is used to set the timeout of asynchronous requests. Default
				405	value of this is 248ms.
				406
				407	fifo_expire_sync
				408	----------------
				409
				410	This parameter is used to set the timeout of synchronous requests. Default
				411	value of this is 124ms. In case to favor synchronous requests over asynchronous
				412	one, this value should be decreased relative to fifo_expire_async.
				413
				414	low_latency
				415	-----------
				416
				417	This parameter is used to enable/disable BFQ's low latency mode. By
				418	default, low latency mode is enabled. If enabled, interactive and soft
				419	real-time applications are privileged and experience a lower latency,
				420	as explained in more detail in the description of how BFQ works.
				421
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	422	DISABLE this mode if you need full control on bandwidth
Paolo Valente	44e44a1	2017-04-12 18:23:12 +0200	[diff] [blame]	423	distribution. In fact, if it is enabled, then BFQ automatically
				424	increases the bandwidth share of privileged applications, as the main
				425	means to guarantee a lower latency to them.
				426
Paolo Valente	43c1b3d	2017-05-09 12:54:23 +0200	[diff] [blame]	427	In addition, as already highlighted at the beginning of this document,
				428	DISABLE this mode if your only goal is to achieve a high throughput.
				429	In fact, privileging the I/O of some application over the rest may
				430	entail a lower throughput. To achieve the highest-possible throughput
				431	on a non-rotational device, setting slice_idle to 0 may be needed too
				432	(at the cost of giving up any strong guarantee on fairness and low
				433	latency).
				434
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	435	timeout_sync
				436	------------
				437
				438	Maximum amount of device time that can be given to a task (queue) once
				439	it has been selected for service. On devices with costly seeks,
				440	increasing this time usually increases maximum throughput. On the
				441	opposite end, increasing this time coarsens the granularity of the
				442	short-term bandwidth and latency guarantees, especially if the
				443	following parameter is set to zero.
				444
				445	max_budget
				446	----------
				447
				448	Maximum amount of service, measured in sectors, that can be provided
				449	to a BFQ queue once it is set in service (of course within the limits
				450	of the above timeout). According to what said in the description of
				451	the algorithm, larger values increase the throughput in proportion to
				452	the percentage of sequential I/O requests issued. The price of larger
				453	values is that they coarsen the granularity of short-term bandwidth
				454	and latency guarantees.
				455
				456	The default value is 0, which enables auto-tuning: BFQ sets max_budget
				457	to the maximum number of sectors that can be served during
				458	timeout_sync, according to the estimated peak rate.
				459
Paolo Valente	2670cd1	2017-08-31 20:00:31 +0200	[diff] [blame]	460	For specific devices, some users have occasionally reported to have
				461	reached a higher throughput by setting max_budget explicitly, i.e., by
				462	setting max_budget to a higher value than 0. In particular, they have
				463	set max_budget to higher values than those to which BFQ would have set
				464	it with auto-tuning. An alternative way to achieve this goal is to
				465	just increase the value of timeout_sync, leaving max_budget equal to 0.
				466
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	467	weights
				468	-------
				469
				470	Read-only parameter, used to show the weights of the currently active
				471	BFQ queues.
				472
				473
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	474	4. Group scheduling with BFQ
				475	============================
				476
Arianna Avanzini	e21b7a0	2017-04-12 18:23:08 +0200	[diff] [blame]	477	BFQ supports both cgroups-v1 and cgroups-v2 io controllers, namely
				478	blkio and io. In particular, BFQ supports weight-based proportional
				479	share. To activate cgroups support, set BFQ_GROUP_IOSCHED.
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	480
				481	4-1 Service guarantees provided
				482	-------------------------------
				483
				484	With BFQ, proportional share means true proportional share of the
				485	device bandwidth, according to group weights. For example, a group
				486	with weight 200 gets twice the bandwidth, and not just twice the time,
				487	of a group with weight 100.
				488
				489	BFQ supports hierarchies (group trees) of any depth. Bandwidth is
				490	distributed among groups and processes in the expected way: for each
				491	group, the children of the group share the whole bandwidth of the
				492	group in proportion to their weights. In particular, this implies
				493	that, for each leaf group, every process of the group receives the
				494	same share of the whole group bandwidth, unless the ioprio of the
				495	process is modified.
				496
				497	The resource-sharing guarantee for a group may partially or totally
				498	switch from bandwidth to time, if providing bandwidth guarantees to
				499	the group lowers the throughput too much. This switch occurs on a
				500	per-process basis: if a process of a leaf group causes throughput loss
				501	if served in such a way to receive its share of the bandwidth, then
				502	BFQ switches back to just time-based proportional share for that
				503	process.
				504
				505	4-2 Interface
				506	-------------
				507
				508	To get proportional sharing of bandwidth with BFQ for a given device,
				509	BFQ must of course be the active scheduler for that device.
				510
				511	Within each group directory, the names of the files associated with
				512	BFQ-specific cgroup parameters and stats begin with the "bfq."
				513	prefix. So, with cgroups-v1 or cgroups-v2, the full prefix for
				514	BFQ-specific files is "blkio.bfq." or "io.bfq." For example, the group
				515	parameter to set the weight of a group with BFQ is blkio.bfq.weight
				516	or io.bfq.weight.
				517
Luca Miccio	a33801e	2017-11-13 07:34:10 +0100	[diff] [blame]	518	As for cgroups-v1 (blkio controller), the exact set of stat files
				519	created, and kept up-to-date by bfq, depends on whether
				520	CONFIG_DEBUG_BLK_CGROUP is set. If it is set, then bfq creates all
				521	the stat files documented in
				522	Documentation/cgroup-v1/blkio-controller.txt. If, instead,
				523	CONFIG_DEBUG_BLK_CGROUP is not set, then bfq creates only the files
				524	blkio.bfq.io_service_bytes
				525	blkio.bfq.io_service_bytes_recursive
				526	blkio.bfq.io_serviced
				527	blkio.bfq.io_serviced_recursive
				528
				529	The value of CONFIG_DEBUG_BLK_CGROUP greatly influences the maximum
				530	throughput sustainable with bfq, because updating the blkio.bfq.*
				531	stats is rather costly, especially for some of the stats enabled by
				532	CONFIG_DEBUG_BLK_CGROUP.
				533
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	534	Parameters to set
				535	-----------------
				536
				537	For each group, there is only the following parameter to set.
				538
				539	weight (namely blkio.bfq.weight or io.bfq-weight): the weight of the
				540	group inside its parent. Available values: 1..10000 (default 100). The
				541	linear mapping between ioprio and weights, described at the beginning
				542	of the tunable section, is still valid, but all weights higher than
				543	IOPRIO_BE_NR*10 are mapped to ioprio 0.
				544
Paolo Valente	44e44a1	2017-04-12 18:23:12 +0200	[diff] [blame]	545	Recall that, if low-latency is set, then BFQ automatically raises the
				546	weight of the queues associated with interactive and soft real-time
				547	applications. Unset this tunable if you need/want to control weights.
				548
Paolo Valente	aee69d7	2017-04-19 08:29:02 -0600	[diff] [blame]	549
				550	[1] P. Valente, A. Avanzini, "Evolution of the BFQ Storage I/O
				551	Scheduler", Proceedings of the First Workshop on Mobile System
				552	Technologies (MST-2015), May 2015.
				553	http://algogroup.unimore.it/people/paolo/disk_sched/mst-2015.pdf
				554
				555	[2] P. Valente and M. Andreolini, "Improving Application
				556	Responsiveness with the BFQ Disk I/O Scheduler", Proceedings of
				557	the 5th Annual International Systems and Storage Conference
				558	(SYSTOR '12), June 2012.
				559	Slightly extended version:
				560	http://algogroup.unimore.it/people/paolo/disk_sched/bfq-v1-suite-
				561	results.pdf