Blame - Documentation/block/queue-sysfs.rst - SHIFTPHONES/mainline/linux

blob: 3f569d5324857355541a8d7487a114c14c52933d [file] [log] [blame]

Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	1	=================
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	2	Queue sysfs files
				3	=================
				4
				5	This text file will detail the queue files that are located in the sysfs tree
				6	for each block device. Note that stacked devices typically do not export
Damien Le Moal	9d824642	2021-10-27 11:22:23 +0900	[diff] [blame]	7	any settings, since their queue merely functions as a remapping target.
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	8	These files are the ones found in the /sys/block/xxx/queue/ directory.
				9
				10	Files denoted with a RO postfix are readonly and the RW postfix means
				11	read-write.
				12
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	13	add_random (RW)
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	14	---------------
Arnd Hannemann	db4ced1	2014-08-26 12:33:20 +0200	[diff] [blame]	15	This file allows to turn off the disk entropy contribution. Default
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	16	value of this file is '1'(on).
				17
Bart Van Assche	6728ac3	2019-06-28 13:07:43 -0700	[diff] [blame]	18	chunk_sectors (RO)
				19	------------------
				20	This has different meaning depending on the type of the block device.
				21	For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors
				22	of the RAID volume stripe segment. For a zoned block device, either host-aware
				23	or host-managed, chunk_sectors indicates the size in 512B sectors of the zones
				24	of the device, with the eventual exception of the last zone of the device which
				25	may be smaller.
				26
Joe Lawrence	005411e	2016-08-09 14:01:30 -0400	[diff] [blame]	27	dax (RO)
				28	--------
				29	This file indicates whether the device supports Direct Access (DAX),
				30	used by CPU-addressable storage to bypass the pagecache. It shows '1'
				31	if true, '0' if not.
				32
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	33	discard_granularity (RO)
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	34	------------------------
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	35	This shows the size of internal allocation of the device in bytes, if
				36	reported by the device. A value of '0' means device does not support
				37	the discard functionality.
				38
Jens Axboe	0034af0	2015-07-16 09:14:26 -0600	[diff] [blame]	39	discard_max_hw_bytes (RO)
Mauro Carvalho Chehab	898bd37	2019-04-18 19:45:00 -0300	[diff] [blame]	40	-------------------------
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	41	Devices that support discard functionality may have internal limits on
				42	the number of bytes that can be trimmed or unmapped in a single operation.
Stephen Kitt	f99b4fe	2021-09-10 12:51:42 +0200	[diff] [blame]	43	The `discard_max_hw_bytes` parameter is set by the device driver to the
				44	maximum number of bytes that can be discarded in a single operation.
				45	Discard requests issued to the device must not exceed this limit.
				46	A `discard_max_hw_bytes` value of 0 means that the device does not support
				47	discard functionality.
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	48
Jens Axboe	0034af0	2015-07-16 09:14:26 -0600	[diff] [blame]	49	discard_max_bytes (RW)
				50	----------------------
				51	While discard_max_hw_bytes is the hardware limit for the device, this
				52	setting is the software limit. Some devices exhibit large latencies when
				53	large discards are issued, setting this value lower will make Linux issue
				54	smaller discards and potentially help reduce latencies induced by large
				55	discard operations.
				56
Bart Van Assche	fbbe7c8	2019-06-28 13:07:45 -0700	[diff] [blame]	57	discard_zeroes_data (RO)
				58	------------------------
				59	Obsolete. Always zero.
				60
				61	fua (RO)
				62	--------
				63	Whether or not the block driver supports the FUA flag for write requests.
				64	FUA stands for Force Unit Access. If the FUA flag is set that means that
				65	write requests must bypass the volatile cache of the storage device.
				66
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	67	hw_sector_size (RO)
				68	-------------------
				69	This is the hardware sector size of the device, in bytes.
				70
Joe Lawrence	005411e	2016-08-09 14:01:30 -0400	[diff] [blame]	71	io_poll (RW)
				72	------------
Jeff Moyer	7158339	2017-01-03 17:51:33 -0500	[diff] [blame]	73	When read, this file shows whether polling is enabled (1) or disabled
				74	(0). Writing '0' to this file will disable polling for this device.
				75	Writing any non-zero value will enable this feature.
Joe Lawrence	005411e	2016-08-09 14:01:30 -0400	[diff] [blame]	76
Jens Axboe	10e6246	2016-11-17 22:23:02 -0700	[diff] [blame]	77	io_poll_delay (RW)
				78	------------------
				79	If polling is enabled, this controls what kind of polling will be
				80	performed. It defaults to -1, which is classic polling. In this mode,
				81	the CPU will repeatedly ask for completions without giving up any time.
				82	If set to 0, a hybrid polling mode is used, where the kernel will attempt
				83	to make an educated guess at when the IO will complete. Based on this
				84	guess, the kernel will put the process issuing IO to sleep for an amount
				85	of time, before entering a classic poll loop. This mode might be a
				86	little slower than pure classic polling, but it will be more efficient.
				87	If set to a value larger than 0, the kernel will put the process issuing
Damien Le Moal	f982495	2018-11-30 14:36:24 +0900	[diff] [blame]	88	IO to sleep for this amount of microseconds before entering classic
Jens Axboe	10e6246	2016-11-17 22:23:02 -0700	[diff] [blame]	89	polling.
				90
Weiping Zhang	bb351ab	2018-12-26 11:56:33 +0800	[diff] [blame]	91	io_timeout (RW)
				92	---------------
				93	io_timeout is the request timeout in milliseconds. If a request does not
				94	complete in this time then the block driver timeout handler is invoked.
				95	That timeout handler can decide to retry the request, to fail it or to start
				96	a device recovery strategy.
				97
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	98	iostats (RW)
				99	-------------
				100	This file is used to control (on/off) the iostats accounting of the
				101	disk.
				102
				103	logical_block_size (RO)
				104	-----------------------
Masanari Iida	141fd28	2016-06-29 05:10:57 +0900	[diff] [blame]	105	This is the logical block size of the device, in bytes.
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	106
Bart Van Assche	fbbe7c8	2019-06-28 13:07:45 -0700	[diff] [blame]	107	max_discard_segments (RO)
				108	-------------------------
				109	The maximum number of DMA scatter/gather entries in a discard request.
				110
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	111	max_hw_sectors_kb (RO)
				112	----------------------
				113	This is the maximum number of kilobytes supported in a single data transfer.
				114
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	115	max_integrity_segments (RO)
				116	---------------------------
Bart Van Assche	0c766e7	2019-06-28 13:07:44 -0700	[diff] [blame]	117	Maximum number of elements in a DMA scatter/gather list with integrity
				118	data that will be submitted by the block layer core to the associated
				119	block driver.
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	120
Niklas Cassel	659bf82	2020-07-14 23:18:24 +0200	[diff] [blame]	121	max_active_zones (RO)
				122	---------------------
				123	For zoned block devices (zoned attribute indicating "host-managed" or
				124	"host-aware"), the sum of zones belonging to any of the zone states:
				125	EXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value.
				126	If this value is 0, there is no limit.
				127
Keith Busch	3b481d9	2020-09-24 13:53:28 -0700	[diff] [blame]	128	If the host attempts to exceed this limit, the driver should report this error
				129	with BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW
				130	errno.
				131
Niklas Cassel	e15864f	2020-07-14 23:18:23 +0200	[diff] [blame]	132	max_open_zones (RO)
				133	-------------------
				134	For zoned block devices (zoned attribute indicating "host-managed" or
				135	"host-aware"), the sum of zones belonging to any of the zone states:
				136	EXPLICIT OPEN or IMPLICIT OPEN, is limited by this value.
				137	If this value is 0, there is no limit.
				138
Keith Busch	3b481d9	2020-09-24 13:53:28 -0700	[diff] [blame]	139	If the host attempts to exceed this limit, the driver should report this error
				140	with BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS
				141	errno.
				142
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	143	max_sectors_kb (RW)
				144	-------------------
				145	This is the maximum number of kilobytes that the block layer will allow
				146	for a filesystem request. Must be smaller than or equal to the maximum
				147	size allowed by the hardware.
				148
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	149	max_segments (RO)
				150	-----------------
Bart Van Assche	0c766e7	2019-06-28 13:07:44 -0700	[diff] [blame]	151	Maximum number of elements in a DMA scatter/gather list that is submitted
				152	to the associated block driver.
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	153
				154	max_segment_size (RO)
				155	---------------------
Bart Van Assche	0c766e7	2019-06-28 13:07:44 -0700	[diff] [blame]	156	Maximum size in bytes of a single element in a DMA scatter/gather list.
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	157
				158	minimum_io_size (RO)
				159	--------------------
Arnd Hannemann	db4ced1	2014-08-26 12:33:20 +0200	[diff] [blame]	160	This is the smallest preferred IO size reported by the device.
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	161
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	162	nomerges (RW)
				163	-------------
Alan D. Brunelle	488991e	2010-01-29 09:04:08 +0100	[diff] [blame]	164	This enables the user to disable the lookup logic involved with IO
				165	merging requests in the block layer. By default (0) all merges are
				166	enabled. When set to 1 only simple one-hit merges will be tried. When
				167	set to 2 no merge algorithms will be tried (including one-hit or more
				168	complex tree/hash lookups).
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	169
				170	nr_requests (RW)
				171	----------------
				172	This controls how many requests may be allocated in the block layer for
				173	read or write requests. Note that the total allocated number may be twice
				174	this amount, since it applies only to reads or writes (not the accumulated
				175	sum).
				176
Tejun Heo	a051661	2012-06-26 15:05:44 -0700	[diff] [blame]	177	To avoid priority inversion through request starvation, a request
				178	queue maintains a separate request pool per each cgroup when
				179	CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such
				180	per-block-cgroup request pool. IOW, if there are N block cgroups,
Anatol Pomozov	f884ab1	2013-05-08 16:56:16 -0700	[diff] [blame]	181	each request queue may have up to N request pools, each independently
Tejun Heo	a051661	2012-06-26 15:05:44 -0700	[diff] [blame]	182	regulated by nr_requests.
				183
Bart Van Assche	6728ac3	2019-06-28 13:07:43 -0700	[diff] [blame]	184	nr_zones (RO)
				185	-------------
				186	For zoned block devices (zoned attribute indicating "host-managed" or
				187	"host-aware"), this indicates the total number of zones of the device.
				188	This is always 0 for regular block devices.
				189
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	190	optimal_io_size (RO)
				191	--------------------
Arnd Hannemann	db4ced1	2014-08-26 12:33:20 +0200	[diff] [blame]	192	This is the optimal IO size reported by the device.
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	193
				194	physical_block_size (RO)
				195	------------------------
				196	This is the physical block size of device, in bytes.
				197
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	198	read_ahead_kb (RW)
				199	------------------
				200	Maximum number of kilobytes to read-ahead for filesystems on this block
				201	device.
				202
Namjae Jeon	4004e90	2012-08-09 15:28:05 +0200	[diff] [blame]	203	rotational (RW)
				204	---------------
				205	This file is used to stat if the device is of rotational type or
				206	non-rotational type.
				207
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	208	rq_affinity (RW)
				209	----------------
Dan Williams	5757a6d	2011-07-23 20:44:25 +0200	[diff] [blame]	210	If this option is '1', the block layer will migrate request completions to the
				211	cpu "group" that originally submitted the request. For some workloads this
				212	provides a significant reduction in CPU cycles due to caching effects.
				213
				214	For storage configurations that need to maximize distribution of completion
				215	processing setting this option to '2' forces the completion to run on the
				216	requesting cpu (bypassing the "group" aggregation logic).
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	217
				218	scheduler (RW)
				219	--------------
				220	When read, this file will display the current and available IO schedulers
				221	for this block device. The currently active IO scheduler will be enclosed
				222	in [] brackets. Writing an IO scheduler name to this file will switch
				223	control of this block device to that new IO scheduler. Note that writing
				224	an IO scheduler name to this file will attempt to load that IO scheduler
				225	module, if it isn't already present in the system.
				226
Jens Axboe	93e9d8e	2016-04-12 12:32:46 -0600	[diff] [blame]	227	write_cache (RW)
				228	----------------
				229	When read, this file will display whether the device has write back
				230	caching enabled or not. It will return "write back" for the former
				231	case, and "write through" for the latter. Writing to this file can
				232	change the kernels view of the device, but it doesn't alter the
				233	device state. This means that it might not be safe to toggle the
				234	setting from "write back" to "write through", since that will also
				235	eliminate cache flushes issued by the kernel.
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	236
Joe Lawrence	005411e	2016-08-09 14:01:30 -0400	[diff] [blame]	237	write_same_max_bytes (RO)
				238	-------------------------
				239	This is the number of bytes the device can write in a single write-same
				240	command. A value of '0' means write-same is not supported by this
				241	device.
				242
Bart Van Assche	152c777	2019-06-28 13:07:42 -0700	[diff] [blame]	243	wbt_lat_usec (RW)
				244	-----------------
Jens Axboe	87760e5	2016-11-09 12:38:14 -0700	[diff] [blame]	245	If the device is registered for writeback throttling, then this file shows
				246	the target minimum read latency. If this latency is exceeded in a given
				247	window of time (see wb_window_usec), then the writeback throttling will start
Jens Axboe	80e091d	2016-11-28 09:22:47 -0700	[diff] [blame]	248	scaling back writes. Writing a value of '0' to this file disables the
				249	feature. Writing a value of '-1' to this file resets the value to the
				250	default setting.
Jens Axboe	87760e5	2016-11-09 12:38:14 -0700	[diff] [blame]	251
Shaohua Li	297e3d8	2017-03-27 10:51:37 -0700	[diff] [blame]	252	throttle_sample_time (RW)
				253	-------------------------
				254	This is the time window that blk-throttle samples data, in millisecond.
				255	blk-throttle makes decision based on the samplings. Lower time means cgroups
				256	have more smooth throughput, but higher CPU overhead. This exists only when
				257	CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	258
Bart Van Assche	fbbe7c8	2019-06-28 13:07:45 -0700	[diff] [blame]	259	write_zeroes_max_bytes (RO)
				260	---------------------------
				261	For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of
				262	bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES
				263	is not supported.
				264
Damien Le Moal	f183642	2021-01-28 13:47:26 +0900	[diff] [blame]	265	zone_append_max_bytes (RO)
				266	--------------------------
				267	This is the maximum number of bytes that can be written to a sequential
				268	zone of a zoned block device using a zone append write operation
				269	(REQ_OP_ZONE_APPEND). This value is always 0 for regular block devices.
				270
Damien Le Moal	f982495	2018-11-30 14:36:24 +0900	[diff] [blame]	271	zoned (RO)
				272	----------
				273	This indicates if the device is a zoned block device and the zone model of the
				274	device if it is indeed zoned. The possible values indicated by zoned are
				275	"none" for regular block devices and "host-aware" or "host-managed" for zoned
				276	block devices. The characteristics of host-aware and host-managed zoned block
				277	devices are described in the ZBC (Zoned Block Commands) and ZAC
				278	(Zoned Device ATA Command Set) standards. These standards also define the
				279	"drive-managed" zone model. However, since drive-managed zoned block devices
				280	do not support zone commands, they will be treated as regular block devices
				281	and zoned will report "none".
				282
Damien Le Moal	a805a4f	2021-01-28 13:47:30 +0900	[diff] [blame]	283	zone_write_granularity (RO)
				284	---------------------------
				285	This indicates the alignment constraint, in bytes, for write operations in
				286	sequential zones of zoned block devices (devices with a zoned attributed
				287	that reports "host-managed" or "host-aware"). This value is always 0 for
				288	regular block devices.
				289
Damien Le Moal	6b3bae2	2021-10-27 11:22:22 +0900	[diff] [blame]	290	independent_access_ranges (RO)
				291	------------------------------
				292
				293	The presence of this sub-directory of the /sys/block/xxx/queue/ directory
				294	indicates that the device is capable of executing requests targeting
				295	different sector ranges in parallel. For instance, single LUN multi-actuator
				296	hard-disks will have an independent_access_ranges directory if the device
				297	correctly advertizes the sector ranges of its actuators.
				298
				299	The independent_access_ranges directory contains one directory per access
				300	range, with each range described using the sector (RO) attribute file to
				301	indicate the first sector of the range and the nr_sectors (RO) attribute file
				302	to indicate the total number of sectors in the range starting from the first
				303	sector of the range. For example, a dual-actuator hard-disk will have the
				304	following independent_access_ranges entries.::
				305
				306	$ tree /sys/block/<device>/queue/independent_access_ranges/
				307	/sys/block/<device>/queue/independent_access_ranges/
				308	\|-- 0
				309	\| \|-- nr_sectors
				310	\| `-- sector
				311	`-- 1
				312	\|-- nr_sectors
				313	`-- sector
				314
				315	The sector and nr_sectors attributes use 512B sector unit, regardless of
				316	the actual block size of the device. Independent access ranges do not
				317	overlap and include all sectors within the device capacity. The access
				318	ranges are numbered in increasing order of the range start sector,
				319	that is, the sector attribute of range 0 always has the value 0.
				320
Jens Axboe	cbb5901	2009-02-02 13:02:31 +0100	[diff] [blame]	321	Jens Axboe <jens.axboe@oracle.com>, February 2009