Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 1 | ================= |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 2 | Queue sysfs files |
| 3 | ================= |
| 4 | |
| 5 | This text file will detail the queue files that are located in the sysfs tree |
| 6 | for each block device. Note that stacked devices typically do not export |
Damien Le Moal | 9d824642 | 2021-10-27 11:22:23 +0900 | [diff] [blame] | 7 | any settings, since their queue merely functions as a remapping target. |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 8 | These files are the ones found in the /sys/block/xxx/queue/ directory. |
| 9 | |
| 10 | Files denoted with a RO postfix are readonly and the RW postfix means |
| 11 | read-write. |
| 12 | |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 13 | add_random (RW) |
Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 14 | --------------- |
Arnd Hannemann | db4ced1 | 2014-08-26 12:33:20 +0200 | [diff] [blame] | 15 | This file allows to turn off the disk entropy contribution. Default |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 16 | value of this file is '1'(on). |
| 17 | |
Bart Van Assche | 6728ac3 | 2019-06-28 13:07:43 -0700 | [diff] [blame] | 18 | chunk_sectors (RO) |
| 19 | ------------------ |
| 20 | This has different meaning depending on the type of the block device. |
| 21 | For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors |
| 22 | of the RAID volume stripe segment. For a zoned block device, either host-aware |
| 23 | or host-managed, chunk_sectors indicates the size in 512B sectors of the zones |
| 24 | of the device, with the eventual exception of the last zone of the device which |
| 25 | may be smaller. |
| 26 | |
Joe Lawrence | 005411e | 2016-08-09 14:01:30 -0400 | [diff] [blame] | 27 | dax (RO) |
| 28 | -------- |
| 29 | This file indicates whether the device supports Direct Access (DAX), |
| 30 | used by CPU-addressable storage to bypass the pagecache. It shows '1' |
| 31 | if true, '0' if not. |
| 32 | |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 33 | discard_granularity (RO) |
Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 34 | ------------------------ |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 35 | This shows the size of internal allocation of the device in bytes, if |
| 36 | reported by the device. A value of '0' means device does not support |
| 37 | the discard functionality. |
| 38 | |
Jens Axboe | 0034af0 | 2015-07-16 09:14:26 -0600 | [diff] [blame] | 39 | discard_max_hw_bytes (RO) |
Mauro Carvalho Chehab | 898bd37 | 2019-04-18 19:45:00 -0300 | [diff] [blame] | 40 | ------------------------- |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 41 | Devices that support discard functionality may have internal limits on |
| 42 | the number of bytes that can be trimmed or unmapped in a single operation. |
Stephen Kitt | f99b4fe | 2021-09-10 12:51:42 +0200 | [diff] [blame] | 43 | The `discard_max_hw_bytes` parameter is set by the device driver to the |
| 44 | maximum number of bytes that can be discarded in a single operation. |
| 45 | Discard requests issued to the device must not exceed this limit. |
| 46 | A `discard_max_hw_bytes` value of 0 means that the device does not support |
| 47 | discard functionality. |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 48 | |
Jens Axboe | 0034af0 | 2015-07-16 09:14:26 -0600 | [diff] [blame] | 49 | discard_max_bytes (RW) |
| 50 | ---------------------- |
| 51 | While discard_max_hw_bytes is the hardware limit for the device, this |
| 52 | setting is the software limit. Some devices exhibit large latencies when |
| 53 | large discards are issued, setting this value lower will make Linux issue |
| 54 | smaller discards and potentially help reduce latencies induced by large |
| 55 | discard operations. |
| 56 | |
Bart Van Assche | fbbe7c8 | 2019-06-28 13:07:45 -0700 | [diff] [blame] | 57 | discard_zeroes_data (RO) |
| 58 | ------------------------ |
| 59 | Obsolete. Always zero. |
| 60 | |
| 61 | fua (RO) |
| 62 | -------- |
| 63 | Whether or not the block driver supports the FUA flag for write requests. |
| 64 | FUA stands for Force Unit Access. If the FUA flag is set that means that |
| 65 | write requests must bypass the volatile cache of the storage device. |
| 66 | |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 67 | hw_sector_size (RO) |
| 68 | ------------------- |
| 69 | This is the hardware sector size of the device, in bytes. |
| 70 | |
Joe Lawrence | 005411e | 2016-08-09 14:01:30 -0400 | [diff] [blame] | 71 | io_poll (RW) |
| 72 | ------------ |
Jeff Moyer | 7158339 | 2017-01-03 17:51:33 -0500 | [diff] [blame] | 73 | When read, this file shows whether polling is enabled (1) or disabled |
| 74 | (0). Writing '0' to this file will disable polling for this device. |
| 75 | Writing any non-zero value will enable this feature. |
Joe Lawrence | 005411e | 2016-08-09 14:01:30 -0400 | [diff] [blame] | 76 | |
Jens Axboe | 10e6246 | 2016-11-17 22:23:02 -0700 | [diff] [blame] | 77 | io_poll_delay (RW) |
| 78 | ------------------ |
| 79 | If polling is enabled, this controls what kind of polling will be |
| 80 | performed. It defaults to -1, which is classic polling. In this mode, |
| 81 | the CPU will repeatedly ask for completions without giving up any time. |
| 82 | If set to 0, a hybrid polling mode is used, where the kernel will attempt |
| 83 | to make an educated guess at when the IO will complete. Based on this |
| 84 | guess, the kernel will put the process issuing IO to sleep for an amount |
| 85 | of time, before entering a classic poll loop. This mode might be a |
| 86 | little slower than pure classic polling, but it will be more efficient. |
| 87 | If set to a value larger than 0, the kernel will put the process issuing |
Damien Le Moal | f982495 | 2018-11-30 14:36:24 +0900 | [diff] [blame] | 88 | IO to sleep for this amount of microseconds before entering classic |
Jens Axboe | 10e6246 | 2016-11-17 22:23:02 -0700 | [diff] [blame] | 89 | polling. |
| 90 | |
Weiping Zhang | bb351ab | 2018-12-26 11:56:33 +0800 | [diff] [blame] | 91 | io_timeout (RW) |
| 92 | --------------- |
| 93 | io_timeout is the request timeout in milliseconds. If a request does not |
| 94 | complete in this time then the block driver timeout handler is invoked. |
| 95 | That timeout handler can decide to retry the request, to fail it or to start |
| 96 | a device recovery strategy. |
| 97 | |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 98 | iostats (RW) |
| 99 | ------------- |
| 100 | This file is used to control (on/off) the iostats accounting of the |
| 101 | disk. |
| 102 | |
| 103 | logical_block_size (RO) |
| 104 | ----------------------- |
Masanari Iida | 141fd28 | 2016-06-29 05:10:57 +0900 | [diff] [blame] | 105 | This is the logical block size of the device, in bytes. |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 106 | |
Bart Van Assche | fbbe7c8 | 2019-06-28 13:07:45 -0700 | [diff] [blame] | 107 | max_discard_segments (RO) |
| 108 | ------------------------- |
| 109 | The maximum number of DMA scatter/gather entries in a discard request. |
| 110 | |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 111 | max_hw_sectors_kb (RO) |
| 112 | ---------------------- |
| 113 | This is the maximum number of kilobytes supported in a single data transfer. |
| 114 | |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 115 | max_integrity_segments (RO) |
| 116 | --------------------------- |
Bart Van Assche | 0c766e7 | 2019-06-28 13:07:44 -0700 | [diff] [blame] | 117 | Maximum number of elements in a DMA scatter/gather list with integrity |
| 118 | data that will be submitted by the block layer core to the associated |
| 119 | block driver. |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 120 | |
Niklas Cassel | 659bf82 | 2020-07-14 23:18:24 +0200 | [diff] [blame] | 121 | max_active_zones (RO) |
| 122 | --------------------- |
| 123 | For zoned block devices (zoned attribute indicating "host-managed" or |
| 124 | "host-aware"), the sum of zones belonging to any of the zone states: |
| 125 | EXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value. |
| 126 | If this value is 0, there is no limit. |
| 127 | |
Keith Busch | 3b481d9 | 2020-09-24 13:53:28 -0700 | [diff] [blame] | 128 | If the host attempts to exceed this limit, the driver should report this error |
| 129 | with BLK_STS_ZONE_ACTIVE_RESOURCE, which user space may see as the EOVERFLOW |
| 130 | errno. |
| 131 | |
Niklas Cassel | e15864f | 2020-07-14 23:18:23 +0200 | [diff] [blame] | 132 | max_open_zones (RO) |
| 133 | ------------------- |
| 134 | For zoned block devices (zoned attribute indicating "host-managed" or |
| 135 | "host-aware"), the sum of zones belonging to any of the zone states: |
| 136 | EXPLICIT OPEN or IMPLICIT OPEN, is limited by this value. |
| 137 | If this value is 0, there is no limit. |
| 138 | |
Keith Busch | 3b481d9 | 2020-09-24 13:53:28 -0700 | [diff] [blame] | 139 | If the host attempts to exceed this limit, the driver should report this error |
| 140 | with BLK_STS_ZONE_OPEN_RESOURCE, which user space may see as the ETOOMANYREFS |
| 141 | errno. |
| 142 | |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 143 | max_sectors_kb (RW) |
| 144 | ------------------- |
| 145 | This is the maximum number of kilobytes that the block layer will allow |
| 146 | for a filesystem request. Must be smaller than or equal to the maximum |
| 147 | size allowed by the hardware. |
| 148 | |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 149 | max_segments (RO) |
| 150 | ----------------- |
Bart Van Assche | 0c766e7 | 2019-06-28 13:07:44 -0700 | [diff] [blame] | 151 | Maximum number of elements in a DMA scatter/gather list that is submitted |
| 152 | to the associated block driver. |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 153 | |
| 154 | max_segment_size (RO) |
| 155 | --------------------- |
Bart Van Assche | 0c766e7 | 2019-06-28 13:07:44 -0700 | [diff] [blame] | 156 | Maximum size in bytes of a single element in a DMA scatter/gather list. |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 157 | |
| 158 | minimum_io_size (RO) |
| 159 | -------------------- |
Arnd Hannemann | db4ced1 | 2014-08-26 12:33:20 +0200 | [diff] [blame] | 160 | This is the smallest preferred IO size reported by the device. |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 161 | |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 162 | nomerges (RW) |
| 163 | ------------- |
Alan D. Brunelle | 488991e | 2010-01-29 09:04:08 +0100 | [diff] [blame] | 164 | This enables the user to disable the lookup logic involved with IO |
| 165 | merging requests in the block layer. By default (0) all merges are |
| 166 | enabled. When set to 1 only simple one-hit merges will be tried. When |
| 167 | set to 2 no merge algorithms will be tried (including one-hit or more |
| 168 | complex tree/hash lookups). |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 169 | |
| 170 | nr_requests (RW) |
| 171 | ---------------- |
| 172 | This controls how many requests may be allocated in the block layer for |
| 173 | read or write requests. Note that the total allocated number may be twice |
| 174 | this amount, since it applies only to reads or writes (not the accumulated |
| 175 | sum). |
| 176 | |
Tejun Heo | a051661 | 2012-06-26 15:05:44 -0700 | [diff] [blame] | 177 | To avoid priority inversion through request starvation, a request |
| 178 | queue maintains a separate request pool per each cgroup when |
| 179 | CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such |
| 180 | per-block-cgroup request pool. IOW, if there are N block cgroups, |
Anatol Pomozov | f884ab1 | 2013-05-08 16:56:16 -0700 | [diff] [blame] | 181 | each request queue may have up to N request pools, each independently |
Tejun Heo | a051661 | 2012-06-26 15:05:44 -0700 | [diff] [blame] | 182 | regulated by nr_requests. |
| 183 | |
Bart Van Assche | 6728ac3 | 2019-06-28 13:07:43 -0700 | [diff] [blame] | 184 | nr_zones (RO) |
| 185 | ------------- |
| 186 | For zoned block devices (zoned attribute indicating "host-managed" or |
| 187 | "host-aware"), this indicates the total number of zones of the device. |
| 188 | This is always 0 for regular block devices. |
| 189 | |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 190 | optimal_io_size (RO) |
| 191 | -------------------- |
Arnd Hannemann | db4ced1 | 2014-08-26 12:33:20 +0200 | [diff] [blame] | 192 | This is the optimal IO size reported by the device. |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 193 | |
| 194 | physical_block_size (RO) |
| 195 | ------------------------ |
| 196 | This is the physical block size of device, in bytes. |
| 197 | |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 198 | read_ahead_kb (RW) |
| 199 | ------------------ |
| 200 | Maximum number of kilobytes to read-ahead for filesystems on this block |
| 201 | device. |
| 202 | |
Namjae Jeon | 4004e90 | 2012-08-09 15:28:05 +0200 | [diff] [blame] | 203 | rotational (RW) |
| 204 | --------------- |
| 205 | This file is used to stat if the device is of rotational type or |
| 206 | non-rotational type. |
| 207 | |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 208 | rq_affinity (RW) |
| 209 | ---------------- |
Dan Williams | 5757a6d | 2011-07-23 20:44:25 +0200 | [diff] [blame] | 210 | If this option is '1', the block layer will migrate request completions to the |
| 211 | cpu "group" that originally submitted the request. For some workloads this |
| 212 | provides a significant reduction in CPU cycles due to caching effects. |
| 213 | |
| 214 | For storage configurations that need to maximize distribution of completion |
| 215 | processing setting this option to '2' forces the completion to run on the |
| 216 | requesting cpu (bypassing the "group" aggregation logic). |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 217 | |
| 218 | scheduler (RW) |
| 219 | -------------- |
| 220 | When read, this file will display the current and available IO schedulers |
| 221 | for this block device. The currently active IO scheduler will be enclosed |
| 222 | in [] brackets. Writing an IO scheduler name to this file will switch |
| 223 | control of this block device to that new IO scheduler. Note that writing |
| 224 | an IO scheduler name to this file will attempt to load that IO scheduler |
| 225 | module, if it isn't already present in the system. |
| 226 | |
Jens Axboe | 93e9d8e | 2016-04-12 12:32:46 -0600 | [diff] [blame] | 227 | write_cache (RW) |
| 228 | ---------------- |
| 229 | When read, this file will display whether the device has write back |
| 230 | caching enabled or not. It will return "write back" for the former |
| 231 | case, and "write through" for the latter. Writing to this file can |
| 232 | change the kernels view of the device, but it doesn't alter the |
| 233 | device state. This means that it might not be safe to toggle the |
| 234 | setting from "write back" to "write through", since that will also |
| 235 | eliminate cache flushes issued by the kernel. |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 236 | |
Joe Lawrence | 005411e | 2016-08-09 14:01:30 -0400 | [diff] [blame] | 237 | write_same_max_bytes (RO) |
| 238 | ------------------------- |
| 239 | This is the number of bytes the device can write in a single write-same |
| 240 | command. A value of '0' means write-same is not supported by this |
| 241 | device. |
| 242 | |
Bart Van Assche | 152c777 | 2019-06-28 13:07:42 -0700 | [diff] [blame] | 243 | wbt_lat_usec (RW) |
| 244 | ----------------- |
Jens Axboe | 87760e5 | 2016-11-09 12:38:14 -0700 | [diff] [blame] | 245 | If the device is registered for writeback throttling, then this file shows |
| 246 | the target minimum read latency. If this latency is exceeded in a given |
| 247 | window of time (see wb_window_usec), then the writeback throttling will start |
Jens Axboe | 80e091d | 2016-11-28 09:22:47 -0700 | [diff] [blame] | 248 | scaling back writes. Writing a value of '0' to this file disables the |
| 249 | feature. Writing a value of '-1' to this file resets the value to the |
| 250 | default setting. |
Jens Axboe | 87760e5 | 2016-11-09 12:38:14 -0700 | [diff] [blame] | 251 | |
Shaohua Li | 297e3d8 | 2017-03-27 10:51:37 -0700 | [diff] [blame] | 252 | throttle_sample_time (RW) |
| 253 | ------------------------- |
| 254 | This is the time window that blk-throttle samples data, in millisecond. |
| 255 | blk-throttle makes decision based on the samplings. Lower time means cgroups |
| 256 | have more smooth throughput, but higher CPU overhead. This exists only when |
| 257 | CONFIG_BLK_DEV_THROTTLING_LOW is enabled. |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 258 | |
Bart Van Assche | fbbe7c8 | 2019-06-28 13:07:45 -0700 | [diff] [blame] | 259 | write_zeroes_max_bytes (RO) |
| 260 | --------------------------- |
| 261 | For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of |
| 262 | bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES |
| 263 | is not supported. |
| 264 | |
Damien Le Moal | f183642 | 2021-01-28 13:47:26 +0900 | [diff] [blame] | 265 | zone_append_max_bytes (RO) |
| 266 | -------------------------- |
| 267 | This is the maximum number of bytes that can be written to a sequential |
| 268 | zone of a zoned block device using a zone append write operation |
| 269 | (REQ_OP_ZONE_APPEND). This value is always 0 for regular block devices. |
| 270 | |
Damien Le Moal | f982495 | 2018-11-30 14:36:24 +0900 | [diff] [blame] | 271 | zoned (RO) |
| 272 | ---------- |
| 273 | This indicates if the device is a zoned block device and the zone model of the |
| 274 | device if it is indeed zoned. The possible values indicated by zoned are |
| 275 | "none" for regular block devices and "host-aware" or "host-managed" for zoned |
| 276 | block devices. The characteristics of host-aware and host-managed zoned block |
| 277 | devices are described in the ZBC (Zoned Block Commands) and ZAC |
| 278 | (Zoned Device ATA Command Set) standards. These standards also define the |
| 279 | "drive-managed" zone model. However, since drive-managed zoned block devices |
| 280 | do not support zone commands, they will be treated as regular block devices |
| 281 | and zoned will report "none". |
| 282 | |
Damien Le Moal | a805a4f | 2021-01-28 13:47:30 +0900 | [diff] [blame] | 283 | zone_write_granularity (RO) |
| 284 | --------------------------- |
| 285 | This indicates the alignment constraint, in bytes, for write operations in |
| 286 | sequential zones of zoned block devices (devices with a zoned attributed |
| 287 | that reports "host-managed" or "host-aware"). This value is always 0 for |
| 288 | regular block devices. |
| 289 | |
Damien Le Moal | 6b3bae2 | 2021-10-27 11:22:22 +0900 | [diff] [blame] | 290 | independent_access_ranges (RO) |
| 291 | ------------------------------ |
| 292 | |
| 293 | The presence of this sub-directory of the /sys/block/xxx/queue/ directory |
| 294 | indicates that the device is capable of executing requests targeting |
| 295 | different sector ranges in parallel. For instance, single LUN multi-actuator |
| 296 | hard-disks will have an independent_access_ranges directory if the device |
| 297 | correctly advertizes the sector ranges of its actuators. |
| 298 | |
| 299 | The independent_access_ranges directory contains one directory per access |
| 300 | range, with each range described using the sector (RO) attribute file to |
| 301 | indicate the first sector of the range and the nr_sectors (RO) attribute file |
| 302 | to indicate the total number of sectors in the range starting from the first |
| 303 | sector of the range. For example, a dual-actuator hard-disk will have the |
| 304 | following independent_access_ranges entries.:: |
| 305 | |
| 306 | $ tree /sys/block/<device>/queue/independent_access_ranges/ |
| 307 | /sys/block/<device>/queue/independent_access_ranges/ |
| 308 | |-- 0 |
| 309 | | |-- nr_sectors |
| 310 | | `-- sector |
| 311 | `-- 1 |
| 312 | |-- nr_sectors |
| 313 | `-- sector |
| 314 | |
| 315 | The sector and nr_sectors attributes use 512B sector unit, regardless of |
| 316 | the actual block size of the device. Independent access ranges do not |
| 317 | overlap and include all sectors within the device capacity. The access |
| 318 | ranges are numbered in increasing order of the range start sector, |
| 319 | that is, the sector attribute of range 0 always has the value 0. |
| 320 | |
Jens Axboe | cbb5901 | 2009-02-02 13:02:31 +0100 | [diff] [blame] | 321 | Jens Axboe <jens.axboe@oracle.com>, February 2009 |