blob: 9b14b0c2c9c453cff8add85cf792b6f6ff0777ed [file] [log] [blame]
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -03001=====================
Linus Torvalds1da177e2005-04-16 15:20:36 -07002I/O statistics fields
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -03003=====================
Linus Torvalds1da177e2005-04-16 15:20:36 -07004
Linus Torvalds1da177e2005-04-16 15:20:36 -07005Since 2.4.20 (and some versions before, with patches), and 2.5.45,
6more extensive disk statistics have been introduced to help measure disk
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -03007activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do
Linus Torvalds1da177e2005-04-16 15:20:36 -07008the work for you, but in case you are interested in creating your own
9tools, the fields are explained here.
10
11In 2.4 now, the information is found as additional fields in
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -030012``/proc/partitions``. In 2.6 and upper, the same information is found in two
13places: one is in the file ``/proc/diskstats``, and the other is within
Linus Torvalds1da177e2005-04-16 15:20:36 -070014the sysfs file system, which must be mounted in order to obtain
15the information. Throughout this document we'll assume that sysfs
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -030016is mounted on ``/sys``, although of course it may be mounted anywhere.
17Both ``/proc/diskstats`` and sysfs use the same source for the information
Linus Torvalds1da177e2005-04-16 15:20:36 -070018and so should not differ.
19
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030020Here are examples of these different formats::
Linus Torvalds1da177e2005-04-16 15:20:36 -070021
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030022 2.4:
23 3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
24 3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
Linus Torvalds1da177e2005-04-16 15:20:36 -070025
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -030026 2.6+ sysfs:
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030027 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
28 35486 38030 38030 38030
Linus Torvalds1da177e2005-04-16 15:20:36 -070029
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -030030 2.6+ diskstats:
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030031 3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
32 3 1 hda1 35486 38030 38030 38030
Linus Torvalds1da177e2005-04-16 15:20:36 -070033
Michael Callahanbdca3c82018-07-18 04:47:40 -070034 4.18+ diskstats:
35 3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 0 0 0 0
36
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -030037On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
38a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
39
Linus Torvalds1da177e2005-04-16 15:20:36 -070040The advantage of one over the other is that the sysfs choice works well
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -030041if you are watching a known, small set of disks. ``/proc/diskstats`` may
Linus Torvalds1da177e2005-04-16 15:20:36 -070042be a better choice if you are watching a large number of disks because
43you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
44each snapshot of your disk statistics.
45
46In 2.4, the statistics fields are those after the device name. In
47the above example, the first field of statistics would be 446216.
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -030048By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020049find just the 15 fields, beginning with 446216. If you look at
50``/proc/diskstats``, the 15 fields will be preceded by the major and
Randy Dunlap9d2e1572011-03-23 20:44:18 +010051minor device numbers, and device name. Each of these formats provides
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +02005215 fields of statistics, each meaning exactly the same things.
Linus Torvalds1da177e2005-04-16 15:20:36 -070053All fields except field 9 are cumulative since boot. Field 9 should
Randy Dunlap9d2e1572011-03-23 20:44:18 +010054go to zero as I/Os complete; all others only increase (unless they
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020055overflow and wrap). Wrapping might eventually occur on a very busy
56or long-lived system; so applications should be prepared to deal with
57it. Regarding wrapping, the types of the fields are either unsigned
58int (32 bit) or unsigned long (32-bit or 64-bit, depending on your
59machine) as noted per-field below. Unless your observations are very
60spread in time, these fields should not wrap twice before you notice it.
Linus Torvalds1da177e2005-04-16 15:20:36 -070061
62Each set of stats only applies to the indicated device; if you want
63system-wide stats you'll have to find all the devices and sum them all up.
64
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020065Field 1 -- # of reads completed (unsigned long)
Linus Torvalds1da177e2005-04-16 15:20:36 -070066 This is the total number of reads completed successfully.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030067
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020068Field 2 -- # of reads merged, field 6 -- # of writes merged (unsigned long)
Linus Torvalds1da177e2005-04-16 15:20:36 -070069 Reads and writes which are adjacent to each other may be merged for
70 efficiency. Thus two 4K reads may become one 8K read before it is
71 ultimately handed to the disk, and so it will be counted (and queued)
72 as only one I/O. This field lets you know how often this was done.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030073
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020074Field 3 -- # of sectors read (unsigned long)
Linus Torvalds1da177e2005-04-16 15:20:36 -070075 This is the total number of sectors read successfully.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030076
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020077Field 4 -- # of milliseconds spent reading (unsigned int)
Linus Torvalds1da177e2005-04-16 15:20:36 -070078 This is the total number of milliseconds spent by all reads (as
79 measured from __make_request() to end_that_request_last()).
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030080
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020081Field 5 -- # of writes completed (unsigned long)
Linus Torvalds1da177e2005-04-16 15:20:36 -070082 This is the total number of writes completed successfully.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030083
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020084Field 6 -- # of writes merged (unsigned long)
David P Hilton69963a02013-02-20 16:44:28 -070085 See the description of field 2.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030086
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020087Field 7 -- # of sectors written (unsigned long)
Linus Torvalds1da177e2005-04-16 15:20:36 -070088 This is the total number of sectors written successfully.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030089
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020090Field 8 -- # of milliseconds spent writing (unsigned int)
Linus Torvalds1da177e2005-04-16 15:20:36 -070091 This is the total number of milliseconds spent by all writes (as
92 measured from __make_request() to end_that_request_last()).
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030093
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020094Field 9 -- # of I/Os currently in progress (unsigned int)
Linus Torvalds1da177e2005-04-16 15:20:36 -070095 The only field that should go to zero. Incremented as requests are
Jens Axboe165125e2007-07-24 09:28:11 +020096 given to appropriate struct request_queue and decremented as they finish.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -030097
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +020098Field 10 -- # of milliseconds spent doing I/Os (unsigned int)
Jim Cromie50ed3802010-07-03 23:18:11 -060099 This field increases so long as field 9 is nonzero.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -0300100
Konstantin Khlebnikov9d9b8892019-06-09 14:14:36 +0300101 Since 5.0 this field counts jiffies when at least one request was
102 started or completed. If request runs more than 2 jiffies then some
Konstantin Khlebnikov2b8bd422020-03-25 16:07:04 +0300103 I/O time might be not accounted in case of concurrent requests.
Konstantin Khlebnikov9d9b8892019-06-09 14:14:36 +0300104
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +0200105Field 11 -- weighted # of milliseconds spent doing I/Os (unsigned int)
Linus Torvalds1da177e2005-04-16 15:20:36 -0700106 This field is incremented at each I/O start, I/O completion, I/O
107 merge, or read of these stats by the number of I/Os in progress
108 (field 9) times the number of milliseconds spent doing I/O since the
109 last update of this field. This can provide an easy measure of both
110 I/O completion time and the backlog that may be accumulating.
111
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +0200112Field 12 -- # of discards completed (unsigned long)
Michael Callahanbdca3c82018-07-18 04:47:40 -0700113 This is the total number of discards completed successfully.
114
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +0200115Field 13 -- # of discards merged (unsigned long)
Michael Callahanbdca3c82018-07-18 04:47:40 -0700116 See the description of field 2
117
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +0200118Field 14 -- # of sectors discarded (unsigned long)
Michael Callahanbdca3c82018-07-18 04:47:40 -0700119 This is the total number of sectors discarded successfully.
120
Albert Vaca Cintorad94cdae2019-10-16 22:13:37 +0200121Field 15 -- # of milliseconds spent discarding (unsigned int)
Michael Callahanbdca3c82018-07-18 04:47:40 -0700122 This is the total number of milliseconds spent by all discards (as
123 measured from __make_request() to end_that_request_last()).
Linus Torvalds1da177e2005-04-16 15:20:36 -0700124
Konstantin Khlebnikovb6866312019-11-21 13:40:26 +0300125Field 16 -- # of flush requests completed
126 This is the total number of flush requests completed successfully.
127
128 Block layer combines flush requests and executes at most one at a time.
129 This counts flush requests executed by disk. Not tracked for partitions.
130
131Field 17 -- # of milliseconds spent flushing
132 This is the total number of milliseconds spent by all flush requests.
133
Linus Torvalds1da177e2005-04-16 15:20:36 -0700134To avoid introducing performance bottlenecks, no locks are held while
135modifying these counters. This implies that minor inaccuracies may be
136introduced when changes collide, so (for instance) adding up all the
137read I/Os issued per partition should equal those made to the disks ...
138but due to the lack of locking it may only be very close.
139
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -0300140In 2.6+, there are counters for each CPU, which make the lack of locking
Randy Dunlap9d2e1572011-03-23 20:44:18 +0100141almost a non-issue. When the statistics are read, the per-CPU counters
142are summed (possibly overflowing the unsigned long variable they are
Linus Torvalds1da177e2005-04-16 15:20:36 -0700143summed to) and the result given to the user. There is no convenient
Randy Dunlap9d2e1572011-03-23 20:44:18 +0100144user interface for accessing the per-CPU counters themselves.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700145
Konstantin Khlebnikov2b8bd422020-03-25 16:07:04 +0300146Since 4.19 request times are measured with nanoseconds precision and
147truncated to milliseconds before showing in this interface.
148
Linus Torvalds1da177e2005-04-16 15:20:36 -0700149Disks vs Partitions
150-------------------
151
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -0300152There were significant changes between 2.4 and 2.6+ in the I/O subsystem.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700153As a result, some statistic information disappeared. The translation from
154a disk address relative to a partition to the disk address relative to
155the host disk happens much earlier. All merges and timings now happen
156at the disk level rather than at both the disk and partition level as
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -0300157in 2.4. Consequently, you'll see a different statistics output on 2.6+ for
Linus Torvalds1da177e2005-04-16 15:20:36 -0700158partitions from that for disks. There are only *four* fields available
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -0300159for partitions on 2.6+ machines. This is reflected in the examples above.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700160
161Field 1 -- # of reads issued
162 This is the total number of reads issued to this partition.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -0300163
Linus Torvalds1da177e2005-04-16 15:20:36 -0700164Field 2 -- # of sectors read
165 This is the total number of sectors requested to be read from this
166 partition.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -0300167
Linus Torvalds1da177e2005-04-16 15:20:36 -0700168Field 3 -- # of writes issued
169 This is the total number of writes issued to this partition.
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -0300170
Linus Torvalds1da177e2005-04-16 15:20:36 -0700171Field 4 -- # of sectors written
172 This is the total number of sectors requested to be written to
173 this partition.
174
175Note that since the address is translated to a disk-relative one, and no
176record of the partition-relative address is kept, the subsequent success
177or failure of the read cannot be attributed to the partition. In other
178words, the number of reads for partitions is counted slightly before time
179of queuing for partitions, and at completion for whole disks. This is
180a subtle distinction that is probably uninteresting for most cases.
181
Jerome Marchand0e53c2b2008-02-08 11:10:56 +0100182More significant is the error induced by counting the numbers of
183reads/writes before merges for partitions and after for disks. Since a
184typical workload usually contains a lot of successive and adjacent requests,
185the number of reads/writes issued can be several times higher than the
186number of reads/writes completed.
187
188In 2.6.25, the full statistic set is again available for partitions and
189disk and partition statistics are consistent again. Since we still don't
190keep record of the partition-relative address, an operation is attributed to
191the partition which contains the first sector of the request after the
192eventual merges. As requests can be merged across partition, this could lead
Matt LaPlanted9195882008-07-25 19:45:33 -0700193to some (probably insignificant) inaccuracy.
Jerome Marchand0e53c2b2008-02-08 11:10:56 +0100194
Linus Torvalds1da177e2005-04-16 15:20:36 -0700195Additional notes
196----------------
197
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -0300198In 2.6+, sysfs is not mounted by default. If your distribution of
Linus Torvalds1da177e2005-04-16 15:20:36 -0700199Linux hasn't added it already, here's the line you'll want to add to
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -0300200your ``/etc/fstab``::
Linus Torvalds1da177e2005-04-16 15:20:36 -0700201
Mauro Carvalho Chehab378012c2017-05-14 14:52:53 -0300202 none /sys sysfs defaults 0 0
Linus Torvalds1da177e2005-04-16 15:20:36 -0700203
204
Mauro Carvalho Chehab877b6382017-05-14 15:08:22 -0300205In 2.6+, all disk statistics were removed from ``/proc/stat``. In 2.4, they
206appear in both ``/proc/partitions`` and ``/proc/stat``, although the ones in
207``/proc/stat`` take a very different format from those in ``/proc/partitions``
Linus Torvalds1da177e2005-04-16 15:20:36 -0700208(see proc(5), if your system has it.)
209
210-- ricklind@us.ibm.com