block: make iolatency avg_lat exponentially decay
Currently, avg_lat is calculated by accumulating the mean of every
window in a long running cumulative average. As time goes on, the metric
becomes less and less useful due to the accumulated history.
This patch reuses the same calculation done in load averages to make the
avg_lat metric more lively. Unlike load averages, the avg only advances
when a window elapses (due to an io). Idle periods extend the most
recent window. Bucketing is used to limit the history of avg_lat by
binding it to the window size. So, the window range for 1/exp (decay
rate) is [1 min, 2.5 min) when windows elapse immediately.
The current sample window size is exposed in the debug info to enable
calculation of the window range.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 3afe10f..1746131 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1474,11 +1474,9 @@
Generally you do not want to set a value lower than the latency your device
supports. Experiment to find the value that works best for your workload.
Start at higher than the expected latency for your device and watch the
-total_lat_avg value in io.stat for your workload group to get an idea of the
-latency you see during normal operation. Use this value as a basis for your
-real setting, setting at 10-15% higher than the value in io.stat.
-Experimentation is key here because total_lat_avg is a running total, so is the
-"statistics" portion of "lies, damned lies, and statistics."
+avg_lat value in io.stat for your workload group to get an idea of the
+latency you see during normal operation. Use the avg_lat value as a basis for
+your real setting, setting at 10-15% higher than the value in io.stat.
How IO Latency Throttling Works
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1522,10 +1520,15 @@
This is the current queue depth for the group.
avg_lat
- The running average IO latency for this group in microseconds.
- Running average is generally flawed, but will give an
- administrator a general idea of the overall latency they can
- expect for their workload on the given disk.
+ This is an exponential moving average with a decay rate of 1/exp
+ bound by the sampling interval. The decay rate interval can be
+ calculated by multiplying the win value in io.stat by the
+ corresponding number of samples based on the win value.
+
+ win
+ The sampling window size in milliseconds. This is the minimum
+ duration of time between evaluation events. Windows only elapse
+ with IO activity. Idle periods extend the most recent window.
PID
---