Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 1 | ############### |
| 2 | Timerlat tracer |
| 3 | ############### |
| 4 | |
| 5 | The timerlat tracer aims to help the preemptive kernel developers to |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 6 | find sources of wakeup latencies of real-time threads. Like cyclictest, |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 7 | the tracer sets a periodic timer that wakes up a thread. The thread then |
| 8 | computes a *wakeup latency* value as the difference between the *current |
| 9 | time* and the *absolute time* that the timer was set to expire. The main |
| 10 | goal of timerlat is tracing in such a way to help kernel developers. |
| 11 | |
| 12 | Usage |
| 13 | ----- |
| 14 | |
| 15 | Write the ASCII text "timerlat" into the current_tracer file of the |
| 16 | tracing system (generally mounted at /sys/kernel/tracing). |
| 17 | |
| 18 | For example:: |
| 19 | |
| 20 | [root@f32 ~]# cd /sys/kernel/tracing/ |
| 21 | [root@f32 tracing]# echo timerlat > current_tracer |
| 22 | |
| 23 | It is possible to follow the trace by reading the trace trace file:: |
| 24 | |
| 25 | [root@f32 tracing]# cat trace |
| 26 | # tracer: timerlat |
| 27 | # |
| 28 | # _-----=> irqs-off |
| 29 | # / _----=> need-resched |
| 30 | # | / _---=> hardirq/softirq |
| 31 | # || / _--=> preempt-depth |
| 32 | # || / |
| 33 | # |||| ACTIVATION |
| 34 | # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY |
| 35 | # | | | |||| | | | | |
| 36 | <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns |
| 37 | <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns |
| 38 | <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns |
| 39 | <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns |
| 40 | <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns |
| 41 | <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns |
| 42 | <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns |
| 43 | <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns |
| 44 | |
| 45 | |
| 46 | The tracer creates a per-cpu kernel thread with real-time priority that |
| 47 | prints two lines at every activation. The first is the *timer latency* |
| 48 | observed at the *hardirq* context before the activation of the thread. |
| 49 | The second is the *timer latency* observed by the thread. The ACTIVATION |
| 50 | ID field serves to relate the *irq* execution to its respective *thread* |
| 51 | execution. |
| 52 | |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 53 | The *irq*/*thread* splitting is important to clarify in which context |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 54 | the unexpected high value is coming from. The *irq* context can be |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 55 | delayed by hardware-related actions, such as SMIs, NMIs, IRQs, |
| 56 | or by thread masking interrupts. Once the timer happens, the delay |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 57 | can also be influenced by blocking caused by threads. For example, by |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 58 | postponing the scheduler execution via preempt_disable(), scheduler |
| 59 | execution, or masking interrupts. Threads can also be delayed by the |
| 60 | interference from other threads and IRQs. |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 61 | |
| 62 | Tracer options |
| 63 | --------------------- |
| 64 | |
| 65 | The timerlat tracer is built on top of osnoise tracer. |
| 66 | So its configuration is also done in the osnoise/ config |
| 67 | directory. The timerlat configs are: |
| 68 | |
| 69 | - cpus: CPUs at which a timerlat thread will execute. |
| 70 | - timerlat_period_us: the period of the timerlat thread. |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 71 | - stop_tracing_us: stop the system tracing if a |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 72 | timer latency at the *irq* context higher than the configured |
| 73 | value happens. Writing 0 disables this option. |
| 74 | - stop_tracing_total_us: stop the system tracing if a |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 75 | timer latency at the *thread* context is higher than the configured |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 76 | value happens. Writing 0 disables this option. |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 77 | - print_stack: save the stack of the IRQ occurrence, and print |
| 78 | it after the *thread context* event". |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 79 | |
| 80 | timerlat and osnoise |
| 81 | ---------------------------- |
| 82 | |
| 83 | The timerlat can also take advantage of the osnoise: traceevents. |
| 84 | For example:: |
| 85 | |
| 86 | [root@f32 ~]# cd /sys/kernel/tracing/ |
| 87 | [root@f32 tracing]# echo timerlat > current_tracer |
| 88 | [root@f32 tracing]# echo 1 > events/osnoise/enable |
| 89 | [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us |
| 90 | [root@f32 tracing]# tail -10 trace |
| 91 | cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns |
| 92 | cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns |
| 93 | cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns |
| 94 | cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns |
| 95 | timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns |
| 96 | |
| 97 | In this case, the root cause of the timer latency does not point to a |
Daniel Bristot de Oliveira | 4d4eac7 | 2021-10-15 17:07:49 +0200 | [diff] [blame] | 98 | single cause but to multiple ones. Firstly, the timer IRQ was delayed |
Daniel Bristot de Oliveira | a955d7e | 2021-06-22 16:42:28 +0200 | [diff] [blame] | 99 | for 13 us, which may point to a long IRQ disabled section (see IRQ |
| 100 | stacktrace section). Then the timer interrupt that wakes up the timerlat |
| 101 | thread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally, |
| 102 | the cc1 thread noise took 9909 ns of time before the context switch. |
| 103 | Such pieces of evidence are useful for the developer to use other |
| 104 | tracing methods to figure out how to debug and optimize the system. |
| 105 | |
| 106 | It is worth mentioning that the *duration* values reported |
| 107 | by the osnoise: events are *net* values. For example, the |
| 108 | thread_noise does not include the duration of the overhead caused |
| 109 | by the IRQ execution (which indeed accounted for 12736 ns). But |
| 110 | the values reported by the timerlat tracer (timerlat_latency) |
| 111 | are *gross* values. |
| 112 | |
| 113 | The art below illustrates a CPU timeline and how the timerlat tracer |
| 114 | observes it at the top and the osnoise: events at the bottom. Each "-" |
| 115 | in the timelines means circa 1 us, and the time moves ==>:: |
| 116 | |
| 117 | External timer irq thread |
| 118 | clock latency latency |
| 119 | event 13585 ns 39960 ns |
| 120 | | ^ ^ |
| 121 | v | | |
| 122 | |-------------| | |
| 123 | |-------------+-------------------------| |
| 124 | ^ ^ |
| 125 | ======================================================================== |
| 126 | [tmr irq] [dev irq] |
| 127 | [another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline |
| 128 | ========================================================================= |
| 129 | |-------| |-------| |
| 130 | |--^ v-------| |
| 131 | | | | |
| 132 | | | + thread_noise: 9909 ns |
| 133 | | +-> irq_noise: 6139 ns |
| 134 | +-> irq_noise: 7597 ns |
| 135 | |
| 136 | IRQ stacktrace |
| 137 | --------------------------- |
| 138 | |
| 139 | The osnoise/print_stack option is helpful for the cases in which a thread |
| 140 | noise causes the major factor for the timer latency, because of preempt or |
| 141 | irq disabled. For example:: |
| 142 | |
| 143 | [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us |
| 144 | [root@f32 tracing]# echo 500 > osnoise/print_stack |
| 145 | [root@f32 tracing]# echo timerlat > current_tracer |
| 146 | [root@f32 tracing]# tail -21 per_cpu/cpu7/trace |
| 147 | insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns |
| 148 | insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns |
| 149 | insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns |
| 150 | insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns |
| 151 | insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns |
| 152 | timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns |
| 153 | timerlat/7-1001 [007] ....1.. 200.203446: <stack trace> |
| 154 | => timerlat_irq |
| 155 | => __hrtimer_run_queues |
| 156 | => hrtimer_interrupt |
| 157 | => __sysvec_apic_timer_interrupt |
| 158 | => asm_call_irq_on_stack |
| 159 | => sysvec_apic_timer_interrupt |
| 160 | => asm_sysvec_apic_timer_interrupt |
| 161 | => delay_tsc |
| 162 | => dummy_load_1ms_pd_init |
| 163 | => do_one_initcall |
| 164 | => do_init_module |
| 165 | => __do_sys_finit_module |
| 166 | => do_syscall_64 |
| 167 | => entry_SYSCALL_64_after_hwframe |
| 168 | |
| 169 | In this case, it is possible to see that the thread added the highest |
| 170 | contribution to the *timer latency* and the stack trace, saved during |
| 171 | the timerlat IRQ handler, points to a function named |
| 172 | dummy_load_1ms_pd_init, which had the following code (on purpose):: |
| 173 | |
| 174 | static int __init dummy_load_1ms_pd_init(void) |
| 175 | { |
| 176 | preempt_disable(); |
| 177 | mdelay(1); |
| 178 | preempt_enable(); |
| 179 | return 0; |
| 180 | |
| 181 | } |