Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 1 | ========================================================= |
| 2 | Notes on Analysing Behaviour Using Events and Tracepoints |
| 3 | ========================================================= |
| 4 | :Author: Mel Gorman (PCL information heavily based on email from Ingo Molnar) |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 5 | |
| 6 | 1. Introduction |
| 7 | =============== |
| 8 | |
Mauro Carvalho Chehab | ec15872 | 2018-05-08 18:54:36 -0300 | [diff] [blame] | 9 | Tracepoints (see Documentation/trace/tracepoints.rst) can be used without |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 10 | creating custom kernel modules to register probe functions using the event |
| 11 | tracing infrastructure. |
| 12 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 13 | Simplistically, tracepoints represent important events that can be |
| 14 | taken in conjunction with other tracepoints to build a "Big Picture" of |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 15 | what is going on within the system. There are a large number of methods for |
| 16 | gathering and interpreting these events. Lacking any current Best Practises, |
| 17 | this document describes some of the methods that can be used. |
| 18 | |
| 19 | This document assumes that debugfs is mounted on /sys/kernel/debug and that |
| 20 | the appropriate tracing options have been configured into the kernel. It is |
| 21 | assumed that the PCL tool tools/perf has been installed and is in your path. |
| 22 | |
| 23 | 2. Listing Available Events |
| 24 | =========================== |
| 25 | |
| 26 | 2.1 Standard Utilities |
| 27 | ---------------------- |
| 28 | |
| 29 | All possible events are visible from /sys/kernel/debug/tracing/events. Simply |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 30 | calling:: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 31 | |
| 32 | $ find /sys/kernel/debug/tracing/events -type d |
| 33 | |
| 34 | will give a fair indication of the number of events available. |
| 35 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 36 | 2.2 PCL (Performance Counters for Linux) |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 37 | ---------------------------------------- |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 38 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 39 | Discovery and enumeration of all counters and events, including tracepoints, |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 40 | are available with the perf tool. Getting a list of available events is a |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 41 | simple case of:: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 42 | |
| 43 | $ perf list 2>&1 | grep Tracepoint |
| 44 | ext4:ext4_free_inode [Tracepoint event] |
| 45 | ext4:ext4_request_inode [Tracepoint event] |
| 46 | ext4:ext4_allocate_inode [Tracepoint event] |
| 47 | ext4:ext4_write_begin [Tracepoint event] |
| 48 | ext4:ext4_ordered_write_end [Tracepoint event] |
| 49 | [ .... remaining output snipped .... ] |
| 50 | |
| 51 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 52 | 3. Enabling Events |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 53 | ================== |
| 54 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 55 | 3.1 System-Wide Event Enabling |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 56 | ------------------------------ |
| 57 | |
Mauro Carvalho Chehab | 5fb94e9 | 2018-05-08 15:14:57 -0300 | [diff] [blame] | 58 | See Documentation/trace/events.rst for a proper description on how events |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 59 | can be enabled system-wide. A short example of enabling all events related |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 60 | to page allocation would look something like:: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 61 | |
| 62 | $ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done |
| 63 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 64 | 3.2 System-Wide Event Enabling with SystemTap |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 65 | --------------------------------------------- |
| 66 | |
| 67 | In SystemTap, tracepoints are accessible using the kernel.trace() function |
| 68 | call. The following is an example that reports every 5 seconds what processes |
| 69 | were allocating the pages. |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 70 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 71 | |
| 72 | global page_allocs |
| 73 | |
| 74 | probe kernel.trace("mm_page_alloc") { |
| 75 | page_allocs[execname()]++ |
| 76 | } |
| 77 | |
| 78 | function print_count() { |
| 79 | printf ("%-25s %-s\n", "#Pages Allocated", "Process Name") |
| 80 | foreach (proc in page_allocs-) |
| 81 | printf("%-25d %s\n", page_allocs[proc], proc) |
| 82 | printf ("\n") |
| 83 | delete page_allocs |
| 84 | } |
| 85 | |
| 86 | probe timer.s(5) { |
| 87 | print_count() |
| 88 | } |
| 89 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 90 | 3.3 System-Wide Event Enabling with PCL |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 91 | --------------------------------------- |
| 92 | |
| 93 | By specifying the -a switch and analysing sleep, the system-wide events |
| 94 | for a duration of time can be examined. |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 95 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 96 | |
| 97 | $ perf stat -a \ |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 98 | -e kmem:mm_page_alloc -e kmem:mm_page_free \ |
| 99 | -e kmem:mm_page_free_batched \ |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 100 | sleep 10 |
| 101 | Performance counter stats for 'sleep 10': |
| 102 | |
| 103 | 9630 kmem:mm_page_alloc |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 104 | 2143 kmem:mm_page_free |
| 105 | 7424 kmem:mm_page_free_batched |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 106 | |
| 107 | 10.002577764 seconds time elapsed |
| 108 | |
| 109 | Similarly, one could execute a shell and exit it as desired to get a report |
| 110 | at that point. |
| 111 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 112 | 3.4 Local Event Enabling |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 113 | ------------------------ |
| 114 | |
Mauro Carvalho Chehab | 5fb94e9 | 2018-05-08 15:14:57 -0300 | [diff] [blame] | 115 | Documentation/trace/ftrace.rst describes how to enable events on a per-thread |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 116 | basis using set_ftrace_pid. |
| 117 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 118 | 3.5 Local Event Enablement with PCL |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 119 | ----------------------------------- |
| 120 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 121 | Events can be activated and tracked for the duration of a process on a local |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 122 | basis using PCL such as follows. |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 123 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 124 | |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 125 | $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \ |
| 126 | -e kmem:mm_page_free_batched ./hackbench 10 |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 127 | Time: 0.909 |
| 128 | |
| 129 | Performance counter stats for './hackbench 10': |
| 130 | |
| 131 | 17803 kmem:mm_page_alloc |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 132 | 12398 kmem:mm_page_free |
| 133 | 4827 kmem:mm_page_free_batched |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 134 | |
| 135 | 0.973913387 seconds time elapsed |
| 136 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 137 | 4. Event Filtering |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 138 | ================== |
| 139 | |
Mauro Carvalho Chehab | 5fb94e9 | 2018-05-08 15:14:57 -0300 | [diff] [blame] | 140 | Documentation/trace/ftrace.rst covers in-depth how to filter events in |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 141 | ftrace. Obviously using grep and awk of trace_pipe is an option as well |
| 142 | as any script reading trace_pipe. |
| 143 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 144 | 5. Analysing Event Variances with PCL |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 145 | ===================================== |
| 146 | |
| 147 | Any workload can exhibit variances between runs and it can be important |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 148 | to know what the standard deviation is. By and large, this is left to the |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 149 | performance analyst to do it by hand. In the event that the discrete event |
| 150 | occurrences are useful to the performance analyst, then perf can be used. |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 151 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 152 | |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 153 | $ perf stat --repeat 5 -e kmem:mm_page_alloc -e kmem:mm_page_free |
| 154 | -e kmem:mm_page_free_batched ./hackbench 10 |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 155 | Time: 0.890 |
| 156 | Time: 0.895 |
| 157 | Time: 0.915 |
| 158 | Time: 1.001 |
| 159 | Time: 0.899 |
| 160 | |
| 161 | Performance counter stats for './hackbench 10' (5 runs): |
| 162 | |
| 163 | 16630 kmem:mm_page_alloc ( +- 3.542% ) |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 164 | 11486 kmem:mm_page_free ( +- 4.771% ) |
| 165 | 4730 kmem:mm_page_free_batched ( +- 2.325% ) |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 166 | |
| 167 | 0.982653002 seconds time elapsed ( +- 1.448% ) |
| 168 | |
| 169 | In the event that some higher-level event is required that depends on some |
| 170 | aggregation of discrete events, then a script would need to be developed. |
| 171 | |
| 172 | Using --repeat, it is also possible to view how events are fluctuating over |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 173 | time on a system-wide basis using -a and sleep. |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 174 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 175 | |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 176 | $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free \ |
| 177 | -e kmem:mm_page_free_batched \ |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 178 | -a --repeat 10 \ |
| 179 | sleep 1 |
| 180 | Performance counter stats for 'sleep 1' (10 runs): |
| 181 | |
| 182 | 1066 kmem:mm_page_alloc ( +- 26.148% ) |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 183 | 182 kmem:mm_page_free ( +- 5.464% ) |
| 184 | 890 kmem:mm_page_free_batched ( +- 30.079% ) |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 185 | |
| 186 | 1.002251757 seconds time elapsed ( +- 0.005% ) |
| 187 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 188 | 6. Higher-Level Analysis with Helper Scripts |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 189 | ============================================ |
| 190 | |
| 191 | When events are enabled the events that are triggering can be read from |
| 192 | /sys/kernel/debug/tracing/trace_pipe in human-readable format although binary |
| 193 | options exist as well. By post-processing the output, further information can |
| 194 | be gathered on-line as appropriate. Examples of post-processing might include |
| 195 | |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 196 | - Reading information from /proc for the PID that triggered the event |
| 197 | - Deriving a higher-level event from a series of lower-level events. |
| 198 | - Calculating latencies between two events |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 199 | |
| 200 | Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example |
| 201 | script that can read trace_pipe from STDIN or a copy of a trace. When used |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 202 | on-line, it can be interrupted once to generate a report without exiting |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 203 | and twice to exit. |
| 204 | |
| 205 | Simplistically, the script just reads STDIN and counts up events but it |
| 206 | also can do more such as |
| 207 | |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 208 | - Derive high-level events from many low-level events. If a number of pages |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 209 | are freed to the main allocator from the per-CPU lists, it recognises |
| 210 | that as one per-CPU drain even though there is no specific tracepoint |
| 211 | for that event |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 212 | - It can aggregate based on PID or individual process number |
| 213 | - In the event memory is getting externally fragmented, it reports |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 214 | on whether the fragmentation event was severe or moderate. |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 215 | - When receiving an event about a PID, it can record who the parent was so |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 216 | that if large numbers of events are coming from very short-lived |
| 217 | processes, the parent process responsible for creating all the helpers |
| 218 | can be identified |
| 219 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 220 | 7. Lower-Level Analysis with PCL |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 221 | ================================ |
| 222 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 223 | There may also be a requirement to identify what functions within a program |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 224 | were generating events within the kernel. To begin this sort of analysis, the |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 225 | data must be recorded. At the time of writing, this required root: |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 226 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 227 | |
| 228 | $ perf record -c 1 \ |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 229 | -e kmem:mm_page_alloc -e kmem:mm_page_free \ |
| 230 | -e kmem:mm_page_free_batched \ |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 231 | ./hackbench 10 |
| 232 | Time: 0.894 |
| 233 | [ perf record: Captured and wrote 0.733 MB perf.data (~32010 samples) ] |
| 234 | |
| 235 | Note the use of '-c 1' to set the event period to sample. The default sample |
| 236 | period is quite high to minimise overhead but the information collected can be |
| 237 | very coarse as a result. |
| 238 | |
| 239 | This record outputted a file called perf.data which can be analysed using |
| 240 | perf report. |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 241 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 242 | |
| 243 | $ perf report |
| 244 | # Samples: 30922 |
| 245 | # |
| 246 | # Overhead Command Shared Object |
| 247 | # ........ ......... ................................ |
| 248 | # |
| 249 | 87.27% hackbench [vdso] |
| 250 | 6.85% hackbench /lib/i686/cmov/libc-2.9.so |
| 251 | 2.62% hackbench /lib/ld-2.9.so |
| 252 | 1.52% perf [vdso] |
| 253 | 1.22% hackbench ./hackbench |
| 254 | 0.48% hackbench [kernel] |
| 255 | 0.02% perf /lib/i686/cmov/libc-2.9.so |
| 256 | 0.01% perf /usr/bin/perf |
| 257 | 0.01% perf /lib/ld-2.9.so |
| 258 | 0.00% hackbench /lib/i686/cmov/libpthread-2.9.so |
| 259 | # |
| 260 | # (For more details, try: perf report --sort comm,dso,symbol) |
| 261 | # |
| 262 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 263 | According to this, the vast majority of events triggered on events |
| 264 | within the VDSO. With simple binaries, this will often be the case so let's |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 265 | take a slightly different example. In the course of writing this, it was |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 266 | noticed that X was generating an insane amount of page allocations so let's look |
| 267 | at it: |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 268 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 269 | |
| 270 | $ perf record -c 1 -f \ |
Konstantin Khlebnikov | 90a5d5a | 2012-01-10 15:07:10 -0800 | [diff] [blame] | 271 | -e kmem:mm_page_alloc -e kmem:mm_page_free \ |
| 272 | -e kmem:mm_page_free_batched \ |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 273 | -p `pidof X` |
| 274 | |
| 275 | This was interrupted after a few seconds and |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 276 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 277 | |
| 278 | $ perf report |
| 279 | # Samples: 27666 |
| 280 | # |
| 281 | # Overhead Command Shared Object |
| 282 | # ........ ....... ....................................... |
| 283 | # |
| 284 | 51.95% Xorg [vdso] |
| 285 | 47.95% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 |
| 286 | 0.09% Xorg /lib/i686/cmov/libc-2.9.so |
| 287 | 0.01% Xorg [kernel] |
| 288 | # |
| 289 | # (For more details, try: perf report --sort comm,dso,symbol) |
| 290 | # |
| 291 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 292 | So, almost half of the events are occurring in a library. To get an idea which |
| 293 | symbol: |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 294 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 295 | |
| 296 | $ perf report --sort comm,dso,symbol |
| 297 | # Samples: 27666 |
| 298 | # |
| 299 | # Overhead Command Shared Object Symbol |
| 300 | # ........ ....... ....................................... ...... |
| 301 | # |
| 302 | 51.95% Xorg [vdso] [.] 0x000000ffffe424 |
| 303 | 47.93% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] pixmanFillsse2 |
| 304 | 0.09% Xorg /lib/i686/cmov/libc-2.9.so [.] _int_malloc |
| 305 | 0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] pixman_region32_copy_f |
| 306 | 0.01% Xorg [kernel] [k] read_hpet |
| 307 | 0.01% Xorg /opt/gfx-test/lib/libpixman-1.so.0.13.1 [.] get_fast_path |
| 308 | 0.00% Xorg [kernel] [k] ftrace_trace_userstack |
| 309 | |
Randy Dunlap | b41df64 | 2009-12-18 15:17:04 -0800 | [diff] [blame] | 310 | To see where within the function pixmanFillsse2 things are going wrong: |
Changbin Du | 8fa4e72 | 2018-02-17 13:39:37 +0800 | [diff] [blame] | 311 | :: |
Mel Gorman | bb72222 | 2009-09-21 17:02:48 -0700 | [diff] [blame] | 312 | |
| 313 | $ perf annotate pixmanFillsse2 |
| 314 | [ ... ] |
| 315 | 0.00 : 34eeb: 0f 18 08 prefetcht0 (%eax) |
| 316 | : } |
| 317 | : |
| 318 | : extern __inline void __attribute__((__gnu_inline__, __always_inline__, _ |
| 319 | : _mm_store_si128 (__m128i *__P, __m128i __B) : { |
| 320 | : *__P = __B; |
| 321 | 12.40 : 34eee: 66 0f 7f 80 40 ff ff movdqa %xmm0,-0xc0(%eax) |
| 322 | 0.00 : 34ef5: ff |
| 323 | 12.40 : 34ef6: 66 0f 7f 80 50 ff ff movdqa %xmm0,-0xb0(%eax) |
| 324 | 0.00 : 34efd: ff |
| 325 | 12.39 : 34efe: 66 0f 7f 80 60 ff ff movdqa %xmm0,-0xa0(%eax) |
| 326 | 0.00 : 34f05: ff |
| 327 | 12.67 : 34f06: 66 0f 7f 80 70 ff ff movdqa %xmm0,-0x90(%eax) |
| 328 | 0.00 : 34f0d: ff |
| 329 | 12.58 : 34f0e: 66 0f 7f 40 80 movdqa %xmm0,-0x80(%eax) |
| 330 | 12.31 : 34f13: 66 0f 7f 40 90 movdqa %xmm0,-0x70(%eax) |
| 331 | 12.40 : 34f18: 66 0f 7f 40 a0 movdqa %xmm0,-0x60(%eax) |
| 332 | 12.31 : 34f1d: 66 0f 7f 40 b0 movdqa %xmm0,-0x50(%eax) |
| 333 | |
| 334 | At a glance, it looks like the time is being spent copying pixmaps to |
| 335 | the card. Further investigation would be needed to determine why pixmaps |
| 336 | are being copied around so much but a starting point would be to take an |
| 337 | ancient build of libpixmap out of the library path where it was totally |
| 338 | forgotten about from months ago! |