Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 1 | perf-stat(1) |
Ingo Molnar | 6e6b754 | 2008-04-15 22:39:31 +0200 | [diff] [blame] | 2 | ============ |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 3 | |
| 4 | NAME |
| 5 | ---- |
| 6 | perf-stat - Run a command and gather performance counter statistics |
| 7 | |
| 8 | SYNOPSIS |
| 9 | -------- |
| 10 | [verse] |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 11 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command> |
Alyssa Ross | f2c24eb | 2021-08-09 15:32:26 +0000 | [diff] [blame] | 12 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>] |
| 13 | 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] \-- <command> [<options>] |
Jiri Olsa | ba6039b6 | 2015-11-05 15:40:55 +0100 | [diff] [blame] | 14 | 'perf stat' report [-i file] |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 15 | |
| 16 | DESCRIPTION |
| 17 | ----------- |
| 18 | This command runs a command and gathers performance counter statistics |
| 19 | from it. |
| 20 | |
| 21 | |
| 22 | OPTIONS |
| 23 | ------- |
| 24 | <command>...:: |
| 25 | Any command you can specify in a shell. |
| 26 | |
Jiri Olsa | 4979d0c | 2015-11-05 15:40:46 +0100 | [diff] [blame] | 27 | record:: |
| 28 | See STAT RECORD. |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 29 | |
Jiri Olsa | ba6039b6 | 2015-11-05 15:40:55 +0100 | [diff] [blame] | 30 | report:: |
| 31 | See STAT REPORT. |
| 32 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 33 | -e:: |
| 34 | --event=:: |
Cody P Schafer | f9ab9c1 | 2015-01-07 17:13:53 -0800 | [diff] [blame] | 35 | Select the PMU event. Selection can be: |
| 36 | |
| 37 | - a symbolic event name (use 'perf list' to list all events) |
| 38 | |
Sandipan Das | 4edb117 | 2021-11-23 14:16:12 +0530 | [diff] [blame] | 39 | - a raw PMU event in the form of rN where N is a hexadecimal value |
| 40 | that represents the raw register encoding with the layout of the |
| 41 | event control registers as described by entries in |
| 42 | /sys/bus/event_sources/devices/cpu/format/*. |
Cody P Schafer | f9ab9c1 | 2015-01-07 17:13:53 -0800 | [diff] [blame] | 43 | |
Kim Phillips | e48a73a | 2020-09-01 16:58:53 -0500 | [diff] [blame] | 44 | - a symbolic or raw PMU event followed by an optional colon |
| 45 | and a list of event modifiers, e.g., cpu-cycles:p. See the |
| 46 | linkperf:perf-list[1] man page for details on event modifiers. |
| 47 | |
Cody P Schafer | f9ab9c1 | 2015-01-07 17:13:53 -0800 | [diff] [blame] | 48 | - a symbolically formed event like 'pmu/param1=0x3,param2/' where |
| 49 | param1 and param2 are defined as formats for the PMU in |
Jack Henschel | 726647d | 2017-08-24 15:20:22 +0200 | [diff] [blame] | 50 | /sys/bus/event_source/devices/<pmu>/format/* |
Cody P Schafer | f9ab9c1 | 2015-01-07 17:13:53 -0800 | [diff] [blame] | 51 | |
Jin Yao | 4fc4d8d | 2019-04-12 21:59:49 +0800 | [diff] [blame] | 52 | 'percore' is a event qualifier that sums up the event counts for both |
| 53 | hardware threads in a core. For example: |
| 54 | perf stat -A -a -e cpu/event,percore=1/,otherevent ... |
| 55 | |
Cody P Schafer | f9ab9c1 | 2015-01-07 17:13:53 -0800 | [diff] [blame] | 56 | - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/' |
| 57 | where M, N, K are numbers (in decimal, hex, octal format). |
| 58 | Acceptable values for each of 'config', 'config1' and 'config2' |
| 59 | parameters are defined by corresponding entries in |
Jack Henschel | 726647d | 2017-08-24 15:20:22 +0200 | [diff] [blame] | 60 | /sys/bus/event_source/devices/<pmu>/format/* |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 61 | |
Agustin Vega-Frias | b2b9d3a | 2018-03-06 09:04:42 -0500 | [diff] [blame] | 62 | Note that the last two syntaxes support prefix and glob matching in |
Ingo Molnar | 1a7ea32 | 2018-12-03 11:22:00 +0100 | [diff] [blame] | 63 | the PMU name to simplify creation of events across multiple instances |
Agustin Vega-Frias | b2b9d3a | 2018-03-06 09:04:42 -0500 | [diff] [blame] | 64 | of the same type of PMU in large systems (e.g. memory controller PMUs). |
| 65 | Multiple PMU instances are typical for uncore PMUs, so the prefix |
| 66 | 'uncore_' is also ignored when performing this match. |
| 67 | |
| 68 | |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 69 | -i:: |
Stephane Eranian | 2e6cdf9 | 2010-05-12 10:40:01 +0200 | [diff] [blame] | 70 | --no-inherit:: |
| 71 | child tasks do not inherit counters |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 72 | -p:: |
| 73 | --pid=<pid>:: |
David Ahern | b52956c | 2012-02-08 09:32:52 -0700 | [diff] [blame] | 74 | stat events on existing process id (comma separated list) |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 75 | |
| 76 | -t:: |
| 77 | --tid=<tid>:: |
David Ahern | b52956c | 2012-02-08 09:32:52 -0700 | [diff] [blame] | 78 | stat events on existing thread id (comma separated list) |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 79 | |
Song Liu | fa853c4 | 2020-12-29 13:42:14 -0800 | [diff] [blame] | 80 | -b:: |
| 81 | --bpf-prog:: |
| 82 | stat events on existing bpf program id (comma separated list), |
| 83 | requiring root rights. bpftool-prog could be used to find program |
| 84 | id all bpf programs in the system. For example: |
| 85 | |
| 86 | # bpftool prog | head -n 1 |
| 87 | 17247: tracepoint name sys_enter tag 192d548b9d754067 gpl |
| 88 | |
| 89 | # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000 |
| 90 | |
| 91 | Performance counter stats for 'BPF program(s) 17247': |
| 92 | |
| 93 | 85,967 cycles |
| 94 | 28,982 instructions # 0.34 insn per cycle |
| 95 | |
| 96 | 1.102235068 seconds time elapsed |
| 97 | |
Song Liu | 7fac83a | 2021-03-16 14:18:35 -0700 | [diff] [blame] | 98 | --bpf-counters:: |
| 99 | Use BPF programs to aggregate readings from perf_events. This |
| 100 | allows multiple perf-stat sessions that are counting the same metric (cycles, |
| 101 | instructions, etc.) to share hardware counters. |
Song Liu | 112cb56 | 2021-04-25 14:43:31 -0700 | [diff] [blame] | 102 | To use BPF programs on common events by default, use |
| 103 | "perf config stat.bpf-counter-events=<list_of_events>". |
Song Liu | 7fac83a | 2021-03-16 14:18:35 -0700 | [diff] [blame] | 104 | |
| 105 | --bpf-attr-map:: |
| 106 | With option "--bpf-counters", different perf-stat sessions share |
| 107 | information about shared BPF programs and maps via a pinned hashmap. |
| 108 | Use "--bpf-attr-map" to specify the path of this pinned hashmap. |
| 109 | The default path is /sys/fs/bpf/perf_attr_map. |
| 110 | |
Stephane Eranian | 7094349 | 2020-05-05 11:29:43 -0700 | [diff] [blame] | 111 | ifdef::HAVE_LIBPFM[] |
| 112 | --pfm-events events:: |
| 113 | Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net) |
| 114 | including support for event filters. For example '--pfm-events |
| 115 | inst_retired:any_p:u:c=1:i'. More than one event can be passed to the |
| 116 | option using the comma separator. Hardware events and generic hardware |
| 117 | events cannot be mixed together. The latter must be used with the -e |
| 118 | option. The -e option and this one can be mixed and matched. Events |
| 119 | can be grouped using the {} notation. |
| 120 | endif::HAVE_LIBPFM[] |
Ingo Molnar | 20c84e9 | 2009-06-04 16:33:00 +0200 | [diff] [blame] | 121 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 122 | -a:: |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 123 | --all-cpus:: |
Jiri Olsa | 0d79f8b | 2017-02-17 18:00:34 +0100 | [diff] [blame] | 124 | system-wide collection from all CPUs (default if no target is specified) |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 125 | |
Andi Kleen | 75998bb | 2019-03-14 15:50:01 -0700 | [diff] [blame] | 126 | --no-scale:: |
| 127 | Don't scale/normalize counter values |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 128 | |
Borislav Petkov | f594bae | 2016-03-07 16:44:44 -0300 | [diff] [blame] | 129 | -d:: |
| 130 | --detailed:: |
| 131 | print more detailed statistics, can be specified up to 3 times |
| 132 | |
| 133 | -d: detailed events, L1 and LLC data cache |
| 134 | -d -d: more detailed events, dTLB and iTLB events |
| 135 | -d -d -d: very detailed events, adding prefetch events |
| 136 | |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 137 | -r:: |
| 138 | --repeat=<n>:: |
Frederik Deweerdt | a7e191c | 2013-03-01 13:02:27 -0500 | [diff] [blame] | 139 | repeat command and print average + stddev (max: 100). 0 means forever. |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 140 | |
Stephane Eranian | 5af52b5 | 2010-05-18 15:00:01 +0200 | [diff] [blame] | 141 | -B:: |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 142 | --big-num:: |
Paul A. Clarke | d778a77 | 2020-05-20 11:23:35 -0500 | [diff] [blame] | 143 | print large numbers with thousands' separators according to locale. |
| 144 | Enabled by default. Use "--no-big-num" to disable. |
| 145 | Default setting can be changed with "perf config stat.big-num=false". |
Stephane Eranian | 5af52b5 | 2010-05-18 15:00:01 +0200 | [diff] [blame] | 146 | |
Stephane Eranian | c45c6ea | 2010-05-28 12:00:01 +0200 | [diff] [blame] | 147 | -C:: |
| 148 | --cpu=:: |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 149 | Count only on the list of CPUs provided. Multiple CPUs can be provided as a |
| 150 | comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2. |
Stephane Eranian | c45c6ea | 2010-05-28 12:00:01 +0200 | [diff] [blame] | 151 | In per-thread mode, this option is ignored. The -a option is still necessary |
| 152 | to activate system-wide monitoring. Default is to count on all CPUs. |
| 153 | |
Stephane Eranian | f5b4a9c3 | 2010-11-16 11:05:01 +0200 | [diff] [blame] | 154 | -A:: |
| 155 | --no-aggr:: |
Ravi Bangoria | efc9c05 | 2017-03-20 18:07:18 +0530 | [diff] [blame] | 156 | Do not aggregate counts across all monitored CPUs. |
Stephane Eranian | f5b4a9c3 | 2010-11-16 11:05:01 +0200 | [diff] [blame] | 157 | |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 158 | -n:: |
| 159 | --null:: |
Arnaldo Carvalho de Melo | 4a03af3 | 2021-03-15 11:27:24 -0300 | [diff] [blame] | 160 | null run - Don't start any counters. |
| 161 | |
| 162 | This can be useful to measure just elapsed wall-clock time - or to assess the |
| 163 | raw overhead of perf stat itself, without running any counters. |
Shawn Bohrer | 8c20769 | 2010-11-30 19:57:19 -0600 | [diff] [blame] | 164 | |
| 165 | -v:: |
| 166 | --verbose:: |
| 167 | be more verbose (show counter open errors, etc) |
| 168 | |
Stephane Eranian | d7470b6 | 2010-12-01 18:49:05 +0200 | [diff] [blame] | 169 | -x SEP:: |
| 170 | --field-separator SEP:: |
| 171 | print counts using a CSV-style output to make it easy to import directly into |
| 172 | spreadsheets. Columns are separated by the string specified in SEP. |
| 173 | |
Jiri Olsa | e55c14a | 2018-04-23 11:08:21 +0200 | [diff] [blame] | 174 | --table:: Display time for each run (-r option), in a table format, e.g.: |
| 175 | |
| 176 | $ perf stat --null -r 5 --table perf bench sched pipe |
| 177 | |
| 178 | Performance counter stats for 'perf bench sched pipe' (5 runs): |
| 179 | |
| 180 | # Table of individual measurements: |
Jiri Olsa | abc60ba | 2018-04-23 11:08:22 +0200 | [diff] [blame] | 181 | 5.189 (-0.293) # |
| 182 | 5.189 (-0.294) # |
| 183 | 5.186 (-0.296) # |
| 184 | 5.663 (+0.181) ## |
| 185 | 6.186 (+0.703) #### |
Jiri Olsa | e55c14a | 2018-04-23 11:08:21 +0200 | [diff] [blame] | 186 | |
| 187 | # Final result: |
Jiri Olsa | abc60ba | 2018-04-23 11:08:22 +0200 | [diff] [blame] | 188 | 5.483 +- 0.198 seconds time elapsed ( +- 3.62% ) |
Jiri Olsa | e55c14a | 2018-04-23 11:08:21 +0200 | [diff] [blame] | 189 | |
Stephane Eranian | 023695d | 2011-02-14 11:20:01 +0200 | [diff] [blame] | 190 | -G name:: |
| 191 | --cgroup name:: |
| 192 | monitor only in the container (cgroup) called "name". This option is available only |
| 193 | in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to |
| 194 | container "name" are monitored when they run on the monitored CPUs. Multiple cgroups |
| 195 | can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup |
| 196 | to first event, second cgroup to second event and so on. It is possible to provide |
| 197 | an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have |
| 198 | corresponding events, i.e., they always refer to events defined earlier on the command |
weiping zhang | 25f72f9 | 2018-01-29 23:48:09 +0800 | [diff] [blame] | 199 | line. If the user wants to track multiple events for a specific cgroup, the user can |
| 200 | use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'. |
| 201 | |
| 202 | If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this |
| 203 | command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'. |
Stephane Eranian | 023695d | 2011-02-14 11:20:01 +0200 | [diff] [blame] | 204 | |
Namhyung Kim | d1c5a0e | 2020-09-24 21:44:52 +0900 | [diff] [blame] | 205 | --for-each-cgroup name:: |
| 206 | Expand event list for each cgroup in "name" (allow multiple cgroups separated |
Namhyung Kim | bb1c15b | 2020-10-27 16:28:55 +0900 | [diff] [blame] | 207 | by comma). It also support regex patterns to match multiple groups. This has same |
| 208 | effect that repeating -e option and -G option for each event x name. This option |
| 209 | cannot be used with -G/--cgroup option. |
Namhyung Kim | d1c5a0e | 2020-09-24 21:44:52 +0900 | [diff] [blame] | 210 | |
Stephane Eranian | 4aa9015 | 2011-08-15 22:22:33 +0200 | [diff] [blame] | 211 | -o file:: |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 212 | --output file:: |
Stephane Eranian | 4aa9015 | 2011-08-15 22:22:33 +0200 | [diff] [blame] | 213 | Print the output into the designated file. |
| 214 | |
| 215 | --append:: |
| 216 | Append to the output file designated with the -o option. Ignored if -o is not specified. |
| 217 | |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 218 | --log-fd:: |
| 219 | |
| 220 | Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive |
| 221 | with it. --append may be used here. Examples: |
Alyssa Ross | f2c24eb | 2021-08-09 15:32:26 +0000 | [diff] [blame] | 222 | 3>results perf stat --log-fd 3 \-- $cmd |
| 223 | 3>>results perf stat --log-fd 3 --append \-- $cmd |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 224 | |
Adrian Hunter | a8fcbd2 | 2020-09-02 13:57:07 +0300 | [diff] [blame] | 225 | --control=fifo:ctl-fifo[,ack-fifo]:: |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 226 | --control=fd:ctl-fd[,ack-fd]:: |
Adrian Hunter | a8fcbd2 | 2020-09-02 13:57:07 +0300 | [diff] [blame] | 227 | ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows. |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 228 | Listen on ctl-fd descriptor for command to control measurement ('enable': enable events, |
| 229 | 'disable': disable events). Measurements can be started with events disabled using |
| 230 | --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor |
| 231 | to synchronize with the controlling process. Example of bash shell script to enable and |
| 232 | disable events during measurements: |
| 233 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 234 | #!/bin/bash |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 235 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 236 | ctl_dir=/tmp/ |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 237 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 238 | ctl_fifo=${ctl_dir}perf_ctl.fifo |
| 239 | test -p ${ctl_fifo} && unlink ${ctl_fifo} |
| 240 | mkfifo ${ctl_fifo} |
| 241 | exec {ctl_fd}<>${ctl_fifo} |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 242 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 243 | ctl_ack_fifo=${ctl_dir}perf_ctl_ack.fifo |
| 244 | test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo} |
| 245 | mkfifo ${ctl_ack_fifo} |
| 246 | exec {ctl_fd_ack}<>${ctl_ack_fifo} |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 247 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 248 | perf stat -D -1 -e cpu-cycles -a -I 1000 \ |
| 249 | --control fd:${ctl_fd},${ctl_fd_ack} \ |
Alyssa Ross | f2c24eb | 2021-08-09 15:32:26 +0000 | [diff] [blame] | 250 | \-- sleep 30 & |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 251 | perf_pid=$! |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 252 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 253 | sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})" |
| 254 | sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})" |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 255 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 256 | exec {ctl_fd_ack}>&- |
| 257 | unlink ${ctl_ack_fifo} |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 258 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 259 | exec {ctl_fd}>&- |
| 260 | unlink ${ctl_fifo} |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 261 | |
Adrian Hunter | 1f4390d | 2020-09-01 12:37:55 +0300 | [diff] [blame] | 262 | wait -n ${perf_pid} |
| 263 | exit $? |
Alexey Budankov | 27e9769 | 2020-07-17 10:05:41 +0300 | [diff] [blame] | 264 | |
| 265 | |
Peter Zijlstra | 1f16c57 | 2012-10-23 13:40:14 +0200 | [diff] [blame] | 266 | --pre:: |
| 267 | --post:: |
| 268 | Pre and post measurement hooks, e.g.: |
| 269 | |
Alyssa Ross | f2c24eb | 2021-08-09 15:32:26 +0000 | [diff] [blame] | 270 | perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defconfig-build/ bzImage |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 271 | |
Stephane Eranian | 13370a9 | 2013-01-29 12:47:44 +0100 | [diff] [blame] | 272 | -I msecs:: |
| 273 | --interval-print msecs:: |
Alexey Budankov | 9dc9a95 | 2018-04-03 21:18:33 +0300 | [diff] [blame] | 274 | Print count deltas every N milliseconds (minimum: 1ms) |
Kan Liang | 19afd10 | 2015-10-02 05:04:34 -0400 | [diff] [blame] | 275 | The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. |
| 276 | example: 'perf stat -I 1000 -e cycles -a sleep 5' |
Jim Cromie | 56f3bae | 2011-09-07 17:14:00 -0600 | [diff] [blame] | 277 | |
Jin Yao | 197ba86 | 2020-04-20 22:54:17 +0800 | [diff] [blame] | 278 | If the metric exists, it is calculated by the counts generated in this interval and the metric is printed after #. |
| 279 | |
yuzhoujian | db06a26 | 2018-01-29 10:25:22 +0100 | [diff] [blame] | 280 | --interval-count times:: |
| 281 | Print count deltas for fixed number of times. |
| 282 | This option should be used together with "-I" option. |
| 283 | example: 'perf stat -I 1000 --interval-count 2 -e cycles -a' |
| 284 | |
Jiri Olsa | 9660e08 | 2018-06-07 00:15:06 +0200 | [diff] [blame] | 285 | --interval-clear:: |
| 286 | Clear the screen before next interval. |
| 287 | |
yuzhoujian | f1f8ad5 | 2018-01-29 10:25:23 +0100 | [diff] [blame] | 288 | --timeout msecs:: |
| 289 | Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms). |
| 290 | This option is not supported with the "-I" option. |
| 291 | example: 'perf stat --time 2000 -e cycles -a' |
| 292 | |
Andi Kleen | 54b5091 | 2016-03-03 15:57:36 -0800 | [diff] [blame] | 293 | --metric-only:: |
| 294 | Only print computed metrics. Print them in a single line. |
Andi Kleen | 206cab6 | 2016-03-03 15:57:37 -0800 | [diff] [blame] | 295 | Don't show any raw values. Not supported with --per-thread. |
Andi Kleen | 54b5091 | 2016-03-03 15:57:36 -0800 | [diff] [blame] | 296 | |
Stephane Eranian | d430495 | 2013-02-14 13:57:28 +0100 | [diff] [blame] | 297 | --per-socket:: |
Stephane Eranian | d7e7a45 | 2013-02-06 15:46:02 +0100 | [diff] [blame] | 298 | Aggregate counts per processor socket for system-wide mode measurements. This |
| 299 | is a useful mode to detect imbalance between sockets. To enable this mode, |
Stephane Eranian | d430495 | 2013-02-14 13:57:28 +0100 | [diff] [blame] | 300 | use --per-socket in addition to -a. (system-wide). The output includes the |
Stephane Eranian | d7e7a45 | 2013-02-06 15:46:02 +0100 | [diff] [blame] | 301 | socket number and the number of online processors on that socket. This is |
| 302 | useful to gauge the amount of aggregation. |
| 303 | |
Kan Liang | db5742b | 2019-06-04 15:50:42 -0700 | [diff] [blame] | 304 | --per-die:: |
| 305 | Aggregate counts per processor die for system-wide mode measurements. This |
| 306 | is a useful mode to detect imbalance between dies. To enable this mode, |
| 307 | use --per-die in addition to -a. (system-wide). The output includes the |
| 308 | die number and the number of online processors on that die. This is |
| 309 | useful to gauge the amount of aggregation. |
| 310 | |
Stephane Eranian | 12c08a9 | 2013-02-14 13:57:29 +0100 | [diff] [blame] | 311 | --per-core:: |
| 312 | Aggregate counts per physical processor for system-wide mode measurements. This |
| 313 | is a useful mode to detect imbalance between physical cores. To enable this mode, |
| 314 | use --per-core in addition to -a. (system-wide). The output includes the |
| 315 | core number and the number of online logical processors on that physical processor. |
| 316 | |
Jiri Olsa | 32b8af8 | 2015-06-26 11:29:27 +0200 | [diff] [blame] | 317 | --per-thread:: |
| 318 | Aggregate counts per monitored threads, when monitoring threads (-t option) |
| 319 | or processes (-p option). |
| 320 | |
Jiri Olsa | 86895b4 | 2019-08-28 10:17:43 +0200 | [diff] [blame] | 321 | --per-node:: |
| 322 | Aggregate counts per NUMA nodes for system-wide mode measurements. This |
| 323 | is a useful mode to detect imbalance between NUMA nodes. To enable this |
| 324 | mode, use --per-node in addition to -a. (system-wide). |
| 325 | |
Andi Kleen | 4119168 | 2013-08-02 17:41:11 -0700 | [diff] [blame] | 326 | -D msecs:: |
Andi Kleen | 8f3dd2b | 2014-01-07 14:14:06 -0800 | [diff] [blame] | 327 | --delay msecs:: |
Alexey Budankov | 2162b9c | 2020-07-17 10:04:33 +0300 | [diff] [blame] | 328 | After starting the program, wait msecs before measuring (-1: start with events |
| 329 | disabled). This is useful to filter out the startup phase of the program, |
| 330 | which is often very different. |
Andi Kleen | 4119168 | 2013-08-02 17:41:11 -0700 | [diff] [blame] | 331 | |
Andi Kleen | 4cabc3d | 2013-08-21 16:47:26 -0700 | [diff] [blame] | 332 | -T:: |
| 333 | --transaction:: |
| 334 | |
| 335 | Print statistics of transactional execution if supported. |
| 336 | |
Ian Rogers | 05530a7 | 2020-05-20 11:20:10 -0700 | [diff] [blame] | 337 | --metric-no-group:: |
| 338 | By default, events to compute a metric are placed in weak groups. The |
| 339 | group tries to enforce scheduling all or none of the events. The |
| 340 | --metric-no-group option places events outside of groups and may |
| 341 | increase the chance of the event being scheduled - leading to more |
| 342 | accuracy. However, as events may not be scheduled together accuracy |
| 343 | for metrics like instructions per cycle can be lower - as both metrics |
| 344 | may no longer be being measured at the same time. |
| 345 | |
| 346 | --metric-no-merge:: |
| 347 | By default metric events in different weak groups can be shared if one |
| 348 | group contains all the events needed by another. In such cases one |
| 349 | group will be eliminated reducing event multiplexing and making it so |
| 350 | that certain groups of metrics sum to 100%. A downside to sharing a |
| 351 | group is that the group may require multiplexing and so accuracy for a |
| 352 | small group that need not have multiplexing is lowered. This option |
| 353 | forbids the event merging logic from sharing events between groups and |
| 354 | may be used to increase accuracy in this case. |
| 355 | |
Andi Kleen | 55a4de9 | 2020-10-26 17:27:36 -0700 | [diff] [blame] | 356 | --quiet:: |
| 357 | Don't print output. This is useful with perf stat record below to only |
| 358 | write data to the perf.data file. |
| 359 | |
Jiri Olsa | 4979d0c | 2015-11-05 15:40:46 +0100 | [diff] [blame] | 360 | STAT RECORD |
| 361 | ----------- |
| 362 | Stores stat data into perf data file. |
| 363 | |
| 364 | -o file:: |
| 365 | --output file:: |
| 366 | Output file name. |
| 367 | |
Jiri Olsa | ba6039b6 | 2015-11-05 15:40:55 +0100 | [diff] [blame] | 368 | STAT REPORT |
| 369 | ----------- |
| 370 | Reads and reports stat data from perf data file. |
| 371 | |
| 372 | -i file:: |
| 373 | --input file:: |
| 374 | Input file name. |
| 375 | |
Jiri Olsa | 89af4e0 | 2015-11-05 15:41:02 +0100 | [diff] [blame] | 376 | --per-socket:: |
| 377 | Aggregate counts per processor socket for system-wide mode measurements. |
| 378 | |
Kan Liang | db5742b | 2019-06-04 15:50:42 -0700 | [diff] [blame] | 379 | --per-die:: |
| 380 | Aggregate counts per processor die for system-wide mode measurements. |
| 381 | |
Jiri Olsa | 89af4e0 | 2015-11-05 15:41:02 +0100 | [diff] [blame] | 382 | --per-core:: |
| 383 | Aggregate counts per physical processor for system-wide mode measurements. |
| 384 | |
Andi Kleen | b18f3e3 | 2017-08-31 12:40:31 -0700 | [diff] [blame] | 385 | -M:: |
| 386 | --metrics:: |
| 387 | Print metrics or metricgroups specified in a comma separated list. |
| 388 | For a group all metrics from the group are added. |
| 389 | The events from the metrics are automatically measured. |
Like Xu | 4da6552 | 2021-09-24 16:19:42 +0800 | [diff] [blame] | 390 | See perf list output for the possible metrics and metricgroups. |
Andi Kleen | b18f3e3 | 2017-08-31 12:40:31 -0700 | [diff] [blame] | 391 | |
Jiri Olsa | 89af4e0 | 2015-11-05 15:41:02 +0100 | [diff] [blame] | 392 | -A:: |
| 393 | --no-aggr:: |
| 394 | Do not aggregate counts across all monitored CPUs. |
| 395 | |
Andi Kleen | 44b1e60 | 2016-05-30 12:49:42 -0300 | [diff] [blame] | 396 | --topdown:: |
Kan Liang | 63e39aa | 2021-02-02 12:09:12 -0800 | [diff] [blame] | 397 | Print complete top-down metrics supported by the CPU. This allows to |
Andi Kleen | 44b1e60 | 2016-05-30 12:49:42 -0300 | [diff] [blame] | 398 | determine bottle necks in the CPU pipeline for CPU bound workloads, |
| 399 | by breaking the cycles consumed down into frontend bound, backend bound, |
| 400 | bad speculation and retiring. |
| 401 | |
| 402 | Frontend bound means that the CPU cannot fetch and decode instructions fast |
| 403 | enough. Backend bound means that computation or memory access is the bottle |
| 404 | neck. Bad Speculation means that the CPU wasted cycles due to branch |
| 405 | mispredictions and similar issues. Retiring means that the CPU computed without |
| 406 | an apparently bottleneck. The bottleneck is only the real bottleneck |
| 407 | if the workload is actually bound by the CPU and not by something else. |
| 408 | |
| 409 | For best results it is usually a good idea to use it with interval |
| 410 | mode like -I 1000, as the bottleneck of workloads can change often. |
| 411 | |
Andi Kleen | 55c36a9 | 2020-09-11 07:48:07 -0700 | [diff] [blame] | 412 | This enables --metric-only, unless overridden with --no-metric-only. |
| 413 | |
| 414 | The following restrictions only apply to older Intel CPUs and Atom, |
| 415 | on newer CPUs (IceLake and later) TopDown can be collected for any thread: |
| 416 | |
Andi Kleen | 44b1e60 | 2016-05-30 12:49:42 -0300 | [diff] [blame] | 417 | The top down metrics are collected per core instead of per |
| 418 | CPU thread. Per core mode is automatically enabled |
| 419 | and -a (global monitoring) is needed, requiring root rights or |
| 420 | perf.perf_event_paranoid=-1. |
| 421 | |
| 422 | Topdown uses the full Performance Monitoring Unit, and needs |
| 423 | disabling of the NMI watchdog (as root): |
| 424 | echo 0 > /proc/sys/kernel/nmi_watchdog |
| 425 | for best results. Otherwise the bottlenecks may be inconsistent |
| 426 | on workload with changing phases. |
| 427 | |
Andi Kleen | 44b1e60 | 2016-05-30 12:49:42 -0300 | [diff] [blame] | 428 | To interpret the results it is usually needed to know on which |
| 429 | CPUs the workload runs on. If needed the CPUs can be forced using |
| 430 | taskset. |
Jiri Olsa | 4979d0c | 2015-11-05 15:40:46 +0100 | [diff] [blame] | 431 | |
Kan Liang | 63e39aa | 2021-02-02 12:09:12 -0800 | [diff] [blame] | 432 | --td-level:: |
| 433 | Print the top-down statistics that equal to or lower than the input level. |
| 434 | It allows users to print the interested top-down metrics level instead of |
| 435 | the complete top-down metrics. |
| 436 | |
| 437 | The availability of the top-down metrics level depends on the hardware. For |
| 438 | example, Ice Lake only supports L1 top-down metrics. The Sapphire Rapids |
| 439 | supports both L1 and L2 top-down metrics. |
| 440 | |
| 441 | Default: 0 means the max level that the current hardware support. |
| 442 | Error out if the input is higher than the supported max level. |
| 443 | |
Andi Kleen | 430daf2 | 2017-03-20 13:17:00 -0700 | [diff] [blame] | 444 | --no-merge:: |
| 445 | Do not merge results from same PMUs. |
| 446 | |
Agustin Vega-Frias | c199c11 | 2018-03-06 09:04:44 -0500 | [diff] [blame] | 447 | When multiple events are created from a single event specification, |
| 448 | stat will, by default, aggregate the event counts and show the result |
| 449 | in a single row. This option disables that behavior and shows |
| 450 | the individual events and counts. |
| 451 | |
| 452 | Multiple events are created from a single event specification when: |
| 453 | 1. Prefix or glob matching is used for the PMU name. |
| 454 | 2. Aliases, which are listed immediately after the Kernel PMU events |
| 455 | by perf list, are used. |
Agustin Vega-Frias | b2b9d3a | 2018-03-06 09:04:42 -0500 | [diff] [blame] | 456 | |
Kan Liang | daefd0b | 2017-05-26 12:05:38 -0700 | [diff] [blame] | 457 | --smi-cost:: |
| 458 | Measure SMI cost if msr/aperf/ and msr/smi/ events are supported. |
| 459 | |
| 460 | During the measurement, the /sys/device/cpu/freeze_on_smi will be set to |
| 461 | freeze core counters on SMI. |
| 462 | The aperf counter will not be effected by the setting. |
| 463 | The cost of SMI can be measured by (aperf - unhalted core cycles). |
| 464 | |
| 465 | In practice, the percentages of SMI cycles is very useful for performance |
| 466 | oriented analysis. --metric_only will be applied by default. |
| 467 | The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf |
| 468 | |
| 469 | Users who wants to get the actual value can apply --no-metric-only. |
| 470 | |
Jin Yao | dd07102 | 2019-10-11 13:05:45 +0800 | [diff] [blame] | 471 | --all-kernel:: |
| 472 | Configure all used events to run in kernel space. |
| 473 | |
| 474 | --all-user:: |
| 475 | Configure all used events to run in user space. |
| 476 | |
Jin Yao | 1af62ce | 2020-02-14 16:04:52 +0800 | [diff] [blame] | 477 | --percore-show-thread:: |
| 478 | The event modifier "percore" has supported to sum up the event counts |
| 479 | for all hardware threads in a core and show the counts per core. |
| 480 | |
| 481 | This option with event modifier "percore" enabled also sums up the event |
| 482 | counts for all hardware threads in a core but show the sum counts per |
| 483 | hardware thread. This is essentially a replacement for the any bit and |
| 484 | convenient for post processing. |
| 485 | |
Jin Yao | ee6a961 | 2020-09-03 09:01:13 +0800 | [diff] [blame] | 486 | --summary:: |
| 487 | Print summary for interval mode (-I). |
| 488 | |
Jin Yao | 0bdad97 | 2021-03-19 15:01:55 +0800 | [diff] [blame] | 489 | --no-csv-summary:: |
| 490 | Don't print 'summary' at the first column for CVS summary output. |
| 491 | This option must be used with -x and --summary. |
| 492 | |
| 493 | This option can be enabled in perf config by setting the variable |
| 494 | 'stat.no-csv-summary'. |
| 495 | |
| 496 | $ perf config stat.no-csv-summary=true |
| 497 | |
Jin Yao | e69dc84 | 2021-09-09 14:22:15 +0800 | [diff] [blame] | 498 | --cputype:: |
| 499 | Only enable events on applying cpu with this type for hybrid platform |
| 500 | (e.g. core or atom)" |
| 501 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 502 | EXAMPLES |
| 503 | -------- |
| 504 | |
Alyssa Ross | f2c24eb | 2021-08-09 15:32:26 +0000 | [diff] [blame] | 505 | $ perf stat \-- make |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 506 | |
Jiri Olsa | 0ce2da1 | 2018-06-05 14:13:13 +0200 | [diff] [blame] | 507 | Performance counter stats for 'make': |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 508 | |
Jiri Olsa | 0ce2da1 | 2018-06-05 14:13:13 +0200 | [diff] [blame] | 509 | 83723.452481 task-clock:u (msec) # 1.004 CPUs utilized |
| 510 | 0 context-switches:u # 0.000 K/sec |
| 511 | 0 cpu-migrations:u # 0.000 K/sec |
| 512 | 3,228,188 page-faults:u # 0.039 M/sec |
| 513 | 229,570,665,834 cycles:u # 2.742 GHz |
| 514 | 313,163,853,778 instructions:u # 1.36 insn per cycle |
| 515 | 69,704,684,856 branches:u # 832.559 M/sec |
| 516 | 2,078,861,393 branch-misses:u # 2.98% of all branches |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 517 | |
Jiri Olsa | 0ce2da1 | 2018-06-05 14:13:13 +0200 | [diff] [blame] | 518 | 83.409183620 seconds time elapsed |
| 519 | |
| 520 | 74.684747000 seconds user |
| 521 | 8.739217000 seconds sys |
| 522 | |
| 523 | TIMINGS |
| 524 | ------- |
| 525 | As displayed in the example above we can display 3 types of timings. |
| 526 | We always display the time the counters were enabled/alive: |
| 527 | |
| 528 | 83.409183620 seconds time elapsed |
| 529 | |
| 530 | For workload sessions we also display time the workloads spent in |
| 531 | user/system lands: |
| 532 | |
| 533 | 74.684747000 seconds user |
| 534 | 8.739217000 seconds sys |
| 535 | |
| 536 | Those times are the very same as displayed by the 'time' tool. |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 537 | |
Andi Kleen | 6b45f7b | 2016-03-03 15:57:35 -0800 | [diff] [blame] | 538 | CSV FORMAT |
| 539 | ---------- |
| 540 | |
| 541 | With -x, perf stat is able to output a not-quite-CSV format output |
| 542 | Commas in the output are not put into "". To make it easy to parse |
| 543 | it is recommended to use a different character like -x \; |
| 544 | |
| 545 | The fields are in this order: |
| 546 | |
| 547 | - optional usec time stamp in fractions of second (with -I xxx) |
| 548 | - optional CPU, core, or socket identifier |
| 549 | - optional number of logical CPUs aggregated |
| 550 | - counter value |
| 551 | - unit of the counter value or empty |
| 552 | - event name |
| 553 | - run time of counter |
| 554 | - percentage of measurement time the counter was running |
| 555 | - optional variance if multiple values are collected with -r |
| 556 | - optional metric value |
| 557 | - optional unit of metric |
| 558 | |
| 559 | Additional metrics may be printed with all earlier fields being empty. |
| 560 | |
Jin Yao | 2750ce1 | 2021-04-27 15:01:39 +0800 | [diff] [blame] | 561 | include::intel-hybrid.txt[] |
| 562 | |
Ingo Molnar | 1d8c8b2 | 2009-04-20 15:52:29 +0200 | [diff] [blame] | 563 | SEE ALSO |
| 564 | -------- |
Thomas Gleixner | 386b05e | 2009-06-06 14:56:33 +0200 | [diff] [blame] | 565 | linkperf:perf-top[1], linkperf:perf-list[1] |