Blame - tools/perf/Documentation/perf-stat.txt - SHIFTPHONES/mainline/linux

blob: b10a90b6a7181f8968420a875a2b2fc2b3919321 [file] [log] [blame]

Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	1	perf-stat(1)
Ingo Molnar	6e6b754	2008-04-15 22:39:31 +0200	[diff] [blame]	2	============
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	3
				4	NAME
				5	----
				6	perf-stat - Run a command and gather performance counter statistics
				7
				8	SYNOPSIS
				9	--------
				10	[verse]
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	11	'perf stat' [-e <EVENT> \| --event=EVENT] [-a] <command>
				12	'perf stat' [-e <EVENT> \| --event=EVENT] [-a] -- <command> [<options>]
Jiri Olsa	4979d0c	2015-11-05 15:40:46 +0100	[diff] [blame]	13	'perf stat' [-e <EVENT> \| --event=EVENT] [-a] record [-o file] -- <command> [<options>]
Jiri Olsa	ba6039b6	2015-11-05 15:40:55 +0100	[diff] [blame]	14	'perf stat' report [-i file]
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	15
				16	DESCRIPTION
				17	-----------
				18	This command runs a command and gathers performance counter statistics
				19	from it.
				20
				21
				22	OPTIONS
				23	-------
				24	<command>...::
				25	Any command you can specify in a shell.
				26
Jiri Olsa	4979d0c	2015-11-05 15:40:46 +0100	[diff] [blame]	27	record::
				28	See STAT RECORD.
Ingo Molnar	20c84e9	2009-06-04 16:33:00 +0200	[diff] [blame]	29
Jiri Olsa	ba6039b6	2015-11-05 15:40:55 +0100	[diff] [blame]	30	report::
				31	See STAT REPORT.
				32
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	33	-e::
				34	--event=::
Cody P Schafer	f9ab9c1	2015-01-07 17:13:53 -0800	[diff] [blame]	35	Select the PMU event. Selection can be:
				36
				37	- a symbolic event name (use 'perf list' to list all events)
				38
				39	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
				40	hexadecimal event descriptor.
				41
				42	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
				43	param1 and param2 are defined as formats for the PMU in
Jack Henschel	726647d	2017-08-24 15:20:22 +0200	[diff] [blame]	44	/sys/bus/event_source/devices/<pmu>/format/*
Cody P Schafer	f9ab9c1	2015-01-07 17:13:53 -0800	[diff] [blame]	45
				46	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
				47	where M, N, K are numbers (in decimal, hex, octal format).
				48	Acceptable values for each of 'config', 'config1' and 'config2'
				49	parameters are defined by corresponding entries in
Jack Henschel	726647d	2017-08-24 15:20:22 +0200	[diff] [blame]	50	/sys/bus/event_source/devices/<pmu>/format/*
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	51
Agustin Vega-Frias	b2b9d3a	2018-03-06 09:04:42 -0500	[diff] [blame]	52	Note that the last two syntaxes support prefix and glob matching in
				53	the PMU name to simplify creation of events accross multiple instances
				54	of the same type of PMU in large systems (e.g. memory controller PMUs).
				55	Multiple PMU instances are typical for uncore PMUs, so the prefix
				56	'uncore_' is also ignored when performing this match.
				57
				58
Ingo Molnar	20c84e9	2009-06-04 16:33:00 +0200	[diff] [blame]	59	-i::
Stephane Eranian	2e6cdf9	2010-05-12 10:40:01 +0200	[diff] [blame]	60	--no-inherit::
				61	child tasks do not inherit counters
Ingo Molnar	20c84e9	2009-06-04 16:33:00 +0200	[diff] [blame]	62	-p::
				63	--pid=<pid>::
David Ahern	b52956c	2012-02-08 09:32:52 -0700	[diff] [blame]	64	stat events on existing process id (comma separated list)
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	65
				66	-t::
				67	--tid=<tid>::
David Ahern	b52956c	2012-02-08 09:32:52 -0700	[diff] [blame]	68	stat events on existing thread id (comma separated list)
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	69
Ingo Molnar	20c84e9	2009-06-04 16:33:00 +0200	[diff] [blame]	70
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	71	-a::
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	72	--all-cpus::
Jiri Olsa	0d79f8b	2017-02-17 18:00:34 +0100	[diff] [blame]	73	system-wide collection from all CPUs (default if no target is specified)
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	74
Brice Goglin	b26bc5a	2009-08-07 10:18:39 +0200	[diff] [blame]	75	-c::
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	76	--scale::
				77	scale/normalize counter values
				78
Borislav Petkov	f594bae	2016-03-07 16:44:44 -0300	[diff] [blame]	79	-d::
				80	--detailed::
				81	print more detailed statistics, can be specified up to 3 times
				82
				83	-d: detailed events, L1 and LLC data cache
				84	-d -d: more detailed events, dTLB and iTLB events
				85	-d -d -d: very detailed events, adding prefetch events
				86
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	87	-r::
				88	--repeat=<n>::
Frederik Deweerdt	a7e191c	2013-03-01 13:02:27 -0500	[diff] [blame]	89	repeat command and print average + stddev (max: 100). 0 means forever.
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	90
Stephane Eranian	5af52b5	2010-05-18 15:00:01 +0200	[diff] [blame]	91	-B::
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	92	--big-num::
Stephane Eranian	5af52b5	2010-05-18 15:00:01 +0200	[diff] [blame]	93	print large numbers with thousands' separators according to locale
				94
Stephane Eranian	c45c6ea	2010-05-28 12:00:01 +0200	[diff] [blame]	95	-C::
				96	--cpu=::
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	97	Count only on the list of CPUs provided. Multiple CPUs can be provided as a
				98	comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
Stephane Eranian	c45c6ea	2010-05-28 12:00:01 +0200	[diff] [blame]	99	In per-thread mode, this option is ignored. The -a option is still necessary
				100	to activate system-wide monitoring. Default is to count on all CPUs.
				101
Stephane Eranian	f5b4a9c3	2010-11-16 11:05:01 +0200	[diff] [blame]	102	-A::
				103	--no-aggr::
Ravi Bangoria	efc9c05	2017-03-20 18:07:18 +0530	[diff] [blame]	104	Do not aggregate counts across all monitored CPUs.
Stephane Eranian	f5b4a9c3	2010-11-16 11:05:01 +0200	[diff] [blame]	105
Shawn Bohrer	8c20769	2010-11-30 19:57:19 -0600	[diff] [blame]	106	-n::
				107	--null::
				108	null run - don't start any counters
				109
				110	-v::
				111	--verbose::
				112	be more verbose (show counter open errors, etc)
				113
Stephane Eranian	d7470b6	2010-12-01 18:49:05 +0200	[diff] [blame]	114	-x SEP::
				115	--field-separator SEP::
				116	print counts using a CSV-style output to make it easy to import directly into
				117	spreadsheets. Columns are separated by the string specified in SEP.
				118
Jiri Olsa	e55c14a	2018-04-23 11:08:21 +0200	[diff] [blame]	119	--table:: Display time for each run (-r option), in a table format, e.g.:
				120
				121	$ perf stat --null -r 5 --table perf bench sched pipe
				122
				123	Performance counter stats for 'perf bench sched pipe' (5 runs):
				124
				125	# Table of individual measurements:
Jiri Olsa	abc60ba	2018-04-23 11:08:22 +0200	[diff] [blame]	126	5.189 (-0.293) #
				127	5.189 (-0.294) #
				128	5.186 (-0.296) #
				129	5.663 (+0.181) ##
				130	6.186 (+0.703) ####
Jiri Olsa	e55c14a	2018-04-23 11:08:21 +0200	[diff] [blame]	131
				132	# Final result:
Jiri Olsa	abc60ba	2018-04-23 11:08:22 +0200	[diff] [blame]	133	5.483 +- 0.198 seconds time elapsed ( +- 3.62% )
Jiri Olsa	e55c14a	2018-04-23 11:08:21 +0200	[diff] [blame]	134
Stephane Eranian	023695d	2011-02-14 11:20:01 +0200	[diff] [blame]	135	-G name::
				136	--cgroup name::
				137	monitor only in the container (cgroup) called "name". This option is available only
				138	in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
				139	container "name" are monitored when they run on the monitored CPUs. Multiple cgroups
				140	can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
				141	to first event, second cgroup to second event and so on. It is possible to provide
				142	an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
				143	corresponding events, i.e., they always refer to events defined earlier on the command
weiping zhang	25f72f9	2018-01-29 23:48:09 +0800	[diff] [blame]	144	line. If the user wants to track multiple events for a specific cgroup, the user can
				145	use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
				146
				147	If wanting to monitor, say, 'cycles' for a cgroup and also for system wide, this
				148	command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
Stephane Eranian	023695d	2011-02-14 11:20:01 +0200	[diff] [blame]	149
Stephane Eranian	4aa9015	2011-08-15 22:22:33 +0200	[diff] [blame]	150	-o file::
Jim Cromie	56f3bae	2011-09-07 17:14:00 -0600	[diff] [blame]	151	--output file::
Stephane Eranian	4aa9015	2011-08-15 22:22:33 +0200	[diff] [blame]	152	Print the output into the designated file.
				153
				154	--append::
				155	Append to the output file designated with the -o option. Ignored if -o is not specified.
				156
Jim Cromie	56f3bae	2011-09-07 17:14:00 -0600	[diff] [blame]	157	--log-fd::
				158
				159	Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive
				160	with it. --append may be used here. Examples:
				161	3>results perf stat --log-fd 3 -- $cmd
				162	3>>results perf stat --log-fd 3 --append -- $cmd
				163
Peter Zijlstra	1f16c57	2012-10-23 13:40:14 +0200	[diff] [blame]	164	--pre::
				165	--post::
				166	Pre and post measurement hooks, e.g.:
				167
				168	perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defconfig-build/ bzImage
Jim Cromie	56f3bae	2011-09-07 17:14:00 -0600	[diff] [blame]	169
Stephane Eranian	13370a9	2013-01-29 12:47:44 +0100	[diff] [blame]	170	-I msecs::
				171	--interval-print msecs::
Alexey Budankov	9dc9a95	2018-04-03 21:18:33 +0300	[diff] [blame]	172	Print count deltas every N milliseconds (minimum: 1ms)
Kan Liang	19afd10	2015-10-02 05:04:34 -0400	[diff] [blame]	173	The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution.
				174	example: 'perf stat -I 1000 -e cycles -a sleep 5'
Jim Cromie	56f3bae	2011-09-07 17:14:00 -0600	[diff] [blame]	175
yuzhoujian	db06a26	2018-01-29 10:25:22 +0100	[diff] [blame]	176	--interval-count times::
				177	Print count deltas for fixed number of times.
				178	This option should be used together with "-I" option.
				179	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
				180
Jiri Olsa	9660e08	2018-06-07 00:15:06 +0200	[diff] [blame^]	181	--interval-clear::
				182	Clear the screen before next interval.
				183
yuzhoujian	f1f8ad5	2018-01-29 10:25:23 +0100	[diff] [blame]	184	--timeout msecs::
				185	Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
				186	This option is not supported with the "-I" option.
				187	example: 'perf stat --time 2000 -e cycles -a'
				188
Andi Kleen	54b5091	2016-03-03 15:57:36 -0800	[diff] [blame]	189	--metric-only::
				190	Only print computed metrics. Print them in a single line.
Andi Kleen	206cab6	2016-03-03 15:57:37 -0800	[diff] [blame]	191	Don't show any raw values. Not supported with --per-thread.
Andi Kleen	54b5091	2016-03-03 15:57:36 -0800	[diff] [blame]	192
Stephane Eranian	d430495	2013-02-14 13:57:28 +0100	[diff] [blame]	193	--per-socket::
Stephane Eranian	d7e7a45	2013-02-06 15:46:02 +0100	[diff] [blame]	194	Aggregate counts per processor socket for system-wide mode measurements. This
				195	is a useful mode to detect imbalance between sockets. To enable this mode,
Stephane Eranian	d430495	2013-02-14 13:57:28 +0100	[diff] [blame]	196	use --per-socket in addition to -a. (system-wide). The output includes the
Stephane Eranian	d7e7a45	2013-02-06 15:46:02 +0100	[diff] [blame]	197	socket number and the number of online processors on that socket. This is
				198	useful to gauge the amount of aggregation.
				199
Stephane Eranian	12c08a9	2013-02-14 13:57:29 +0100	[diff] [blame]	200	--per-core::
				201	Aggregate counts per physical processor for system-wide mode measurements. This
				202	is a useful mode to detect imbalance between physical cores. To enable this mode,
				203	use --per-core in addition to -a. (system-wide). The output includes the
				204	core number and the number of online logical processors on that physical processor.
				205
Jiri Olsa	32b8af8	2015-06-26 11:29:27 +0200	[diff] [blame]	206	--per-thread::
				207	Aggregate counts per monitored threads, when monitoring threads (-t option)
				208	or processes (-p option).
				209
Andi Kleen	4119168	2013-08-02 17:41:11 -0700	[diff] [blame]	210	-D msecs::
Andi Kleen	8f3dd2b	2014-01-07 14:14:06 -0800	[diff] [blame]	211	--delay msecs::
Andi Kleen	4119168	2013-08-02 17:41:11 -0700	[diff] [blame]	212	After starting the program, wait msecs before measuring. This is useful to
				213	filter out the startup phase of the program, which is often very different.
				214
Andi Kleen	4cabc3d	2013-08-21 16:47:26 -0700	[diff] [blame]	215	-T::
				216	--transaction::
				217
				218	Print statistics of transactional execution if supported.
				219
Jiri Olsa	4979d0c	2015-11-05 15:40:46 +0100	[diff] [blame]	220	STAT RECORD
				221	-----------
				222	Stores stat data into perf data file.
				223
				224	-o file::
				225	--output file::
				226	Output file name.
				227
Jiri Olsa	ba6039b6	2015-11-05 15:40:55 +0100	[diff] [blame]	228	STAT REPORT
				229	-----------
				230	Reads and reports stat data from perf data file.
				231
				232	-i file::
				233	--input file::
				234	Input file name.
				235
Jiri Olsa	89af4e0	2015-11-05 15:41:02 +0100	[diff] [blame]	236	--per-socket::
				237	Aggregate counts per processor socket for system-wide mode measurements.
				238
				239	--per-core::
				240	Aggregate counts per physical processor for system-wide mode measurements.
				241
Andi Kleen	b18f3e3	2017-08-31 12:40:31 -0700	[diff] [blame]	242	-M::
				243	--metrics::
				244	Print metrics or metricgroups specified in a comma separated list.
				245	For a group all metrics from the group are added.
				246	The events from the metrics are automatically measured.
				247	See perf list output for the possble metrics and metricgroups.
				248
Jiri Olsa	89af4e0	2015-11-05 15:41:02 +0100	[diff] [blame]	249	-A::
				250	--no-aggr::
				251	Do not aggregate counts across all monitored CPUs.
				252
Andi Kleen	44b1e60	2016-05-30 12:49:42 -0300	[diff] [blame]	253	--topdown::
				254	Print top down level 1 metrics if supported by the CPU. This allows to
				255	determine bottle necks in the CPU pipeline for CPU bound workloads,
				256	by breaking the cycles consumed down into frontend bound, backend bound,
				257	bad speculation and retiring.
				258
				259	Frontend bound means that the CPU cannot fetch and decode instructions fast
				260	enough. Backend bound means that computation or memory access is the bottle
				261	neck. Bad Speculation means that the CPU wasted cycles due to branch
				262	mispredictions and similar issues. Retiring means that the CPU computed without
				263	an apparently bottleneck. The bottleneck is only the real bottleneck
				264	if the workload is actually bound by the CPU and not by something else.
				265
				266	For best results it is usually a good idea to use it with interval
				267	mode like -I 1000, as the bottleneck of workloads can change often.
				268
				269	The top down metrics are collected per core instead of per
				270	CPU thread. Per core mode is automatically enabled
				271	and -a (global monitoring) is needed, requiring root rights or
				272	perf.perf_event_paranoid=-1.
				273
				274	Topdown uses the full Performance Monitoring Unit, and needs
				275	disabling of the NMI watchdog (as root):
				276	echo 0 > /proc/sys/kernel/nmi_watchdog
				277	for best results. Otherwise the bottlenecks may be inconsistent
				278	on workload with changing phases.
				279
				280	This enables --metric-only, unless overriden with --no-metric-only.
				281
				282	To interpret the results it is usually needed to know on which
				283	CPUs the workload runs on. If needed the CPUs can be forced using
				284	taskset.
Jiri Olsa	4979d0c	2015-11-05 15:40:46 +0100	[diff] [blame]	285
Andi Kleen	430daf2	2017-03-20 13:17:00 -0700	[diff] [blame]	286	--no-merge::
				287	Do not merge results from same PMUs.
				288
Agustin Vega-Frias	c199c11	2018-03-06 09:04:44 -0500	[diff] [blame]	289	When multiple events are created from a single event specification,
				290	stat will, by default, aggregate the event counts and show the result
				291	in a single row. This option disables that behavior and shows
				292	the individual events and counts.
				293
				294	Multiple events are created from a single event specification when:
				295	1. Prefix or glob matching is used for the PMU name.
				296	2. Aliases, which are listed immediately after the Kernel PMU events
				297	by perf list, are used.
Agustin Vega-Frias	b2b9d3a	2018-03-06 09:04:42 -0500	[diff] [blame]	298
Kan Liang	daefd0b	2017-05-26 12:05:38 -0700	[diff] [blame]	299	--smi-cost::
				300	Measure SMI cost if msr/aperf/ and msr/smi/ events are supported.
				301
				302	During the measurement, the /sys/device/cpu/freeze_on_smi will be set to
				303	freeze core counters on SMI.
				304	The aperf counter will not be effected by the setting.
				305	The cost of SMI can be measured by (aperf - unhalted core cycles).
				306
				307	In practice, the percentages of SMI cycles is very useful for performance
				308	oriented analysis. --metric_only will be applied by default.
				309	The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
				310
				311	Users who wants to get the actual value can apply --no-metric-only.
				312
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	313	EXAMPLES
				314	--------
				315
Jiri Olsa	0ce2da1	2018-06-05 14:13:13 +0200	[diff] [blame]	316	$ perf stat -- make
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	317
Jiri Olsa	0ce2da1	2018-06-05 14:13:13 +0200	[diff] [blame]	318	Performance counter stats for 'make':
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	319
Jiri Olsa	0ce2da1	2018-06-05 14:13:13 +0200	[diff] [blame]	320	83723.452481 task-clock:u (msec) # 1.004 CPUs utilized
				321	0 context-switches:u # 0.000 K/sec
				322	0 cpu-migrations:u # 0.000 K/sec
				323	3,228,188 page-faults:u # 0.039 M/sec
				324	229,570,665,834 cycles:u # 2.742 GHz
				325	313,163,853,778 instructions:u # 1.36 insn per cycle
				326	69,704,684,856 branches:u # 832.559 M/sec
				327	2,078,861,393 branch-misses:u # 2.98% of all branches
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	328
Jiri Olsa	0ce2da1	2018-06-05 14:13:13 +0200	[diff] [blame]	329	83.409183620 seconds time elapsed
				330
				331	74.684747000 seconds user
				332	8.739217000 seconds sys
				333
				334	TIMINGS
				335	-------
				336	As displayed in the example above we can display 3 types of timings.
				337	We always display the time the counters were enabled/alive:
				338
				339	83.409183620 seconds time elapsed
				340
				341	For workload sessions we also display time the workloads spent in
				342	user/system lands:
				343
				344	74.684747000 seconds user
				345	8.739217000 seconds sys
				346
				347	Those times are the very same as displayed by the 'time' tool.
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	348
Andi Kleen	6b45f7b	2016-03-03 15:57:35 -0800	[diff] [blame]	349	CSV FORMAT
				350	----------
				351
				352	With -x, perf stat is able to output a not-quite-CSV format output
				353	Commas in the output are not put into "". To make it easy to parse
				354	it is recommended to use a different character like -x \;
				355
				356	The fields are in this order:
				357
				358	- optional usec time stamp in fractions of second (with -I xxx)
				359	- optional CPU, core, or socket identifier
				360	- optional number of logical CPUs aggregated
				361	- counter value
				362	- unit of the counter value or empty
				363	- event name
				364	- run time of counter
				365	- percentage of measurement time the counter was running
				366	- optional variance if multiple values are collected with -r
				367	- optional metric value
				368	- optional unit of metric
				369
				370	Additional metrics may be printed with all earlier fields being empty.
				371
Ingo Molnar	1d8c8b2	2009-04-20 15:52:29 +0200	[diff] [blame]	372	SEE ALSO
				373	--------
Thomas Gleixner	386b05e	2009-06-06 14:56:33 +0200	[diff] [blame]	374	linkperf:perf-top[1], linkperf:perf-list[1]