blob: cc4c5fc313df304b48f425490c9a231e44af2688 [file] [log] [blame]
Changbin Du263ee772018-02-17 13:39:39 +08001==========================
2Kprobe-based Event Tracing
3==========================
Masami Hiramatsud8ec9182009-08-19 21:13:57 +02004
Changbin Du263ee772018-02-17 13:39:39 +08005:Author: Masami Hiramatsu
Masami Hiramatsud8ec9182009-08-19 21:13:57 +02006
7Overview
8--------
Masami Hiramatsu77b44d12009-11-03 19:12:47 -05009These events are similar to tracepoint based events. Instead of Tracepoint,
10this is based on kprobes (kprobe and kretprobe). So it can probe wherever
Naveen N. Raoc1ac0942017-03-08 22:34:14 +053011kprobes can probe (this means, all functions except those with
12__kprobes/nokprobe_inline annotation and those marked NOKPROBE_SYMBOL).
13Unlike the Tracepoint based event, this can be added and removed
Masami Hiramatsu77b44d12009-11-03 19:12:47 -050014dynamically, on the fly.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020015
Anton Blanchard6b0b7552017-02-16 17:00:50 +110016To enable this feature, build your kernel with CONFIG_KPROBE_EVENTS=y.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020017
Masami Hiramatsu77b44d12009-11-03 19:12:47 -050018Similar to the events tracer, this doesn't need to be activated via
19current_tracer. Instead of that, add probe points via
20/sys/kernel/debug/tracing/kprobe_events, and enable it via
Andreas Zieglere50891d2018-07-16 13:05:57 +020021/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enable.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020022
Masami Hiramatsu6212dd22018-11-05 18:02:36 +090023You can also use /sys/kernel/debug/tracing/dynamic_events instead of
24kprobe_events. That interface will provide unified access to other
25dynamic events too.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020026
27Synopsis of kprobe_events
28-------------------------
Changbin Du263ee772018-02-17 13:39:39 +080029::
30
Masami Hiramatsu61424312011-06-27 16:26:56 +090031 p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
Alban Crequy696ced42017-04-03 12:36:22 +020032 r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
Motohiro KOSAKIdf3ab702010-01-18 21:35:12 -050033 -:[GRP/]EVENT : Clear a probe
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020034
Masami Hiramatsuf52487e2009-09-10 19:53:53 -040035 GRP : Group name. If omitted, use "kprobes" for it.
Masami Hiramatsu2fba0c82009-09-10 19:53:14 -040036 EVENT : Event name. If omitted, the event name is generated
Masami Hiramatsu61424312011-06-27 16:26:56 +090037 based on SYM+offs or MEMADDR.
38 MOD : Module name which has given SYM.
39 SYM[+offs] : Symbol+offset where the probe is inserted.
Masami Hiramatsu2fba0c82009-09-10 19:53:14 -040040 MEMADDR : Address where the probe is inserted.
Alban Crequy696ced42017-04-03 12:36:22 +020041 MAXACTIVE : Maximum number of instances of the specified function that
42 can be probed simultaneously, or 0 for the default value
43 as defined in Documentation/kprobes.txt section 1.3.1.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020044
Masami Hiramatsu2fba0c82009-09-10 19:53:14 -040045 FETCHARGS : Arguments. Each probe can have up to 128 args.
Masami Hiramatsu2e06ff62009-10-07 18:27:59 -040046 %REG : Fetch register REG
47 @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020048 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
Masami Hiramatsu2e06ff62009-10-07 18:27:59 -040049 $stackN : Fetch Nth entry of stack (N >= 0)
50 $stack : Fetch stack address.
Masami Hiramatsua1303af2018-04-25 21:21:26 +090051 $argN : Fetch the Nth function argument. (N >= 1) (\*1)
52 $retval : Fetch return value.(\*2)
Omar Sandoval35abb672016-06-08 18:38:02 -070053 $comm : Fetch current task comm.
Masami Hiramatsue65f7ae2019-05-15 14:38:42 +090054 +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
Masami Hiramatsu6218bf92019-06-20 00:08:27 +090055 \IMM : Store an immediate value to the argument.
Masami Hiramatsu93ccae72010-04-12 13:17:08 -040056 NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
57 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
Masami Hiramatsu17ce3dc2016-08-18 17:57:50 +090058 (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
Masami Hiramatsu88903c42019-05-15 14:38:30 +090059 (x8/x16/x32/x64), "string", "ustring" and bitfield
60 are supported.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020061
Masami Hiramatsua1303af2018-04-25 21:21:26 +090062 (\*1) only for the probe on function entry (offs == 0).
63 (\*2) only for return probe.
64 (\*3) this is useful for fetching a field of data structures.
Masami Hiramatsue65f7ae2019-05-15 14:38:42 +090065 (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +020066
Masami Hiramatsu1ff511e2011-02-04 21:52:05 +090067Types
68-----
69Several types are supported for fetch-args. Kprobe tracer will access memory
70by given type. Prefix 's' and 'u' means those types are signed and unsigned
Masami Hiramatsubdca79c22016-08-18 17:59:21 +090071respectively. 'x' prefix implies it is unsigned. Traced arguments are shown
72in decimal ('s' and 'u') or hexadecimal ('x'). Without type casting, 'x32'
73or 'x64' is used depends on the architecture (e.g. x86-32 uses x32, and
74x86-64 uses x64).
Masami Hiramatsu40b53b72018-04-25 21:21:55 +090075These value types can be an array. To record array data, you can add '[N]'
76(where N is a fixed number, less than 64) to the base type.
77E.g. 'x16[4]' means an array of x16 (2bytes hex) with 4 elements.
78Note that the array can be applied to memory type fetchargs, you can not
79apply it to registers/stack-entries etc. (for example, '$stack1:x8[8]' is
80wrong, but '+8($stack):x8[8]' is OK.)
Masami Hiramatsu1ff511e2011-02-04 21:52:05 +090081String type is a special type, which fetches a "null-terminated" string from
82kernel space. This means it will fail and store NULL if the string container
Masami Hiramatsu88903c42019-05-15 14:38:30 +090083has been paged out. "ustring" type is an alternative of string for user-space.
Masami Hiramatsue65f7ae2019-05-15 14:38:42 +090084See :ref:`user_mem_access` for more info..
Masami Hiramatsu40b53b72018-04-25 21:21:55 +090085The string array type is a bit different from other types. For other base
86types, <base-type>[1] is equal to <base-type> (e.g. +0(%di):x32[1] is same
87as +0(%di):x32.) But string[1] is not equal to string. The string type itself
88represents "char array", but string array type represents "char * array".
89So, for example, +0(%di):string[1] is equal to +0(+0(%di)):string.
Masami Hiramatsu1ff511e2011-02-04 21:52:05 +090090Bitfield is another special type, which takes 3 parameters, bit-width, bit-
Changbin Du263ee772018-02-17 13:39:39 +080091offset, and container-size (usually 32). The syntax is::
Masami Hiramatsu1ff511e2011-02-04 21:52:05 +090092
93 b<bit-width>@<bit-offset>/<container-size>
94
Masami Hiramatsu60c2e0c2018-04-25 21:20:28 +090095Symbol type('symbol') is an alias of u32 or u64 type (depends on BITS_PER_LONG)
96which shows given pointer in "symbol+offset" style.
Omar Sandoval35abb672016-06-08 18:38:02 -070097For $comm, the default type is "string"; any other type is invalid.
98
Masami Hiramatsue65f7ae2019-05-15 14:38:42 +090099.. _user_mem_access:
Sameer Rahmaniff1e81a2020-02-03 20:15:43 +0000100
Masami Hiramatsue65f7ae2019-05-15 14:38:42 +0900101User Memory Access
102------------------
103Kprobe events supports user-space memory access. For that purpose, you can use
104either user-space dereference syntax or 'ustring' type.
105
106The user-space dereference syntax allows you to access a field of a data
107structure in user-space. This is done by adding the "u" prefix to the
108dereference syntax. For example, +u4(%si) means it will read memory from the
109address in the register %si offset by 4, and the memory is expected to be in
110user-space. You can use this for strings too, e.g. +u0(%si):string will read
111a string from the address in the register %si that is expected to be in user-
112space. 'ustring' is a shortcut way of performing the same task. That is,
113+0(%si):ustring is equivalent to +u0(%si):string.
114
115Note that kprobe-event provides the user-memory access syntax but it doesn't
116use it transparently. This means if you use normal dereference or string type
117for user memory, it might fail, and may always fail on some archs. The user
118has to carefully check if the target data is in kernel or user space.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200119
120Per-Probe Event Filtering
121-------------------------
Changbin Du263ee772018-02-17 13:39:39 +0800122Per-probe event filtering feature allows you to set different filter on each
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200123probe and gives you what arguments will be shown in trace buffer. If an event
Masami Hiramatsu77b44d12009-11-03 19:12:47 -0500124name is specified right after 'p:' or 'r:' in kprobe_events, it adds an event
125under tracing/events/kprobes/<EVENT>, at the directory you can see 'id',
Andreas Ziegler31130c82018-07-16 13:05:58 +0200126'enable', 'format', 'filter' and 'trigger'.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200127
Andreas Zieglere50891d2018-07-16 13:05:57 +0200128enable:
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200129 You can enable/disable the probe by writing 1 or 0 on it.
130
131format:
Masami Hiramatsueca0d912009-09-10 19:53:38 -0400132 This shows the format of this probe event.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200133
134filter:
Masami Hiramatsueca0d912009-09-10 19:53:38 -0400135 You can write filtering rules of this event.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200136
Masami Hiramatsue08d1c62009-09-10 19:53:30 -0400137id:
138 This shows the id of this probe event.
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200139
Andreas Ziegler31130c82018-07-16 13:05:58 +0200140trigger:
141 This allows to install trigger commands which are executed when the event is
142 hit (for details, see Documentation/trace/events.rst, section 6).
Masami Hiramatsu77b44d12009-11-03 19:12:47 -0500143
Masami Hiramatsucd7e7bd2009-08-13 16:35:42 -0400144Event Profiling
145---------------
Changbin Du263ee772018-02-17 13:39:39 +0800146You can check the total number of probe hits and probe miss-hits via
Masami Hiramatsucd7e7bd2009-08-13 16:35:42 -0400147/sys/kernel/debug/tracing/kprobe_profile.
Changbin Du263ee772018-02-17 13:39:39 +0800148The first column is event name, the second is the number of probe hits,
Masami Hiramatsucd7e7bd2009-08-13 16:35:42 -0400149the third is the number of probe miss-hits.
150
Masami Hiramatsu970988e2019-05-22 17:32:35 +0900151Kernel Boot Parameter
152---------------------
153You can add and enable new kprobe events when booting up the kernel by
154"kprobe_event=" parameter. The parameter accepts a semicolon-delimited
155kprobe events, which format is similar to the kprobe_events.
156The difference is that the probe definition parameters are comma-delimited
157instead of space. For example, adding myprobe event on do_sys_open like below
158
159 p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)
160
161should be below for kernel boot parameter (just replace spaces with comma)
162
163 p:myprobe,do_sys_open,dfd=%ax,filename=%dx,flags=%cx,mode=+4($stack)
164
Masami Hiramatsucd7e7bd2009-08-13 16:35:42 -0400165
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200166Usage examples
167--------------
168To add a probe as a new event, write a new definition to kprobe_events
Changbin Du263ee772018-02-17 13:39:39 +0800169as below::
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200170
Motohiro KOSAKI580d9e02010-01-18 21:35:05 -0500171 echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200172
Changbin Du263ee772018-02-17 13:39:39 +0800173This sets a kprobe on the top of do_sys_open() function with recording
Masami Hiramatsu14640102010-01-05 17:46:48 -05001741st to 4th arguments as "myprobe" event. Note, which register/stack entry is
175assigned to each function argument depends on arch-specific ABI. If you unsure
176the ABI, please try to use probe subcommand of perf-tools (you can find it
177under tools/perf/).
178As this example shows, users can choose more familiar names for each arguments.
Changbin Du263ee772018-02-17 13:39:39 +0800179::
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200180
Motohiro KOSAKI580d9e02010-01-18 21:35:05 -0500181 echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200182
Changbin Du263ee772018-02-17 13:39:39 +0800183This sets a kretprobe on the return point of do_sys_open() function with
Masami Hiramatsu99329c42009-10-07 18:27:48 -0400184recording return value as "myretprobe" event.
Changbin Du263ee772018-02-17 13:39:39 +0800185You can see the format of these events via
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200186/sys/kernel/debug/tracing/events/kprobes/<EVENT>/format.
Changbin Du263ee772018-02-17 13:39:39 +0800187::
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200188
189 cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
Changbin Du263ee772018-02-17 13:39:39 +0800190 name: myprobe
191 ID: 780
192 format:
193 field:unsigned short common_type; offset:0; size:2; signed:0;
194 field:unsigned char common_flags; offset:2; size:1; signed:0;
195 field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
196 field:int common_pid; offset:4; size:4; signed:1;
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200197
Changbin Du263ee772018-02-17 13:39:39 +0800198 field:unsigned long __probe_ip; offset:12; size:4; signed:0;
199 field:int __probe_nargs; offset:16; size:4; signed:1;
200 field:unsigned long dfd; offset:20; size:4; signed:0;
201 field:unsigned long filename; offset:24; size:4; signed:0;
202 field:unsigned long flags; offset:28; size:4; signed:0;
203 field:unsigned long mode; offset:32; size:4; signed:0;
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200204
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200205
Changbin Du263ee772018-02-17 13:39:39 +0800206 print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
207 REC->dfd, REC->filename, REC->flags, REC->mode
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200208
Changbin Du263ee772018-02-17 13:39:39 +0800209You can see that the event has 4 arguments as in the expressions you specified.
210::
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200211
212 echo > /sys/kernel/debug/tracing/kprobe_events
213
Changbin Du263ee772018-02-17 13:39:39 +0800214This clears all probe points.
Masami Hiramatsu5a0d9052009-09-14 16:49:37 -0400215
Changbin Du263ee772018-02-17 13:39:39 +0800216Or,
217::
Motohiro KOSAKIdf3ab702010-01-18 21:35:12 -0500218
219 echo -:myprobe >> kprobe_events
220
Changbin Du263ee772018-02-17 13:39:39 +0800221This clears probe points selectively.
Motohiro KOSAKIdf3ab702010-01-18 21:35:12 -0500222
Changbin Du263ee772018-02-17 13:39:39 +0800223Right after definition, each event is disabled by default. For tracing these
Masami Hiramatsu5a0d9052009-09-14 16:49:37 -0400224events, you need to enable it.
Changbin Du263ee772018-02-17 13:39:39 +0800225::
Masami Hiramatsu5a0d9052009-09-14 16:49:37 -0400226
227 echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
228 echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
229
Lecopzer Chen78a89462019-05-09 18:31:16 +0800230Use the following command to start tracing in an interval.
231::
Mauro Carvalho Chehab7e6294c2019-06-18 15:51:18 -0300232
Lecopzer Chen78a89462019-05-09 18:31:16 +0800233 # echo 1 > tracing_on
234 Open something...
235 # echo 0 > tracing_on
236
Changbin Du263ee772018-02-17 13:39:39 +0800237And you can see the traced information via /sys/kernel/debug/tracing/trace.
238::
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200239
240 cat /sys/kernel/debug/tracing/trace
Changbin Du263ee772018-02-17 13:39:39 +0800241 # tracer: nop
242 #
243 # TASK-PID CPU# TIMESTAMP FUNCTION
244 # | | | | |
245 <...>-1447 [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
246 <...>-1447 [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
247 <...>-1447 [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
248 <...>-1447 [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
249 <...>-1447 [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
250 <...>-1447 [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200251
252
Changbin Du263ee772018-02-17 13:39:39 +0800253Each line shows when the kernel hits an event, and <- SYMBOL means kernel
Masami Hiramatsud8ec9182009-08-19 21:13:57 +0200254returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel
255returns from do_sys_open to sys_open+0x1b).