blob: f7d98ae5b8853e732d74c8be953eeb446d79059d [file] [log] [blame]
Steven Rostedtb4d94212017-10-31 10:07:05 -04001=================================
2Using ftrace to hook to functions
3=================================
4
5.. Copyright 2017 VMware Inc.
6.. Author: Steven Rostedt <srostedt@goodmis.org>
7.. License: The GNU Free Documentation License, Version 1.2
8.. (dual licensed under the GPL v2)
9
10Written for: 4.14
11
12Introduction
13============
14
Masanari Iidae37274f2017-11-28 12:26:13 +090015The ftrace infrastructure was originally created to attach callbacks to the
Steven Rostedtb4d94212017-10-31 10:07:05 -040016beginning of functions in order to record and trace the flow of the kernel.
17But callbacks to the start of a function can have other use cases. Either
18for live kernel patching, or for security monitoring. This document describes
19how to use ftrace to implement your own function callbacks.
20
21
22The ftrace context
23==================
Changbin Dub3fdd1f2018-02-17 13:39:36 +080024.. warning::
Steven Rostedtb4d94212017-10-31 10:07:05 -040025
Changbin Dub3fdd1f2018-02-17 13:39:36 +080026 The ability to add a callback to almost any function within the
27 kernel comes with risks. A callback can be called from any context
28 (normal, softirq, irq, and NMI). Callbacks can also be called just before
29 going to idle, during CPU bring up and takedown, or going to user space.
30 This requires extra care to what can be done inside a callback. A callback
31 can be called outside the protective scope of RCU.
Steven Rostedtb4d94212017-10-31 10:07:05 -040032
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -050033There are helper functions to help against recursion, and making sure
34RCU is watching. These are explained below.
Steven Rostedtb4d94212017-10-31 10:07:05 -040035
36
37The ftrace_ops structure
38========================
39
40To register a function callback, a ftrace_ops is required. This structure
41is used to tell ftrace what function should be called as the callback
42as well as what protections the callback will perform and not require
43ftrace to handle.
44
45There is only one field that is needed to be set when registering
Markus Heiser2cd6ff42017-12-12 11:22:25 +010046an ftrace_ops with ftrace:
Steven Rostedtb4d94212017-10-31 10:07:05 -040047
Markus Heiser2cd6ff42017-12-12 11:22:25 +010048.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -040049
50 struct ftrace_ops ops = {
51 .func = my_callback_func,
52 .flags = MY_FTRACE_FLAGS
53 .private = any_private_data_structure,
54 };
55
56Both .flags and .private are optional. Only .func is required.
57
Mauro Carvalho Chehabd7faad12020-09-26 09:06:17 +020058To enable tracing call::
Steven Rostedtb4d94212017-10-31 10:07:05 -040059
Mauro Carvalho Chehabd7faad12020-09-26 09:06:17 +020060 register_ftrace_function(&ops);
Steven Rostedtb4d94212017-10-31 10:07:05 -040061
Mauro Carvalho Chehabd7faad12020-09-26 09:06:17 +020062To disable tracing call::
Steven Rostedtb4d94212017-10-31 10:07:05 -040063
Mauro Carvalho Chehabd7faad12020-09-26 09:06:17 +020064 unregister_ftrace_function(&ops);
Steven Rostedtb4d94212017-10-31 10:07:05 -040065
Mauro Carvalho Chehabd7faad12020-09-26 09:06:17 +020066The above is defined by including the header::
Steven Rostedtb4d94212017-10-31 10:07:05 -040067
Mauro Carvalho Chehabd7faad12020-09-26 09:06:17 +020068 #include <linux/ftrace.h>
Steven Rostedtb4d94212017-10-31 10:07:05 -040069
70The registered callback will start being called some time after the
71register_ftrace_function() is called and before it returns. The exact time
72that callbacks start being called is dependent upon architecture and scheduling
73of services. The callback itself will have to handle any synchronization if it
74must begin at an exact moment.
75
76The unregister_ftrace_function() will guarantee that the callback is
77no longer being called by functions after the unregister_ftrace_function()
78returns. Note that to perform this guarantee, the unregister_ftrace_function()
79may take some time to finish.
80
81
82The callback function
83=====================
84
Markus Heiser2cd6ff42017-12-12 11:22:25 +010085The prototype of the callback function is as follows (as of v4.14):
Steven Rostedtb4d94212017-10-31 10:07:05 -040086
Markus Heiser2cd6ff42017-12-12 11:22:25 +010087.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -040088
Markus Heiser2cd6ff42017-12-12 11:22:25 +010089 void callback_func(unsigned long ip, unsigned long parent_ip,
90 struct ftrace_ops *op, struct pt_regs *regs);
Steven Rostedtb4d94212017-10-31 10:07:05 -040091
92@ip
93 This is the instruction pointer of the function that is being traced.
94 (where the fentry or mcount is within the function)
95
96@parent_ip
97 This is the instruction pointer of the function that called the
98 the function being traced (where the call of the function occurred).
99
100@op
101 This is a pointer to ftrace_ops that was used to register the callback.
102 This can be used to pass data to the callback via the private pointer.
103
104@regs
105 If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
106 flags are set in the ftrace_ops structure, then this will be pointing
107 to the pt_regs structure like it would be if an breakpoint was placed
108 at the start of the function where ftrace was tracing. Otherwise it
109 either contains garbage, or NULL.
110
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500111Protect your callback
112=====================
113
114As functions can be called from anywhere, and it is possible that a function
115called by a callback may also be traced, and call that same callback,
116recursion protection must be used. There are two helper functions that
117can help in this regard. If you start your code with:
118
Steven Rostedt (VMware)3a37b912020-11-16 15:46:52 -0500119.. code-block:: c
120
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500121 int bit;
122
Steven Rostedt (VMware)773c1672020-11-05 21:32:46 -0500123 bit = ftrace_test_recursion_trylock(ip, parent_ip);
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500124 if (bit < 0)
125 return;
126
127and end it with:
128
Steven Rostedt (VMware)3a37b912020-11-16 15:46:52 -0500129.. code-block:: c
130
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500131 ftrace_test_recursion_unlock(bit);
132
133The code in between will be safe to use, even if it ends up calling a
134function that the callback is tracing. Note, on success,
135ftrace_test_recursion_trylock() will disable preemption, and the
136ftrace_test_recursion_unlock() will enable it again (if it was previously
Steven Rostedt (VMware)773c1672020-11-05 21:32:46 -0500137enabled). The instruction pointer (ip) and its parent (parent_ip) is passed to
138ftrace_test_recursion_trylock() to record where the recursion happened
139(if CONFIG_FTRACE_RECORD_RECURSION is set).
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500140
141Alternatively, if the FTRACE_OPS_FL_RECURSION flag is set on the ftrace_ops
142(as explained below), then a helper trampoline will be used to test
143for recursion for the callback and no recursion test needs to be done.
144But this is at the expense of a slightly more overhead from an extra
145function call.
146
147If your callback accesses any data or critical section that requires RCU
148protection, it is best to make sure that RCU is "watching", otherwise
149that data or critical section will not be protected as expected. In this
150case add:
151
Steven Rostedt (VMware)3a37b912020-11-16 15:46:52 -0500152.. code-block:: c
153
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500154 if (!rcu_is_watching())
155 return;
156
157Alternatively, if the FTRACE_OPS_FL_RCU flag is set on the ftrace_ops
158(as explained below), then a helper trampoline will be used to test
159for rcu_is_watching for the callback and no other test needs to be done.
160But this is at the expense of a slightly more overhead from an extra
161function call.
162
Steven Rostedtb4d94212017-10-31 10:07:05 -0400163
164The ftrace FLAGS
165================
166
167The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
168Some of the flags are used for internal infrastructure of ftrace, but the
169ones that users should be aware of are the following:
170
171FTRACE_OPS_FL_SAVE_REGS
172 If the callback requires reading or modifying the pt_regs
173 passed to the callback, then it must set this flag. Registering
174 a ftrace_ops with this flag set on an architecture that does not
175 support passing of pt_regs to the callback will fail.
176
177FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
178 Similar to SAVE_REGS but the registering of a
179 ftrace_ops on an architecture that does not support passing of regs
180 will not fail with this flag set. But the callback must check if
181 regs is NULL or not to determine if the architecture supports it.
182
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500183FTRACE_OPS_FL_RECURSION
184 By default, it is expected that the callback can handle recursion.
185 But if the callback is not that worried about overehead, then
186 setting this bit will add the recursion protection around the
187 callback by calling a helper function that will do the recursion
188 protection and only call the callback if it did not recurse.
Steven Rostedtb4d94212017-10-31 10:07:05 -0400189
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500190 Note, if this flag is not set, and recursion does occur, it could
191 cause the system to crash, and possibly reboot via a triple fault.
Steven Rostedtb4d94212017-10-31 10:07:05 -0400192
Steven Rostedt (VMware)a25d0362020-11-05 21:32:45 -0500193 Not, if this flag is set, then the callback will always be called
194 with preemption disabled. If it is not set, then it is possible
195 (but not guaranteed) that the callback will be called in
196 preemptable context.
Steven Rostedtb4d94212017-10-31 10:07:05 -0400197
198FTRACE_OPS_FL_IPMODIFY
199 Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack"
200 the traced function (have another function called instead of the
201 traced function), it requires setting this flag. This is what live
202 kernel patches uses. Without this flag the pt_regs->ip can not be
203 modified.
204
205 Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be
206 registered to any given function at a time.
207
208FTRACE_OPS_FL_RCU
209 If this is set, then the callback will only be called by functions
210 where RCU is "watching". This is required if the callback function
211 performs any rcu_read_lock() operation.
212
213 RCU stops watching when the system goes idle, the time when a CPU
214 is taken down and comes back online, and when entering from kernel
215 to user space and back to kernel space. During these transitions,
216 a callback may be executed and RCU synchronization will not protect
217 it.
218
Miroslav Benes71624312019-10-16 13:33:13 +0200219FTRACE_OPS_FL_PERMANENT
220 If this is set on any ftrace ops, then the tracing cannot disabled by
221 writing 0 to the proc sysctl ftrace_enabled. Equally, a callback with
222 the flag set cannot be registered if ftrace_enabled is 0.
223
224 Livepatch uses it not to lose the function redirection, so the system
225 stays protected.
226
Steven Rostedtb4d94212017-10-31 10:07:05 -0400227
228Filtering which functions to trace
229==================================
230
231If a callback is only to be called from specific functions, a filter must be
232set up. The filters are added by name, or ip if it is known.
233
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100234.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400235
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100236 int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
237 int len, int reset);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400238
239@ops
240 The ops to set the filter with
241
242@buf
243 The string that holds the function filter text.
244@len
245 The length of the string.
246
247@reset
248 Non-zero to reset all filters before applying this filter.
249
250Filters denote which functions should be enabled when tracing is enabled.
251If @buf is NULL and reset is set, all functions will be enabled for tracing.
252
253The @buf can also be a glob expression to enable all functions that
254match a specific pattern.
255
Mauro Carvalho Chehab5fb94e92018-05-08 15:14:57 -0300256See Filter Commands in :file:`Documentation/trace/ftrace.rst`.
Steven Rostedtb4d94212017-10-31 10:07:05 -0400257
Changbin Dub3fdd1f2018-02-17 13:39:36 +0800258To just trace the schedule function:
Steven Rostedtb4d94212017-10-31 10:07:05 -0400259
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100260.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400261
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100262 ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400263
264To add more functions, call the ftrace_set_filter() more than once with the
265@reset parameter set to zero. To remove the current filter set and replace it
266with new functions defined by @buf, have @reset be non-zero.
267
Changbin Dub3fdd1f2018-02-17 13:39:36 +0800268To remove all the filtered functions and trace all functions:
Steven Rostedtb4d94212017-10-31 10:07:05 -0400269
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100270.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400271
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100272 ret = ftrace_set_filter(&ops, NULL, 0, 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400273
274
275Sometimes more than one function has the same name. To trace just a specific
276function in this case, ftrace_set_filter_ip() can be used.
277
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100278.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400279
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100280 ret = ftrace_set_filter_ip(&ops, ip, 0, 0);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400281
282Although the ip must be the address where the call to fentry or mcount is
283located in the function. This function is used by perf and kprobes that
284gets the ip address from the user (usually using debug info from the kernel).
285
286If a glob is used to set the filter, functions can be added to a "notrace"
287list that will prevent those functions from calling the callback.
288The "notrace" list takes precedence over the "filter" list. If the
289two lists are non-empty and contain the same functions, the callback will not
290be called by any function.
291
292An empty "notrace" list means to allow all functions defined by the filter
293to be traced.
294
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100295.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400296
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100297 int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
298 int len, int reset);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400299
300This takes the same parameters as ftrace_set_filter() but will add the
301functions it finds to not be traced. This is a separate list from the
302filter list, and this function does not modify the filter list.
303
304A non-zero @reset will clear the "notrace" list before adding functions
305that match @buf to it.
306
307Clearing the "notrace" list is the same as clearing the filter list
308
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100309.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400310
311 ret = ftrace_set_notrace(&ops, NULL, 0, 1);
312
313The filter and notrace lists may be changed at any time. If only a set of
314functions should call the callback, it is best to set the filters before
315registering the callback. But the changes may also happen after the callback
316has been registered.
317
318If a filter is in place, and the @reset is non-zero, and @buf contains a
319matching glob to functions, the switch will happen during the time of
320the ftrace_set_filter() call. At no time will all functions call the callback.
321
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100322.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400323
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100324 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400325
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100326 register_ftrace_function(&ops);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400327
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100328 msleep(10);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400329
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100330 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400331
332is not the same as:
333
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100334.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400335
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100336 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400337
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100338 register_ftrace_function(&ops);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400339
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100340 msleep(10);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400341
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100342 ftrace_set_filter(&ops, NULL, 0, 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400343
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100344 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400345
346As the latter will have a short time where all functions will call
347the callback, between the time of the reset, and the time of the
348new setting of the filter.