blob: 1fbc69894eed0b552dd04dd0037e810ba499331f [file] [log] [blame]
Steven Rostedtb4d94212017-10-31 10:07:05 -04001=================================
2Using ftrace to hook to functions
3=================================
4
5.. Copyright 2017 VMware Inc.
6.. Author: Steven Rostedt <srostedt@goodmis.org>
7.. License: The GNU Free Documentation License, Version 1.2
8.. (dual licensed under the GPL v2)
9
10Written for: 4.14
11
12Introduction
13============
14
Masanari Iidae37274f2017-11-28 12:26:13 +090015The ftrace infrastructure was originally created to attach callbacks to the
Steven Rostedtb4d94212017-10-31 10:07:05 -040016beginning of functions in order to record and trace the flow of the kernel.
17But callbacks to the start of a function can have other use cases. Either
18for live kernel patching, or for security monitoring. This document describes
19how to use ftrace to implement your own function callbacks.
20
21
22The ftrace context
23==================
Changbin Dub3fdd1f2018-02-17 13:39:36 +080024.. warning::
Steven Rostedtb4d94212017-10-31 10:07:05 -040025
Changbin Dub3fdd1f2018-02-17 13:39:36 +080026 The ability to add a callback to almost any function within the
27 kernel comes with risks. A callback can be called from any context
28 (normal, softirq, irq, and NMI). Callbacks can also be called just before
29 going to idle, during CPU bring up and takedown, or going to user space.
30 This requires extra care to what can be done inside a callback. A callback
31 can be called outside the protective scope of RCU.
Steven Rostedtb4d94212017-10-31 10:07:05 -040032
Masanari Iidae37274f2017-11-28 12:26:13 +090033The ftrace infrastructure has some protections against recursions and RCU
Steven Rostedtb4d94212017-10-31 10:07:05 -040034but one must still be very careful how they use the callbacks.
35
36
37The ftrace_ops structure
38========================
39
40To register a function callback, a ftrace_ops is required. This structure
41is used to tell ftrace what function should be called as the callback
42as well as what protections the callback will perform and not require
43ftrace to handle.
44
45There is only one field that is needed to be set when registering
Markus Heiser2cd6ff42017-12-12 11:22:25 +010046an ftrace_ops with ftrace:
Steven Rostedtb4d94212017-10-31 10:07:05 -040047
Markus Heiser2cd6ff42017-12-12 11:22:25 +010048.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -040049
50 struct ftrace_ops ops = {
51 .func = my_callback_func,
52 .flags = MY_FTRACE_FLAGS
53 .private = any_private_data_structure,
54 };
55
56Both .flags and .private are optional. Only .func is required.
57
Changbin Dub3fdd1f2018-02-17 13:39:36 +080058To enable tracing call:
Steven Rostedtb4d94212017-10-31 10:07:05 -040059
60.. c:function:: register_ftrace_function(&ops);
61
Changbin Dub3fdd1f2018-02-17 13:39:36 +080062To disable tracing call:
Steven Rostedtb4d94212017-10-31 10:07:05 -040063
64.. c:function:: unregister_ftrace_function(&ops);
65
Changbin Dub3fdd1f2018-02-17 13:39:36 +080066The above is defined by including the header:
Steven Rostedtb4d94212017-10-31 10:07:05 -040067
68.. c:function:: #include <linux/ftrace.h>
69
70The registered callback will start being called some time after the
71register_ftrace_function() is called and before it returns. The exact time
72that callbacks start being called is dependent upon architecture and scheduling
73of services. The callback itself will have to handle any synchronization if it
74must begin at an exact moment.
75
76The unregister_ftrace_function() will guarantee that the callback is
77no longer being called by functions after the unregister_ftrace_function()
78returns. Note that to perform this guarantee, the unregister_ftrace_function()
79may take some time to finish.
80
81
82The callback function
83=====================
84
Markus Heiser2cd6ff42017-12-12 11:22:25 +010085The prototype of the callback function is as follows (as of v4.14):
Steven Rostedtb4d94212017-10-31 10:07:05 -040086
Markus Heiser2cd6ff42017-12-12 11:22:25 +010087.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -040088
Markus Heiser2cd6ff42017-12-12 11:22:25 +010089 void callback_func(unsigned long ip, unsigned long parent_ip,
90 struct ftrace_ops *op, struct pt_regs *regs);
Steven Rostedtb4d94212017-10-31 10:07:05 -040091
92@ip
93 This is the instruction pointer of the function that is being traced.
94 (where the fentry or mcount is within the function)
95
96@parent_ip
97 This is the instruction pointer of the function that called the
98 the function being traced (where the call of the function occurred).
99
100@op
101 This is a pointer to ftrace_ops that was used to register the callback.
102 This can be used to pass data to the callback via the private pointer.
103
104@regs
105 If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
106 flags are set in the ftrace_ops structure, then this will be pointing
107 to the pt_regs structure like it would be if an breakpoint was placed
108 at the start of the function where ftrace was tracing. Otherwise it
109 either contains garbage, or NULL.
110
111
112The ftrace FLAGS
113================
114
115The ftrace_ops flags are all defined and documented in include/linux/ftrace.h.
116Some of the flags are used for internal infrastructure of ftrace, but the
117ones that users should be aware of are the following:
118
119FTRACE_OPS_FL_SAVE_REGS
120 If the callback requires reading or modifying the pt_regs
121 passed to the callback, then it must set this flag. Registering
122 a ftrace_ops with this flag set on an architecture that does not
123 support passing of pt_regs to the callback will fail.
124
125FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED
126 Similar to SAVE_REGS but the registering of a
127 ftrace_ops on an architecture that does not support passing of regs
128 will not fail with this flag set. But the callback must check if
129 regs is NULL or not to determine if the architecture supports it.
130
131FTRACE_OPS_FL_RECURSION_SAFE
132 By default, a wrapper is added around the callback to
133 make sure that recursion of the function does not occur. That is,
134 if a function that is called as a result of the callback's execution
135 is also traced, ftrace will prevent the callback from being called
136 again. But this wrapper adds some overhead, and if the callback is
137 safe from recursion, it can set this flag to disable the ftrace
138 protection.
139
140 Note, if this flag is set, and recursion does occur, it could cause
141 the system to crash, and possibly reboot via a triple fault.
142
143 It is OK if another callback traces a function that is called by a
144 callback that is marked recursion safe. Recursion safe callbacks
145 must never trace any function that are called by the callback
146 itself or any nested functions that those functions call.
147
148 If this flag is set, it is possible that the callback will also
149 be called with preemption enabled (when CONFIG_PREEMPT is set),
150 but this is not guaranteed.
151
152FTRACE_OPS_FL_IPMODIFY
153 Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack"
154 the traced function (have another function called instead of the
155 traced function), it requires setting this flag. This is what live
156 kernel patches uses. Without this flag the pt_regs->ip can not be
157 modified.
158
159 Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be
160 registered to any given function at a time.
161
162FTRACE_OPS_FL_RCU
163 If this is set, then the callback will only be called by functions
164 where RCU is "watching". This is required if the callback function
165 performs any rcu_read_lock() operation.
166
167 RCU stops watching when the system goes idle, the time when a CPU
168 is taken down and comes back online, and when entering from kernel
169 to user space and back to kernel space. During these transitions,
170 a callback may be executed and RCU synchronization will not protect
171 it.
172
173
174Filtering which functions to trace
175==================================
176
177If a callback is only to be called from specific functions, a filter must be
178set up. The filters are added by name, or ip if it is known.
179
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100180.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400181
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100182 int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
183 int len, int reset);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400184
185@ops
186 The ops to set the filter with
187
188@buf
189 The string that holds the function filter text.
190@len
191 The length of the string.
192
193@reset
194 Non-zero to reset all filters before applying this filter.
195
196Filters denote which functions should be enabled when tracing is enabled.
197If @buf is NULL and reset is set, all functions will be enabled for tracing.
198
199The @buf can also be a glob expression to enable all functions that
200match a specific pattern.
201
Mauro Carvalho Chehab5fb94e92018-05-08 15:14:57 -0300202See Filter Commands in :file:`Documentation/trace/ftrace.rst`.
Steven Rostedtb4d94212017-10-31 10:07:05 -0400203
Changbin Dub3fdd1f2018-02-17 13:39:36 +0800204To just trace the schedule function:
Steven Rostedtb4d94212017-10-31 10:07:05 -0400205
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100206.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400207
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100208 ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400209
210To add more functions, call the ftrace_set_filter() more than once with the
211@reset parameter set to zero. To remove the current filter set and replace it
212with new functions defined by @buf, have @reset be non-zero.
213
Changbin Dub3fdd1f2018-02-17 13:39:36 +0800214To remove all the filtered functions and trace all functions:
Steven Rostedtb4d94212017-10-31 10:07:05 -0400215
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100216.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400217
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100218 ret = ftrace_set_filter(&ops, NULL, 0, 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400219
220
221Sometimes more than one function has the same name. To trace just a specific
222function in this case, ftrace_set_filter_ip() can be used.
223
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100224.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400225
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100226 ret = ftrace_set_filter_ip(&ops, ip, 0, 0);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400227
228Although the ip must be the address where the call to fentry or mcount is
229located in the function. This function is used by perf and kprobes that
230gets the ip address from the user (usually using debug info from the kernel).
231
232If a glob is used to set the filter, functions can be added to a "notrace"
233list that will prevent those functions from calling the callback.
234The "notrace" list takes precedence over the "filter" list. If the
235two lists are non-empty and contain the same functions, the callback will not
236be called by any function.
237
238An empty "notrace" list means to allow all functions defined by the filter
239to be traced.
240
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100241.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400242
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100243 int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
244 int len, int reset);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400245
246This takes the same parameters as ftrace_set_filter() but will add the
247functions it finds to not be traced. This is a separate list from the
248filter list, and this function does not modify the filter list.
249
250A non-zero @reset will clear the "notrace" list before adding functions
251that match @buf to it.
252
253Clearing the "notrace" list is the same as clearing the filter list
254
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100255.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400256
257 ret = ftrace_set_notrace(&ops, NULL, 0, 1);
258
259The filter and notrace lists may be changed at any time. If only a set of
260functions should call the callback, it is best to set the filters before
261registering the callback. But the changes may also happen after the callback
262has been registered.
263
264If a filter is in place, and the @reset is non-zero, and @buf contains a
265matching glob to functions, the switch will happen during the time of
266the ftrace_set_filter() call. At no time will all functions call the callback.
267
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100268.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400269
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100270 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400271
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100272 register_ftrace_function(&ops);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400273
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100274 msleep(10);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400275
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100276 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400277
278is not the same as:
279
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100280.. code-block:: c
Steven Rostedtb4d94212017-10-31 10:07:05 -0400281
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100282 ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400283
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100284 register_ftrace_function(&ops);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400285
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100286 msleep(10);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400287
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100288 ftrace_set_filter(&ops, NULL, 0, 1);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400289
Markus Heiser2cd6ff42017-12-12 11:22:25 +0100290 ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0);
Steven Rostedtb4d94212017-10-31 10:07:05 -0400291
292As the latter will have a short time where all functions will call
293the callback, between the time of the reset, and the time of the
294new setting of the filter.