Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 1 | ================================= |
| 2 | Using ftrace to hook to functions |
| 3 | ================================= |
| 4 | |
| 5 | .. Copyright 2017 VMware Inc. |
| 6 | .. Author: Steven Rostedt <srostedt@goodmis.org> |
| 7 | .. License: The GNU Free Documentation License, Version 1.2 |
| 8 | .. (dual licensed under the GPL v2) |
| 9 | |
| 10 | Written for: 4.14 |
| 11 | |
| 12 | Introduction |
| 13 | ============ |
| 14 | |
Masanari Iida | e37274f | 2017-11-28 12:26:13 +0900 | [diff] [blame] | 15 | The ftrace infrastructure was originally created to attach callbacks to the |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 16 | beginning of functions in order to record and trace the flow of the kernel. |
| 17 | But callbacks to the start of a function can have other use cases. Either |
| 18 | for live kernel patching, or for security monitoring. This document describes |
| 19 | how to use ftrace to implement your own function callbacks. |
| 20 | |
| 21 | |
| 22 | The ftrace context |
| 23 | ================== |
Changbin Du | b3fdd1f | 2018-02-17 13:39:36 +0800 | [diff] [blame] | 24 | .. warning:: |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 25 | |
Changbin Du | b3fdd1f | 2018-02-17 13:39:36 +0800 | [diff] [blame] | 26 | The ability to add a callback to almost any function within the |
| 27 | kernel comes with risks. A callback can be called from any context |
| 28 | (normal, softirq, irq, and NMI). Callbacks can also be called just before |
| 29 | going to idle, during CPU bring up and takedown, or going to user space. |
| 30 | This requires extra care to what can be done inside a callback. A callback |
| 31 | can be called outside the protective scope of RCU. |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 32 | |
Masanari Iida | e37274f | 2017-11-28 12:26:13 +0900 | [diff] [blame] | 33 | The ftrace infrastructure has some protections against recursions and RCU |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 34 | but one must still be very careful how they use the callbacks. |
| 35 | |
| 36 | |
| 37 | The ftrace_ops structure |
| 38 | ======================== |
| 39 | |
| 40 | To register a function callback, a ftrace_ops is required. This structure |
| 41 | is used to tell ftrace what function should be called as the callback |
| 42 | as well as what protections the callback will perform and not require |
| 43 | ftrace to handle. |
| 44 | |
| 45 | There is only one field that is needed to be set when registering |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 46 | an ftrace_ops with ftrace: |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 47 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 48 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 49 | |
| 50 | struct ftrace_ops ops = { |
| 51 | .func = my_callback_func, |
| 52 | .flags = MY_FTRACE_FLAGS |
| 53 | .private = any_private_data_structure, |
| 54 | }; |
| 55 | |
| 56 | Both .flags and .private are optional. Only .func is required. |
| 57 | |
Changbin Du | b3fdd1f | 2018-02-17 13:39:36 +0800 | [diff] [blame] | 58 | To enable tracing call: |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 59 | |
| 60 | .. c:function:: register_ftrace_function(&ops); |
| 61 | |
Changbin Du | b3fdd1f | 2018-02-17 13:39:36 +0800 | [diff] [blame] | 62 | To disable tracing call: |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 63 | |
| 64 | .. c:function:: unregister_ftrace_function(&ops); |
| 65 | |
Changbin Du | b3fdd1f | 2018-02-17 13:39:36 +0800 | [diff] [blame] | 66 | The above is defined by including the header: |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 67 | |
| 68 | .. c:function:: #include <linux/ftrace.h> |
| 69 | |
| 70 | The registered callback will start being called some time after the |
| 71 | register_ftrace_function() is called and before it returns. The exact time |
| 72 | that callbacks start being called is dependent upon architecture and scheduling |
| 73 | of services. The callback itself will have to handle any synchronization if it |
| 74 | must begin at an exact moment. |
| 75 | |
| 76 | The unregister_ftrace_function() will guarantee that the callback is |
| 77 | no longer being called by functions after the unregister_ftrace_function() |
| 78 | returns. Note that to perform this guarantee, the unregister_ftrace_function() |
| 79 | may take some time to finish. |
| 80 | |
| 81 | |
| 82 | The callback function |
| 83 | ===================== |
| 84 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 85 | The prototype of the callback function is as follows (as of v4.14): |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 86 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 87 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 88 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 89 | void callback_func(unsigned long ip, unsigned long parent_ip, |
| 90 | struct ftrace_ops *op, struct pt_regs *regs); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 91 | |
| 92 | @ip |
| 93 | This is the instruction pointer of the function that is being traced. |
| 94 | (where the fentry or mcount is within the function) |
| 95 | |
| 96 | @parent_ip |
| 97 | This is the instruction pointer of the function that called the |
| 98 | the function being traced (where the call of the function occurred). |
| 99 | |
| 100 | @op |
| 101 | This is a pointer to ftrace_ops that was used to register the callback. |
| 102 | This can be used to pass data to the callback via the private pointer. |
| 103 | |
| 104 | @regs |
| 105 | If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED |
| 106 | flags are set in the ftrace_ops structure, then this will be pointing |
| 107 | to the pt_regs structure like it would be if an breakpoint was placed |
| 108 | at the start of the function where ftrace was tracing. Otherwise it |
| 109 | either contains garbage, or NULL. |
| 110 | |
| 111 | |
| 112 | The ftrace FLAGS |
| 113 | ================ |
| 114 | |
| 115 | The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. |
| 116 | Some of the flags are used for internal infrastructure of ftrace, but the |
| 117 | ones that users should be aware of are the following: |
| 118 | |
| 119 | FTRACE_OPS_FL_SAVE_REGS |
| 120 | If the callback requires reading or modifying the pt_regs |
| 121 | passed to the callback, then it must set this flag. Registering |
| 122 | a ftrace_ops with this flag set on an architecture that does not |
| 123 | support passing of pt_regs to the callback will fail. |
| 124 | |
| 125 | FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED |
| 126 | Similar to SAVE_REGS but the registering of a |
| 127 | ftrace_ops on an architecture that does not support passing of regs |
| 128 | will not fail with this flag set. But the callback must check if |
| 129 | regs is NULL or not to determine if the architecture supports it. |
| 130 | |
| 131 | FTRACE_OPS_FL_RECURSION_SAFE |
| 132 | By default, a wrapper is added around the callback to |
| 133 | make sure that recursion of the function does not occur. That is, |
| 134 | if a function that is called as a result of the callback's execution |
| 135 | is also traced, ftrace will prevent the callback from being called |
| 136 | again. But this wrapper adds some overhead, and if the callback is |
| 137 | safe from recursion, it can set this flag to disable the ftrace |
| 138 | protection. |
| 139 | |
| 140 | Note, if this flag is set, and recursion does occur, it could cause |
| 141 | the system to crash, and possibly reboot via a triple fault. |
| 142 | |
| 143 | It is OK if another callback traces a function that is called by a |
| 144 | callback that is marked recursion safe. Recursion safe callbacks |
| 145 | must never trace any function that are called by the callback |
| 146 | itself or any nested functions that those functions call. |
| 147 | |
| 148 | If this flag is set, it is possible that the callback will also |
| 149 | be called with preemption enabled (when CONFIG_PREEMPT is set), |
| 150 | but this is not guaranteed. |
| 151 | |
| 152 | FTRACE_OPS_FL_IPMODIFY |
| 153 | Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" |
| 154 | the traced function (have another function called instead of the |
| 155 | traced function), it requires setting this flag. This is what live |
| 156 | kernel patches uses. Without this flag the pt_regs->ip can not be |
| 157 | modified. |
| 158 | |
| 159 | Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be |
| 160 | registered to any given function at a time. |
| 161 | |
| 162 | FTRACE_OPS_FL_RCU |
| 163 | If this is set, then the callback will only be called by functions |
| 164 | where RCU is "watching". This is required if the callback function |
| 165 | performs any rcu_read_lock() operation. |
| 166 | |
| 167 | RCU stops watching when the system goes idle, the time when a CPU |
| 168 | is taken down and comes back online, and when entering from kernel |
| 169 | to user space and back to kernel space. During these transitions, |
| 170 | a callback may be executed and RCU synchronization will not protect |
| 171 | it. |
| 172 | |
| 173 | |
| 174 | Filtering which functions to trace |
| 175 | ================================== |
| 176 | |
| 177 | If a callback is only to be called from specific functions, a filter must be |
| 178 | set up. The filters are added by name, or ip if it is known. |
| 179 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 180 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 181 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 182 | int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, |
| 183 | int len, int reset); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 184 | |
| 185 | @ops |
| 186 | The ops to set the filter with |
| 187 | |
| 188 | @buf |
| 189 | The string that holds the function filter text. |
| 190 | @len |
| 191 | The length of the string. |
| 192 | |
| 193 | @reset |
| 194 | Non-zero to reset all filters before applying this filter. |
| 195 | |
| 196 | Filters denote which functions should be enabled when tracing is enabled. |
| 197 | If @buf is NULL and reset is set, all functions will be enabled for tracing. |
| 198 | |
| 199 | The @buf can also be a glob expression to enable all functions that |
| 200 | match a specific pattern. |
| 201 | |
Mauro Carvalho Chehab | 5fb94e9 | 2018-05-08 15:14:57 -0300 | [diff] [blame] | 202 | See Filter Commands in :file:`Documentation/trace/ftrace.rst`. |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 203 | |
Changbin Du | b3fdd1f | 2018-02-17 13:39:36 +0800 | [diff] [blame] | 204 | To just trace the schedule function: |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 205 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 206 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 207 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 208 | ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 209 | |
| 210 | To add more functions, call the ftrace_set_filter() more than once with the |
| 211 | @reset parameter set to zero. To remove the current filter set and replace it |
| 212 | with new functions defined by @buf, have @reset be non-zero. |
| 213 | |
Changbin Du | b3fdd1f | 2018-02-17 13:39:36 +0800 | [diff] [blame] | 214 | To remove all the filtered functions and trace all functions: |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 215 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 216 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 217 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 218 | ret = ftrace_set_filter(&ops, NULL, 0, 1); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 219 | |
| 220 | |
| 221 | Sometimes more than one function has the same name. To trace just a specific |
| 222 | function in this case, ftrace_set_filter_ip() can be used. |
| 223 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 224 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 225 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 226 | ret = ftrace_set_filter_ip(&ops, ip, 0, 0); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 227 | |
| 228 | Although the ip must be the address where the call to fentry or mcount is |
| 229 | located in the function. This function is used by perf and kprobes that |
| 230 | gets the ip address from the user (usually using debug info from the kernel). |
| 231 | |
| 232 | If a glob is used to set the filter, functions can be added to a "notrace" |
| 233 | list that will prevent those functions from calling the callback. |
| 234 | The "notrace" list takes precedence over the "filter" list. If the |
| 235 | two lists are non-empty and contain the same functions, the callback will not |
| 236 | be called by any function. |
| 237 | |
| 238 | An empty "notrace" list means to allow all functions defined by the filter |
| 239 | to be traced. |
| 240 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 241 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 242 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 243 | int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, |
| 244 | int len, int reset); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 245 | |
| 246 | This takes the same parameters as ftrace_set_filter() but will add the |
| 247 | functions it finds to not be traced. This is a separate list from the |
| 248 | filter list, and this function does not modify the filter list. |
| 249 | |
| 250 | A non-zero @reset will clear the "notrace" list before adding functions |
| 251 | that match @buf to it. |
| 252 | |
| 253 | Clearing the "notrace" list is the same as clearing the filter list |
| 254 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 255 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 256 | |
| 257 | ret = ftrace_set_notrace(&ops, NULL, 0, 1); |
| 258 | |
| 259 | The filter and notrace lists may be changed at any time. If only a set of |
| 260 | functions should call the callback, it is best to set the filters before |
| 261 | registering the callback. But the changes may also happen after the callback |
| 262 | has been registered. |
| 263 | |
| 264 | If a filter is in place, and the @reset is non-zero, and @buf contains a |
| 265 | matching glob to functions, the switch will happen during the time of |
| 266 | the ftrace_set_filter() call. At no time will all functions call the callback. |
| 267 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 268 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 269 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 270 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 271 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 272 | register_ftrace_function(&ops); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 273 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 274 | msleep(10); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 275 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 276 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 277 | |
| 278 | is not the same as: |
| 279 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 280 | .. code-block:: c |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 281 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 282 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 283 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 284 | register_ftrace_function(&ops); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 285 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 286 | msleep(10); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 287 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 288 | ftrace_set_filter(&ops, NULL, 0, 1); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 289 | |
Markus Heiser | 2cd6ff4 | 2017-12-12 11:22:25 +0100 | [diff] [blame] | 290 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); |
Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 291 | |
| 292 | As the latter will have a short time where all functions will call |
| 293 | the callback, between the time of the reset, and the time of the |
| 294 | new setting of the filter. |