Steven Rostedt | b4d9421 | 2017-10-31 10:07:05 -0400 | [diff] [blame] | 1 | ================================= |
| 2 | Using ftrace to hook to functions |
| 3 | ================================= |
| 4 | |
| 5 | .. Copyright 2017 VMware Inc. |
| 6 | .. Author: Steven Rostedt <srostedt@goodmis.org> |
| 7 | .. License: The GNU Free Documentation License, Version 1.2 |
| 8 | .. (dual licensed under the GPL v2) |
| 9 | |
| 10 | Written for: 4.14 |
| 11 | |
| 12 | Introduction |
| 13 | ============ |
| 14 | |
| 15 | The ftrace infrastructure was originially created to attach callbacks to the |
| 16 | beginning of functions in order to record and trace the flow of the kernel. |
| 17 | But callbacks to the start of a function can have other use cases. Either |
| 18 | for live kernel patching, or for security monitoring. This document describes |
| 19 | how to use ftrace to implement your own function callbacks. |
| 20 | |
| 21 | |
| 22 | The ftrace context |
| 23 | ================== |
| 24 | |
| 25 | WARNING: The ability to add a callback to almost any function within the |
| 26 | kernel comes with risks. A callback can be called from any context |
| 27 | (normal, softirq, irq, and NMI). Callbacks can also be called just before |
| 28 | going to idle, during CPU bring up and takedown, or going to user space. |
| 29 | This requires extra care to what can be done inside a callback. A callback |
| 30 | can be called outside the protective scope of RCU. |
| 31 | |
| 32 | The ftrace infrastructure has some protections agains recursions and RCU |
| 33 | but one must still be very careful how they use the callbacks. |
| 34 | |
| 35 | |
| 36 | The ftrace_ops structure |
| 37 | ======================== |
| 38 | |
| 39 | To register a function callback, a ftrace_ops is required. This structure |
| 40 | is used to tell ftrace what function should be called as the callback |
| 41 | as well as what protections the callback will perform and not require |
| 42 | ftrace to handle. |
| 43 | |
| 44 | There is only one field that is needed to be set when registering |
| 45 | an ftrace_ops with ftrace:: |
| 46 | |
| 47 | .. code-block: c |
| 48 | |
| 49 | struct ftrace_ops ops = { |
| 50 | .func = my_callback_func, |
| 51 | .flags = MY_FTRACE_FLAGS |
| 52 | .private = any_private_data_structure, |
| 53 | }; |
| 54 | |
| 55 | Both .flags and .private are optional. Only .func is required. |
| 56 | |
| 57 | To enable tracing call:: |
| 58 | |
| 59 | .. c:function:: register_ftrace_function(&ops); |
| 60 | |
| 61 | To disable tracing call:: |
| 62 | |
| 63 | .. c:function:: unregister_ftrace_function(&ops); |
| 64 | |
| 65 | The above is defined by including the header:: |
| 66 | |
| 67 | .. c:function:: #include <linux/ftrace.h> |
| 68 | |
| 69 | The registered callback will start being called some time after the |
| 70 | register_ftrace_function() is called and before it returns. The exact time |
| 71 | that callbacks start being called is dependent upon architecture and scheduling |
| 72 | of services. The callback itself will have to handle any synchronization if it |
| 73 | must begin at an exact moment. |
| 74 | |
| 75 | The unregister_ftrace_function() will guarantee that the callback is |
| 76 | no longer being called by functions after the unregister_ftrace_function() |
| 77 | returns. Note that to perform this guarantee, the unregister_ftrace_function() |
| 78 | may take some time to finish. |
| 79 | |
| 80 | |
| 81 | The callback function |
| 82 | ===================== |
| 83 | |
| 84 | The prototype of the callback function is as follows (as of v4.14):: |
| 85 | |
| 86 | .. code-block: c |
| 87 | |
| 88 | void callback_func(unsigned long ip, unsigned long parent_ip, |
| 89 | struct ftrace_ops *op, struct pt_regs *regs); |
| 90 | |
| 91 | @ip |
| 92 | This is the instruction pointer of the function that is being traced. |
| 93 | (where the fentry or mcount is within the function) |
| 94 | |
| 95 | @parent_ip |
| 96 | This is the instruction pointer of the function that called the |
| 97 | the function being traced (where the call of the function occurred). |
| 98 | |
| 99 | @op |
| 100 | This is a pointer to ftrace_ops that was used to register the callback. |
| 101 | This can be used to pass data to the callback via the private pointer. |
| 102 | |
| 103 | @regs |
| 104 | If the FTRACE_OPS_FL_SAVE_REGS or FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED |
| 105 | flags are set in the ftrace_ops structure, then this will be pointing |
| 106 | to the pt_regs structure like it would be if an breakpoint was placed |
| 107 | at the start of the function where ftrace was tracing. Otherwise it |
| 108 | either contains garbage, or NULL. |
| 109 | |
| 110 | |
| 111 | The ftrace FLAGS |
| 112 | ================ |
| 113 | |
| 114 | The ftrace_ops flags are all defined and documented in include/linux/ftrace.h. |
| 115 | Some of the flags are used for internal infrastructure of ftrace, but the |
| 116 | ones that users should be aware of are the following: |
| 117 | |
| 118 | FTRACE_OPS_FL_SAVE_REGS |
| 119 | If the callback requires reading or modifying the pt_regs |
| 120 | passed to the callback, then it must set this flag. Registering |
| 121 | a ftrace_ops with this flag set on an architecture that does not |
| 122 | support passing of pt_regs to the callback will fail. |
| 123 | |
| 124 | FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED |
| 125 | Similar to SAVE_REGS but the registering of a |
| 126 | ftrace_ops on an architecture that does not support passing of regs |
| 127 | will not fail with this flag set. But the callback must check if |
| 128 | regs is NULL or not to determine if the architecture supports it. |
| 129 | |
| 130 | FTRACE_OPS_FL_RECURSION_SAFE |
| 131 | By default, a wrapper is added around the callback to |
| 132 | make sure that recursion of the function does not occur. That is, |
| 133 | if a function that is called as a result of the callback's execution |
| 134 | is also traced, ftrace will prevent the callback from being called |
| 135 | again. But this wrapper adds some overhead, and if the callback is |
| 136 | safe from recursion, it can set this flag to disable the ftrace |
| 137 | protection. |
| 138 | |
| 139 | Note, if this flag is set, and recursion does occur, it could cause |
| 140 | the system to crash, and possibly reboot via a triple fault. |
| 141 | |
| 142 | It is OK if another callback traces a function that is called by a |
| 143 | callback that is marked recursion safe. Recursion safe callbacks |
| 144 | must never trace any function that are called by the callback |
| 145 | itself or any nested functions that those functions call. |
| 146 | |
| 147 | If this flag is set, it is possible that the callback will also |
| 148 | be called with preemption enabled (when CONFIG_PREEMPT is set), |
| 149 | but this is not guaranteed. |
| 150 | |
| 151 | FTRACE_OPS_FL_IPMODIFY |
| 152 | Requires FTRACE_OPS_FL_SAVE_REGS set. If the callback is to "hijack" |
| 153 | the traced function (have another function called instead of the |
| 154 | traced function), it requires setting this flag. This is what live |
| 155 | kernel patches uses. Without this flag the pt_regs->ip can not be |
| 156 | modified. |
| 157 | |
| 158 | Note, only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be |
| 159 | registered to any given function at a time. |
| 160 | |
| 161 | FTRACE_OPS_FL_RCU |
| 162 | If this is set, then the callback will only be called by functions |
| 163 | where RCU is "watching". This is required if the callback function |
| 164 | performs any rcu_read_lock() operation. |
| 165 | |
| 166 | RCU stops watching when the system goes idle, the time when a CPU |
| 167 | is taken down and comes back online, and when entering from kernel |
| 168 | to user space and back to kernel space. During these transitions, |
| 169 | a callback may be executed and RCU synchronization will not protect |
| 170 | it. |
| 171 | |
| 172 | |
| 173 | Filtering which functions to trace |
| 174 | ================================== |
| 175 | |
| 176 | If a callback is only to be called from specific functions, a filter must be |
| 177 | set up. The filters are added by name, or ip if it is known. |
| 178 | |
| 179 | .. code-block: c |
| 180 | |
| 181 | int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, |
| 182 | int len, int reset); |
| 183 | |
| 184 | @ops |
| 185 | The ops to set the filter with |
| 186 | |
| 187 | @buf |
| 188 | The string that holds the function filter text. |
| 189 | @len |
| 190 | The length of the string. |
| 191 | |
| 192 | @reset |
| 193 | Non-zero to reset all filters before applying this filter. |
| 194 | |
| 195 | Filters denote which functions should be enabled when tracing is enabled. |
| 196 | If @buf is NULL and reset is set, all functions will be enabled for tracing. |
| 197 | |
| 198 | The @buf can also be a glob expression to enable all functions that |
| 199 | match a specific pattern. |
| 200 | |
| 201 | See Filter Commands in :file:`Documentation/trace/ftrace.txt`. |
| 202 | |
| 203 | To just trace the schedule function:: |
| 204 | |
| 205 | .. code-block: c |
| 206 | |
| 207 | ret = ftrace_set_filter(&ops, "schedule", strlen("schedule"), 0); |
| 208 | |
| 209 | To add more functions, call the ftrace_set_filter() more than once with the |
| 210 | @reset parameter set to zero. To remove the current filter set and replace it |
| 211 | with new functions defined by @buf, have @reset be non-zero. |
| 212 | |
| 213 | To remove all the filtered functions and trace all functions:: |
| 214 | |
| 215 | .. code-block: c |
| 216 | |
| 217 | ret = ftrace_set_filter(&ops, NULL, 0, 1); |
| 218 | |
| 219 | |
| 220 | Sometimes more than one function has the same name. To trace just a specific |
| 221 | function in this case, ftrace_set_filter_ip() can be used. |
| 222 | |
| 223 | .. code-block: c |
| 224 | |
| 225 | ret = ftrace_set_filter_ip(&ops, ip, 0, 0); |
| 226 | |
| 227 | Although the ip must be the address where the call to fentry or mcount is |
| 228 | located in the function. This function is used by perf and kprobes that |
| 229 | gets the ip address from the user (usually using debug info from the kernel). |
| 230 | |
| 231 | If a glob is used to set the filter, functions can be added to a "notrace" |
| 232 | list that will prevent those functions from calling the callback. |
| 233 | The "notrace" list takes precedence over the "filter" list. If the |
| 234 | two lists are non-empty and contain the same functions, the callback will not |
| 235 | be called by any function. |
| 236 | |
| 237 | An empty "notrace" list means to allow all functions defined by the filter |
| 238 | to be traced. |
| 239 | |
| 240 | .. code-block: c |
| 241 | |
| 242 | int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, |
| 243 | int len, int reset); |
| 244 | |
| 245 | This takes the same parameters as ftrace_set_filter() but will add the |
| 246 | functions it finds to not be traced. This is a separate list from the |
| 247 | filter list, and this function does not modify the filter list. |
| 248 | |
| 249 | A non-zero @reset will clear the "notrace" list before adding functions |
| 250 | that match @buf to it. |
| 251 | |
| 252 | Clearing the "notrace" list is the same as clearing the filter list |
| 253 | |
| 254 | .. code-block: c |
| 255 | |
| 256 | ret = ftrace_set_notrace(&ops, NULL, 0, 1); |
| 257 | |
| 258 | The filter and notrace lists may be changed at any time. If only a set of |
| 259 | functions should call the callback, it is best to set the filters before |
| 260 | registering the callback. But the changes may also happen after the callback |
| 261 | has been registered. |
| 262 | |
| 263 | If a filter is in place, and the @reset is non-zero, and @buf contains a |
| 264 | matching glob to functions, the switch will happen during the time of |
| 265 | the ftrace_set_filter() call. At no time will all functions call the callback. |
| 266 | |
| 267 | .. code-block: c |
| 268 | |
| 269 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
| 270 | |
| 271 | register_ftrace_function(&ops); |
| 272 | |
| 273 | msleep(10); |
| 274 | |
| 275 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 1); |
| 276 | |
| 277 | is not the same as: |
| 278 | |
| 279 | .. code-block: c |
| 280 | |
| 281 | ftrace_set_filter(&ops, "schedule", strlen("schedule"), 1); |
| 282 | |
| 283 | register_ftrace_function(&ops); |
| 284 | |
| 285 | msleep(10); |
| 286 | |
| 287 | ftrace_set_filter(&ops, NULL, 0, 1); |
| 288 | |
| 289 | ftrace_set_filter(&ops, "try_to_wake_up", strlen("try_to_wake_up"), 0); |
| 290 | |
| 291 | As the latter will have a short time where all functions will call |
| 292 | the callback, between the time of the reset, and the time of the |
| 293 | new setting of the filter. |