Jiri Slaby | ffedeeb | 2019-10-11 13:50:41 +0200 | [diff] [blame] | 1 | Assembler Annotations |
| 2 | ===================== |
| 3 | |
| 4 | Copyright (c) 2017-2019 Jiri Slaby |
| 5 | |
| 6 | This document describes the new macros for annotation of data and code in |
| 7 | assembly. In particular, it contains information about ``SYM_FUNC_START``, |
| 8 | ``SYM_FUNC_END``, ``SYM_CODE_START``, and similar. |
| 9 | |
| 10 | Rationale |
| 11 | --------- |
| 12 | Some code like entries, trampolines, or boot code needs to be written in |
| 13 | assembly. The same as in C, such code is grouped into functions and |
| 14 | accompanied with data. Standard assemblers do not force users into precisely |
| 15 | marking these pieces as code, data, or even specifying their length. |
| 16 | Nevertheless, assemblers provide developers with such annotations to aid |
| 17 | debuggers throughout assembly. On top of that, developers also want to mark |
| 18 | some functions as *global* in order to be visible outside of their translation |
| 19 | units. |
| 20 | |
| 21 | Over time, the Linux kernel has adopted macros from various projects (like |
| 22 | ``binutils``) to facilitate such annotations. So for historic reasons, |
| 23 | developers have been using ``ENTRY``, ``END``, ``ENDPROC``, and other |
| 24 | annotations in assembly. Due to the lack of their documentation, the macros |
| 25 | are used in rather wrong contexts at some locations. Clearly, ``ENTRY`` was |
| 26 | intended to denote the beginning of global symbols (be it data or code). |
| 27 | ``END`` used to mark the end of data or end of special functions with |
| 28 | *non-standard* calling convention. In contrast, ``ENDPROC`` should annotate |
| 29 | only ends of *standard* functions. |
| 30 | |
| 31 | When these macros are used correctly, they help assemblers generate a nice |
| 32 | object with both sizes and types set correctly. For example, the result of |
| 33 | ``arch/x86/lib/putuser.S``:: |
| 34 | |
| 35 | Num: Value Size Type Bind Vis Ndx Name |
| 36 | 25: 0000000000000000 33 FUNC GLOBAL DEFAULT 1 __put_user_1 |
| 37 | 29: 0000000000000030 37 FUNC GLOBAL DEFAULT 1 __put_user_2 |
| 38 | 32: 0000000000000060 36 FUNC GLOBAL DEFAULT 1 __put_user_4 |
| 39 | 35: 0000000000000090 37 FUNC GLOBAL DEFAULT 1 __put_user_8 |
| 40 | |
| 41 | This is not only important for debugging purposes. When there are properly |
| 42 | annotated objects like this, tools can be run on them to generate more useful |
| 43 | information. In particular, on properly annotated objects, ``objtool`` can be |
| 44 | run to check and fix the object if needed. Currently, ``objtool`` can report |
| 45 | missing frame pointer setup/destruction in functions. It can also |
| 46 | automatically generate annotations for :doc:`ORC unwinder <x86/orc-unwinder>` |
| 47 | for most code. Both of these are especially important to support reliable |
| 48 | stack traces which are in turn necessary for :doc:`Kernel live patching |
| 49 | <livepatch/livepatch>`. |
| 50 | |
| 51 | Caveat and Discussion |
| 52 | --------------------- |
| 53 | As one might realize, there were only three macros previously. That is indeed |
| 54 | insufficient to cover all the combinations of cases: |
| 55 | |
| 56 | * standard/non-standard function |
| 57 | * code/data |
| 58 | * global/local symbol |
| 59 | |
| 60 | There was a discussion_ and instead of extending the current ``ENTRY/END*`` |
| 61 | macros, it was decided that brand new macros should be introduced instead:: |
| 62 | |
| 63 | So how about using macro names that actually show the purpose, instead |
| 64 | of importing all the crappy, historic, essentially randomly chosen |
| 65 | debug symbol macro names from the binutils and older kernels? |
| 66 | |
Thorsten Leemhuis | a9d85ef | 2021-10-07 10:05:00 +0200 | [diff] [blame] | 67 | .. _discussion: https://lore.kernel.org/r/20170217104757.28588-1-jslaby@suse.cz |
Jiri Slaby | ffedeeb | 2019-10-11 13:50:41 +0200 | [diff] [blame] | 68 | |
| 69 | Macros Description |
| 70 | ------------------ |
| 71 | |
| 72 | The new macros are prefixed with the ``SYM_`` prefix and can be divided into |
| 73 | three main groups: |
| 74 | |
| 75 | 1. ``SYM_FUNC_*`` -- to annotate C-like functions. This means functions with |
Will Deacon | 6535a39 | 2020-01-15 18:43:05 +0000 | [diff] [blame] | 76 | standard C calling conventions. For example, on x86, this means that the |
| 77 | stack contains a return address at the predefined place and a return from |
| 78 | the function can happen in a standard way. When frame pointers are enabled, |
| 79 | save/restore of frame pointer shall happen at the start/end of a function, |
| 80 | respectively, too. |
Jiri Slaby | ffedeeb | 2019-10-11 13:50:41 +0200 | [diff] [blame] | 81 | |
| 82 | Checking tools like ``objtool`` should ensure such marked functions conform |
| 83 | to these rules. The tools can also easily annotate these functions with |
| 84 | debugging information (like *ORC data*) automatically. |
| 85 | |
| 86 | 2. ``SYM_CODE_*`` -- special functions called with special stack. Be it |
| 87 | interrupt handlers with special stack content, trampolines, or startup |
| 88 | functions. |
| 89 | |
| 90 | Checking tools mostly ignore checking of these functions. But some debug |
| 91 | information still can be generated automatically. For correct debug data, |
| 92 | this code needs hints like ``UNWIND_HINT_REGS`` provided by developers. |
| 93 | |
| 94 | 3. ``SYM_DATA*`` -- obviously data belonging to ``.data`` sections and not to |
| 95 | ``.text``. Data do not contain instructions, so they have to be treated |
| 96 | specially by the tools: they should not treat the bytes as instructions, |
| 97 | nor assign any debug information to them. |
| 98 | |
| 99 | Instruction Macros |
| 100 | ~~~~~~~~~~~~~~~~~~ |
| 101 | This section covers ``SYM_FUNC_*`` and ``SYM_CODE_*`` enumerated above. |
| 102 | |
Nick Desaulniers | 5e6dca8 | 2021-01-12 11:46:24 -0800 | [diff] [blame] | 103 | ``objtool`` requires that all code must be contained in an ELF symbol. Symbol |
| 104 | names that have a ``.L`` prefix do not emit symbol table entries. ``.L`` |
| 105 | prefixed symbols can be used within a code region, but should be avoided for |
| 106 | denoting a range of code via ``SYM_*_START/END`` annotations. |
| 107 | |
Jiri Slaby | ffedeeb | 2019-10-11 13:50:41 +0200 | [diff] [blame] | 108 | * ``SYM_FUNC_START`` and ``SYM_FUNC_START_LOCAL`` are supposed to be **the |
| 109 | most frequent markings**. They are used for functions with standard calling |
| 110 | conventions -- global and local. Like in C, they both align the functions to |
| 111 | architecture specific ``__ALIGN`` bytes. There are also ``_NOALIGN`` variants |
| 112 | for special cases where developers do not want this implicit alignment. |
| 113 | |
| 114 | ``SYM_FUNC_START_WEAK`` and ``SYM_FUNC_START_WEAK_NOALIGN`` markings are |
| 115 | also offered as an assembler counterpart to the *weak* attribute known from |
| 116 | C. |
| 117 | |
| 118 | All of these **shall** be coupled with ``SYM_FUNC_END``. First, it marks |
| 119 | the sequence of instructions as a function and computes its size to the |
| 120 | generated object file. Second, it also eases checking and processing such |
| 121 | object files as the tools can trivially find exact function boundaries. |
| 122 | |
| 123 | So in most cases, developers should write something like in the following |
| 124 | example, having some asm instructions in between the macros, of course:: |
| 125 | |
Borislav Petkov | 0f42c1a | 2019-10-21 17:18:23 +0200 | [diff] [blame] | 126 | SYM_FUNC_START(memset) |
Jiri Slaby | ffedeeb | 2019-10-11 13:50:41 +0200 | [diff] [blame] | 127 | ... asm insns ... |
Borislav Petkov | 0f42c1a | 2019-10-21 17:18:23 +0200 | [diff] [blame] | 128 | SYM_FUNC_END(memset) |
Jiri Slaby | ffedeeb | 2019-10-11 13:50:41 +0200 | [diff] [blame] | 129 | |
| 130 | In fact, this kind of annotation corresponds to the now deprecated ``ENTRY`` |
| 131 | and ``ENDPROC`` macros. |
| 132 | |
| 133 | * ``SYM_FUNC_START_ALIAS`` and ``SYM_FUNC_START_LOCAL_ALIAS`` serve for those |
| 134 | who decided to have two or more names for one function. The typical use is:: |
| 135 | |
| 136 | SYM_FUNC_START_ALIAS(__memset) |
| 137 | SYM_FUNC_START(memset) |
| 138 | ... asm insns ... |
| 139 | SYM_FUNC_END(memset) |
| 140 | SYM_FUNC_END_ALIAS(__memset) |
| 141 | |
| 142 | In this example, one can call ``__memset`` or ``memset`` with the same |
| 143 | result, except the debug information for the instructions is generated to |
| 144 | the object file only once -- for the non-``ALIAS`` case. |
| 145 | |
| 146 | * ``SYM_CODE_START`` and ``SYM_CODE_START_LOCAL`` should be used only in |
| 147 | special cases -- if you know what you are doing. This is used exclusively |
| 148 | for interrupt handlers and similar where the calling convention is not the C |
| 149 | one. ``_NOALIGN`` variants exist too. The use is the same as for the ``FUNC`` |
| 150 | category above:: |
| 151 | |
| 152 | SYM_CODE_START_LOCAL(bad_put_user) |
| 153 | ... asm insns ... |
| 154 | SYM_CODE_END(bad_put_user) |
| 155 | |
| 156 | Again, every ``SYM_CODE_START*`` **shall** be coupled by ``SYM_CODE_END``. |
| 157 | |
| 158 | To some extent, this category corresponds to deprecated ``ENTRY`` and |
| 159 | ``END``. Except ``END`` had several other meanings too. |
| 160 | |
| 161 | * ``SYM_INNER_LABEL*`` is used to denote a label inside some |
| 162 | ``SYM_{CODE,FUNC}_START`` and ``SYM_{CODE,FUNC}_END``. They are very similar |
| 163 | to C labels, except they can be made global. An example of use:: |
| 164 | |
| 165 | SYM_CODE_START(ftrace_caller) |
| 166 | /* save_mcount_regs fills in first two parameters */ |
| 167 | ... |
| 168 | |
| 169 | SYM_INNER_LABEL(ftrace_caller_op_ptr, SYM_L_GLOBAL) |
| 170 | /* Load the ftrace_ops into the 3rd parameter */ |
| 171 | ... |
| 172 | |
| 173 | SYM_INNER_LABEL(ftrace_call, SYM_L_GLOBAL) |
| 174 | call ftrace_stub |
| 175 | ... |
| 176 | retq |
| 177 | SYM_CODE_END(ftrace_caller) |
| 178 | |
| 179 | Data Macros |
| 180 | ~~~~~~~~~~~ |
| 181 | Similar to instructions, there is a couple of macros to describe data in the |
| 182 | assembly. |
| 183 | |
| 184 | * ``SYM_DATA_START`` and ``SYM_DATA_START_LOCAL`` mark the start of some data |
| 185 | and shall be used in conjunction with either ``SYM_DATA_END``, or |
| 186 | ``SYM_DATA_END_LABEL``. The latter adds also a label to the end, so that |
| 187 | people can use ``lstack`` and (local) ``lstack_end`` in the following |
| 188 | example:: |
| 189 | |
| 190 | SYM_DATA_START_LOCAL(lstack) |
| 191 | .skip 4096 |
| 192 | SYM_DATA_END_LABEL(lstack, SYM_L_LOCAL, lstack_end) |
| 193 | |
| 194 | * ``SYM_DATA`` and ``SYM_DATA_LOCAL`` are variants for simple, mostly one-line |
| 195 | data:: |
| 196 | |
| 197 | SYM_DATA(HEAP, .long rm_heap) |
| 198 | SYM_DATA(heap_end, .long rm_stack) |
| 199 | |
| 200 | In the end, they expand to ``SYM_DATA_START`` with ``SYM_DATA_END`` |
| 201 | internally. |
| 202 | |
| 203 | Support Macros |
| 204 | ~~~~~~~~~~~~~~ |
| 205 | All the above reduce themselves to some invocation of ``SYM_START``, |
| 206 | ``SYM_END``, or ``SYM_ENTRY`` at last. Normally, developers should avoid using |
| 207 | these. |
| 208 | |
| 209 | Further, in the above examples, one could see ``SYM_L_LOCAL``. There are also |
| 210 | ``SYM_L_GLOBAL`` and ``SYM_L_WEAK``. All are intended to denote linkage of a |
| 211 | symbol marked by them. They are used either in ``_LABEL`` variants of the |
| 212 | earlier macros, or in ``SYM_START``. |
| 213 | |
| 214 | |
| 215 | Overriding Macros |
| 216 | ~~~~~~~~~~~~~~~~~ |
| 217 | Architecture can also override any of the macros in their own |
| 218 | ``asm/linkage.h``, including macros specifying the type of a symbol |
| 219 | (``SYM_T_FUNC``, ``SYM_T_OBJECT``, and ``SYM_T_NONE``). As every macro |
| 220 | described in this file is surrounded by ``#ifdef`` + ``#endif``, it is enough |
| 221 | to define the macros differently in the aforementioned architecture-dependent |
| 222 | header. |