Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 1 | ===================== |
| 2 | BPF Type Format (BTF) |
| 3 | ===================== |
| 4 | |
| 5 | 1. Introduction |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 6 | =============== |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 7 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 8 | BTF (BPF Type Format) is the metadata format which encodes the debug info |
| 9 | related to BPF program/map. The name BTF was used initially to describe data |
| 10 | types. The BTF was later extended to include function info for defined |
| 11 | subroutines, and line info for source/line information. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 12 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 13 | The debug info is used for map pretty print, function signature, etc. The |
| 14 | function signature enables better bpf program/function kernel symbol. The line |
| 15 | info helps generate source annotated translated byte code, jited code and |
| 16 | verifier log. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 17 | |
| 18 | The BTF specification contains two parts, |
| 19 | * BTF kernel API |
| 20 | * BTF ELF file format |
| 21 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 22 | The kernel API is the contract between user space and kernel. The kernel |
| 23 | verifies the BTF info before using it. The ELF file format is a user space |
| 24 | contract between ELF file and libbpf loader. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 25 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 26 | The type and string sections are part of the BTF kernel API, describing the |
| 27 | debug info (mostly types related) referenced by the bpf program. These two |
| 28 | sections are discussed in details in :ref:`BTF_Type_String`. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 29 | |
| 30 | .. _BTF_Type_String: |
| 31 | |
| 32 | 2. BTF Type and String Encoding |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 33 | =============================== |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 34 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 35 | The file ``include/uapi/linux/btf.h`` provides high-level definition of how |
| 36 | types/strings are encoded. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 37 | |
| 38 | The beginning of data blob must be:: |
| 39 | |
| 40 | struct btf_header { |
| 41 | __u16 magic; |
| 42 | __u8 version; |
| 43 | __u8 flags; |
| 44 | __u32 hdr_len; |
| 45 | |
| 46 | /* All offsets are in bytes relative to the end of this header */ |
| 47 | __u32 type_off; /* offset of type section */ |
| 48 | __u32 type_len; /* length of type section */ |
| 49 | __u32 str_off; /* offset of string section */ |
| 50 | __u32 str_len; /* length of string section */ |
| 51 | }; |
| 52 | |
| 53 | The magic is ``0xeB9F``, which has different encoding for big and little |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 54 | endian systems, and can be used to test whether BTF is generated for big- or |
| 55 | little-endian target. The ``btf_header`` is designed to be extensible with |
| 56 | ``hdr_len`` equal to ``sizeof(struct btf_header)`` when a data blob is |
| 57 | generated. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 58 | |
| 59 | 2.1 String Encoding |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 60 | ------------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 61 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 62 | The first string in the string section must be a null string. The rest of |
| 63 | string table is a concatenation of other null-terminated strings. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 64 | |
| 65 | 2.2 Type Encoding |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 66 | ----------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 67 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 68 | The type id ``0`` is reserved for ``void`` type. The type section is parsed |
| 69 | sequentially and type id is assigned to each recognized type starting from id |
| 70 | ``1``. Currently, the following types are supported:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 71 | |
| 72 | #define BTF_KIND_INT 1 /* Integer */ |
| 73 | #define BTF_KIND_PTR 2 /* Pointer */ |
| 74 | #define BTF_KIND_ARRAY 3 /* Array */ |
| 75 | #define BTF_KIND_STRUCT 4 /* Struct */ |
| 76 | #define BTF_KIND_UNION 5 /* Union */ |
| 77 | #define BTF_KIND_ENUM 6 /* Enumeration */ |
| 78 | #define BTF_KIND_FWD 7 /* Forward */ |
| 79 | #define BTF_KIND_TYPEDEF 8 /* Typedef */ |
| 80 | #define BTF_KIND_VOLATILE 9 /* Volatile */ |
| 81 | #define BTF_KIND_CONST 10 /* Const */ |
| 82 | #define BTF_KIND_RESTRICT 11 /* Restrict */ |
| 83 | #define BTF_KIND_FUNC 12 /* Function */ |
| 84 | #define BTF_KIND_FUNC_PROTO 13 /* Function Proto */ |
Daniel Borkmann | f063c88 | 2019-04-09 23:20:08 +0200 | [diff] [blame] | 85 | #define BTF_KIND_VAR 14 /* Variable */ |
| 86 | #define BTF_KIND_DATASEC 15 /* Section */ |
Ilya Leoshkevich | 6be6a0b | 2021-02-26 21:22:56 +0100 | [diff] [blame] | 87 | #define BTF_KIND_FLOAT 16 /* Floating point */ |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 88 | #define BTF_KIND_DECL_TAG 17 /* Decl Tag */ |
Yonghong Song | d52f5c6 | 2021-11-11 17:26:56 -0800 | [diff] [blame] | 89 | #define BTF_KIND_TYPE_TAG 18 /* Type Tag */ |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 90 | |
| 91 | Note that the type section encodes debug info, not just pure types. |
| 92 | ``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram. |
| 93 | |
| 94 | Each type contains the following common data:: |
| 95 | |
| 96 | struct btf_type { |
| 97 | __u32 name_off; |
| 98 | /* "info" bits arrangement |
| 99 | * bits 0-15: vlen (e.g. # of struct's members) |
| 100 | * bits 16-23: unused |
Ilya Leoshkevich | 6be6a0b | 2021-02-26 21:22:56 +0100 | [diff] [blame] | 101 | * bits 24-28: kind (e.g. int, ptr, array...etc) |
| 102 | * bits 29-30: unused |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 103 | * bit 31: kind_flag, currently used by |
| 104 | * struct, union and fwd |
| 105 | */ |
| 106 | __u32 info; |
| 107 | /* "size" is used by INT, ENUM, STRUCT and UNION. |
| 108 | * "size" tells the size of the type it is describing. |
| 109 | * |
| 110 | * "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT, |
Yonghong Song | d52f5c6 | 2021-11-11 17:26:56 -0800 | [diff] [blame] | 111 | * FUNC, FUNC_PROTO, DECL_TAG and TYPE_TAG. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 112 | * "type" is a type_id referring to another type. |
| 113 | */ |
| 114 | union { |
| 115 | __u32 size; |
| 116 | __u32 type; |
| 117 | }; |
| 118 | }; |
| 119 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 120 | For certain kinds, the common data are followed by kind-specific data. The |
| 121 | ``name_off`` in ``struct btf_type`` specifies the offset in the string table. |
| 122 | The following sections detail encoding of each kind. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 123 | |
| 124 | 2.2.1 BTF_KIND_INT |
| 125 | ~~~~~~~~~~~~~~~~~~ |
| 126 | |
| 127 | ``struct btf_type`` encoding requirement: |
| 128 | * ``name_off``: any valid offset |
| 129 | * ``info.kind_flag``: 0 |
| 130 | * ``info.kind``: BTF_KIND_INT |
| 131 | * ``info.vlen``: 0 |
| 132 | * ``size``: the size of the int type in bytes. |
| 133 | |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 134 | ``btf_type`` is followed by a ``u32`` with the following bits arrangement:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 135 | |
| 136 | #define BTF_INT_ENCODING(VAL) (((VAL) & 0x0f000000) >> 24) |
Gary Lin | 948dc8c | 2019-05-13 17:45:48 +0800 | [diff] [blame] | 137 | #define BTF_INT_OFFSET(VAL) (((VAL) & 0x00ff0000) >> 16) |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 138 | #define BTF_INT_BITS(VAL) ((VAL) & 0x000000ff) |
| 139 | |
| 140 | The ``BTF_INT_ENCODING`` has the following attributes:: |
| 141 | |
| 142 | #define BTF_INT_SIGNED (1 << 0) |
| 143 | #define BTF_INT_CHAR (1 << 1) |
| 144 | #define BTF_INT_BOOL (1 << 2) |
| 145 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 146 | The ``BTF_INT_ENCODING()`` provides extra information: signedness, char, or |
| 147 | bool, for the int type. The char and bool encoding are mostly useful for |
| 148 | pretty print. At most one encoding can be specified for the int type. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 149 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 150 | The ``BTF_INT_BITS()`` specifies the number of actual bits held by this int |
| 151 | type. For example, a 4-bit bitfield encodes ``BTF_INT_BITS()`` equals to 4. |
| 152 | The ``btf_type.size * 8`` must be equal to or greater than ``BTF_INT_BITS()`` |
| 153 | for the type. The maximum value of ``BTF_INT_BITS()`` is 128. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 154 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 155 | The ``BTF_INT_OFFSET()`` specifies the starting bit offset to calculate values |
Jesper Dangaard Brouer | f52c97d | 2019-03-25 15:12:15 +0100 | [diff] [blame] | 156 | for this int. For example, a bitfield struct member has: |
Mauro Carvalho Chehab | d857a3f | 2019-06-07 15:54:21 -0300 | [diff] [blame] | 157 | |
Jesper Dangaard Brouer | f52c97d | 2019-03-25 15:12:15 +0100 | [diff] [blame] | 158 | * btf member bit offset 100 from the start of the structure, |
| 159 | * btf member pointing to an int type, |
| 160 | * the int type has ``BTF_INT_OFFSET() = 2`` and ``BTF_INT_BITS() = 4`` |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 161 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 162 | Then in the struct memory layout, this member will occupy ``4`` bits starting |
| 163 | from bits ``100 + 2 = 102``. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 164 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 165 | Alternatively, the bitfield struct member can be the following to access the |
| 166 | same bits as the above: |
Mauro Carvalho Chehab | d857a3f | 2019-06-07 15:54:21 -0300 | [diff] [blame] | 167 | |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 168 | * btf member bit offset 102, |
| 169 | * btf member pointing to an int type, |
| 170 | * the int type has ``BTF_INT_OFFSET() = 0`` and ``BTF_INT_BITS() = 4`` |
| 171 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 172 | The original intention of ``BTF_INT_OFFSET()`` is to provide flexibility of |
| 173 | bitfield encoding. Currently, both llvm and pahole generate |
| 174 | ``BTF_INT_OFFSET() = 0`` for all int types. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 175 | |
| 176 | 2.2.2 BTF_KIND_PTR |
| 177 | ~~~~~~~~~~~~~~~~~~ |
| 178 | |
| 179 | ``struct btf_type`` encoding requirement: |
| 180 | * ``name_off``: 0 |
| 181 | * ``info.kind_flag``: 0 |
| 182 | * ``info.kind``: BTF_KIND_PTR |
| 183 | * ``info.vlen``: 0 |
| 184 | * ``type``: the pointee type of the pointer |
| 185 | |
| 186 | No additional type data follow ``btf_type``. |
| 187 | |
| 188 | 2.2.3 BTF_KIND_ARRAY |
| 189 | ~~~~~~~~~~~~~~~~~~~~ |
| 190 | |
| 191 | ``struct btf_type`` encoding requirement: |
| 192 | * ``name_off``: 0 |
| 193 | * ``info.kind_flag``: 0 |
| 194 | * ``info.kind``: BTF_KIND_ARRAY |
| 195 | * ``info.vlen``: 0 |
| 196 | * ``size/type``: 0, not used |
| 197 | |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 198 | ``btf_type`` is followed by one ``struct btf_array``:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 199 | |
| 200 | struct btf_array { |
| 201 | __u32 type; |
| 202 | __u32 index_type; |
| 203 | __u32 nelems; |
| 204 | }; |
| 205 | |
| 206 | The ``struct btf_array`` encoding: |
| 207 | * ``type``: the element type |
| 208 | * ``index_type``: the index type |
| 209 | * ``nelems``: the number of elements for this array (``0`` is also allowed). |
| 210 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 211 | The ``index_type`` can be any regular int type (``u8``, ``u16``, ``u32``, |
| 212 | ``u64``, ``unsigned __int128``). The original design of including |
| 213 | ``index_type`` follows DWARF, which has an ``index_type`` for its array type. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 214 | Currently in BTF, beyond type verification, the ``index_type`` is not used. |
| 215 | |
| 216 | The ``struct btf_array`` allows chaining through element type to represent |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 217 | multidimensional arrays. For example, for ``int a[5][6]``, the following type |
| 218 | information illustrates the chaining: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 219 | |
| 220 | * [1]: int |
| 221 | * [2]: array, ``btf_array.type = [1]``, ``btf_array.nelems = 6`` |
| 222 | * [3]: array, ``btf_array.type = [2]``, ``btf_array.nelems = 5`` |
| 223 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 224 | Currently, both pahole and llvm collapse multidimensional array into |
| 225 | one-dimensional array, e.g., for ``a[5][6]``, the ``btf_array.nelems`` is |
| 226 | equal to ``30``. This is because the original use case is map pretty print |
| 227 | where the whole array is dumped out so one-dimensional array is enough. As |
| 228 | more BTF usage is explored, pahole and llvm can be changed to generate proper |
| 229 | chained representation for multidimensional arrays. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 230 | |
| 231 | 2.2.4 BTF_KIND_STRUCT |
| 232 | ~~~~~~~~~~~~~~~~~~~~~ |
| 233 | 2.2.5 BTF_KIND_UNION |
| 234 | ~~~~~~~~~~~~~~~~~~~~ |
| 235 | |
| 236 | ``struct btf_type`` encoding requirement: |
| 237 | * ``name_off``: 0 or offset to a valid C identifier |
| 238 | * ``info.kind_flag``: 0 or 1 |
| 239 | * ``info.kind``: BTF_KIND_STRUCT or BTF_KIND_UNION |
| 240 | * ``info.vlen``: the number of struct/union members |
| 241 | * ``info.size``: the size of the struct/union in bytes |
| 242 | |
| 243 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_member``.:: |
| 244 | |
| 245 | struct btf_member { |
| 246 | __u32 name_off; |
| 247 | __u32 type; |
| 248 | __u32 offset; |
| 249 | }; |
| 250 | |
| 251 | ``struct btf_member`` encoding: |
| 252 | * ``name_off``: offset to a valid C identifier |
| 253 | * ``type``: the member type |
| 254 | * ``offset``: <see below> |
| 255 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 256 | If the type info ``kind_flag`` is not set, the offset contains only bit offset |
| 257 | of the member. Note that the base type of the bitfield can only be int or enum |
| 258 | type. If the bitfield size is 32, the base type can be either int or enum |
| 259 | type. If the bitfield size is not 32, the base type must be int, and int type |
| 260 | ``BTF_INT_BITS()`` encodes the bitfield size. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 261 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 262 | If the ``kind_flag`` is set, the ``btf_member.offset`` contains both member |
| 263 | bitfield size and bit offset. The bitfield size and bit offset are calculated |
| 264 | as below.:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 265 | |
| 266 | #define BTF_MEMBER_BITFIELD_SIZE(val) ((val) >> 24) |
| 267 | #define BTF_MEMBER_BIT_OFFSET(val) ((val) & 0xffffff) |
| 268 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 269 | In this case, if the base type is an int type, it must be a regular int type: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 270 | |
| 271 | * ``BTF_INT_OFFSET()`` must be 0. |
| 272 | * ``BTF_INT_BITS()`` must be equal to ``{1,2,4,8,16} * 8``. |
| 273 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 274 | The following kernel patch introduced ``kind_flag`` and explained why both |
| 275 | modes exist: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 276 | |
| 277 | https://github.com/torvalds/linux/commit/9d5f9f701b1891466fb3dbb1806ad97716f95cc3#diff-fa650a64fdd3968396883d2fe8215ff3 |
| 278 | |
| 279 | 2.2.6 BTF_KIND_ENUM |
| 280 | ~~~~~~~~~~~~~~~~~~~ |
| 281 | |
| 282 | ``struct btf_type`` encoding requirement: |
| 283 | * ``name_off``: 0 or offset to a valid C identifier |
| 284 | * ``info.kind_flag``: 0 |
| 285 | * ``info.kind``: BTF_KIND_ENUM |
| 286 | * ``info.vlen``: number of enum values |
| 287 | * ``size``: 4 |
| 288 | |
| 289 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.:: |
| 290 | |
| 291 | struct btf_enum { |
| 292 | __u32 name_off; |
| 293 | __s32 val; |
| 294 | }; |
| 295 | |
| 296 | The ``btf_enum`` encoding: |
| 297 | * ``name_off``: offset to a valid C identifier |
| 298 | * ``val``: any value |
| 299 | |
| 300 | 2.2.7 BTF_KIND_FWD |
| 301 | ~~~~~~~~~~~~~~~~~~ |
| 302 | |
| 303 | ``struct btf_type`` encoding requirement: |
| 304 | * ``name_off``: offset to a valid C identifier |
| 305 | * ``info.kind_flag``: 0 for struct, 1 for union |
| 306 | * ``info.kind``: BTF_KIND_FWD |
| 307 | * ``info.vlen``: 0 |
| 308 | * ``type``: 0 |
| 309 | |
| 310 | No additional type data follow ``btf_type``. |
| 311 | |
| 312 | 2.2.8 BTF_KIND_TYPEDEF |
| 313 | ~~~~~~~~~~~~~~~~~~~~~~ |
| 314 | |
| 315 | ``struct btf_type`` encoding requirement: |
| 316 | * ``name_off``: offset to a valid C identifier |
| 317 | * ``info.kind_flag``: 0 |
| 318 | * ``info.kind``: BTF_KIND_TYPEDEF |
| 319 | * ``info.vlen``: 0 |
| 320 | * ``type``: the type which can be referred by name at ``name_off`` |
| 321 | |
| 322 | No additional type data follow ``btf_type``. |
| 323 | |
| 324 | 2.2.9 BTF_KIND_VOLATILE |
| 325 | ~~~~~~~~~~~~~~~~~~~~~~~ |
| 326 | |
| 327 | ``struct btf_type`` encoding requirement: |
| 328 | * ``name_off``: 0 |
| 329 | * ``info.kind_flag``: 0 |
| 330 | * ``info.kind``: BTF_KIND_VOLATILE |
| 331 | * ``info.vlen``: 0 |
| 332 | * ``type``: the type with ``volatile`` qualifier |
| 333 | |
| 334 | No additional type data follow ``btf_type``. |
| 335 | |
| 336 | 2.2.10 BTF_KIND_CONST |
| 337 | ~~~~~~~~~~~~~~~~~~~~~ |
| 338 | |
| 339 | ``struct btf_type`` encoding requirement: |
| 340 | * ``name_off``: 0 |
| 341 | * ``info.kind_flag``: 0 |
| 342 | * ``info.kind``: BTF_KIND_CONST |
| 343 | * ``info.vlen``: 0 |
| 344 | * ``type``: the type with ``const`` qualifier |
| 345 | |
| 346 | No additional type data follow ``btf_type``. |
| 347 | |
| 348 | 2.2.11 BTF_KIND_RESTRICT |
| 349 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 350 | |
| 351 | ``struct btf_type`` encoding requirement: |
| 352 | * ``name_off``: 0 |
| 353 | * ``info.kind_flag``: 0 |
| 354 | * ``info.kind``: BTF_KIND_RESTRICT |
| 355 | * ``info.vlen``: 0 |
| 356 | * ``type``: the type with ``restrict`` qualifier |
| 357 | |
| 358 | No additional type data follow ``btf_type``. |
| 359 | |
| 360 | 2.2.12 BTF_KIND_FUNC |
| 361 | ~~~~~~~~~~~~~~~~~~~~ |
| 362 | |
| 363 | ``struct btf_type`` encoding requirement: |
| 364 | * ``name_off``: offset to a valid C identifier |
| 365 | * ``info.kind_flag``: 0 |
| 366 | * ``info.kind``: BTF_KIND_FUNC |
| 367 | * ``info.vlen``: 0 |
| 368 | * ``type``: a BTF_KIND_FUNC_PROTO type |
| 369 | |
| 370 | No additional type data follow ``btf_type``. |
| 371 | |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 372 | A BTF_KIND_FUNC defines not a type, but a subprogram (function) whose |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 373 | signature is defined by ``type``. The subprogram is thus an instance of that |
| 374 | type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the |
| 375 | :ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load` |
| 376 | (ABI). |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 377 | |
| 378 | 2.2.13 BTF_KIND_FUNC_PROTO |
| 379 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 380 | |
| 381 | ``struct btf_type`` encoding requirement: |
| 382 | * ``name_off``: 0 |
| 383 | * ``info.kind_flag``: 0 |
| 384 | * ``info.kind``: BTF_KIND_FUNC_PROTO |
| 385 | * ``info.vlen``: # of parameters |
| 386 | * ``type``: the return type |
| 387 | |
| 388 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_param``.:: |
| 389 | |
| 390 | struct btf_param { |
| 391 | __u32 name_off; |
| 392 | __u32 type; |
| 393 | }; |
| 394 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 395 | If a BTF_KIND_FUNC_PROTO type is referred by a BTF_KIND_FUNC type, then |
| 396 | ``btf_param.name_off`` must point to a valid C identifier except for the |
| 397 | possible last argument representing the variable argument. The btf_param.type |
| 398 | refers to parameter type. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 399 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 400 | If the function has variable arguments, the last parameter is encoded with |
| 401 | ``name_off = 0`` and ``type = 0``. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 402 | |
Daniel Borkmann | f063c88 | 2019-04-09 23:20:08 +0200 | [diff] [blame] | 403 | 2.2.14 BTF_KIND_VAR |
| 404 | ~~~~~~~~~~~~~~~~~~~ |
| 405 | |
| 406 | ``struct btf_type`` encoding requirement: |
| 407 | * ``name_off``: offset to a valid C identifier |
| 408 | * ``info.kind_flag``: 0 |
| 409 | * ``info.kind``: BTF_KIND_VAR |
| 410 | * ``info.vlen``: 0 |
| 411 | * ``type``: the type of the variable |
| 412 | |
| 413 | ``btf_type`` is followed by a single ``struct btf_variable`` with the |
| 414 | following data:: |
| 415 | |
| 416 | struct btf_var { |
| 417 | __u32 linkage; |
| 418 | }; |
| 419 | |
| 420 | ``struct btf_var`` encoding: |
| 421 | * ``linkage``: currently only static variable 0, or globally allocated |
| 422 | variable in ELF sections 1 |
| 423 | |
| 424 | Not all type of global variables are supported by LLVM at this point. |
| 425 | The following is currently available: |
| 426 | |
| 427 | * static variables with or without section attributes |
| 428 | * global variables with section attributes |
| 429 | |
| 430 | The latter is for future extraction of map key/value type id's from a |
| 431 | map definition. |
| 432 | |
| 433 | 2.2.15 BTF_KIND_DATASEC |
| 434 | ~~~~~~~~~~~~~~~~~~~~~~~ |
| 435 | |
| 436 | ``struct btf_type`` encoding requirement: |
| 437 | * ``name_off``: offset to a valid name associated with a variable or |
| 438 | one of .data/.bss/.rodata |
| 439 | * ``info.kind_flag``: 0 |
| 440 | * ``info.kind``: BTF_KIND_DATASEC |
| 441 | * ``info.vlen``: # of variables |
| 442 | * ``size``: total section size in bytes (0 at compilation time, patched |
| 443 | to actual size by BPF loaders such as libbpf) |
| 444 | |
| 445 | ``btf_type`` is followed by ``info.vlen`` number of ``struct btf_var_secinfo``.:: |
| 446 | |
| 447 | struct btf_var_secinfo { |
| 448 | __u32 type; |
| 449 | __u32 offset; |
| 450 | __u32 size; |
| 451 | }; |
| 452 | |
| 453 | ``struct btf_var_secinfo`` encoding: |
| 454 | * ``type``: the type of the BTF_KIND_VAR variable |
| 455 | * ``offset``: the in-section offset of the variable |
| 456 | * ``size``: the size of the variable in bytes |
| 457 | |
Ilya Leoshkevich | 6be6a0b | 2021-02-26 21:22:56 +0100 | [diff] [blame] | 458 | 2.2.16 BTF_KIND_FLOAT |
| 459 | ~~~~~~~~~~~~~~~~~~~~~ |
| 460 | |
| 461 | ``struct btf_type`` encoding requirement: |
| 462 | * ``name_off``: any valid offset |
| 463 | * ``info.kind_flag``: 0 |
| 464 | * ``info.kind``: BTF_KIND_FLOAT |
| 465 | * ``info.vlen``: 0 |
| 466 | * ``size``: the size of the float type in bytes: 2, 4, 8, 12 or 16. |
| 467 | |
| 468 | No additional type data follow ``btf_type``. |
| 469 | |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 470 | 2.2.17 BTF_KIND_DECL_TAG |
| 471 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 472 | |
| 473 | ``struct btf_type`` encoding requirement: |
| 474 | * ``name_off``: offset to a non-empty string |
| 475 | * ``info.kind_flag``: 0 |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 476 | * ``info.kind``: BTF_KIND_DECL_TAG |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 477 | * ``info.vlen``: 0 |
Yonghong Song | 5a86713 | 2021-10-21 12:56:49 -0700 | [diff] [blame] | 478 | * ``type``: ``struct``, ``union``, ``func``, ``var`` or ``typedef`` |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 479 | |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 480 | ``btf_type`` is followed by ``struct btf_decl_tag``.:: |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 481 | |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 482 | struct btf_decl_tag { |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 483 | __u32 component_idx; |
| 484 | }; |
| 485 | |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 486 | The ``name_off`` encodes btf_decl_tag attribute string. |
Yonghong Song | 5a86713 | 2021-10-21 12:56:49 -0700 | [diff] [blame] | 487 | The ``type`` should be ``struct``, ``union``, ``func``, ``var`` or ``typedef``. |
| 488 | For ``var`` or ``typedef`` type, ``btf_decl_tag.component_idx`` must be ``-1``. |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 489 | For the other three types, if the btf_decl_tag attribute is |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 490 | applied to the ``struct``, ``union`` or ``func`` itself, |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 491 | ``btf_decl_tag.component_idx`` must be ``-1``. Otherwise, |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 492 | the attribute is applied to a ``struct``/``union`` member or |
Yonghong Song | 223f903 | 2021-10-12 09:48:38 -0700 | [diff] [blame] | 493 | a ``func`` argument, and ``btf_decl_tag.component_idx`` should be a |
Yonghong Song | 48f5a6c | 2021-09-14 15:31:03 -0700 | [diff] [blame] | 494 | valid index (starting from 0) pointing to a member or an argument. |
| 495 | |
Yonghong Song | d52f5c6 | 2021-11-11 17:26:56 -0800 | [diff] [blame] | 496 | 2.2.17 BTF_KIND_TYPE_TAG |
| 497 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 498 | |
| 499 | ``struct btf_type`` encoding requirement: |
| 500 | * ``name_off``: offset to a non-empty string |
| 501 | * ``info.kind_flag``: 0 |
| 502 | * ``info.kind``: BTF_KIND_TYPE_TAG |
| 503 | * ``info.vlen``: 0 |
| 504 | * ``type``: the type with ``btf_type_tag`` attribute |
| 505 | |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 506 | 3. BTF Kernel API |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 507 | ================= |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 508 | |
| 509 | The following bpf syscall command involves BTF: |
| 510 | * BPF_BTF_LOAD: load a blob of BTF data into kernel |
| 511 | * BPF_MAP_CREATE: map creation with btf key and value type info. |
| 512 | * BPF_PROG_LOAD: prog load with btf function and line info. |
| 513 | * BPF_BTF_GET_FD_BY_ID: get a btf fd |
| 514 | * BPF_OBJ_GET_INFO_BY_FD: btf, func_info, line_info |
| 515 | and other btf related info are returned. |
| 516 | |
| 517 | The workflow typically looks like: |
| 518 | :: |
| 519 | |
| 520 | Application: |
| 521 | BPF_BTF_LOAD |
| 522 | | |
| 523 | v |
| 524 | BPF_MAP_CREATE and BPF_PROG_LOAD |
| 525 | | |
| 526 | V |
| 527 | ...... |
| 528 | |
| 529 | Introspection tool: |
| 530 | ...... |
| 531 | BPF_{PROG,MAP}_GET_NEXT_ID (get prog/map id's) |
| 532 | | |
| 533 | V |
| 534 | BPF_{PROG,MAP}_GET_FD_BY_ID (get a prog/map fd) |
| 535 | | |
| 536 | V |
| 537 | BPF_OBJ_GET_INFO_BY_FD (get bpf_prog_info/bpf_map_info with btf_id) |
| 538 | | | |
| 539 | V | |
| 540 | BPF_BTF_GET_FD_BY_ID (get btf_fd) | |
| 541 | | | |
| 542 | V | |
| 543 | BPF_OBJ_GET_INFO_BY_FD (get btf) | |
| 544 | | | |
| 545 | V V |
| 546 | pretty print types, dump func signatures and line info, etc. |
| 547 | |
| 548 | |
| 549 | 3.1 BPF_BTF_LOAD |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 550 | ---------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 551 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 552 | Load a blob of BTF data into kernel. A blob of data, described in |
| 553 | :ref:`BTF_Type_String`, can be directly loaded into the kernel. A ``btf_fd`` |
| 554 | is returned to a userspace. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 555 | |
| 556 | 3.2 BPF_MAP_CREATE |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 557 | ------------------ |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 558 | |
| 559 | A map can be created with ``btf_fd`` and specified key/value type id.:: |
| 560 | |
| 561 | __u32 btf_fd; /* fd pointing to a BTF type data */ |
| 562 | __u32 btf_key_type_id; /* BTF type_id of the key */ |
| 563 | __u32 btf_value_type_id; /* BTF type_id of the value */ |
| 564 | |
| 565 | In libbpf, the map can be defined with extra annotation like below: |
| 566 | :: |
| 567 | |
| 568 | struct bpf_map_def SEC("maps") btf_map = { |
| 569 | .type = BPF_MAP_TYPE_ARRAY, |
| 570 | .key_size = sizeof(int), |
| 571 | .value_size = sizeof(struct ipv_counts), |
| 572 | .max_entries = 4, |
| 573 | }; |
| 574 | BPF_ANNOTATE_KV_PAIR(btf_map, int, struct ipv_counts); |
| 575 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 576 | Here, the parameters for macro BPF_ANNOTATE_KV_PAIR are map name, key and |
| 577 | value types for the map. During ELF parsing, libbpf is able to extract |
| 578 | key/value type_id's and assign them to BPF_MAP_CREATE attributes |
| 579 | automatically. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 580 | |
| 581 | .. _BPF_Prog_Load: |
| 582 | |
| 583 | 3.3 BPF_PROG_LOAD |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 584 | ----------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 585 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 586 | During prog_load, func_info and line_info can be passed to kernel with proper |
| 587 | values for the following attributes: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 588 | :: |
| 589 | |
| 590 | __u32 insn_cnt; |
| 591 | __aligned_u64 insns; |
| 592 | ...... |
| 593 | __u32 prog_btf_fd; /* fd pointing to BTF type data */ |
| 594 | __u32 func_info_rec_size; /* userspace bpf_func_info size */ |
| 595 | __aligned_u64 func_info; /* func info */ |
| 596 | __u32 func_info_cnt; /* number of bpf_func_info records */ |
| 597 | __u32 line_info_rec_size; /* userspace bpf_line_info size */ |
| 598 | __aligned_u64 line_info; /* line info */ |
| 599 | __u32 line_info_cnt; /* number of bpf_line_info records */ |
| 600 | |
| 601 | The func_info and line_info are an array of below, respectively.:: |
| 602 | |
| 603 | struct bpf_func_info { |
| 604 | __u32 insn_off; /* [0, insn_cnt - 1] */ |
| 605 | __u32 type_id; /* pointing to a BTF_KIND_FUNC type */ |
| 606 | }; |
| 607 | struct bpf_line_info { |
| 608 | __u32 insn_off; /* [0, insn_cnt - 1] */ |
| 609 | __u32 file_name_off; /* offset to string table for the filename */ |
| 610 | __u32 line_off; /* offset to string table for the source line */ |
| 611 | __u32 line_col; /* line number and column number */ |
| 612 | }; |
| 613 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 614 | func_info_rec_size is the size of each func_info record, and |
| 615 | line_info_rec_size is the size of each line_info record. Passing the record |
| 616 | size to kernel make it possible to extend the record itself in the future. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 617 | |
| 618 | Below are requirements for func_info: |
| 619 | * func_info[0].insn_off must be 0. |
| 620 | * the func_info insn_off is in strictly increasing order and matches |
| 621 | bpf func boundaries. |
| 622 | |
| 623 | Below are requirements for line_info: |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 624 | * the first insn in each func must have a line_info record pointing to it. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 625 | * the line_info insn_off is in strictly increasing order. |
| 626 | |
| 627 | For line_info, the line number and column number are defined as below: |
| 628 | :: |
| 629 | |
| 630 | #define BPF_LINE_INFO_LINE_NUM(line_col) ((line_col) >> 10) |
| 631 | #define BPF_LINE_INFO_LINE_COL(line_col) ((line_col) & 0x3ff) |
| 632 | |
| 633 | 3.4 BPF_{PROG,MAP}_GET_NEXT_ID |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 634 | ------------------------------ |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 635 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 636 | In kernel, every loaded program, map or btf has a unique id. The id won't |
| 637 | change during the lifetime of a program, map, or btf. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 638 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 639 | The bpf syscall command BPF_{PROG,MAP}_GET_NEXT_ID returns all id's, one for |
| 640 | each command, to user space, for bpf program or maps, respectively, so an |
| 641 | inspection tool can inspect all programs and maps. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 642 | |
| 643 | 3.5 BPF_{PROG,MAP}_GET_FD_BY_ID |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 644 | ------------------------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 645 | |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 646 | An introspection tool cannot use id to get details about program or maps. |
| 647 | A file descriptor needs to be obtained first for reference-counting purpose. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 648 | |
| 649 | 3.6 BPF_OBJ_GET_INFO_BY_FD |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 650 | -------------------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 651 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 652 | Once a program/map fd is acquired, an introspection tool can get the detailed |
| 653 | information from kernel about this fd, some of which are BTF-related. For |
| 654 | example, ``bpf_map_info`` returns ``btf_id`` and key/value type ids. |
| 655 | ``bpf_prog_info`` returns ``btf_id``, func_info, and line info for translated |
| 656 | bpf byte codes, and jited_line_info. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 657 | |
| 658 | 3.7 BPF_BTF_GET_FD_BY_ID |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 659 | ------------------------ |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 660 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 661 | With ``btf_id`` obtained in ``bpf_map_info`` and ``bpf_prog_info``, bpf |
| 662 | syscall command BPF_BTF_GET_FD_BY_ID can retrieve a btf fd. Then, with |
| 663 | command BPF_OBJ_GET_INFO_BY_FD, the btf blob, originally loaded into the |
| 664 | kernel with BPF_BTF_LOAD, can be retrieved. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 665 | |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 666 | With the btf blob, ``bpf_map_info``, and ``bpf_prog_info``, an introspection |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 667 | tool has full btf knowledge and is able to pretty print map key/values, dump |
| 668 | func signatures and line info, along with byte/jit codes. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 669 | |
| 670 | 4. ELF File Format Interface |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 671 | ============================ |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 672 | |
| 673 | 4.1 .BTF section |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 674 | ---------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 675 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 676 | The .BTF section contains type and string data. The format of this section is |
| 677 | same as the one describe in :ref:`BTF_Type_String`. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 678 | |
| 679 | .. _BTF_Ext_Section: |
| 680 | |
| 681 | 4.2 .BTF.ext section |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 682 | -------------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 683 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 684 | The .BTF.ext section encodes func_info and line_info which needs loader |
| 685 | manipulation before loading into the kernel. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 686 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 687 | The specification for .BTF.ext section is defined at ``tools/lib/bpf/btf.h`` |
| 688 | and ``tools/lib/bpf/btf.c``. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 689 | |
| 690 | The current header of .BTF.ext section:: |
| 691 | |
| 692 | struct btf_ext_header { |
| 693 | __u16 magic; |
| 694 | __u8 version; |
| 695 | __u8 flags; |
| 696 | __u32 hdr_len; |
| 697 | |
| 698 | /* All offsets are in bytes relative to the end of this header */ |
| 699 | __u32 func_info_off; |
| 700 | __u32 func_info_len; |
| 701 | __u32 line_info_off; |
| 702 | __u32 line_info_len; |
| 703 | }; |
| 704 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 705 | It is very similar to .BTF section. Instead of type/string section, it |
| 706 | contains func_info and line_info section. See :ref:`BPF_Prog_Load` for details |
| 707 | about func_info and line_info record format. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 708 | |
| 709 | The func_info is organized as below.:: |
| 710 | |
| 711 | func_info_rec_size |
| 712 | btf_ext_info_sec for section #1 /* func_info for section #1 */ |
| 713 | btf_ext_info_sec for section #2 /* func_info for section #2 */ |
| 714 | ... |
| 715 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 716 | ``func_info_rec_size`` specifies the size of ``bpf_func_info`` structure when |
| 717 | .BTF.ext is generated. ``btf_ext_info_sec``, defined below, is a collection of |
| 718 | func_info for each specific ELF section.:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 719 | |
| 720 | struct btf_ext_info_sec { |
| 721 | __u32 sec_name_off; /* offset to section name */ |
| 722 | __u32 num_info; |
| 723 | /* Followed by num_info * record_size number of bytes */ |
| 724 | __u8 data[0]; |
| 725 | }; |
| 726 | |
| 727 | Here, num_info must be greater than 0. |
| 728 | |
| 729 | The line_info is organized as below.:: |
| 730 | |
| 731 | line_info_rec_size |
| 732 | btf_ext_info_sec for section #1 /* line_info for section #1 */ |
| 733 | btf_ext_info_sec for section #2 /* line_info for section #2 */ |
| 734 | ... |
| 735 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 736 | ``line_info_rec_size`` specifies the size of ``bpf_line_info`` structure when |
| 737 | .BTF.ext is generated. |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 738 | |
| 739 | The interpretation of ``bpf_func_info->insn_off`` and |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 740 | ``bpf_line_info->insn_off`` is different between kernel API and ELF API. For |
| 741 | kernel API, the ``insn_off`` is the instruction offset in the unit of ``struct |
| 742 | bpf_insn``. For ELF API, the ``insn_off`` is the byte offset from the |
| 743 | beginning of section (``btf_ext_info_sec->sec_name_off``). |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 744 | |
Jiri Olsa | 232ce4b | 2020-07-11 23:53:27 +0200 | [diff] [blame] | 745 | 4.2 .BTF_ids section |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 746 | -------------------- |
Jiri Olsa | 232ce4b | 2020-07-11 23:53:27 +0200 | [diff] [blame] | 747 | |
| 748 | The .BTF_ids section encodes BTF ID values that are used within the kernel. |
| 749 | |
| 750 | This section is created during the kernel compilation with the help of |
| 751 | macros defined in ``include/linux/btf_ids.h`` header file. Kernel code can |
| 752 | use them to create lists and sets (sorted lists) of BTF ID values. |
| 753 | |
| 754 | The ``BTF_ID_LIST`` and ``BTF_ID`` macros define unsorted list of BTF ID values, |
| 755 | with following syntax:: |
| 756 | |
| 757 | BTF_ID_LIST(list) |
| 758 | BTF_ID(type1, name1) |
| 759 | BTF_ID(type2, name2) |
| 760 | |
| 761 | resulting in following layout in .BTF_ids section:: |
| 762 | |
| 763 | __BTF_ID__type1__name1__1: |
| 764 | .zero 4 |
| 765 | __BTF_ID__type2__name2__2: |
| 766 | .zero 4 |
| 767 | |
| 768 | The ``u32 list[];`` variable is defined to access the list. |
| 769 | |
| 770 | The ``BTF_ID_UNUSED`` macro defines 4 zero bytes. It's used when we |
| 771 | want to define unused entry in BTF_ID_LIST, like:: |
| 772 | |
| 773 | BTF_ID_LIST(bpf_skb_output_btf_ids) |
| 774 | BTF_ID(struct, sk_buff) |
| 775 | BTF_ID_UNUSED |
| 776 | BTF_ID(struct, task_struct) |
| 777 | |
Jiri Olsa | 68a26bc | 2020-08-25 21:21:21 +0200 | [diff] [blame] | 778 | The ``BTF_SET_START/END`` macros pair defines sorted list of BTF ID values |
| 779 | and their count, with following syntax:: |
| 780 | |
| 781 | BTF_SET_START(set) |
| 782 | BTF_ID(type1, name1) |
| 783 | BTF_ID(type2, name2) |
| 784 | BTF_SET_END(set) |
| 785 | |
| 786 | resulting in following layout in .BTF_ids section:: |
| 787 | |
| 788 | __BTF_ID__set__set: |
| 789 | .zero 4 |
| 790 | __BTF_ID__type1__name1__3: |
| 791 | .zero 4 |
| 792 | __BTF_ID__type2__name2__4: |
| 793 | .zero 4 |
| 794 | |
| 795 | The ``struct btf_id_set set;`` variable is defined to access the list. |
| 796 | |
| 797 | The ``typeX`` name can be one of following:: |
| 798 | |
| 799 | struct, union, typedef, func |
| 800 | |
| 801 | and is used as a filter when resolving the BTF ID value. |
| 802 | |
Jiri Olsa | 232ce4b | 2020-07-11 23:53:27 +0200 | [diff] [blame] | 803 | All the BTF ID lists and sets are compiled in the .BTF_ids section and |
| 804 | resolved during the linking phase of kernel build by ``resolve_btfids`` tool. |
| 805 | |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 806 | 5. Using BTF |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 807 | ============ |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 808 | |
| 809 | 5.1 bpftool map pretty print |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 810 | ---------------------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 811 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 812 | With BTF, the map key/value can be printed based on fields rather than simply |
| 813 | raw bytes. This is especially valuable for large structure or if your data |
| 814 | structure has bitfields. For example, for the following map,:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 815 | |
| 816 | enum A { A1, A2, A3, A4, A5 }; |
| 817 | typedef enum A ___A; |
| 818 | struct tmp_t { |
| 819 | char a1:4; |
| 820 | int a2:4; |
| 821 | int :4; |
| 822 | __u32 a3:4; |
| 823 | int b; |
| 824 | ___A b1:4; |
| 825 | enum A b2:4; |
| 826 | }; |
| 827 | struct bpf_map_def SEC("maps") tmpmap = { |
| 828 | .type = BPF_MAP_TYPE_ARRAY, |
| 829 | .key_size = sizeof(__u32), |
| 830 | .value_size = sizeof(struct tmp_t), |
| 831 | .max_entries = 1, |
| 832 | }; |
| 833 | BPF_ANNOTATE_KV_PAIR(tmpmap, int, struct tmp_t); |
| 834 | |
| 835 | bpftool is able to pretty print like below: |
| 836 | :: |
| 837 | |
| 838 | [{ |
| 839 | "key": 0, |
| 840 | "value": { |
| 841 | "a1": 0x2, |
| 842 | "a2": 0x4, |
| 843 | "a3": 0x6, |
| 844 | "b": 7, |
| 845 | "b1": 0x8, |
| 846 | "b2": 0xa |
| 847 | } |
| 848 | } |
| 849 | ] |
| 850 | |
| 851 | 5.2 bpftool prog dump |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 852 | --------------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 853 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 854 | The following is an example showing how func_info and line_info can help prog |
| 855 | dump with better kernel symbol names, function prototypes and line |
| 856 | information.:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 857 | |
| 858 | $ bpftool prog dump jited pinned /sys/fs/bpf/test_btf_haskv |
| 859 | [...] |
| 860 | int test_long_fname_2(struct dummy_tracepoint_args * arg): |
| 861 | bpf_prog_44a040bf25481309_test_long_fname_2: |
| 862 | ; static int test_long_fname_2(struct dummy_tracepoint_args *arg) |
| 863 | 0: push %rbp |
| 864 | 1: mov %rsp,%rbp |
| 865 | 4: sub $0x30,%rsp |
| 866 | b: sub $0x28,%rbp |
| 867 | f: mov %rbx,0x0(%rbp) |
| 868 | 13: mov %r13,0x8(%rbp) |
| 869 | 17: mov %r14,0x10(%rbp) |
| 870 | 1b: mov %r15,0x18(%rbp) |
| 871 | 1f: xor %eax,%eax |
| 872 | 21: mov %rax,0x20(%rbp) |
| 873 | 25: xor %esi,%esi |
| 874 | ; int key = 0; |
| 875 | 27: mov %esi,-0x4(%rbp) |
| 876 | ; if (!arg->sock) |
| 877 | 2a: mov 0x8(%rdi),%rdi |
| 878 | ; if (!arg->sock) |
| 879 | 2e: cmp $0x0,%rdi |
| 880 | 32: je 0x0000000000000070 |
| 881 | 34: mov %rbp,%rsi |
| 882 | ; counts = bpf_map_lookup_elem(&btf_map, &key); |
| 883 | [...] |
| 884 | |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 885 | 5.3 Verifier Log |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 886 | ---------------- |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 887 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 888 | The following is an example of how line_info can help debugging verification |
| 889 | failure.:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 890 | |
| 891 | /* The code at tools/testing/selftests/bpf/test_xdp_noinline.c |
| 892 | * is modified as below. |
| 893 | */ |
| 894 | data = (void *)(long)xdp->data; |
| 895 | data_end = (void *)(long)xdp->data_end; |
| 896 | /* |
| 897 | if (data + 4 > data_end) |
| 898 | return XDP_DROP; |
| 899 | */ |
| 900 | *(u32 *)data = dst->dst; |
| 901 | |
| 902 | $ bpftool prog load ./test_xdp_noinline.o /sys/fs/bpf/test_xdp_noinline type xdp |
| 903 | ; data = (void *)(long)xdp->data; |
| 904 | 224: (79) r2 = *(u64 *)(r10 -112) |
| 905 | 225: (61) r2 = *(u32 *)(r2 +0) |
| 906 | ; *(u32 *)data = dst->dst; |
| 907 | 226: (63) *(u32 *)(r2 +0) = r1 |
| 908 | invalid access to packet, off=0 size=4, R2(id=0,off=0,r=0) |
| 909 | R2 offset is outside of the packet |
| 910 | |
| 911 | 6. BTF Generation |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 912 | ================= |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 913 | |
| 914 | You need latest pahole |
| 915 | |
| 916 | https://git.kernel.org/pub/scm/devel/pahole/pahole.git/ |
| 917 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 918 | or llvm (8.0 or later). The pahole acts as a dwarf2btf converter. It doesn't |
| 919 | support .BTF.ext and btf BTF_KIND_FUNC type yet. For example,:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 920 | |
| 921 | -bash-4.4$ cat t.c |
| 922 | struct t { |
| 923 | int a:2; |
| 924 | int b:3; |
| 925 | int c:2; |
| 926 | } g; |
| 927 | -bash-4.4$ gcc -c -O2 -g t.c |
| 928 | -bash-4.4$ pahole -JV t.o |
| 929 | File t.o: |
| 930 | [1] STRUCT t kind_flag=1 size=4 vlen=3 |
| 931 | a type_id=2 bitfield_size=2 bits_offset=0 |
| 932 | b type_id=2 bitfield_size=3 bits_offset=2 |
| 933 | c type_id=2 bitfield_size=2 bits_offset=5 |
| 934 | [2] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED |
| 935 | |
Andrii Nakryiko | 9ab5305 | 2019-02-28 17:12:20 -0800 | [diff] [blame] | 936 | The llvm is able to generate .BTF and .BTF.ext directly with -g for bpf target |
| 937 | only. The assembly code (-S) is able to show the BTF encoding in assembly |
| 938 | format.:: |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 939 | |
| 940 | -bash-4.4$ cat t2.c |
| 941 | typedef int __int32; |
| 942 | struct t2 { |
| 943 | int a2; |
| 944 | int (*f2)(char q1, __int32 q2, ...); |
| 945 | int (*f3)(); |
| 946 | } g2; |
| 947 | int main() { return 0; } |
| 948 | int test() { return 0; } |
| 949 | -bash-4.4$ clang -c -g -O2 -target bpf t2.c |
| 950 | -bash-4.4$ readelf -S t2.o |
| 951 | ...... |
| 952 | [ 8] .BTF PROGBITS 0000000000000000 00000247 |
| 953 | 000000000000016e 0000000000000000 0 0 1 |
| 954 | [ 9] .BTF.ext PROGBITS 0000000000000000 000003b5 |
| 955 | 0000000000000060 0000000000000000 0 0 1 |
| 956 | [10] .rel.BTF.ext REL 0000000000000000 000007e0 |
| 957 | 0000000000000040 0000000000000010 16 9 8 |
| 958 | ...... |
| 959 | -bash-4.4$ clang -S -g -O2 -target bpf t2.c |
| 960 | -bash-4.4$ cat t2.s |
| 961 | ...... |
| 962 | .section .BTF,"",@progbits |
| 963 | .short 60319 # 0xeb9f |
| 964 | .byte 1 |
| 965 | .byte 0 |
| 966 | .long 24 |
| 967 | .long 0 |
| 968 | .long 220 |
| 969 | .long 220 |
| 970 | .long 122 |
| 971 | .long 0 # BTF_KIND_FUNC_PROTO(id = 1) |
| 972 | .long 218103808 # 0xd000000 |
| 973 | .long 2 |
| 974 | .long 83 # BTF_KIND_INT(id = 2) |
| 975 | .long 16777216 # 0x1000000 |
| 976 | .long 4 |
| 977 | .long 16777248 # 0x1000020 |
| 978 | ...... |
| 979 | .byte 0 # string offset=0 |
| 980 | .ascii ".text" # string offset=1 |
| 981 | .byte 0 |
| 982 | .ascii "/home/yhs/tmp-pahole/t2.c" # string offset=7 |
| 983 | .byte 0 |
| 984 | .ascii "int main() { return 0; }" # string offset=33 |
| 985 | .byte 0 |
| 986 | .ascii "int test() { return 0; }" # string offset=58 |
| 987 | .byte 0 |
| 988 | .ascii "int" # string offset=83 |
| 989 | ...... |
| 990 | .section .BTF.ext,"",@progbits |
| 991 | .short 60319 # 0xeb9f |
| 992 | .byte 1 |
| 993 | .byte 0 |
| 994 | .long 24 |
| 995 | .long 0 |
| 996 | .long 28 |
| 997 | .long 28 |
| 998 | .long 44 |
| 999 | .long 8 # FuncInfo |
| 1000 | .long 1 # FuncInfo section string offset=1 |
| 1001 | .long 2 |
| 1002 | .long .Lfunc_begin0 |
| 1003 | .long 3 |
| 1004 | .long .Lfunc_begin1 |
| 1005 | .long 5 |
| 1006 | .long 16 # LineInfo |
| 1007 | .long 1 # LineInfo section string offset=1 |
| 1008 | .long 2 |
| 1009 | .long .Ltmp0 |
| 1010 | .long 7 |
| 1011 | .long 33 |
| 1012 | .long 7182 # Line 7 Col 14 |
| 1013 | .long .Ltmp3 |
| 1014 | .long 7 |
| 1015 | .long 58 |
| 1016 | .long 8206 # Line 8 Col 14 |
| 1017 | |
| 1018 | 7. Testing |
Dave Tucker | 3ff36bf | 2021-11-12 21:17:22 +0000 | [diff] [blame] | 1019 | ========== |
Yonghong Song | ffcf7ce | 2019-01-18 13:56:49 -0800 | [diff] [blame] | 1020 | |
Andrii Nakryiko | 5efc529 | 2019-02-28 17:12:19 -0800 | [diff] [blame] | 1021 | Kernel bpf selftest `test_btf.c` provides extensive set of BTF-related tests. |