Christoph Hellwig | 88691e9 | 2021-11-19 17:32:15 +0100 | [diff] [blame] | 1 | |
| 2 | ============= |
| 3 | eBPF verifier |
| 4 | ============= |
| 5 | |
| 6 | The safety of the eBPF program is determined in two steps. |
| 7 | |
| 8 | First step does DAG check to disallow loops and other CFG validation. |
| 9 | In particular it will detect programs that have unreachable instructions. |
| 10 | (though classic BPF checker allows them) |
| 11 | |
| 12 | Second step starts from the first insn and descends all possible paths. |
| 13 | It simulates execution of every insn and observes the state change of |
| 14 | registers and stack. |
| 15 | |
| 16 | At the start of the program the register R1 contains a pointer to context |
| 17 | and has type PTR_TO_CTX. |
| 18 | If verifier sees an insn that does R2=R1, then R2 has now type |
| 19 | PTR_TO_CTX as well and can be used on the right hand side of expression. |
| 20 | If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=SCALAR_VALUE, |
| 21 | since addition of two valid pointers makes invalid pointer. |
| 22 | (In 'secure' mode verifier will reject any type of pointer arithmetic to make |
| 23 | sure that kernel addresses don't leak to unprivileged users) |
| 24 | |
| 25 | If register was never written to, it's not readable:: |
| 26 | |
| 27 | bpf_mov R0 = R2 |
| 28 | bpf_exit |
| 29 | |
| 30 | will be rejected, since R2 is unreadable at the start of the program. |
| 31 | |
| 32 | After kernel function call, R1-R5 are reset to unreadable and |
| 33 | R0 has a return type of the function. |
| 34 | |
| 35 | Since R6-R9 are callee saved, their state is preserved across the call. |
| 36 | |
| 37 | :: |
| 38 | |
| 39 | bpf_mov R6 = 1 |
| 40 | bpf_call foo |
| 41 | bpf_mov R0 = R6 |
| 42 | bpf_exit |
| 43 | |
| 44 | is a correct program. If there was R1 instead of R6, it would have |
| 45 | been rejected. |
| 46 | |
| 47 | load/store instructions are allowed only with registers of valid types, which |
| 48 | are PTR_TO_CTX, PTR_TO_MAP, PTR_TO_STACK. They are bounds and alignment checked. |
| 49 | For example:: |
| 50 | |
| 51 | bpf_mov R1 = 1 |
| 52 | bpf_mov R2 = 2 |
| 53 | bpf_xadd *(u32 *)(R1 + 3) += R2 |
| 54 | bpf_exit |
| 55 | |
| 56 | will be rejected, since R1 doesn't have a valid pointer type at the time of |
| 57 | execution of instruction bpf_xadd. |
| 58 | |
| 59 | At the start R1 type is PTR_TO_CTX (a pointer to generic ``struct bpf_context``) |
| 60 | A callback is used to customize verifier to restrict eBPF program access to only |
| 61 | certain fields within ctx structure with specified size and alignment. |
| 62 | |
| 63 | For example, the following insn:: |
| 64 | |
| 65 | bpf_ld R0 = *(u32 *)(R6 + 8) |
| 66 | |
| 67 | intends to load a word from address R6 + 8 and store it into R0 |
| 68 | If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know |
| 69 | that offset 8 of size 4 bytes can be accessed for reading, otherwise |
| 70 | the verifier will reject the program. |
| 71 | If R6=PTR_TO_STACK, then access should be aligned and be within |
| 72 | stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8, |
| 73 | so it will fail verification, since it's out of bounds. |
| 74 | |
| 75 | The verifier will allow eBPF program to read data from stack only after |
| 76 | it wrote into it. |
| 77 | |
| 78 | Classic BPF verifier does similar check with M[0-15] memory slots. |
| 79 | For example:: |
| 80 | |
| 81 | bpf_ld R0 = *(u32 *)(R10 - 4) |
| 82 | bpf_exit |
| 83 | |
| 84 | is invalid program. |
| 85 | Though R10 is correct read-only register and has type PTR_TO_STACK |
| 86 | and R10 - 4 is within stack bounds, there were no stores into that location. |
| 87 | |
| 88 | Pointer register spill/fill is tracked as well, since four (R6-R9) |
| 89 | callee saved registers may not be enough for some programs. |
| 90 | |
| 91 | Allowed function calls are customized with bpf_verifier_ops->get_func_proto() |
| 92 | The eBPF verifier will check that registers match argument constraints. |
| 93 | After the call register R0 will be set to return type of the function. |
| 94 | |
| 95 | Function calls is a main mechanism to extend functionality of eBPF programs. |
| 96 | Socket filters may let programs to call one set of functions, whereas tracing |
| 97 | filters may allow completely different set. |
| 98 | |
| 99 | If a function made accessible to eBPF program, it needs to be thought through |
| 100 | from safety point of view. The verifier will guarantee that the function is |
| 101 | called with valid arguments. |
| 102 | |
| 103 | seccomp vs socket filters have different security restrictions for classic BPF. |
| 104 | Seccomp solves this by two stage verifier: classic BPF verifier is followed |
| 105 | by seccomp verifier. In case of eBPF one configurable verifier is shared for |
| 106 | all use cases. |
| 107 | |
| 108 | See details of eBPF verifier in kernel/bpf/verifier.c |
| 109 | |
| 110 | Register value tracking |
| 111 | ======================= |
| 112 | |
| 113 | In order to determine the safety of an eBPF program, the verifier must track |
| 114 | the range of possible values in each register and also in each stack slot. |
| 115 | This is done with ``struct bpf_reg_state``, defined in include/linux/ |
| 116 | bpf_verifier.h, which unifies tracking of scalar and pointer values. Each |
| 117 | register state has a type, which is either NOT_INIT (the register has not been |
| 118 | written to), SCALAR_VALUE (some value which is not usable as a pointer), or a |
| 119 | pointer type. The types of pointers describe their base, as follows: |
| 120 | |
| 121 | |
| 122 | PTR_TO_CTX |
| 123 | Pointer to bpf_context. |
| 124 | CONST_PTR_TO_MAP |
| 125 | Pointer to struct bpf_map. "Const" because arithmetic |
| 126 | on these pointers is forbidden. |
| 127 | PTR_TO_MAP_VALUE |
| 128 | Pointer to the value stored in a map element. |
| 129 | PTR_TO_MAP_VALUE_OR_NULL |
| 130 | Either a pointer to a map value, or NULL; map accesses |
| 131 | (see maps.rst) return this type, which becomes a |
| 132 | PTR_TO_MAP_VALUE when checked != NULL. Arithmetic on |
| 133 | these pointers is forbidden. |
| 134 | PTR_TO_STACK |
| 135 | Frame pointer. |
| 136 | PTR_TO_PACKET |
| 137 | skb->data. |
| 138 | PTR_TO_PACKET_END |
| 139 | skb->data + headlen; arithmetic forbidden. |
| 140 | PTR_TO_SOCKET |
| 141 | Pointer to struct bpf_sock_ops, implicitly refcounted. |
| 142 | PTR_TO_SOCKET_OR_NULL |
| 143 | Either a pointer to a socket, or NULL; socket lookup |
| 144 | returns this type, which becomes a PTR_TO_SOCKET when |
| 145 | checked != NULL. PTR_TO_SOCKET is reference-counted, |
| 146 | so programs must release the reference through the |
| 147 | socket release function before the end of the program. |
| 148 | Arithmetic on these pointers is forbidden. |
| 149 | |
| 150 | However, a pointer may be offset from this base (as a result of pointer |
| 151 | arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable |
| 152 | offset'. The former is used when an exactly-known value (e.g. an immediate |
| 153 | operand) is added to a pointer, while the latter is used for values which are |
| 154 | not exactly known. The variable offset is also used in SCALAR_VALUEs, to track |
| 155 | the range of possible values in the register. |
| 156 | |
| 157 | The verifier's knowledge about the variable offset consists of: |
| 158 | |
| 159 | * minimum and maximum values as unsigned |
| 160 | * minimum and maximum values as signed |
| 161 | |
| 162 | * knowledge of the values of individual bits, in the form of a 'tnum': a u64 |
| 163 | 'mask' and a u64 'value'. 1s in the mask represent bits whose value is unknown; |
| 164 | 1s in the value represent bits known to be 1. Bits known to be 0 have 0 in both |
| 165 | mask and value; no bit should ever be 1 in both. For example, if a byte is read |
| 166 | into a register from memory, the register's top 56 bits are known zero, while |
| 167 | the low 8 are unknown - which is represented as the tnum (0x0; 0xff). If we |
| 168 | then OR this with 0x40, we get (0x40; 0xbf), then if we add 1 we get (0x0; |
| 169 | 0x1ff), because of potential carries. |
| 170 | |
| 171 | Besides arithmetic, the register state can also be updated by conditional |
| 172 | branches. For instance, if a SCALAR_VALUE is compared > 8, in the 'true' branch |
| 173 | it will have a umin_value (unsigned minimum value) of 9, whereas in the 'false' |
| 174 | branch it will have a umax_value of 8. A signed compare (with BPF_JSGT or |
| 175 | BPF_JSGE) would instead update the signed minimum/maximum values. Information |
| 176 | from the signed and unsigned bounds can be combined; for instance if a value is |
| 177 | first tested < 8 and then tested s> 4, the verifier will conclude that the value |
| 178 | is also > 4 and s< 8, since the bounds prevent crossing the sign boundary. |
| 179 | |
| 180 | PTR_TO_PACKETs with a variable offset part have an 'id', which is common to all |
| 181 | pointers sharing that same variable offset. This is important for packet range |
| 182 | checks: after adding a variable to a packet pointer register A, if you then copy |
| 183 | it to another register B and then add a constant 4 to A, both registers will |
| 184 | share the same 'id' but the A will have a fixed offset of +4. Then if A is |
| 185 | bounds-checked and found to be less than a PTR_TO_PACKET_END, the register B is |
| 186 | now known to have a safe range of at least 4 bytes. See 'Direct packet access', |
| 187 | below, for more on PTR_TO_PACKET ranges. |
| 188 | |
| 189 | The 'id' field is also used on PTR_TO_MAP_VALUE_OR_NULL, common to all copies of |
| 190 | the pointer returned from a map lookup. This means that when one copy is |
| 191 | checked and found to be non-NULL, all copies can become PTR_TO_MAP_VALUEs. |
| 192 | As well as range-checking, the tracked information is also used for enforcing |
| 193 | alignment of pointer accesses. For instance, on most systems the packet pointer |
| 194 | is 2 bytes after a 4-byte alignment. If a program adds 14 bytes to that to jump |
| 195 | over the Ethernet header, then reads IHL and addes (IHL * 4), the resulting |
| 196 | pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 |
| 197 | bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through |
| 198 | that pointer are safe. |
| 199 | The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common |
| 200 | to all copies of the pointer returned from a socket lookup. This has similar |
| 201 | behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but |
| 202 | it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly |
| 203 | represents a reference to the corresponding ``struct sock``. To ensure that the |
| 204 | reference is not leaked, it is imperative to NULL-check the reference and in |
| 205 | the non-NULL case, and pass the valid reference to the socket release function. |
| 206 | |
| 207 | Direct packet access |
| 208 | ==================== |
| 209 | |
| 210 | In cls_bpf and act_bpf programs the verifier allows direct access to the packet |
| 211 | data via skb->data and skb->data_end pointers. |
| 212 | Ex:: |
| 213 | |
| 214 | 1: r4 = *(u32 *)(r1 +80) /* load skb->data_end */ |
| 215 | 2: r3 = *(u32 *)(r1 +76) /* load skb->data */ |
| 216 | 3: r5 = r3 |
| 217 | 4: r5 += 14 |
| 218 | 5: if r5 > r4 goto pc+16 |
| 219 | R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp |
| 220 | 6: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */ |
| 221 | |
| 222 | this 2byte load from the packet is safe to do, since the program author |
| 223 | did check ``if (skb->data + 14 > skb->data_end) goto err`` at insn #5 which |
| 224 | means that in the fall-through case the register R3 (which points to skb->data) |
| 225 | has at least 14 directly accessible bytes. The verifier marks it |
| 226 | as R3=pkt(id=0,off=0,r=14). |
| 227 | id=0 means that no additional variables were added to the register. |
| 228 | off=0 means that no additional constants were added. |
| 229 | r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok. |
| 230 | Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points |
| 231 | to the packet data, but constant 14 was added to the register, so |
| 232 | it now points to ``skb->data + 14`` and accessible range is [R5, R5 + 14 - 14) |
| 233 | which is zero bytes. |
| 234 | |
| 235 | More complex packet access may look like:: |
| 236 | |
| 237 | |
| 238 | R0=inv1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp |
| 239 | 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */ |
| 240 | 7: r4 = *(u8 *)(r3 +12) |
| 241 | 8: r4 *= 14 |
| 242 | 9: r3 = *(u32 *)(r1 +76) /* load skb->data */ |
| 243 | 10: r3 += r4 |
| 244 | 11: r2 = r1 |
| 245 | 12: r2 <<= 48 |
| 246 | 13: r2 >>= 48 |
| 247 | 14: r3 += r2 |
| 248 | 15: r2 = r3 |
| 249 | 16: r2 += 8 |
| 250 | 17: r1 = *(u32 *)(r1 +80) /* load skb->data_end */ |
| 251 | 18: if r2 > r1 goto pc+2 |
| 252 | R0=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)) R5=pkt(id=0,off=14,r=14) R10=fp |
| 253 | 19: r1 = *(u8 *)(r3 +4) |
| 254 | |
| 255 | The state of the register R3 is R3=pkt(id=2,off=0,r=8) |
| 256 | id=2 means that two ``r3 += rX`` instructions were seen, so r3 points to some |
| 257 | offset within a packet and since the program author did |
| 258 | ``if (r3 + 8 > r1) goto err`` at insn #18, the safe range is [R3, R3 + 8). |
| 259 | The verifier only allows 'add'/'sub' operations on packet registers. Any other |
| 260 | operation will set the register state to 'SCALAR_VALUE' and it won't be |
| 261 | available for direct packet access. |
| 262 | |
| 263 | Operation ``r3 += rX`` may overflow and become less than original skb->data, |
| 264 | therefore the verifier has to prevent that. So when it sees ``r3 += rX`` |
| 265 | instruction and rX is more than 16-bit value, any subsequent bounds-check of r3 |
| 266 | against skb->data_end will not give us 'range' information, so attempts to read |
| 267 | through the pointer will give "invalid access to packet" error. |
| 268 | |
| 269 | Ex. after insn ``r4 = *(u8 *)(r3 +12)`` (insn #7 above) the state of r4 is |
| 270 | R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) which means that upper 56 bits |
| 271 | of the register are guaranteed to be zero, and nothing is known about the lower |
| 272 | 8 bits. After insn ``r4 *= 14`` the state becomes |
| 273 | R4=inv(id=0,umax_value=3570,var_off=(0x0; 0xfffe)), since multiplying an 8-bit |
| 274 | value by constant 14 will keep upper 52 bits as zero, also the least significant |
| 275 | bit will be zero as 14 is even. Similarly ``r2 >>= 48`` will make |
| 276 | R2=inv(id=0,umax_value=65535,var_off=(0x0; 0xffff)), since the shift is not sign |
| 277 | extending. This logic is implemented in adjust_reg_min_max_vals() function, |
| 278 | which calls adjust_ptr_min_max_vals() for adding pointer to scalar (or vice |
| 279 | versa) and adjust_scalar_min_max_vals() for operations on two scalars. |
| 280 | |
| 281 | The end result is that bpf program author can access packet directly |
| 282 | using normal C code as:: |
| 283 | |
| 284 | void *data = (void *)(long)skb->data; |
| 285 | void *data_end = (void *)(long)skb->data_end; |
| 286 | struct eth_hdr *eth = data; |
| 287 | struct iphdr *iph = data + sizeof(*eth); |
| 288 | struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph); |
| 289 | |
| 290 | if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end) |
| 291 | return 0; |
| 292 | if (eth->h_proto != htons(ETH_P_IP)) |
| 293 | return 0; |
| 294 | if (iph->protocol != IPPROTO_UDP || iph->ihl != 5) |
| 295 | return 0; |
| 296 | if (udp->dest == 53 || udp->source == 9) |
| 297 | ...; |
| 298 | |
| 299 | which makes such programs easier to write comparing to LD_ABS insn |
| 300 | and significantly faster. |
| 301 | |
| 302 | Pruning |
| 303 | ======= |
| 304 | |
| 305 | The verifier does not actually walk all possible paths through the program. For |
| 306 | each new branch to analyse, the verifier looks at all the states it's previously |
| 307 | been in when at this instruction. If any of them contain the current state as a |
| 308 | subset, the branch is 'pruned' - that is, the fact that the previous state was |
| 309 | accepted implies the current state would be as well. For instance, if in the |
| 310 | previous state, r1 held a packet-pointer, and in the current state, r1 holds a |
| 311 | packet-pointer with a range as long or longer and at least as strict an |
| 312 | alignment, then r1 is safe. Similarly, if r2 was NOT_INIT before then it can't |
| 313 | have been used by any path from that point, so any value in r2 (including |
| 314 | another NOT_INIT) is safe. The implementation is in the function regsafe(). |
| 315 | Pruning considers not only the registers but also the stack (and any spilled |
| 316 | registers it may hold). They must all be safe for the branch to be pruned. |
| 317 | This is implemented in states_equal(). |
| 318 | |
| 319 | Understanding eBPF verifier messages |
| 320 | ==================================== |
| 321 | |
| 322 | The following are few examples of invalid eBPF programs and verifier error |
| 323 | messages as seen in the log: |
| 324 | |
| 325 | Program with unreachable instructions:: |
| 326 | |
| 327 | static struct bpf_insn prog[] = { |
| 328 | BPF_EXIT_INSN(), |
| 329 | BPF_EXIT_INSN(), |
| 330 | }; |
| 331 | |
| 332 | Error: |
| 333 | |
| 334 | unreachable insn 1 |
| 335 | |
| 336 | Program that reads uninitialized register:: |
| 337 | |
| 338 | BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), |
| 339 | BPF_EXIT_INSN(), |
| 340 | |
| 341 | Error:: |
| 342 | |
| 343 | 0: (bf) r0 = r2 |
| 344 | R2 !read_ok |
| 345 | |
| 346 | Program that doesn't initialize R0 before exiting:: |
| 347 | |
| 348 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), |
| 349 | BPF_EXIT_INSN(), |
| 350 | |
| 351 | Error:: |
| 352 | |
| 353 | 0: (bf) r2 = r1 |
| 354 | 1: (95) exit |
| 355 | R0 !read_ok |
| 356 | |
| 357 | Program that accesses stack out of bounds:: |
| 358 | |
| 359 | BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0), |
| 360 | BPF_EXIT_INSN(), |
| 361 | |
| 362 | Error:: |
| 363 | |
| 364 | 0: (7a) *(u64 *)(r10 +8) = 0 |
| 365 | invalid stack off=8 size=8 |
| 366 | |
| 367 | Program that doesn't initialize stack before passing its address into function:: |
| 368 | |
| 369 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), |
| 370 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), |
| 371 | BPF_LD_MAP_FD(BPF_REG_1, 0), |
| 372 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), |
| 373 | BPF_EXIT_INSN(), |
| 374 | |
| 375 | Error:: |
| 376 | |
| 377 | 0: (bf) r2 = r10 |
| 378 | 1: (07) r2 += -8 |
| 379 | 2: (b7) r1 = 0x0 |
| 380 | 3: (85) call 1 |
| 381 | invalid indirect read from stack off -8+0 size 8 |
| 382 | |
| 383 | Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:: |
| 384 | |
| 385 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), |
| 386 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), |
| 387 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), |
| 388 | BPF_LD_MAP_FD(BPF_REG_1, 0), |
| 389 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), |
| 390 | BPF_EXIT_INSN(), |
| 391 | |
| 392 | Error:: |
| 393 | |
| 394 | 0: (7a) *(u64 *)(r10 -8) = 0 |
| 395 | 1: (bf) r2 = r10 |
| 396 | 2: (07) r2 += -8 |
| 397 | 3: (b7) r1 = 0x0 |
| 398 | 4: (85) call 1 |
| 399 | fd 0 is not pointing to valid bpf_map |
| 400 | |
| 401 | Program that doesn't check return value of map_lookup_elem() before accessing |
| 402 | map element:: |
| 403 | |
| 404 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), |
| 405 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), |
| 406 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), |
| 407 | BPF_LD_MAP_FD(BPF_REG_1, 0), |
| 408 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), |
| 409 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), |
| 410 | BPF_EXIT_INSN(), |
| 411 | |
| 412 | Error:: |
| 413 | |
| 414 | 0: (7a) *(u64 *)(r10 -8) = 0 |
| 415 | 1: (bf) r2 = r10 |
| 416 | 2: (07) r2 += -8 |
| 417 | 3: (b7) r1 = 0x0 |
| 418 | 4: (85) call 1 |
| 419 | 5: (7a) *(u64 *)(r0 +0) = 0 |
| 420 | R0 invalid mem access 'map_value_or_null' |
| 421 | |
| 422 | Program that correctly checks map_lookup_elem() returned value for NULL, but |
| 423 | accesses the memory with incorrect alignment:: |
| 424 | |
| 425 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), |
| 426 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), |
| 427 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), |
| 428 | BPF_LD_MAP_FD(BPF_REG_1, 0), |
| 429 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), |
| 430 | BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), |
| 431 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), |
| 432 | BPF_EXIT_INSN(), |
| 433 | |
| 434 | Error:: |
| 435 | |
| 436 | 0: (7a) *(u64 *)(r10 -8) = 0 |
| 437 | 1: (bf) r2 = r10 |
| 438 | 2: (07) r2 += -8 |
| 439 | 3: (b7) r1 = 1 |
| 440 | 4: (85) call 1 |
| 441 | 5: (15) if r0 == 0x0 goto pc+1 |
| 442 | R0=map_ptr R10=fp |
| 443 | 6: (7a) *(u64 *)(r0 +4) = 0 |
| 444 | misaligned access off 4 size 8 |
| 445 | |
| 446 | Program that correctly checks map_lookup_elem() returned value for NULL and |
| 447 | accesses memory with correct alignment in one side of 'if' branch, but fails |
| 448 | to do so in the other side of 'if' branch:: |
| 449 | |
| 450 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), |
| 451 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), |
| 452 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), |
| 453 | BPF_LD_MAP_FD(BPF_REG_1, 0), |
| 454 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), |
| 455 | BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), |
| 456 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), |
| 457 | BPF_EXIT_INSN(), |
| 458 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1), |
| 459 | BPF_EXIT_INSN(), |
| 460 | |
| 461 | Error:: |
| 462 | |
| 463 | 0: (7a) *(u64 *)(r10 -8) = 0 |
| 464 | 1: (bf) r2 = r10 |
| 465 | 2: (07) r2 += -8 |
| 466 | 3: (b7) r1 = 1 |
| 467 | 4: (85) call 1 |
| 468 | 5: (15) if r0 == 0x0 goto pc+2 |
| 469 | R0=map_ptr R10=fp |
| 470 | 6: (7a) *(u64 *)(r0 +0) = 0 |
| 471 | 7: (95) exit |
| 472 | |
| 473 | from 5 to 8: R0=imm0 R10=fp |
| 474 | 8: (7a) *(u64 *)(r0 +0) = 1 |
| 475 | R0 invalid mem access 'imm' |
| 476 | |
| 477 | Program that performs a socket lookup then sets the pointer to NULL without |
| 478 | checking it:: |
| 479 | |
| 480 | BPF_MOV64_IMM(BPF_REG_2, 0), |
| 481 | BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), |
| 482 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), |
| 483 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), |
| 484 | BPF_MOV64_IMM(BPF_REG_3, 4), |
| 485 | BPF_MOV64_IMM(BPF_REG_4, 0), |
| 486 | BPF_MOV64_IMM(BPF_REG_5, 0), |
| 487 | BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), |
| 488 | BPF_MOV64_IMM(BPF_REG_0, 0), |
| 489 | BPF_EXIT_INSN(), |
| 490 | |
| 491 | Error:: |
| 492 | |
| 493 | 0: (b7) r2 = 0 |
| 494 | 1: (63) *(u32 *)(r10 -8) = r2 |
| 495 | 2: (bf) r2 = r10 |
| 496 | 3: (07) r2 += -8 |
| 497 | 4: (b7) r3 = 4 |
| 498 | 5: (b7) r4 = 0 |
| 499 | 6: (b7) r5 = 0 |
| 500 | 7: (85) call bpf_sk_lookup_tcp#65 |
| 501 | 8: (b7) r0 = 0 |
| 502 | 9: (95) exit |
| 503 | Unreleased reference id=1, alloc_insn=7 |
| 504 | |
| 505 | Program that performs a socket lookup but does not NULL-check the returned |
| 506 | value:: |
| 507 | |
| 508 | BPF_MOV64_IMM(BPF_REG_2, 0), |
| 509 | BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), |
| 510 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), |
| 511 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), |
| 512 | BPF_MOV64_IMM(BPF_REG_3, 4), |
| 513 | BPF_MOV64_IMM(BPF_REG_4, 0), |
| 514 | BPF_MOV64_IMM(BPF_REG_5, 0), |
| 515 | BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), |
| 516 | BPF_EXIT_INSN(), |
| 517 | |
| 518 | Error:: |
| 519 | |
| 520 | 0: (b7) r2 = 0 |
| 521 | 1: (63) *(u32 *)(r10 -8) = r2 |
| 522 | 2: (bf) r2 = r10 |
| 523 | 3: (07) r2 += -8 |
| 524 | 4: (b7) r3 = 4 |
| 525 | 5: (b7) r4 = 0 |
| 526 | 6: (b7) r5 = 0 |
| 527 | 7: (85) call bpf_sk_lookup_tcp#65 |
| 528 | 8: (95) exit |
| 529 | Unreleased reference id=1, alloc_insn=7 |