YiFei Zhu | 4e15f46 | 2020-07-23 23:47:45 -0500 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0-only |
| 2 | .. Copyright (C) 2020 Google LLC. |
| 3 | |
| 4 | =========================== |
| 5 | BPF_MAP_TYPE_CGROUP_STORAGE |
| 6 | =========================== |
| 7 | |
| 8 | The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized |
| 9 | storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that |
| 10 | attach to cgroups; the programs are made available by the same Kconfig. The |
| 11 | storage is identified by the cgroup the program is attached to. |
| 12 | |
| 13 | The map provide a local storage at the cgroup that the BPF program is attached |
| 14 | to. It provides a faster and simpler access than the general purpose hash |
| 15 | table, which performs a hash table lookups, and requires user to track live |
| 16 | cgroups on their own. |
| 17 | |
| 18 | This document describes the usage and semantics of the |
| 19 | ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in |
| 20 | Linux 5.9 and this document will describe the differences. |
| 21 | |
| 22 | Usage |
| 23 | ===== |
| 24 | |
| 25 | The map uses key of type of either ``__u64 cgroup_inode_id`` or |
| 26 | ``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: |
| 27 | |
| 28 | struct bpf_cgroup_storage_key { |
| 29 | __u64 cgroup_inode_id; |
| 30 | __u32 attach_type; |
| 31 | }; |
| 32 | |
| 33 | ``cgroup_inode_id`` is the inode id of the cgroup directory. |
| 34 | ``attach_type`` is the the program's attach type. |
| 35 | |
| 36 | Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. |
| 37 | When this key type is used, then all attach types of the particular cgroup and |
| 38 | map will share the same storage. Otherwise, if the type is |
| 39 | ``struct bpf_cgroup_storage_key``, then programs of different attach types |
| 40 | be isolated and see different storages. |
| 41 | |
| 42 | To access the storage in a program, use ``bpf_get_local_storage``:: |
| 43 | |
| 44 | void *bpf_get_local_storage(void *map, u64 flags) |
| 45 | |
| 46 | ``flags`` is reserved for future use and must be 0. |
| 47 | |
| 48 | There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` |
| 49 | can be accessed by multiple programs across different CPUs, and user should |
| 50 | take care of synchronization by themselves. The bpf infrastructure provides |
| 51 | ``struct bpf_spin_lock`` to synchronize the storage. See |
| 52 | ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. |
| 53 | |
| 54 | Examples |
| 55 | ======== |
| 56 | |
| 57 | Usage with key type as ``struct bpf_cgroup_storage_key``:: |
| 58 | |
| 59 | #include <bpf/bpf.h> |
| 60 | |
| 61 | struct { |
| 62 | __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); |
| 63 | __type(key, struct bpf_cgroup_storage_key); |
| 64 | __type(value, __u32); |
| 65 | } cgroup_storage SEC(".maps"); |
| 66 | |
| 67 | int program(struct __sk_buff *skb) |
| 68 | { |
| 69 | __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); |
| 70 | __sync_fetch_and_add(ptr, 1); |
| 71 | |
| 72 | return 0; |
| 73 | } |
| 74 | |
| 75 | Userspace accessing map declared above:: |
| 76 | |
| 77 | #include <linux/bpf.h> |
| 78 | #include <linux/libbpf.h> |
| 79 | |
| 80 | __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) |
| 81 | { |
| 82 | struct bpf_cgroup_storage_key = { |
| 83 | .cgroup_inode_id = cgrp, |
| 84 | .attach_type = type, |
| 85 | }; |
| 86 | __u32 value; |
| 87 | bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); |
| 88 | // error checking omitted |
| 89 | return value; |
| 90 | } |
| 91 | |
| 92 | Alternatively, using just ``__u64 cgroup_inode_id`` as key type:: |
| 93 | |
| 94 | #include <bpf/bpf.h> |
| 95 | |
| 96 | struct { |
| 97 | __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); |
| 98 | __type(key, __u64); |
| 99 | __type(value, __u32); |
| 100 | } cgroup_storage SEC(".maps"); |
| 101 | |
| 102 | int program(struct __sk_buff *skb) |
| 103 | { |
| 104 | __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); |
| 105 | __sync_fetch_and_add(ptr, 1); |
| 106 | |
| 107 | return 0; |
| 108 | } |
| 109 | |
| 110 | And userspace:: |
| 111 | |
| 112 | #include <linux/bpf.h> |
| 113 | #include <linux/libbpf.h> |
| 114 | |
| 115 | __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) |
| 116 | { |
| 117 | __u32 value; |
| 118 | bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); |
| 119 | // error checking omitted |
| 120 | return value; |
| 121 | } |
| 122 | |
| 123 | Semantics |
| 124 | ========= |
| 125 | |
| 126 | ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This |
| 127 | per-CPU variant will have different memory regions for each CPU for each |
| 128 | storage. The non-per-CPU will have the same memory region for each storage. |
| 129 | |
| 130 | Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and |
| 131 | for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded |
| 132 | that uses the map. A program may be attached to multiple cgroups or have |
| 133 | multiple attach types, and each attach creates a fresh zeroed storage. The |
| 134 | storage is freed upon detach. |
| 135 | |
| 136 | There is a one-to-one association between the map of each type (per-CPU and |
| 137 | non-per-CPU) and the BPF program during load verification time. As a result, |
| 138 | each map can only be used by one BPF program and each BPF program can only use |
| 139 | one storage map of each type. Because of map can only be used by one BPF |
| 140 | program, sharing of this cgroup's storage with other BPF programs were |
| 141 | impossible. |
| 142 | |
| 143 | Since Linux 5.9, storage can be shared by multiple programs. When a program is |
| 144 | attached to a cgroup, the kernel would create a new storage only if the map |
| 145 | does not already contain an entry for the cgroup and attach type pair, or else |
| 146 | the old storage is reused for the new attachment. If the map is attach type |
| 147 | shared, then attach type is simply ignored during comparison. Storage is freed |
| 148 | only when either the map or the cgroup attached to is being freed. Detaching |
| 149 | will not directly free the storage, but it may cause the reference to the map |
| 150 | to reach zero and indirectly freeing all storage in the map. |
| 151 | |
| 152 | The map is not associated with any BPF program, thus making sharing possible. |
| 153 | However, the BPF program can still only associate with one map of each type |
| 154 | (per-CPU and non-per-CPU). A BPF program cannot use more than one |
| 155 | ``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one |
| 156 | ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. |
| 157 | |
| 158 | In all versions, userspace may use the the attach parameters of cgroup and |
| 159 | attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map |
| 160 | APIs to read or update the storage for a given attachment. For Linux 5.9 |
| 161 | attach type shared storages, only the first value in the struct, cgroup inode |
| 162 | id, is used during comparison, so userspace may just specify a ``__u64`` |
| 163 | directly. |
| 164 | |
| 165 | The storage is bound at attach time. Even if the program is attached to parent |
| 166 | and triggers in child, the storage still belongs to the parent. |
| 167 | |
| 168 | Userspace cannot create a new entry in the map or delete an existing entry. |
| 169 | Program test runs always use a temporary storage. |