Mickaël Salaün | 5526b45 | 2021-04-22 17:41:22 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | .. Copyright © 2017-2020 Mickaël Salaün <mic@digikod.net> |
| 3 | .. Copyright © 2019-2020 ANSSI |
| 4 | .. Copyright © 2021 Microsoft Corporation |
| 5 | |
| 6 | ===================================== |
| 7 | Landlock: unprivileged access control |
| 8 | ===================================== |
| 9 | |
| 10 | :Author: Mickaël Salaün |
| 11 | :Date: March 2021 |
| 12 | |
| 13 | The goal of Landlock is to enable to restrict ambient rights (e.g. global |
| 14 | filesystem access) for a set of processes. Because Landlock is a stackable |
| 15 | LSM, it makes possible to create safe security sandboxes as new security layers |
| 16 | in addition to the existing system-wide access-controls. This kind of sandbox |
| 17 | is expected to help mitigate the security impact of bugs or |
| 18 | unexpected/malicious behaviors in user space applications. Landlock empowers |
| 19 | any process, including unprivileged ones, to securely restrict themselves. |
| 20 | |
| 21 | Landlock rules |
| 22 | ============== |
| 23 | |
| 24 | A Landlock rule describes an action on an object. An object is currently a |
| 25 | file hierarchy, and the related filesystem actions are defined with `access |
| 26 | rights`_. A set of rules is aggregated in a ruleset, which can then restrict |
| 27 | the thread enforcing it, and its future children. |
| 28 | |
| 29 | Defining and enforcing a security policy |
| 30 | ---------------------------------------- |
| 31 | |
| 32 | We first need to create the ruleset that will contain our rules. For this |
| 33 | example, the ruleset will contain rules that only allow read actions, but write |
| 34 | actions will be denied. The ruleset then needs to handle both of these kind of |
| 35 | actions. |
| 36 | |
| 37 | .. code-block:: c |
| 38 | |
| 39 | int ruleset_fd; |
| 40 | struct landlock_ruleset_attr ruleset_attr = { |
| 41 | .handled_access_fs = |
| 42 | LANDLOCK_ACCESS_FS_EXECUTE | |
| 43 | LANDLOCK_ACCESS_FS_WRITE_FILE | |
| 44 | LANDLOCK_ACCESS_FS_READ_FILE | |
| 45 | LANDLOCK_ACCESS_FS_READ_DIR | |
| 46 | LANDLOCK_ACCESS_FS_REMOVE_DIR | |
| 47 | LANDLOCK_ACCESS_FS_REMOVE_FILE | |
| 48 | LANDLOCK_ACCESS_FS_MAKE_CHAR | |
| 49 | LANDLOCK_ACCESS_FS_MAKE_DIR | |
| 50 | LANDLOCK_ACCESS_FS_MAKE_REG | |
| 51 | LANDLOCK_ACCESS_FS_MAKE_SOCK | |
| 52 | LANDLOCK_ACCESS_FS_MAKE_FIFO | |
| 53 | LANDLOCK_ACCESS_FS_MAKE_BLOCK | |
| 54 | LANDLOCK_ACCESS_FS_MAKE_SYM, |
| 55 | }; |
| 56 | |
| 57 | ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); |
| 58 | if (ruleset_fd < 0) { |
| 59 | perror("Failed to create a ruleset"); |
| 60 | return 1; |
| 61 | } |
| 62 | |
| 63 | We can now add a new rule to this ruleset thanks to the returned file |
| 64 | descriptor referring to this ruleset. The rule will only allow reading the |
| 65 | file hierarchy ``/usr``. Without another rule, write actions would then be |
| 66 | denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the |
| 67 | ``O_PATH`` flag and fill the &struct landlock_path_beneath_attr with this file |
| 68 | descriptor. |
| 69 | |
| 70 | .. code-block:: c |
| 71 | |
| 72 | int err; |
| 73 | struct landlock_path_beneath_attr path_beneath = { |
| 74 | .allowed_access = |
| 75 | LANDLOCK_ACCESS_FS_EXECUTE | |
| 76 | LANDLOCK_ACCESS_FS_READ_FILE | |
| 77 | LANDLOCK_ACCESS_FS_READ_DIR, |
| 78 | }; |
| 79 | |
| 80 | path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC); |
| 81 | if (path_beneath.parent_fd < 0) { |
| 82 | perror("Failed to open file"); |
| 83 | close(ruleset_fd); |
| 84 | return 1; |
| 85 | } |
| 86 | err = landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH, |
| 87 | &path_beneath, 0); |
| 88 | close(path_beneath.parent_fd); |
| 89 | if (err) { |
| 90 | perror("Failed to update ruleset"); |
| 91 | close(ruleset_fd); |
| 92 | return 1; |
| 93 | } |
| 94 | |
| 95 | We now have a ruleset with one rule allowing read access to ``/usr`` while |
| 96 | denying all other handled accesses for the filesystem. The next step is to |
| 97 | restrict the current thread from gaining more privileges (e.g. thanks to a SUID |
| 98 | binary). |
| 99 | |
| 100 | .. code-block:: c |
| 101 | |
| 102 | if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { |
| 103 | perror("Failed to restrict privileges"); |
| 104 | close(ruleset_fd); |
| 105 | return 1; |
| 106 | } |
| 107 | |
| 108 | The current thread is now ready to sandbox itself with the ruleset. |
| 109 | |
| 110 | .. code-block:: c |
| 111 | |
| 112 | if (landlock_restrict_self(ruleset_fd, 0)) { |
| 113 | perror("Failed to enforce ruleset"); |
| 114 | close(ruleset_fd); |
| 115 | return 1; |
| 116 | } |
| 117 | close(ruleset_fd); |
| 118 | |
| 119 | If the `landlock_restrict_self` system call succeeds, the current thread is now |
| 120 | restricted and this policy will be enforced on all its subsequently created |
| 121 | children as well. Once a thread is landlocked, there is no way to remove its |
| 122 | security policy; only adding more restrictions is allowed. These threads are |
| 123 | now in a new Landlock domain, merge of their parent one (if any) with the new |
| 124 | ruleset. |
| 125 | |
| 126 | Full working code can be found in `samples/landlock/sandboxer.c`_. |
| 127 | |
| 128 | Layers of file path access rights |
| 129 | --------------------------------- |
| 130 | |
| 131 | Each time a thread enforces a ruleset on itself, it updates its Landlock domain |
| 132 | with a new layer of policy. Indeed, this complementary policy is stacked with |
| 133 | the potentially other rulesets already restricting this thread. A sandboxed |
| 134 | thread can then safely add more constraints to itself with a new enforced |
| 135 | ruleset. |
| 136 | |
| 137 | One policy layer grants access to a file path if at least one of its rules |
| 138 | encountered on the path grants the access. A sandboxed thread can only access |
| 139 | a file path if all its enforced policy layers grant the access as well as all |
| 140 | the other system access controls (e.g. filesystem DAC, other LSM policies, |
| 141 | etc.). |
| 142 | |
| 143 | Bind mounts and OverlayFS |
| 144 | ------------------------- |
| 145 | |
| 146 | Landlock enables to restrict access to file hierarchies, which means that these |
| 147 | access rights can be propagated with bind mounts (cf. |
Mauro Carvalho Chehab | 69fe554 | 2021-06-16 08:27:42 +0200 | [diff] [blame^] | 148 | Documentation/filesystems/sharedsubtree.rst) but not with |
| 149 | Documentation/filesystems/overlayfs.rst. |
Mickaël Salaün | 5526b45 | 2021-04-22 17:41:22 +0200 | [diff] [blame] | 150 | |
| 151 | A bind mount mirrors a source file hierarchy to a destination. The destination |
| 152 | hierarchy is then composed of the exact same files, on which Landlock rules can |
| 153 | be tied, either via the source or the destination path. These rules restrict |
| 154 | access when they are encountered on a path, which means that they can restrict |
| 155 | access to multiple file hierarchies at the same time, whether these hierarchies |
| 156 | are the result of bind mounts or not. |
| 157 | |
| 158 | An OverlayFS mount point consists of upper and lower layers. These layers are |
| 159 | combined in a merge directory, result of the mount point. This merge hierarchy |
| 160 | may include files from the upper and lower layers, but modifications performed |
| 161 | on the merge hierarchy only reflects on the upper layer. From a Landlock |
| 162 | policy point of view, each OverlayFS layers and merge hierarchies are |
| 163 | standalone and contains their own set of files and directories, which is |
| 164 | different from bind mounts. A policy restricting an OverlayFS layer will not |
| 165 | restrict the resulted merged hierarchy, and vice versa. Landlock users should |
| 166 | then only think about file hierarchies they want to allow access to, regardless |
| 167 | of the underlying filesystem. |
| 168 | |
| 169 | Inheritance |
| 170 | ----------- |
| 171 | |
| 172 | Every new thread resulting from a :manpage:`clone(2)` inherits Landlock domain |
| 173 | restrictions from its parent. This is similar to the seccomp inheritance (cf. |
Mauro Carvalho Chehab | 69fe554 | 2021-06-16 08:27:42 +0200 | [diff] [blame^] | 174 | Documentation/userspace-api/seccomp_filter.rst) or any other LSM dealing with |
| 175 | task's :manpage:`credentials(7)`. For instance, one process's thread may apply |
Mickaël Salaün | 5526b45 | 2021-04-22 17:41:22 +0200 | [diff] [blame] | 176 | Landlock rules to itself, but they will not be automatically applied to other |
| 177 | sibling threads (unlike POSIX thread credential changes, cf. |
| 178 | :manpage:`nptl(7)`). |
| 179 | |
| 180 | When a thread sandboxes itself, we have the guarantee that the related security |
| 181 | policy will stay enforced on all this thread's descendants. This allows |
| 182 | creating standalone and modular security policies per application, which will |
| 183 | automatically be composed between themselves according to their runtime parent |
| 184 | policies. |
| 185 | |
| 186 | Ptrace restrictions |
| 187 | ------------------- |
| 188 | |
| 189 | A sandboxed process has less privileges than a non-sandboxed process and must |
| 190 | then be subject to additional restrictions when manipulating another process. |
| 191 | To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target |
| 192 | process, a sandboxed process should have a subset of the target process rules, |
| 193 | which means the tracee must be in a sub-domain of the tracer. |
| 194 | |
| 195 | Kernel interface |
| 196 | ================ |
| 197 | |
| 198 | Access rights |
| 199 | ------------- |
| 200 | |
| 201 | .. kernel-doc:: include/uapi/linux/landlock.h |
| 202 | :identifiers: fs_access |
| 203 | |
| 204 | Creating a new ruleset |
| 205 | ---------------------- |
| 206 | |
| 207 | .. kernel-doc:: security/landlock/syscalls.c |
| 208 | :identifiers: sys_landlock_create_ruleset |
| 209 | |
| 210 | .. kernel-doc:: include/uapi/linux/landlock.h |
| 211 | :identifiers: landlock_ruleset_attr |
| 212 | |
| 213 | Extending a ruleset |
| 214 | ------------------- |
| 215 | |
| 216 | .. kernel-doc:: security/landlock/syscalls.c |
| 217 | :identifiers: sys_landlock_add_rule |
| 218 | |
| 219 | .. kernel-doc:: include/uapi/linux/landlock.h |
| 220 | :identifiers: landlock_rule_type landlock_path_beneath_attr |
| 221 | |
| 222 | Enforcing a ruleset |
| 223 | ------------------- |
| 224 | |
| 225 | .. kernel-doc:: security/landlock/syscalls.c |
| 226 | :identifiers: sys_landlock_restrict_self |
| 227 | |
| 228 | Current limitations |
| 229 | =================== |
| 230 | |
| 231 | File renaming and linking |
| 232 | ------------------------- |
| 233 | |
| 234 | Because Landlock targets unprivileged access controls, it is needed to properly |
| 235 | handle composition of rules. Such property also implies rules nesting. |
| 236 | Properly handling multiple layers of ruleset, each one of them able to restrict |
| 237 | access to files, also implies to inherit the ruleset restrictions from a parent |
| 238 | to its hierarchy. Because files are identified and restricted by their |
| 239 | hierarchy, moving or linking a file from one directory to another implies to |
| 240 | propagate the hierarchy constraints. To protect against privilege escalations |
| 241 | through renaming or linking, and for the sake of simplicity, Landlock currently |
| 242 | limits linking and renaming to the same directory. Future Landlock evolutions |
| 243 | will enable more flexibility for renaming and linking, with dedicated ruleset |
| 244 | flags. |
| 245 | |
| 246 | Filesystem topology modification |
| 247 | -------------------------------- |
| 248 | |
| 249 | As for file renaming and linking, a sandboxed thread cannot modify its |
| 250 | filesystem topology, whether via :manpage:`mount(2)` or |
| 251 | :manpage:`pivot_root(2)`. However, :manpage:`chroot(2)` calls are not denied. |
| 252 | |
| 253 | Special filesystems |
| 254 | ------------------- |
| 255 | |
| 256 | Access to regular files and directories can be restricted by Landlock, |
| 257 | according to the handled accesses of a ruleset. However, files that do not |
| 258 | come from a user-visible filesystem (e.g. pipe, socket), but can still be |
| 259 | accessed through ``/proc/<pid>/fd/*``, cannot currently be explicitly |
| 260 | restricted. Likewise, some special kernel filesystems such as nsfs, which can |
| 261 | be accessed through ``/proc/<pid>/ns/*``, cannot currently be explicitly |
| 262 | restricted. However, thanks to the `ptrace restrictions`_, access to such |
| 263 | sensitive ``/proc`` files are automatically restricted according to domain |
| 264 | hierarchies. Future Landlock evolutions could still enable to explicitly |
| 265 | restrict such paths with dedicated ruleset flags. |
| 266 | |
| 267 | Ruleset layers |
| 268 | -------------- |
| 269 | |
| 270 | There is a limit of 64 layers of stacked rulesets. This can be an issue for a |
| 271 | task willing to enforce a new ruleset in complement to its 64 inherited |
| 272 | rulesets. Once this limit is reached, sys_landlock_restrict_self() returns |
| 273 | E2BIG. It is then strongly suggested to carefully build rulesets once in the |
| 274 | life of a thread, especially for applications able to launch other applications |
| 275 | that may also want to sandbox themselves (e.g. shells, container managers, |
| 276 | etc.). |
| 277 | |
| 278 | Memory usage |
| 279 | ------------ |
| 280 | |
| 281 | Kernel memory allocated to create rulesets is accounted and can be restricted |
Mauro Carvalho Chehab | 69fe554 | 2021-06-16 08:27:42 +0200 | [diff] [blame^] | 282 | by the Documentation/admin-guide/cgroup-v1/memory.rst. |
Mickaël Salaün | 5526b45 | 2021-04-22 17:41:22 +0200 | [diff] [blame] | 283 | |
| 284 | Questions and answers |
| 285 | ===================== |
| 286 | |
| 287 | What about user space sandbox managers? |
| 288 | --------------------------------------- |
| 289 | |
| 290 | Using user space process to enforce restrictions on kernel resources can lead |
| 291 | to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of |
| 292 | the OS code and state |
| 293 | <https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/>`_). |
| 294 | |
| 295 | What about namespaces and containers? |
| 296 | ------------------------------------- |
| 297 | |
| 298 | Namespaces can help create sandboxes but they are not designed for |
| 299 | access-control and then miss useful features for such use case (e.g. no |
| 300 | fine-grained restrictions). Moreover, their complexity can lead to security |
| 301 | issues, especially when untrusted processes can manipulate them (cf. |
| 302 | `Controlling access to user namespaces <https://lwn.net/Articles/673597/>`_). |
| 303 | |
| 304 | Additional documentation |
| 305 | ======================== |
| 306 | |
Mauro Carvalho Chehab | 69fe554 | 2021-06-16 08:27:42 +0200 | [diff] [blame^] | 307 | * Documentation/security/landlock.rst |
Mickaël Salaün | 5526b45 | 2021-04-22 17:41:22 +0200 | [diff] [blame] | 308 | * https://landlock.io |
| 309 | |
| 310 | .. Links |
| 311 | .. _samples/landlock/sandboxer.c: |
| 312 | https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/samples/landlock/sandboxer.c |