Eric Biggers | 6ff2deb | 2019-07-22 09:26:20 -0700 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | .. _fsverity: |
| 4 | |
| 5 | ======================================================= |
| 6 | fs-verity: read-only file-based authenticity protection |
| 7 | ======================================================= |
| 8 | |
| 9 | Introduction |
| 10 | ============ |
| 11 | |
| 12 | fs-verity (``fs/verity/``) is a support layer that filesystems can |
| 13 | hook into to support transparent integrity and authenticity protection |
| 14 | of read-only files. Currently, it is supported by the ext4 and f2fs |
| 15 | filesystems. Like fscrypt, not too much filesystem-specific code is |
| 16 | needed to support fs-verity. |
| 17 | |
| 18 | fs-verity is similar to `dm-verity |
| 19 | <https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ |
| 20 | but works on files rather than block devices. On regular files on |
| 21 | filesystems supporting fs-verity, userspace can execute an ioctl that |
| 22 | causes the filesystem to build a Merkle tree for the file and persist |
| 23 | it to a filesystem-specific location associated with the file. |
| 24 | |
| 25 | After this, the file is made readonly, and all reads from the file are |
| 26 | automatically verified against the file's Merkle tree. Reads of any |
| 27 | corrupted data, including mmap reads, will fail. |
| 28 | |
| 29 | Userspace can use another ioctl to retrieve the root hash (actually |
| 30 | the "file measurement", which is a hash that includes the root hash) |
| 31 | that fs-verity is enforcing for the file. This ioctl executes in |
| 32 | constant time, regardless of the file size. |
| 33 | |
| 34 | fs-verity is essentially a way to hash a file in constant time, |
| 35 | subject to the caveat that reads which would violate the hash will |
| 36 | fail at runtime. |
| 37 | |
| 38 | Use cases |
| 39 | ========= |
| 40 | |
| 41 | By itself, the base fs-verity feature only provides integrity |
| 42 | protection, i.e. detection of accidental (non-malicious) corruption. |
| 43 | |
| 44 | However, because fs-verity makes retrieving the file hash extremely |
| 45 | efficient, it's primarily meant to be used as a tool to support |
| 46 | authentication (detection of malicious modifications) or auditing |
| 47 | (logging file hashes before use). |
| 48 | |
| 49 | Trusted userspace code (e.g. operating system code running on a |
| 50 | read-only partition that is itself authenticated by dm-verity) can |
| 51 | authenticate the contents of an fs-verity file by using the |
| 52 | `FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a |
| 53 | digital signature of it. |
| 54 | |
| 55 | A standard file hash could be used instead of fs-verity. However, |
| 56 | this is inefficient if the file is large and only a small portion may |
| 57 | be accessed. This is often the case for Android application package |
| 58 | (APK) files, for example. These typically contain many translations, |
| 59 | classes, and other resources that are infrequently or even never |
| 60 | accessed on a particular device. It would be slow and wasteful to |
| 61 | read and hash the entire file before starting the application. |
| 62 | |
| 63 | Unlike an ahead-of-time hash, fs-verity also re-verifies data each |
| 64 | time it's paged in. This ensures that malicious disk firmware can't |
| 65 | undetectably change the contents of the file at runtime. |
| 66 | |
| 67 | fs-verity does not replace or obsolete dm-verity. dm-verity should |
| 68 | still be used on read-only filesystems. fs-verity is for files that |
| 69 | must live on a read-write filesystem because they are independently |
| 70 | updated and potentially user-installed, so dm-verity cannot be used. |
| 71 | |
| 72 | The base fs-verity feature is a hashing mechanism only; actually |
| 73 | authenticating the files is up to userspace. However, to meet some |
| 74 | users' needs, fs-verity optionally supports a simple signature |
| 75 | verification mechanism where users can configure the kernel to require |
| 76 | that all fs-verity files be signed by a key loaded into a keyring; see |
| 77 | `Built-in signature verification`_. Support for fs-verity file hashes |
| 78 | in IMA (Integrity Measurement Architecture) policies is also planned. |
| 79 | |
| 80 | User API |
| 81 | ======== |
| 82 | |
| 83 | FS_IOC_ENABLE_VERITY |
| 84 | -------------------- |
| 85 | |
| 86 | The FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes |
Mauro Carvalho Chehab | 9303c9d | 2020-09-25 12:01:25 +0200 | [diff] [blame] | 87 | in a pointer to a struct fsverity_enable_arg, defined as |
Eric Biggers | 6ff2deb | 2019-07-22 09:26:20 -0700 | [diff] [blame] | 88 | follows:: |
| 89 | |
| 90 | struct fsverity_enable_arg { |
| 91 | __u32 version; |
| 92 | __u32 hash_algorithm; |
| 93 | __u32 block_size; |
| 94 | __u32 salt_size; |
| 95 | __u64 salt_ptr; |
| 96 | __u32 sig_size; |
| 97 | __u32 __reserved1; |
| 98 | __u64 sig_ptr; |
| 99 | __u64 __reserved2[11]; |
| 100 | }; |
| 101 | |
| 102 | This structure contains the parameters of the Merkle tree to build for |
| 103 | the file, and optionally contains a signature. It must be initialized |
| 104 | as follows: |
| 105 | |
| 106 | - ``version`` must be 1. |
| 107 | - ``hash_algorithm`` must be the identifier for the hash algorithm to |
| 108 | use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See |
| 109 | ``include/uapi/linux/fsverity.h`` for the list of possible values. |
| 110 | - ``block_size`` must be the Merkle tree block size. Currently, this |
| 111 | must be equal to the system page size, which is usually 4096 bytes. |
| 112 | Other sizes may be supported in the future. This value is not |
| 113 | necessarily the same as the filesystem block size. |
| 114 | - ``salt_size`` is the size of the salt in bytes, or 0 if no salt is |
| 115 | provided. The salt is a value that is prepended to every hashed |
| 116 | block; it can be used to personalize the hashing for a particular |
| 117 | file or device. Currently the maximum salt size is 32 bytes. |
| 118 | - ``salt_ptr`` is the pointer to the salt, or NULL if no salt is |
| 119 | provided. |
| 120 | - ``sig_size`` is the size of the signature in bytes, or 0 if no |
| 121 | signature is provided. Currently the signature is (somewhat |
| 122 | arbitrarily) limited to 16128 bytes. See `Built-in signature |
| 123 | verification`_ for more information. |
| 124 | - ``sig_ptr`` is the pointer to the signature, or NULL if no |
| 125 | signature is provided. |
| 126 | - All reserved fields must be zeroed. |
| 127 | |
| 128 | FS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for |
| 129 | the file and persist it to a filesystem-specific location associated |
| 130 | with the file, then mark the file as a verity file. This ioctl may |
| 131 | take a long time to execute on large files, and it is interruptible by |
| 132 | fatal signals. |
| 133 | |
| 134 | FS_IOC_ENABLE_VERITY checks for write access to the inode. However, |
| 135 | it must be executed on an O_RDONLY file descriptor and no processes |
| 136 | can have the file open for writing. Attempts to open the file for |
| 137 | writing while this ioctl is executing will fail with ETXTBSY. (This |
| 138 | is necessary to guarantee that no writable file descriptors will exist |
| 139 | after verity is enabled, and to guarantee that the file's contents are |
| 140 | stable while the Merkle tree is being built over it.) |
| 141 | |
| 142 | On success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a |
| 143 | verity file. On failure (including the case of interruption by a |
| 144 | fatal signal), no changes are made to the file. |
| 145 | |
| 146 | FS_IOC_ENABLE_VERITY can fail with the following errors: |
| 147 | |
| 148 | - ``EACCES``: the process does not have write access to the file |
| 149 | - ``EBADMSG``: the signature is malformed |
| 150 | - ``EBUSY``: this ioctl is already running on the file |
| 151 | - ``EEXIST``: the file already has verity enabled |
| 152 | - ``EFAULT``: the caller provided inaccessible memory |
| 153 | - ``EINTR``: the operation was interrupted by a fatal signal |
| 154 | - ``EINVAL``: unsupported version, hash algorithm, or block size; or |
| 155 | reserved bits are set; or the file descriptor refers to neither a |
| 156 | regular file nor a directory. |
| 157 | - ``EISDIR``: the file descriptor refers to a directory |
| 158 | - ``EKEYREJECTED``: the signature doesn't match the file |
| 159 | - ``EMSGSIZE``: the salt or signature is too long |
| 160 | - ``ENOKEY``: the fs-verity keyring doesn't contain the certificate |
| 161 | needed to verify the signature |
| 162 | - ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not |
| 163 | available in the kernel's crypto API as currently configured (e.g. |
| 164 | for SHA-512, missing CONFIG_CRYPTO_SHA512). |
| 165 | - ``ENOTTY``: this type of filesystem does not implement fs-verity |
| 166 | - ``EOPNOTSUPP``: the kernel was not configured with fs-verity |
| 167 | support; or the filesystem superblock has not had the 'verity' |
| 168 | feature enabled on it; or the filesystem does not support fs-verity |
| 169 | on this file. (See `Filesystem support`_.) |
| 170 | - ``EPERM``: the file is append-only; or, a signature is required and |
| 171 | one was not provided. |
| 172 | - ``EROFS``: the filesystem is read-only |
| 173 | - ``ETXTBSY``: someone has the file open for writing. This can be the |
| 174 | caller's file descriptor, another open file descriptor, or the file |
| 175 | reference held by a writable memory map. |
| 176 | |
| 177 | FS_IOC_MEASURE_VERITY |
| 178 | --------------------- |
| 179 | |
| 180 | The FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity |
| 181 | file. The file measurement is a digest that cryptographically |
| 182 | identifies the file contents that are being enforced on reads. |
| 183 | |
| 184 | This ioctl takes in a pointer to a variable-length structure:: |
| 185 | |
| 186 | struct fsverity_digest { |
| 187 | __u16 digest_algorithm; |
| 188 | __u16 digest_size; /* input/output */ |
| 189 | __u8 digest[]; |
| 190 | }; |
| 191 | |
| 192 | ``digest_size`` is an input/output field. On input, it must be |
| 193 | initialized to the number of bytes allocated for the variable-length |
| 194 | ``digest`` field. |
| 195 | |
| 196 | On success, 0 is returned and the kernel fills in the structure as |
| 197 | follows: |
| 198 | |
| 199 | - ``digest_algorithm`` will be the hash algorithm used for the file |
| 200 | measurement. It will match ``fsverity_enable_arg::hash_algorithm``. |
| 201 | - ``digest_size`` will be the size of the digest in bytes, e.g. 32 |
| 202 | for SHA-256. (This can be redundant with ``digest_algorithm``.) |
| 203 | - ``digest`` will be the actual bytes of the digest. |
| 204 | |
| 205 | FS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, |
| 206 | regardless of the size of the file. |
| 207 | |
| 208 | FS_IOC_MEASURE_VERITY can fail with the following errors: |
| 209 | |
| 210 | - ``EFAULT``: the caller provided inaccessible memory |
| 211 | - ``ENODATA``: the file is not a verity file |
| 212 | - ``ENOTTY``: this type of filesystem does not implement fs-verity |
| 213 | - ``EOPNOTSUPP``: the kernel was not configured with fs-verity |
| 214 | support, or the filesystem superblock has not had the 'verity' |
| 215 | feature enabled on it. (See `Filesystem support`_.) |
| 216 | - ``EOVERFLOW``: the digest is longer than the specified |
| 217 | ``digest_size`` bytes. Try providing a larger buffer. |
| 218 | |
| 219 | FS_IOC_GETFLAGS |
| 220 | --------------- |
| 221 | |
| 222 | The existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) |
| 223 | can also be used to check whether a file has fs-verity enabled or not. |
| 224 | To do so, check for FS_VERITY_FL (0x00100000) in the returned flags. |
| 225 | |
| 226 | The verity flag is not settable via FS_IOC_SETFLAGS. You must use |
| 227 | FS_IOC_ENABLE_VERITY instead, since parameters must be provided. |
| 228 | |
Eric Biggers | 73f0ec0 | 2019-10-29 13:41:41 -0700 | [diff] [blame] | 229 | statx |
| 230 | ----- |
| 231 | |
| 232 | Since Linux v5.5, the statx() system call sets STATX_ATTR_VERITY if |
| 233 | the file has fs-verity enabled. This can perform better than |
| 234 | FS_IOC_GETFLAGS and FS_IOC_MEASURE_VERITY because it doesn't require |
| 235 | opening the file, and opening verity files can be expensive. |
| 236 | |
Eric Biggers | 6ff2deb | 2019-07-22 09:26:20 -0700 | [diff] [blame] | 237 | Accessing verity files |
| 238 | ====================== |
| 239 | |
| 240 | Applications can transparently access a verity file just like a |
| 241 | non-verity one, with the following exceptions: |
| 242 | |
| 243 | - Verity files are readonly. They cannot be opened for writing or |
| 244 | truncate()d, even if the file mode bits allow it. Attempts to do |
| 245 | one of these things will fail with EPERM. However, changes to |
| 246 | metadata such as owner, mode, timestamps, and xattrs are still |
| 247 | allowed, since these are not measured by fs-verity. Verity files |
| 248 | can also still be renamed, deleted, and linked to. |
| 249 | |
| 250 | - Direct I/O is not supported on verity files. Attempts to use direct |
| 251 | I/O on such files will fall back to buffered I/O. |
| 252 | |
| 253 | - DAX (Direct Access) is not supported on verity files, because this |
| 254 | would circumvent the data verification. |
| 255 | |
| 256 | - Reads of data that doesn't match the verity Merkle tree will fail |
| 257 | with EIO (for read()) or SIGBUS (for mmap() reads). |
| 258 | |
| 259 | - If the sysctl "fs.verity.require_signatures" is set to 1 and the |
| 260 | file's verity measurement is not signed by a key in the fs-verity |
| 261 | keyring, then opening the file will fail. See `Built-in signature |
| 262 | verification`_. |
| 263 | |
| 264 | Direct access to the Merkle tree is not supported. Therefore, if a |
| 265 | verity file is copied, or is backed up and restored, then it will lose |
| 266 | its "verity"-ness. fs-verity is primarily meant for files like |
| 267 | executables that are managed by a package manager. |
| 268 | |
| 269 | File measurement computation |
| 270 | ============================ |
| 271 | |
| 272 | This section describes how fs-verity hashes the file contents using a |
| 273 | Merkle tree to produce the "file measurement" which cryptographically |
| 274 | identifies the file contents. This algorithm is the same for all |
| 275 | filesystems that support fs-verity. |
| 276 | |
| 277 | Userspace only needs to be aware of this algorithm if it needs to |
| 278 | compute the file measurement itself, e.g. in order to sign the file. |
| 279 | |
| 280 | .. _fsverity_merkle_tree: |
| 281 | |
| 282 | Merkle tree |
| 283 | ----------- |
| 284 | |
| 285 | The file contents is divided into blocks, where the block size is |
| 286 | configurable but is usually 4096 bytes. The end of the last block is |
| 287 | zero-padded if needed. Each block is then hashed, producing the first |
| 288 | level of hashes. Then, the hashes in this first level are grouped |
| 289 | into 'blocksize'-byte blocks (zero-padding the ends as needed) and |
| 290 | these blocks are hashed, producing the second level of hashes. This |
| 291 | proceeds up the tree until only a single block remains. The hash of |
| 292 | this block is the "Merkle tree root hash". |
| 293 | |
| 294 | If the file fits in one block and is nonempty, then the "Merkle tree |
| 295 | root hash" is simply the hash of the single data block. If the file |
| 296 | is empty, then the "Merkle tree root hash" is all zeroes. |
| 297 | |
| 298 | The "blocks" here are not necessarily the same as "filesystem blocks". |
| 299 | |
| 300 | If a salt was specified, then it's zero-padded to the closest multiple |
| 301 | of the input size of the hash algorithm's compression function, e.g. |
| 302 | 64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is |
| 303 | prepended to every data or Merkle tree block that is hashed. |
| 304 | |
| 305 | The purpose of the block padding is to cause every hash to be taken |
| 306 | over the same amount of data, which simplifies the implementation and |
| 307 | keeps open more possibilities for hardware acceleration. The purpose |
| 308 | of the salt padding is to make the salting "free" when the salted hash |
| 309 | state is precomputed, then imported for each hash. |
| 310 | |
| 311 | Example: in the recommended configuration of SHA-256 and 4K blocks, |
| 312 | 128 hash values fit in each block. Thus, each level of the Merkle |
| 313 | tree is approximately 128 times smaller than the previous, and for |
| 314 | large files the Merkle tree's size converges to approximately 1/127 of |
| 315 | the original file size. However, for small files, the padding is |
| 316 | significant, making the space overhead proportionally more. |
| 317 | |
| 318 | .. _fsverity_descriptor: |
| 319 | |
| 320 | fs-verity descriptor |
| 321 | -------------------- |
| 322 | |
| 323 | By itself, the Merkle tree root hash is ambiguous. For example, it |
| 324 | can't a distinguish a large file from a small second file whose data |
| 325 | is exactly the top-level hash block of the first file. Ambiguities |
| 326 | also arise from the convention of padding to the next block boundary. |
| 327 | |
| 328 | To solve this problem, the verity file measurement is actually |
| 329 | computed as a hash of the following structure, which contains the |
| 330 | Merkle tree root hash as well as other fields such as the file size:: |
| 331 | |
| 332 | struct fsverity_descriptor { |
| 333 | __u8 version; /* must be 1 */ |
| 334 | __u8 hash_algorithm; /* Merkle tree hash algorithm */ |
| 335 | __u8 log_blocksize; /* log2 of size of data and tree blocks */ |
| 336 | __u8 salt_size; /* size of salt in bytes; 0 if none */ |
| 337 | __le32 sig_size; /* must be 0 */ |
| 338 | __le64 data_size; /* size of file the Merkle tree is built over */ |
| 339 | __u8 root_hash[64]; /* Merkle tree root hash */ |
| 340 | __u8 salt[32]; /* salt prepended to each hashed block */ |
| 341 | __u8 __reserved[144]; /* must be 0's */ |
| 342 | }; |
| 343 | |
| 344 | Note that the ``sig_size`` field must be set to 0 for the purpose of |
| 345 | computing the file measurement, even if a signature was provided (or |
| 346 | will be provided) to `FS_IOC_ENABLE_VERITY`_. |
| 347 | |
| 348 | Built-in signature verification |
| 349 | =============================== |
| 350 | |
| 351 | With CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting |
| 352 | a portion of an authentication policy (see `Use cases`_) in the |
| 353 | kernel. Specifically, it adds support for: |
| 354 | |
| 355 | 1. At fs-verity module initialization time, a keyring ".fs-verity" is |
| 356 | created. The root user can add trusted X.509 certificates to this |
| 357 | keyring using the add_key() system call, then (when done) |
| 358 | optionally use keyctl_restrict_keyring() to prevent additional |
| 359 | certificates from being added. |
| 360 | |
| 361 | 2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted |
| 362 | detached signature in DER format of the file measurement. On |
| 363 | success, this signature is persisted alongside the Merkle tree. |
| 364 | Then, any time the file is opened, the kernel will verify the |
| 365 | file's actual measurement against this signature, using the |
| 366 | certificates in the ".fs-verity" keyring. |
| 367 | |
| 368 | 3. A new sysctl "fs.verity.require_signatures" is made available. |
| 369 | When set to 1, the kernel requires that all verity files have a |
| 370 | correctly signed file measurement as described in (2). |
| 371 | |
| 372 | File measurements must be signed in the following format, which is |
| 373 | similar to the structure used by `FS_IOC_MEASURE_VERITY`_:: |
| 374 | |
| 375 | struct fsverity_signed_digest { |
| 376 | char magic[8]; /* must be "FSVerity" */ |
| 377 | __le16 digest_algorithm; |
| 378 | __le16 digest_size; |
| 379 | __u8 digest[]; |
| 380 | }; |
| 381 | |
| 382 | fs-verity's built-in signature verification support is meant as a |
| 383 | relatively simple mechanism that can be used to provide some level of |
| 384 | authenticity protection for verity files, as an alternative to doing |
| 385 | the signature verification in userspace or using IMA-appraisal. |
| 386 | However, with this mechanism, userspace programs still need to check |
| 387 | that the verity bit is set, and there is no protection against verity |
| 388 | files being swapped around. |
| 389 | |
| 390 | Filesystem support |
| 391 | ================== |
| 392 | |
| 393 | fs-verity is currently supported by the ext4 and f2fs filesystems. |
| 394 | The CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity |
| 395 | on either filesystem. |
| 396 | |
| 397 | ``include/linux/fsverity.h`` declares the interface between the |
| 398 | ``fs/verity/`` support layer and filesystems. Briefly, filesystems |
| 399 | must provide an ``fsverity_operations`` structure that provides |
| 400 | methods to read and write the verity metadata to a filesystem-specific |
| 401 | location, including the Merkle tree blocks and |
| 402 | ``fsverity_descriptor``. Filesystems must also call functions in |
| 403 | ``fs/verity/`` at certain times, such as when a file is opened or when |
| 404 | pages have been read into the pagecache. (See `Verifying data`_.) |
| 405 | |
| 406 | ext4 |
| 407 | ---- |
| 408 | |
Eric Biggers | c0d782a | 2019-10-30 15:19:15 -0700 | [diff] [blame] | 409 | ext4 supports fs-verity since Linux v5.4 and e2fsprogs v1.45.2. |
Eric Biggers | 6ff2deb | 2019-07-22 09:26:20 -0700 | [diff] [blame] | 410 | |
| 411 | To create verity files on an ext4 filesystem, the filesystem must have |
| 412 | been formatted with ``-O verity`` or had ``tune2fs -O verity`` run on |
| 413 | it. "verity" is an RO_COMPAT filesystem feature, so once set, old |
| 414 | kernels will only be able to mount the filesystem readonly, and old |
| 415 | versions of e2fsck will be unable to check the filesystem. Moreover, |
| 416 | currently ext4 only supports mounting a filesystem with the "verity" |
| 417 | feature when its block size is equal to PAGE_SIZE (often 4096 bytes). |
| 418 | |
| 419 | ext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It |
| 420 | can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. |
| 421 | |
| 422 | ext4 also supports encryption, which can be used simultaneously with |
| 423 | fs-verity. In this case, the plaintext data is verified rather than |
| 424 | the ciphertext. This is necessary in order to make the file |
| 425 | measurement meaningful, since every file is encrypted differently. |
| 426 | |
| 427 | ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) |
| 428 | past the end of the file, starting at the first 64K boundary beyond |
| 429 | i_size. This approach works because (a) verity files are readonly, |
| 430 | and (b) pages fully beyond i_size aren't visible to userspace but can |
| 431 | be read/written internally by ext4 with only some relatively small |
| 432 | changes to ext4. This approach avoids having to depend on the |
| 433 | EA_INODE feature and on rearchitecturing ext4's xattr support to |
| 434 | support paging multi-gigabyte xattrs into memory, and to support |
| 435 | encrypting xattrs. Note that the verity metadata *must* be encrypted |
| 436 | when the file is, since it contains hashes of the plaintext data. |
| 437 | |
| 438 | Currently, ext4 verity only supports the case where the Merkle tree |
| 439 | block size, filesystem block size, and page size are all the same. It |
| 440 | also only supports extent-based files. |
| 441 | |
| 442 | f2fs |
| 443 | ---- |
| 444 | |
Eric Biggers | c0d782a | 2019-10-30 15:19:15 -0700 | [diff] [blame] | 445 | f2fs supports fs-verity since Linux v5.4 and f2fs-tools v1.11.0. |
Eric Biggers | 6ff2deb | 2019-07-22 09:26:20 -0700 | [diff] [blame] | 446 | |
| 447 | To create verity files on an f2fs filesystem, the filesystem must have |
| 448 | been formatted with ``-O verity``. |
| 449 | |
| 450 | f2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. |
| 451 | It can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be |
| 452 | cleared. |
| 453 | |
| 454 | Like ext4, f2fs stores the verity metadata (Merkle tree and |
| 455 | fsverity_descriptor) past the end of the file, starting at the first |
| 456 | 64K boundary beyond i_size. See explanation for ext4 above. |
| 457 | Moreover, f2fs supports at most 4096 bytes of xattr entries per inode |
| 458 | which wouldn't be enough for even a single Merkle tree block. |
| 459 | |
| 460 | Currently, f2fs verity only supports a Merkle tree block size of 4096. |
| 461 | Also, f2fs doesn't support enabling verity on files that currently |
| 462 | have atomic or volatile writes pending. |
| 463 | |
| 464 | Implementation details |
| 465 | ====================== |
| 466 | |
| 467 | Verifying data |
| 468 | -------------- |
| 469 | |
| 470 | fs-verity ensures that all reads of a verity file's data are verified, |
| 471 | regardless of which syscall is used to do the read (e.g. mmap(), |
| 472 | read(), pread()) and regardless of whether it's the first read or a |
| 473 | later read (unless the later read can return cached data that was |
| 474 | already verified). Below, we describe how filesystems implement this. |
| 475 | |
| 476 | Pagecache |
| 477 | ~~~~~~~~~ |
| 478 | |
| 479 | For filesystems using Linux's pagecache, the ``->readpage()`` and |
| 480 | ``->readpages()`` methods must be modified to verify pages before they |
| 481 | are marked Uptodate. Merely hooking ``->read_iter()`` would be |
| 482 | insufficient, since ``->read_iter()`` is not used for memory maps. |
| 483 | |
| 484 | Therefore, fs/verity/ provides a function fsverity_verify_page() which |
| 485 | verifies a page that has been read into the pagecache of a verity |
| 486 | inode, but is still locked and not Uptodate, so it's not yet readable |
| 487 | by userspace. As needed to do the verification, |
| 488 | fsverity_verify_page() will call back into the filesystem to read |
| 489 | Merkle tree pages via fsverity_operations::read_merkle_tree_page(). |
| 490 | |
| 491 | fsverity_verify_page() returns false if verification failed; in this |
| 492 | case, the filesystem must not set the page Uptodate. Following this, |
| 493 | as per the usual Linux pagecache behavior, attempts by userspace to |
| 494 | read() from the part of the file containing the page will fail with |
| 495 | EIO, and accesses to the page within a memory map will raise SIGBUS. |
| 496 | |
| 497 | fsverity_verify_page() currently only supports the case where the |
| 498 | Merkle tree block size is equal to PAGE_SIZE (often 4096 bytes). |
| 499 | |
| 500 | In principle, fsverity_verify_page() verifies the entire path in the |
| 501 | Merkle tree from the data page to the root hash. However, for |
| 502 | efficiency the filesystem may cache the hash pages. Therefore, |
| 503 | fsverity_verify_page() only ascends the tree reading hash pages until |
| 504 | an already-verified hash page is seen, as indicated by the PageChecked |
| 505 | bit being set. It then verifies the path to that page. |
| 506 | |
| 507 | This optimization, which is also used by dm-verity, results in |
| 508 | excellent sequential read performance. This is because usually (e.g. |
| 509 | 127 in 128 times for 4K blocks and SHA-256) the hash page from the |
| 510 | bottom level of the tree will already be cached and checked from |
| 511 | reading a previous data page. However, random reads perform worse. |
| 512 | |
| 513 | Block device based filesystems |
| 514 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 515 | |
| 516 | Block device based filesystems (e.g. ext4 and f2fs) in Linux also use |
| 517 | the pagecache, so the above subsection applies too. However, they |
| 518 | also usually read many pages from a file at once, grouped into a |
| 519 | structure called a "bio". To make it easier for these types of |
| 520 | filesystems to support fs-verity, fs/verity/ also provides a function |
| 521 | fsverity_verify_bio() which verifies all pages in a bio. |
| 522 | |
| 523 | ext4 and f2fs also support encryption. If a verity file is also |
| 524 | encrypted, the pages must be decrypted before being verified. To |
| 525 | support this, these filesystems allocate a "post-read context" for |
| 526 | each bio and store it in ``->bi_private``:: |
| 527 | |
| 528 | struct bio_post_read_ctx { |
| 529 | struct bio *bio; |
| 530 | struct work_struct work; |
| 531 | unsigned int cur_step; |
| 532 | unsigned int enabled_steps; |
| 533 | }; |
| 534 | |
| 535 | ``enabled_steps`` is a bitmask that specifies whether decryption, |
| 536 | verity, or both is enabled. After the bio completes, for each needed |
| 537 | postprocessing step the filesystem enqueues the bio_post_read_ctx on a |
| 538 | workqueue, and then the workqueue work does the decryption or |
| 539 | verification. Finally, pages where no decryption or verity error |
| 540 | occurred are marked Uptodate, and the pages are unlocked. |
| 541 | |
| 542 | Files on ext4 and f2fs may contain holes. Normally, ``->readpages()`` |
| 543 | simply zeroes holes and sets the corresponding pages Uptodate; no bios |
| 544 | are issued. To prevent this case from bypassing fs-verity, these |
| 545 | filesystems use fsverity_verify_page() to verify hole pages. |
| 546 | |
| 547 | ext4 and f2fs disable direct I/O on verity files, since otherwise |
| 548 | direct I/O would bypass fs-verity. (They also do the same for |
| 549 | encrypted files.) |
| 550 | |
| 551 | Userspace utility |
| 552 | ================= |
| 553 | |
| 554 | This document focuses on the kernel, but a userspace utility for |
| 555 | fs-verity can be found at: |
| 556 | |
| 557 | https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git |
| 558 | |
| 559 | See the README.md file in the fsverity-utils source tree for details, |
| 560 | including examples of setting up fs-verity protected files. |
| 561 | |
| 562 | Tests |
| 563 | ===== |
| 564 | |
| 565 | To test fs-verity, use xfstests. For example, using `kvm-xfstests |
| 566 | <https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: |
| 567 | |
| 568 | kvm-xfstests -c ext4,f2fs -g verity |
| 569 | |
| 570 | FAQ |
| 571 | === |
| 572 | |
| 573 | This section answers frequently asked questions about fs-verity that |
| 574 | weren't already directly answered in other parts of this document. |
| 575 | |
| 576 | :Q: Why isn't fs-verity part of IMA? |
| 577 | :A: fs-verity and IMA (Integrity Measurement Architecture) have |
| 578 | different focuses. fs-verity is a filesystem-level mechanism for |
| 579 | hashing individual files using a Merkle tree. In contrast, IMA |
| 580 | specifies a system-wide policy that specifies which files are |
| 581 | hashed and what to do with those hashes, such as log them, |
| 582 | authenticate them, or add them to a measurement list. |
| 583 | |
| 584 | IMA is planned to support the fs-verity hashing mechanism as an |
| 585 | alternative to doing full file hashes, for people who want the |
| 586 | performance and security benefits of the Merkle tree based hash. |
| 587 | But it doesn't make sense to force all uses of fs-verity to be |
| 588 | through IMA. As a standalone filesystem feature, fs-verity |
| 589 | already meets many users' needs, and it's testable like other |
| 590 | filesystem features e.g. with xfstests. |
| 591 | |
| 592 | :Q: Isn't fs-verity useless because the attacker can just modify the |
| 593 | hashes in the Merkle tree, which is stored on-disk? |
| 594 | :A: To verify the authenticity of an fs-verity file you must verify |
| 595 | the authenticity of the "file measurement", which is basically the |
| 596 | root hash of the Merkle tree. See `Use cases`_. |
| 597 | |
| 598 | :Q: Isn't fs-verity useless because the attacker can just replace a |
| 599 | verity file with a non-verity one? |
| 600 | :A: See `Use cases`_. In the initial use case, it's really trusted |
| 601 | userspace code that authenticates the files; fs-verity is just a |
| 602 | tool to do this job efficiently and securely. The trusted |
| 603 | userspace code will consider non-verity files to be inauthentic. |
| 604 | |
| 605 | :Q: Why does the Merkle tree need to be stored on-disk? Couldn't you |
| 606 | store just the root hash? |
| 607 | :A: If the Merkle tree wasn't stored on-disk, then you'd have to |
| 608 | compute the entire tree when the file is first accessed, even if |
| 609 | just one byte is being read. This is a fundamental consequence of |
| 610 | how Merkle tree hashing works. To verify a leaf node, you need to |
| 611 | verify the whole path to the root hash, including the root node |
| 612 | (the thing which the root hash is a hash of). But if the root |
| 613 | node isn't stored on-disk, you have to compute it by hashing its |
| 614 | children, and so on until you've actually hashed the entire file. |
| 615 | |
| 616 | That defeats most of the point of doing a Merkle tree-based hash, |
| 617 | since if you have to hash the whole file ahead of time anyway, |
| 618 | then you could simply do sha256(file) instead. That would be much |
| 619 | simpler, and a bit faster too. |
| 620 | |
| 621 | It's true that an in-memory Merkle tree could still provide the |
| 622 | advantage of verification on every read rather than just on the |
| 623 | first read. However, it would be inefficient because every time a |
| 624 | hash page gets evicted (you can't pin the entire Merkle tree into |
| 625 | memory, since it may be very large), in order to restore it you |
| 626 | again need to hash everything below it in the tree. This again |
| 627 | defeats most of the point of doing a Merkle tree-based hash, since |
| 628 | a single block read could trigger re-hashing gigabytes of data. |
| 629 | |
| 630 | :Q: But couldn't you store just the leaf nodes and compute the rest? |
| 631 | :A: See previous answer; this really just moves up one level, since |
| 632 | one could alternatively interpret the data blocks as being the |
| 633 | leaf nodes of the Merkle tree. It's true that the tree can be |
| 634 | computed much faster if the leaf level is stored rather than just |
| 635 | the data, but that's only because each level is less than 1% the |
| 636 | size of the level below (assuming the recommended settings of |
| 637 | SHA-256 and 4K blocks). For the exact same reason, by storing |
| 638 | "just the leaf nodes" you'd already be storing over 99% of the |
| 639 | tree, so you might as well simply store the whole tree. |
| 640 | |
| 641 | :Q: Can the Merkle tree be built ahead of time, e.g. distributed as |
| 642 | part of a package that is installed to many computers? |
| 643 | :A: This isn't currently supported. It was part of the original |
| 644 | design, but was removed to simplify the kernel UAPI and because it |
| 645 | wasn't a critical use case. Files are usually installed once and |
| 646 | used many times, and cryptographic hashing is somewhat fast on |
| 647 | most modern processors. |
| 648 | |
| 649 | :Q: Why doesn't fs-verity support writes? |
| 650 | :A: Write support would be very difficult and would require a |
| 651 | completely different design, so it's well outside the scope of |
| 652 | fs-verity. Write support would require: |
| 653 | |
| 654 | - A way to maintain consistency between the data and hashes, |
| 655 | including all levels of hashes, since corruption after a crash |
| 656 | (especially of potentially the entire file!) is unacceptable. |
| 657 | The main options for solving this are data journalling, |
| 658 | copy-on-write, and log-structured volume. But it's very hard to |
| 659 | retrofit existing filesystems with new consistency mechanisms. |
| 660 | Data journalling is available on ext4, but is very slow. |
| 661 | |
Randy Dunlap | 59bc120 | 2020-07-03 14:43:20 -0700 | [diff] [blame] | 662 | - Rebuilding the Merkle tree after every write, which would be |
Eric Biggers | 6ff2deb | 2019-07-22 09:26:20 -0700 | [diff] [blame] | 663 | extremely inefficient. Alternatively, a different authenticated |
| 664 | dictionary structure such as an "authenticated skiplist" could |
| 665 | be used. However, this would be far more complex. |
| 666 | |
| 667 | Compare it to dm-verity vs. dm-integrity. dm-verity is very |
| 668 | simple: the kernel just verifies read-only data against a |
| 669 | read-only Merkle tree. In contrast, dm-integrity supports writes |
| 670 | but is slow, is much more complex, and doesn't actually support |
| 671 | full-device authentication since it authenticates each sector |
| 672 | independently, i.e. there is no "root hash". It doesn't really |
| 673 | make sense for the same device-mapper target to support these two |
| 674 | very different cases; the same applies to fs-verity. |
| 675 | |
| 676 | :Q: Since verity files are immutable, why isn't the immutable bit set? |
| 677 | :A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a |
| 678 | specific set of semantics which not only make the file contents |
| 679 | read-only, but also prevent the file from being deleted, renamed, |
| 680 | linked to, or having its owner or mode changed. These extra |
| 681 | properties are unwanted for fs-verity, so reusing the immutable |
| 682 | bit isn't appropriate. |
| 683 | |
| 684 | :Q: Why does the API use ioctls instead of setxattr() and getxattr()? |
| 685 | :A: Abusing the xattr interface for basically arbitrary syscalls is |
| 686 | heavily frowned upon by most of the Linux filesystem developers. |
| 687 | An xattr should really just be an xattr on-disk, not an API to |
| 688 | e.g. magically trigger construction of a Merkle tree. |
| 689 | |
| 690 | :Q: Does fs-verity support remote filesystems? |
| 691 | :A: Only ext4 and f2fs support is implemented currently, but in |
| 692 | principle any filesystem that can store per-file verity metadata |
| 693 | can support fs-verity, regardless of whether it's local or remote. |
| 694 | Some filesystems may have fewer options of where to store the |
| 695 | verity metadata; one possibility is to store it past the end of |
| 696 | the file and "hide" it from userspace by manipulating i_size. The |
| 697 | data verification functions provided by ``fs/verity/`` also assume |
| 698 | that the filesystem uses the Linux pagecache, but both local and |
| 699 | remote filesystems normally do so. |
| 700 | |
| 701 | :Q: Why is anything filesystem-specific at all? Shouldn't fs-verity |
| 702 | be implemented entirely at the VFS level? |
| 703 | :A: There are many reasons why this is not possible or would be very |
| 704 | difficult, including the following: |
| 705 | |
| 706 | - To prevent bypassing verification, pages must not be marked |
| 707 | Uptodate until they've been verified. Currently, each |
| 708 | filesystem is responsible for marking pages Uptodate via |
| 709 | ``->readpages()``. Therefore, currently it's not possible for |
| 710 | the VFS to do the verification on its own. Changing this would |
| 711 | require significant changes to the VFS and all filesystems. |
| 712 | |
| 713 | - It would require defining a filesystem-independent way to store |
| 714 | the verity metadata. Extended attributes don't work for this |
| 715 | because (a) the Merkle tree may be gigabytes, but many |
| 716 | filesystems assume that all xattrs fit into a single 4K |
| 717 | filesystem block, and (b) ext4 and f2fs encryption doesn't |
| 718 | encrypt xattrs, yet the Merkle tree *must* be encrypted when the |
| 719 | file contents are, because it stores hashes of the plaintext |
| 720 | file contents. |
| 721 | |
| 722 | So the verity metadata would have to be stored in an actual |
| 723 | file. Using a separate file would be very ugly, since the |
| 724 | metadata is fundamentally part of the file to be protected, and |
| 725 | it could cause problems where users could delete the real file |
| 726 | but not the metadata file or vice versa. On the other hand, |
| 727 | having it be in the same file would break applications unless |
| 728 | filesystems' notion of i_size were divorced from the VFS's, |
| 729 | which would be complex and require changes to all filesystems. |
| 730 | |
| 731 | - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's |
| 732 | transaction mechanism so that either the file ends up with |
| 733 | verity enabled, or no changes were made. Allowing intermediate |
| 734 | states to occur after a crash may cause problems. |