Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ============================ |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 4 | XFS Self Describing Metadata |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 5 | ============================ |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 6 | |
| 7 | Introduction |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 8 | ============ |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 9 | |
| 10 | The largest scalability problem facing XFS is not one of algorithmic |
| 11 | scalability, but of verification of the filesystem structure. Scalabilty of the |
| 12 | structures and indexes on disk and the algorithms for iterating them are |
| 13 | adequate for supporting PB scale filesystems with billions of inodes, however it |
| 14 | is this very scalability that causes the verification problem. |
| 15 | |
| 16 | Almost all metadata on XFS is dynamically allocated. The only fixed location |
| 17 | metadata is the allocation group headers (SB, AGF, AGFL and AGI), while all |
| 18 | other metadata structures need to be discovered by walking the filesystem |
| 19 | structure in different ways. While this is already done by userspace tools for |
| 20 | validating and repairing the structure, there are limits to what they can |
| 21 | verify, and this in turn limits the supportable size of an XFS filesystem. |
| 22 | |
| 23 | For example, it is entirely possible to manually use xfs_db and a bit of |
| 24 | scripting to analyse the structure of a 100TB filesystem when trying to |
| 25 | determine the root cause of a corruption problem, but it is still mainly a |
| 26 | manual task of verifying that things like single bit errors or misplaced writes |
| 27 | weren't the ultimate cause of a corruption event. It may take a few hours to a |
| 28 | few days to perform such forensic analysis, so for at this scale root cause |
| 29 | analysis is entirely possible. |
| 30 | |
| 31 | However, if we scale the filesystem up to 1PB, we now have 10x as much metadata |
| 32 | to analyse and so that analysis blows out towards weeks/months of forensic work. |
| 33 | Most of the analysis work is slow and tedious, so as the amount of analysis goes |
| 34 | up, the more likely that the cause will be lost in the noise. Hence the primary |
| 35 | concern for supporting PB scale filesystems is minimising the time and effort |
| 36 | required for basic forensic analysis of the filesystem structure. |
| 37 | |
| 38 | |
| 39 | Self Describing Metadata |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 40 | ======================== |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 41 | |
| 42 | One of the problems with the current metadata format is that apart from the |
| 43 | magic number in the metadata block, we have no other way of identifying what it |
| 44 | is supposed to be. We can't even identify if it is the right place. Put simply, |
| 45 | you can't look at a single metadata block in isolation and say "yes, it is |
| 46 | supposed to be there and the contents are valid". |
| 47 | |
| 48 | Hence most of the time spent on forensic analysis is spent doing basic |
| 49 | verification of metadata values, looking for values that are in range (and hence |
| 50 | not detected by automated verification checks) but are not correct. Finding and |
| 51 | understanding how things like cross linked block lists (e.g. sibling |
| 52 | pointers in a btree end up with loops in them) are the key to understanding what |
| 53 | went wrong, but it is impossible to tell what order the blocks were linked into |
| 54 | each other or written to disk after the fact. |
| 55 | |
| 56 | Hence we need to record more information into the metadata to allow us to |
| 57 | quickly determine if the metadata is intact and can be ignored for the purpose |
| 58 | of analysis. We can't protect against every possible type of error, but we can |
| 59 | ensure that common types of errors are easily detectable. Hence the concept of |
| 60 | self describing metadata. |
| 61 | |
| 62 | The first, fundamental requirement of self describing metadata is that the |
| 63 | metadata object contains some form of unique identifier in a well known |
| 64 | location. This allows us to identify the expected contents of the block and |
| 65 | hence parse and verify the metadata object. IF we can't independently identify |
| 66 | the type of metadata in the object, then the metadata doesn't describe itself |
| 67 | very well at all! |
| 68 | |
| 69 | Luckily, almost all XFS metadata has magic numbers embedded already - only the |
| 70 | AGFL, remote symlinks and remote attribute blocks do not contain identifying |
| 71 | magic numbers. Hence we can change the on-disk format of all these objects to |
| 72 | add more identifying information and detect this simply by changing the magic |
| 73 | numbers in the metadata objects. That is, if it has the current magic number, |
| 74 | the metadata isn't self identifying. If it contains a new magic number, it is |
| 75 | self identifying and we can do much more expansive automated verification of the |
| 76 | metadata object at runtime, during forensic analysis or repair. |
| 77 | |
| 78 | As a primary concern, self describing metadata needs some form of overall |
| 79 | integrity checking. We cannot trust the metadata if we cannot verify that it has |
| 80 | not been changed as a result of external influences. Hence we need some form of |
| 81 | integrity check, and this is done by adding CRC32c validation to the metadata |
| 82 | block. If we can verify the block contains the metadata it was intended to |
| 83 | contain, a large amount of the manual verification work can be skipped. |
| 84 | |
| 85 | CRC32c was selected as metadata cannot be more than 64k in length in XFS and |
| 86 | hence a 32 bit CRC is more than sufficient to detect multi-bit errors in |
| 87 | metadata blocks. CRC32c is also now hardware accelerated on common CPUs so it is |
| 88 | fast. So while CRC32c is not the strongest of possible integrity checks that |
| 89 | could be used, it is more than sufficient for our needs and has relatively |
| 90 | little overhead. Adding support for larger integrity fields and/or algorithms |
| 91 | does really provide any extra value over CRC32c, but it does add a lot of |
| 92 | complexity and so there is no provision for changing the integrity checking |
| 93 | mechanism. |
| 94 | |
| 95 | Self describing metadata needs to contain enough information so that the |
| 96 | metadata block can be verified as being in the correct place without needing to |
| 97 | look at any other metadata. This means it needs to contain location information. |
| 98 | Just adding a block number to the metadata is not sufficient to protect against |
| 99 | mis-directed writes - a write might be misdirected to the wrong LUN and so be |
| 100 | written to the "correct block" of the wrong filesystem. Hence location |
| 101 | information must contain a filesystem identifier as well as a block number. |
| 102 | |
| 103 | Another key information point in forensic analysis is knowing who the metadata |
| 104 | block belongs to. We already know the type, the location, that it is valid |
| 105 | and/or corrupted, and how long ago that it was last modified. Knowing the owner |
| 106 | of the block is important as it allows us to find other related metadata to |
| 107 | determine the scope of the corruption. For example, if we have a extent btree |
| 108 | object, we don't know what inode it belongs to and hence have to walk the entire |
| 109 | filesystem to find the owner of the block. Worse, the corruption could mean that |
| 110 | no owner can be found (i.e. it's an orphan block), and so without an owner field |
| 111 | in the metadata we have no idea of the scope of the corruption. If we have an |
| 112 | owner field in the metadata object, we can immediately do top down validation to |
| 113 | determine the scope of the problem. |
| 114 | |
| 115 | Different types of metadata have different owner identifiers. For example, |
Will Deacon | 806654a | 2018-11-19 11:02:45 +0000 | [diff] [blame] | 116 | directory, attribute and extent tree blocks are all owned by an inode, while |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 117 | freespace btree blocks are owned by an allocation group. Hence the size and |
| 118 | contents of the owner field are determined by the type of metadata object we are |
| 119 | looking at. The owner information can also identify misplaced writes (e.g. |
| 120 | freespace btree block written to the wrong AG). |
| 121 | |
| 122 | Self describing metadata also needs to contain some indication of when it was |
| 123 | written to the filesystem. One of the key information points when doing forensic |
| 124 | analysis is how recently the block was modified. Correlation of set of corrupted |
| 125 | metadata blocks based on modification times is important as it can indicate |
| 126 | whether the corruptions are related, whether there's been multiple corruption |
| 127 | events that lead to the eventual failure, and even whether there are corruptions |
| 128 | present that the run-time verification is not detecting. |
| 129 | |
| 130 | For example, we can determine whether a metadata object is supposed to be free |
| 131 | space or still allocated if it is still referenced by its owner by looking at |
| 132 | when the free space btree block that contains the block was last written |
| 133 | compared to when the metadata object itself was last written. If the free space |
| 134 | block is more recent than the object and the object's owner, then there is a |
| 135 | very good chance that the block should have been removed from the owner. |
| 136 | |
| 137 | To provide this "written timestamp", each metadata block gets the Log Sequence |
| 138 | Number (LSN) of the most recent transaction it was modified on written into it. |
| 139 | This number will always increase over the life of the filesystem, and the only |
| 140 | thing that resets it is running xfs_repair on the filesystem. Further, by use of |
| 141 | the LSN we can tell if the corrupted metadata all belonged to the same log |
| 142 | checkpoint and hence have some idea of how much modification occurred between |
| 143 | the first and last instance of corrupt metadata on disk and, further, how much |
| 144 | modification occurred between the corruption being written and when it was |
| 145 | detected. |
| 146 | |
| 147 | Runtime Validation |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 148 | ================== |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 149 | |
| 150 | Validation of self-describing metadata takes place at runtime in two places: |
| 151 | |
| 152 | - immediately after a successful read from disk |
| 153 | - immediately prior to write IO submission |
| 154 | |
| 155 | The verification is completely stateless - it is done independently of the |
| 156 | modification process, and seeks only to check that the metadata is what it says |
| 157 | it is and that the metadata fields are within bounds and internally consistent. |
| 158 | As such, we cannot catch all types of corruption that can occur within a block |
| 159 | as there may be certain limitations that operational state enforces of the |
| 160 | metadata, or there may be corruption of interblock relationships (e.g. corrupted |
| 161 | sibling pointer lists). Hence we still need stateful checking in the main code |
| 162 | body, but in general most of the per-field validation is handled by the |
| 163 | verifiers. |
| 164 | |
| 165 | For read verification, the caller needs to specify the expected type of metadata |
| 166 | that it should see, and the IO completion process verifies that the metadata |
| 167 | object matches what was expected. If the verification process fails, then it |
| 168 | marks the object being read as EFSCORRUPTED. The caller needs to catch this |
| 169 | error (same as for IO errors), and if it needs to take special action due to a |
| 170 | verification error it can do so by catching the EFSCORRUPTED error value. If we |
| 171 | need more discrimination of error type at higher levels, we can define new |
| 172 | error numbers for different errors as necessary. |
| 173 | |
| 174 | The first step in read verification is checking the magic number and determining |
| 175 | whether CRC validating is necessary. If it is, the CRC32c is calculated and |
| 176 | compared against the value stored in the object itself. Once this is validated, |
| 177 | further checks are made against the location information, followed by extensive |
| 178 | object specific metadata validation. If any of these checks fail, then the |
| 179 | buffer is considered corrupt and the EFSCORRUPTED error is set appropriately. |
| 180 | |
| 181 | Write verification is the opposite of the read verification - first the object |
| 182 | is extensively verified and if it is OK we then update the LSN from the last |
| 183 | modification made to the object, After this, we calculate the CRC and insert it |
| 184 | into the object. Once this is done the write IO is allowed to continue. If any |
| 185 | error occurs during this process, the buffer is again marked with a EFSCORRUPTED |
| 186 | error for the higher layers to catch. |
| 187 | |
| 188 | Structures |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 189 | ========== |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 190 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 191 | A typical on-disk structure needs to contain the following information:: |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 192 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 193 | struct xfs_ondisk_hdr { |
| 194 | __be32 magic; /* magic number */ |
| 195 | __be32 crc; /* CRC, not logged */ |
| 196 | uuid_t uuid; /* filesystem identifier */ |
| 197 | __be64 owner; /* parent object */ |
| 198 | __be64 blkno; /* location on disk */ |
| 199 | __be64 lsn; /* last modification in log, not logged */ |
| 200 | }; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 201 | |
| 202 | Depending on the metadata, this information may be part of a header structure |
| 203 | separate to the metadata contents, or may be distributed through an existing |
| 204 | structure. The latter occurs with metadata that already contains some of this |
| 205 | information, such as the superblock and AG headers. |
| 206 | |
| 207 | Other metadata may have different formats for the information, but the same |
| 208 | level of information is generally provided. For example: |
| 209 | |
| 210 | - short btree blocks have a 32 bit owner (ag number) and a 32 bit block |
| 211 | number for location. The two of these combined provide the same |
| 212 | information as @owner and @blkno in eh above structure, but using 8 |
| 213 | bytes less space on disk. |
| 214 | |
| 215 | - directory/attribute node blocks have a 16 bit magic number, and the |
| 216 | header that contains the magic number has other information in it as |
| 217 | well. hence the additional metadata headers change the overall format |
| 218 | of the metadata. |
| 219 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 220 | A typical buffer read verifier is structured as follows:: |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 221 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 222 | #define XFS_FOO_CRC_OFF offsetof(struct xfs_ondisk_hdr, crc) |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 223 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 224 | static void |
| 225 | xfs_foo_read_verify( |
| 226 | struct xfs_buf *bp) |
| 227 | { |
| 228 | struct xfs_mount *mp = bp->b_mount; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 229 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 230 | if ((xfs_sb_version_hascrc(&mp->m_sb) && |
| 231 | !xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length), |
| 232 | XFS_FOO_CRC_OFF)) || |
| 233 | !xfs_foo_verify(bp)) { |
| 234 | XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr); |
| 235 | xfs_buf_ioerror(bp, EFSCORRUPTED); |
| 236 | } |
| 237 | } |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 238 | |
| 239 | The code ensures that the CRC is only checked if the filesystem has CRCs enabled |
| 240 | by checking the superblock of the feature bit, and then if the CRC verifies OK |
| 241 | (or is not needed) it verifies the actual contents of the block. |
| 242 | |
| 243 | The verifier function will take a couple of different forms, depending on |
| 244 | whether the magic number can be used to determine the format of the block. In |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 245 | the case it can't, the code is structured as follows:: |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 246 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 247 | static bool |
| 248 | xfs_foo_verify( |
| 249 | struct xfs_buf *bp) |
| 250 | { |
| 251 | struct xfs_mount *mp = bp->b_mount; |
| 252 | struct xfs_ondisk_hdr *hdr = bp->b_addr; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 253 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 254 | if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC)) |
| 255 | return false; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 256 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 257 | if (!xfs_sb_version_hascrc(&mp->m_sb)) { |
| 258 | if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid)) |
| 259 | return false; |
| 260 | if (bp->b_bn != be64_to_cpu(hdr->blkno)) |
| 261 | return false; |
| 262 | if (hdr->owner == 0) |
| 263 | return false; |
| 264 | } |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 265 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 266 | /* object specific verification checks here */ |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 267 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 268 | return true; |
| 269 | } |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 270 | |
| 271 | If there are different magic numbers for the different formats, the verifier |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 272 | will look like:: |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 273 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 274 | static bool |
| 275 | xfs_foo_verify( |
| 276 | struct xfs_buf *bp) |
| 277 | { |
| 278 | struct xfs_mount *mp = bp->b_mount; |
| 279 | struct xfs_ondisk_hdr *hdr = bp->b_addr; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 280 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 281 | if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) { |
| 282 | if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid)) |
| 283 | return false; |
| 284 | if (bp->b_bn != be64_to_cpu(hdr->blkno)) |
| 285 | return false; |
| 286 | if (hdr->owner == 0) |
| 287 | return false; |
| 288 | } else if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC)) |
| 289 | return false; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 290 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 291 | /* object specific verification checks here */ |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 292 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 293 | return true; |
| 294 | } |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 295 | |
| 296 | Write verifiers are very similar to the read verifiers, they just do things in |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 297 | the opposite order to the read verifiers. A typical write verifier:: |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 298 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 299 | static void |
| 300 | xfs_foo_write_verify( |
| 301 | struct xfs_buf *bp) |
| 302 | { |
| 303 | struct xfs_mount *mp = bp->b_mount; |
| 304 | struct xfs_buf_log_item *bip = bp->b_fspriv; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 305 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 306 | if (!xfs_foo_verify(bp)) { |
| 307 | XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr); |
| 308 | xfs_buf_ioerror(bp, EFSCORRUPTED); |
| 309 | return; |
| 310 | } |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 311 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 312 | if (!xfs_sb_version_hascrc(&mp->m_sb)) |
| 313 | return; |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 314 | |
| 315 | |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 316 | if (bip) { |
| 317 | struct xfs_ondisk_hdr *hdr = bp->b_addr; |
| 318 | hdr->lsn = cpu_to_be64(bip->bli_item.li_lsn); |
| 319 | } |
| 320 | xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length), XFS_FOO_CRC_OFF); |
| 321 | } |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 322 | |
| 323 | This will verify the internal structure of the metadata before we go any |
| 324 | further, detecting corruptions that have occurred as the metadata has been |
| 325 | modified in memory. If the metadata verifies OK, and CRCs are enabled, we then |
| 326 | update the LSN field (when it was last modified) and calculate the CRC on the |
| 327 | metadata. Once this is done, we can issue the IO. |
| 328 | |
| 329 | Inodes and Dquots |
Mauro Carvalho Chehab | fc2f6fe | 2020-04-27 23:17:20 +0200 | [diff] [blame] | 330 | ================= |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 331 | |
| 332 | Inodes and dquots are special snowflakes. They have per-object CRC and |
| 333 | self-identifiers, but they are packed so that there are multiple objects per |
| 334 | buffer. Hence we do not use per-buffer verifiers to do the work of per-object |
| 335 | verification and CRC calculations. The per-buffer verifiers simply perform basic |
| 336 | identification of the buffer - that they contain inodes or dquots, and that |
| 337 | there are magic numbers in all the expected spots. All further CRC and |
| 338 | verification checks are done when each inode is read from or written back to the |
| 339 | buffer. |
| 340 | |
| 341 | The structure of the verifiers and the identifiers checks is very similar to the |
| 342 | buffer code described above. The only difference is where they are called. For |
Christoph Hellwig | 2d6051d | 2020-05-14 14:01:18 -0700 | [diff] [blame] | 343 | example, inode read verification is done in xfs_inode_from_disk() when the inode |
| 344 | is first read out of the buffer and the struct xfs_inode is instantiated. The |
| 345 | inode is already extensively verified during writeback in xfs_iflush_int, so the |
| 346 | only addition here is to add the LSN and CRC to the inode as it is copied back |
| 347 | into the buffer. |
Dave Chinner | dccc3f447 | 2013-04-09 16:49:58 +1000 | [diff] [blame] | 348 | |
| 349 | XXX: inode unlinked list modification doesn't recalculate the inode CRC! None of |
| 350 | the unlinked list modifications check or update CRCs, neither during unlink nor |
| 351 | log recovery. So, it's gone unnoticed until now. This won't matter immediately - |
| 352 | repair will probably complain about it - but it needs to be fixed. |