Mauro Carvalho Chehab | 25b532c | 2019-07-26 09:51:28 -0300 | [diff] [blame] | 1 | ==================== |
| 2 | Changes since 2.5.0: |
| 3 | ==================== |
| 4 | |
| 5 | --- |
| 6 | |
| 7 | **recommended** |
| 8 | |
| 9 | New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(), |
| 10 | sb_set_blocksize() and sb_min_blocksize(). |
| 11 | |
| 12 | Use them. |
| 13 | |
| 14 | (sb_find_get_block() replaces 2.4's get_hash_table()) |
| 15 | |
| 16 | --- |
| 17 | |
| 18 | **recommended** |
| 19 | |
| 20 | New methods: ->alloc_inode() and ->destroy_inode(). |
| 21 | |
| 22 | Remove inode->u.foo_inode_i |
| 23 | |
| 24 | Declare:: |
| 25 | |
| 26 | struct foo_inode_info { |
| 27 | /* fs-private stuff */ |
| 28 | struct inode vfs_inode; |
| 29 | }; |
| 30 | static inline struct foo_inode_info *FOO_I(struct inode *inode) |
| 31 | { |
| 32 | return list_entry(inode, struct foo_inode_info, vfs_inode); |
| 33 | } |
| 34 | |
| 35 | Use FOO_I(inode) instead of &inode->u.foo_inode_i; |
| 36 | |
| 37 | Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate |
| 38 | foo_inode_info and return the address of ->vfs_inode, the latter should free |
| 39 | FOO_I(inode) (see in-tree filesystems for examples). |
| 40 | |
| 41 | Make them ->alloc_inode and ->destroy_inode in your super_operations. |
| 42 | |
| 43 | Keep in mind that now you need explicit initialization of private data |
| 44 | typically between calling iget_locked() and unlocking the inode. |
| 45 | |
| 46 | At some point that will become mandatory. |
| 47 | |
| 48 | --- |
| 49 | |
| 50 | **mandatory** |
| 51 | |
| 52 | Change of file_system_type method (->read_super to ->get_sb) |
| 53 | |
| 54 | ->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV. |
| 55 | |
| 56 | Turn your foo_read_super() into a function that would return 0 in case of |
| 57 | success and negative number in case of error (-EINVAL unless you have more |
| 58 | informative error value to report). Call it foo_fill_super(). Now declare:: |
| 59 | |
| 60 | int foo_get_sb(struct file_system_type *fs_type, |
| 61 | int flags, const char *dev_name, void *data, struct vfsmount *mnt) |
| 62 | { |
| 63 | return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, |
| 64 | mnt); |
| 65 | } |
| 66 | |
| 67 | (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of |
| 68 | filesystem). |
| 69 | |
| 70 | Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as |
| 71 | foo_get_sb. |
| 72 | |
| 73 | --- |
| 74 | |
| 75 | **mandatory** |
| 76 | |
| 77 | Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames. |
| 78 | Most likely there is no need to change anything, but if you relied on |
| 79 | global exclusion between renames for some internal purpose - you need to |
| 80 | change your internal locking. Otherwise exclusion warranties remain the |
| 81 | same (i.e. parents and victim are locked, etc.). |
| 82 | |
| 83 | --- |
| 84 | |
| 85 | **informational** |
| 86 | |
| 87 | Now we have the exclusion between ->lookup() and directory removal (by |
| 88 | ->rmdir() and ->rename()). If you used to need that exclusion and do |
| 89 | it by internal locking (most of filesystems couldn't care less) - you |
| 90 | can relax your locking. |
| 91 | |
| 92 | --- |
| 93 | |
| 94 | **mandatory** |
| 95 | |
| 96 | ->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(), |
| 97 | ->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() |
| 98 | and ->readdir() are called without BKL now. Grab it on entry, drop upon return |
| 99 | - that will guarantee the same locking you used to have. If your method or its |
| 100 | parts do not need BKL - better yet, now you can shift lock_kernel() and |
| 101 | unlock_kernel() so that they would protect exactly what needs to be |
| 102 | protected. |
| 103 | |
| 104 | --- |
| 105 | |
| 106 | **mandatory** |
| 107 | |
| 108 | BKL is also moved from around sb operations. BKL should have been shifted into |
| 109 | individual fs sb_op functions. If you don't need it, remove it. |
| 110 | |
| 111 | --- |
| 112 | |
| 113 | **informational** |
| 114 | |
| 115 | check for ->link() target not being a directory is done by callers. Feel |
| 116 | free to drop it... |
| 117 | |
| 118 | --- |
| 119 | |
| 120 | **informational** |
| 121 | |
| 122 | ->link() callers hold ->i_mutex on the object we are linking to. Some of your |
| 123 | problems might be over... |
| 124 | |
| 125 | --- |
| 126 | |
| 127 | **mandatory** |
| 128 | |
| 129 | new file_system_type method - kill_sb(superblock). If you are converting |
| 130 | an existing filesystem, set it according to ->fs_flags:: |
| 131 | |
| 132 | FS_REQUIRES_DEV - kill_block_super |
| 133 | FS_LITTER - kill_litter_super |
| 134 | neither - kill_anon_super |
| 135 | |
| 136 | FS_LITTER is gone - just remove it from fs_flags. |
| 137 | |
| 138 | --- |
| 139 | |
| 140 | **mandatory** |
| 141 | |
| 142 | FS_SINGLE is gone (actually, that had happened back when ->get_sb() |
| 143 | went in - and hadn't been documented ;-/). Just remove it from fs_flags |
| 144 | (and see ->get_sb() entry for other actions). |
| 145 | |
| 146 | --- |
| 147 | |
| 148 | **mandatory** |
| 149 | |
| 150 | ->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so |
| 151 | watch for ->i_mutex-grabbing code that might be used by your ->setattr(). |
| 152 | Callers of notify_change() need ->i_mutex now. |
| 153 | |
| 154 | --- |
| 155 | |
| 156 | **recommended** |
| 157 | |
| 158 | New super_block field ``struct export_operations *s_export_op`` for |
| 159 | explicit support for exporting, e.g. via NFS. The structure is fully |
| 160 | documented at its declaration in include/linux/fs.h, and in |
Mauro Carvalho Chehab | 9195c3e8 | 2019-07-31 17:27:56 -0300 | [diff] [blame] | 161 | Documentation/filesystems/nfs/exporting.rst. |
Mauro Carvalho Chehab | 25b532c | 2019-07-26 09:51:28 -0300 | [diff] [blame] | 162 | |
| 163 | Briefly it allows for the definition of decode_fh and encode_fh operations |
| 164 | to encode and decode filehandles, and allows the filesystem to use |
| 165 | a standard helper function for decode_fh, and provide file-system specific |
| 166 | support for this helper, particularly get_parent. |
| 167 | |
| 168 | It is planned that this will be required for exporting once the code |
| 169 | settles down a bit. |
| 170 | |
| 171 | **mandatory** |
| 172 | |
| 173 | s_export_op is now required for exporting a filesystem. |
| 174 | isofs, ext2, ext3, resierfs, fat |
| 175 | can be used as examples of very different filesystems. |
| 176 | |
| 177 | --- |
| 178 | |
| 179 | **mandatory** |
| 180 | |
| 181 | iget4() and the read_inode2 callback have been superseded by iget5_locked() |
| 182 | which has the following prototype:: |
| 183 | |
| 184 | struct inode *iget5_locked(struct super_block *sb, unsigned long ino, |
| 185 | int (*test)(struct inode *, void *), |
| 186 | int (*set)(struct inode *, void *), |
| 187 | void *data); |
| 188 | |
| 189 | 'test' is an additional function that can be used when the inode |
| 190 | number is not sufficient to identify the actual file object. 'set' |
| 191 | should be a non-blocking function that initializes those parts of a |
| 192 | newly created inode to allow the test function to succeed. 'data' is |
| 193 | passed as an opaque value to both test and set functions. |
| 194 | |
| 195 | When the inode has been created by iget5_locked(), it will be returned with the |
| 196 | I_NEW flag set and will still be locked. The filesystem then needs to finalize |
| 197 | the initialization. Once the inode is initialized it must be unlocked by |
| 198 | calling unlock_new_inode(). |
| 199 | |
| 200 | The filesystem is responsible for setting (and possibly testing) i_ino |
| 201 | when appropriate. There is also a simpler iget_locked function that |
| 202 | just takes the superblock and inode number as arguments and does the |
| 203 | test and set for you. |
| 204 | |
| 205 | e.g.:: |
| 206 | |
| 207 | inode = iget_locked(sb, ino); |
| 208 | if (inode->i_state & I_NEW) { |
| 209 | err = read_inode_from_disk(inode); |
| 210 | if (err < 0) { |
| 211 | iget_failed(inode); |
| 212 | return err; |
| 213 | } |
| 214 | unlock_new_inode(inode); |
| 215 | } |
| 216 | |
| 217 | Note that if the process of setting up a new inode fails, then iget_failed() |
| 218 | should be called on the inode to render it dead, and an appropriate error |
| 219 | should be passed back to the caller. |
| 220 | |
| 221 | --- |
| 222 | |
| 223 | **recommended** |
| 224 | |
| 225 | ->getattr() finally getting used. See instances in nfs, minix, etc. |
| 226 | |
| 227 | --- |
| 228 | |
| 229 | **mandatory** |
| 230 | |
| 231 | ->revalidate() is gone. If your filesystem had it - provide ->getattr() |
| 232 | and let it call whatever you had as ->revlidate() + (for symlinks that |
| 233 | had ->revalidate()) add calls in ->follow_link()/->readlink(). |
| 234 | |
| 235 | --- |
| 236 | |
| 237 | **mandatory** |
| 238 | |
| 239 | ->d_parent changes are not protected by BKL anymore. Read access is safe |
| 240 | if at least one of the following is true: |
| 241 | |
| 242 | * filesystem has no cross-directory rename() |
| 243 | * we know that parent had been locked (e.g. we are looking at |
| 244 | ->d_parent of ->lookup() argument). |
| 245 | * we are called from ->rename(). |
| 246 | * the child's ->d_lock is held |
| 247 | |
| 248 | Audit your code and add locking if needed. Notice that any place that is |
| 249 | not protected by the conditions above is risky even in the old tree - you |
| 250 | had been relying on BKL and that's prone to screwups. Old tree had quite |
| 251 | a few holes of that kind - unprotected access to ->d_parent leading to |
| 252 | anything from oops to silent memory corruption. |
| 253 | |
| 254 | --- |
| 255 | |
| 256 | **mandatory** |
| 257 | |
| 258 | FS_NOMOUNT is gone. If you use it - just set SB_NOUSER in flags |
| 259 | (see rootfs for one kind of solution and bdev/socket/pipe for another). |
| 260 | |
| 261 | --- |
| 262 | |
| 263 | **recommended** |
| 264 | |
| 265 | Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter |
| 266 | is still alive, but only because of the mess in drivers/s390/block/dasd.c. |
| 267 | As soon as it gets fixed is_read_only() will die. |
| 268 | |
| 269 | --- |
| 270 | |
| 271 | **mandatory** |
| 272 | |
| 273 | ->permission() is called without BKL now. Grab it on entry, drop upon |
| 274 | return - that will guarantee the same locking you used to have. If |
| 275 | your method or its parts do not need BKL - better yet, now you can |
| 276 | shift lock_kernel() and unlock_kernel() so that they would protect |
| 277 | exactly what needs to be protected. |
| 278 | |
| 279 | --- |
| 280 | |
| 281 | **mandatory** |
| 282 | |
| 283 | ->statfs() is now called without BKL held. BKL should have been |
| 284 | shifted into individual fs sb_op functions where it's not clear that |
| 285 | it's safe to remove it. If you don't need it, remove it. |
| 286 | |
| 287 | --- |
| 288 | |
| 289 | **mandatory** |
| 290 | |
| 291 | is_read_only() is gone; use bdev_read_only() instead. |
| 292 | |
| 293 | --- |
| 294 | |
| 295 | **mandatory** |
| 296 | |
| 297 | destroy_buffers() is gone; use invalidate_bdev(). |
| 298 | |
| 299 | --- |
| 300 | |
| 301 | **mandatory** |
| 302 | |
| 303 | fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is |
| 304 | deliberate; as soon as struct block_device * is propagated in a reasonable |
| 305 | way by that code fixing will become trivial; until then nothing can be |
| 306 | done. |
| 307 | |
| 308 | **mandatory** |
| 309 | |
| 310 | block truncatation on error exit from ->write_begin, and ->direct_IO |
| 311 | moved from generic methods (block_write_begin, cont_write_begin, |
| 312 | nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at |
| 313 | ext2_write_failed and callers for an example. |
| 314 | |
| 315 | **mandatory** |
| 316 | |
| 317 | ->truncate is gone. The whole truncate sequence needs to be |
| 318 | implemented in ->setattr, which is now mandatory for filesystems |
| 319 | implementing on-disk size changes. Start with a copy of the old inode_setattr |
| 320 | and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to |
| 321 | be in order of zeroing blocks using block_truncate_page or similar helpers, |
| 322 | size update and on finally on-disk truncation which should not fail. |
| 323 | setattr_prepare (which used to be inode_change_ok) now includes the size checks |
| 324 | for ATTR_SIZE and must be called in the beginning of ->setattr unconditionally. |
| 325 | |
| 326 | **mandatory** |
| 327 | |
| 328 | ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should |
| 329 | be used instead. It gets called whenever the inode is evicted, whether it has |
| 330 | remaining links or not. Caller does *not* evict the pagecache or inode-associated |
| 331 | metadata buffers; the method has to use truncate_inode_pages_final() to get rid |
| 332 | of those. Caller makes sure async writeback cannot be running for the inode while |
| 333 | (or after) ->evict_inode() is called. |
| 334 | |
| 335 | ->drop_inode() returns int now; it's called on final iput() with |
| 336 | inode->i_lock held and it returns true if filesystems wants the inode to be |
| 337 | dropped. As before, generic_drop_inode() is still the default and it's been |
| 338 | updated appropriately. generic_delete_inode() is also alive and it consists |
| 339 | simply of return 1. Note that all actual eviction work is done by caller after |
| 340 | ->drop_inode() returns. |
| 341 | |
| 342 | As before, clear_inode() must be called exactly once on each call of |
| 343 | ->evict_inode() (as it used to be for each call of ->delete_inode()). Unlike |
| 344 | before, if you are using inode-associated metadata buffers (i.e. |
| 345 | mark_buffer_dirty_inode()), it's your responsibility to call |
| 346 | invalidate_inode_buffers() before clear_inode(). |
| 347 | |
| 348 | NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out |
| 349 | if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput() |
| 350 | may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly |
| 351 | free the on-disk inode, you may end up doing that while ->write_inode() is writing |
| 352 | to it. |
| 353 | |
| 354 | --- |
| 355 | |
| 356 | **mandatory** |
| 357 | |
| 358 | .d_delete() now only advises the dcache as to whether or not to cache |
| 359 | unreferenced dentries, and is now only called when the dentry refcount goes to |
| 360 | 0. Even on 0 refcount transition, it must be able to tolerate being called 0, |
| 361 | 1, or more times (eg. constant, idempotent). |
| 362 | |
| 363 | --- |
| 364 | |
| 365 | **mandatory** |
| 366 | |
| 367 | .d_compare() calling convention and locking rules are significantly |
| 368 | changed. Read updated documentation in Documentation/filesystems/vfs.rst (and |
| 369 | look at examples of other filesystems) for guidance. |
| 370 | |
| 371 | --- |
| 372 | |
| 373 | **mandatory** |
| 374 | |
| 375 | .d_hash() calling convention and locking rules are significantly |
| 376 | changed. Read updated documentation in Documentation/filesystems/vfs.rst (and |
| 377 | look at examples of other filesystems) for guidance. |
| 378 | |
| 379 | --- |
| 380 | |
| 381 | **mandatory** |
| 382 | |
| 383 | dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c |
| 384 | for details of what locks to replace dcache_lock with in order to protect |
| 385 | particular things. Most of the time, a filesystem only needs ->d_lock, which |
| 386 | protects *all* the dcache state of a given dentry. |
| 387 | |
| 388 | --- |
| 389 | |
| 390 | **mandatory** |
| 391 | |
| 392 | Filesystems must RCU-free their inodes, if they can have been accessed |
| 393 | via rcu-walk path walk (basically, if the file can have had a path name in the |
| 394 | vfs namespace). |
| 395 | |
| 396 | Even though i_dentry and i_rcu share storage in a union, we will |
| 397 | initialize the former in inode_init_always(), so just leave it alone in |
| 398 | the callback. It used to be necessary to clean it there, but not anymore |
| 399 | (starting at 3.2). |
| 400 | |
| 401 | --- |
| 402 | |
| 403 | **recommended** |
| 404 | |
| 405 | vfs now tries to do path walking in "rcu-walk mode", which avoids |
| 406 | atomic operations and scalability hazards on dentries and inodes (see |
| 407 | Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes |
| 408 | (above) are examples of the changes required to support this. For more complex |
| 409 | filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so |
| 410 | no changes are required to the filesystem. However, this is costly and loses |
| 411 | the benefits of rcu-walk mode. We will begin to add filesystem callbacks that |
| 412 | are rcu-walk aware, shown below. Filesystems should take advantage of this |
| 413 | where possible. |
| 414 | |
| 415 | --- |
| 416 | |
| 417 | **mandatory** |
| 418 | |
| 419 | d_revalidate is a callback that is made on every path element (if |
| 420 | the filesystem provides it), which requires dropping out of rcu-walk mode. This |
| 421 | may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be |
| 422 | returned if the filesystem cannot handle rcu-walk. See |
| 423 | Documentation/filesystems/vfs.rst for more details. |
| 424 | |
| 425 | permission is an inode permission check that is called on many or all |
| 426 | directory inodes on the way down a path walk (to check for exec permission). It |
| 427 | must now be rcu-walk aware (mask & MAY_NOT_BLOCK). See |
| 428 | Documentation/filesystems/vfs.rst for more details. |
| 429 | |
| 430 | --- |
| 431 | |
| 432 | **mandatory** |
| 433 | |
| 434 | In ->fallocate() you must check the mode option passed in. If your |
| 435 | filesystem does not support hole punching (deallocating space in the middle of a |
| 436 | file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode. |
| 437 | Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set, |
| 438 | so the i_size should not change when hole punching, even when puching the end of |
| 439 | a file off. |
| 440 | |
| 441 | --- |
| 442 | |
| 443 | **mandatory** |
| 444 | |
| 445 | ->get_sb() is gone. Switch to use of ->mount(). Typically it's just |
| 446 | a matter of switching from calling ``get_sb_``... to ``mount_``... and changing |
| 447 | the function type. If you were doing it manually, just switch from setting |
| 448 | ->mnt_root to some pointer to returning that pointer. On errors return |
| 449 | ERR_PTR(...). |
| 450 | |
| 451 | --- |
| 452 | |
| 453 | **mandatory** |
| 454 | |
| 455 | ->permission() and generic_permission()have lost flags |
| 456 | argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask. |
| 457 | |
| 458 | generic_permission() has also lost the check_acl argument; ACL checking |
| 459 | has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl |
| 460 | to read an ACL from disk. |
| 461 | |
| 462 | --- |
| 463 | |
| 464 | **mandatory** |
| 465 | |
| 466 | If you implement your own ->llseek() you must handle SEEK_HOLE and |
| 467 | SEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to |
| 468 | support it in some way. The generic handler assumes that the entire file is |
| 469 | data and there is a virtual hole at the end of the file. So if the provided |
| 470 | offset is less than i_size and SEEK_DATA is specified, return the same offset. |
| 471 | If the above is true for the offset and you are given SEEK_HOLE, return the end |
| 472 | of the file. If the offset is i_size or greater return -ENXIO in either case. |
| 473 | |
| 474 | **mandatory** |
| 475 | |
| 476 | If you have your own ->fsync() you must make sure to call |
| 477 | filemap_write_and_wait_range() so that all dirty pages are synced out properly. |
| 478 | You must also keep in mind that ->fsync() is not called with i_mutex held |
| 479 | anymore, so if you require i_mutex locking you must make sure to take it and |
| 480 | release it yourself. |
| 481 | |
| 482 | --- |
| 483 | |
| 484 | **mandatory** |
| 485 | |
| 486 | d_alloc_root() is gone, along with a lot of bugs caused by code |
| 487 | misusing it. Replacement: d_make_root(inode). On success d_make_root(inode) |
| 488 | allocates and returns a new dentry instantiated with the passed in inode. |
| 489 | On failure NULL is returned and the passed in inode is dropped so the reference |
| 490 | to inode is consumed in all cases and failure handling need not do any cleanup |
| 491 | for the inode. If d_make_root(inode) is passed a NULL inode it returns NULL |
| 492 | and also requires no further error handling. Typical usage is:: |
| 493 | |
| 494 | inode = foofs_new_inode(....); |
| 495 | s->s_root = d_make_root(inode); |
| 496 | if (!s->s_root) |
| 497 | /* Nothing needed for the inode cleanup */ |
| 498 | return -ENOMEM; |
| 499 | ... |
| 500 | |
| 501 | --- |
| 502 | |
| 503 | **mandatory** |
| 504 | |
| 505 | The witch is dead! Well, 2/3 of it, anyway. ->d_revalidate() and |
| 506 | ->lookup() do *not* take struct nameidata anymore; just the flags. |
| 507 | |
| 508 | --- |
| 509 | |
| 510 | **mandatory** |
| 511 | |
| 512 | ->create() doesn't take ``struct nameidata *``; unlike the previous |
| 513 | two, it gets "is it an O_EXCL or equivalent?" boolean argument. Note that |
| 514 | local filesystems can ignore tha argument - they are guaranteed that the |
| 515 | object doesn't exist. It's remote/distributed ones that might care... |
| 516 | |
| 517 | --- |
| 518 | |
| 519 | **mandatory** |
| 520 | |
| 521 | FS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate() |
| 522 | in your dentry operations instead. |
| 523 | |
| 524 | --- |
| 525 | |
| 526 | **mandatory** |
| 527 | |
| 528 | vfs_readdir() is gone; switch to iterate_dir() instead |
| 529 | |
| 530 | --- |
| 531 | |
| 532 | **mandatory** |
| 533 | |
| 534 | ->readdir() is gone now; switch to ->iterate() |
| 535 | |
| 536 | **mandatory** |
| 537 | |
| 538 | vfs_follow_link has been removed. Filesystems must use nd_set_link |
| 539 | from ->follow_link for normal symlinks, or nd_jump_link for magic |
| 540 | /proc/<pid> style links. |
| 541 | |
| 542 | --- |
| 543 | |
| 544 | **mandatory** |
| 545 | |
| 546 | iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be |
| 547 | called with both ->i_lock and inode_hash_lock held; the former is *not* |
| 548 | taken anymore, so verify that your callbacks do not rely on it (none |
| 549 | of the in-tree instances did). inode_hash_lock is still held, |
| 550 | of course, so they are still serialized wrt removal from inode hash, |
| 551 | as well as wrt set() callback of iget5_locked(). |
| 552 | |
| 553 | --- |
| 554 | |
| 555 | **mandatory** |
| 556 | |
| 557 | d_materialise_unique() is gone; d_splice_alias() does everything you |
| 558 | need now. Remember that they have opposite orders of arguments ;-/ |
| 559 | |
| 560 | --- |
| 561 | |
| 562 | **mandatory** |
| 563 | |
| 564 | f_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid |
| 565 | it entirely. |
| 566 | |
| 567 | --- |
| 568 | |
| 569 | **mandatory** |
| 570 | |
| 571 | never call ->read() and ->write() directly; use __vfs_{read,write} or |
| 572 | wrappers; instead of checking for ->write or ->read being NULL, look for |
| 573 | FMODE_CAN_{WRITE,READ} in file->f_mode. |
| 574 | |
| 575 | --- |
| 576 | |
| 577 | **mandatory** |
| 578 | |
| 579 | do _not_ use new_sync_{read,write} for ->read/->write; leave it NULL |
| 580 | instead. |
| 581 | |
| 582 | --- |
| 583 | |
| 584 | **mandatory** |
| 585 | ->aio_read/->aio_write are gone. Use ->read_iter/->write_iter. |
| 586 | |
| 587 | --- |
| 588 | |
| 589 | **recommended** |
| 590 | |
| 591 | for embedded ("fast") symlinks just set inode->i_link to wherever the |
| 592 | symlink body is and use simple_follow_link() as ->follow_link(). |
| 593 | |
| 594 | --- |
| 595 | |
| 596 | **mandatory** |
| 597 | |
| 598 | calling conventions for ->follow_link() have changed. Instead of returning |
| 599 | cookie and using nd_set_link() to store the body to traverse, we return |
| 600 | the body to traverse and store the cookie using explicit void ** argument. |
| 601 | nameidata isn't passed at all - nd_jump_link() doesn't need it and |
| 602 | nd_[gs]et_link() is gone. |
| 603 | |
| 604 | --- |
| 605 | |
| 606 | **mandatory** |
| 607 | |
| 608 | calling conventions for ->put_link() have changed. It gets inode instead of |
| 609 | dentry, it does not get nameidata at all and it gets called only when cookie |
| 610 | is non-NULL. Note that link body isn't available anymore, so if you need it, |
| 611 | store it as cookie. |
| 612 | |
| 613 | --- |
| 614 | |
| 615 | **mandatory** |
| 616 | |
| 617 | any symlink that might use page_follow_link_light/page_put_link() must |
| 618 | have inode_nohighmem(inode) called before anything might start playing with |
| 619 | its pagecache. No highmem pages should end up in the pagecache of such |
| 620 | symlinks. That includes any preseeding that might be done during symlink |
| 621 | creation. __page_symlink() will honour the mapping gfp flags, so once |
| 622 | you've done inode_nohighmem() it's safe to use, but if you allocate and |
| 623 | insert the page manually, make sure to use the right gfp flags. |
| 624 | |
| 625 | --- |
| 626 | |
| 627 | **mandatory** |
| 628 | |
| 629 | ->follow_link() is replaced with ->get_link(); same API, except that |
| 630 | |
| 631 | * ->get_link() gets inode as a separate argument |
| 632 | * ->get_link() may be called in RCU mode - in that case NULL |
| 633 | dentry is passed |
| 634 | |
| 635 | --- |
| 636 | |
| 637 | **mandatory** |
| 638 | |
| 639 | ->get_link() gets struct delayed_call ``*done`` now, and should do |
| 640 | set_delayed_call() where it used to set ``*cookie``. |
| 641 | |
| 642 | ->put_link() is gone - just give the destructor to set_delayed_call() |
| 643 | in ->get_link(). |
| 644 | |
| 645 | --- |
| 646 | |
| 647 | **mandatory** |
| 648 | |
| 649 | ->getxattr() and xattr_handler.get() get dentry and inode passed separately. |
| 650 | dentry might be yet to be attached to inode, so do _not_ use its ->d_inode |
| 651 | in the instances. Rationale: !@#!@# security_d_instantiate() needs to be |
| 652 | called before we attach dentry to inode. |
| 653 | |
| 654 | --- |
| 655 | |
| 656 | **mandatory** |
| 657 | |
| 658 | symlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/ |
| 659 | i_pipe/i_link union zeroed out at inode eviction. As the result, you can't |
| 660 | assume that non-NULL value in ->i_nlink at ->destroy_inode() implies that |
| 661 | it's a symlink. Checking ->i_mode is really needed now. In-tree we had |
| 662 | to fix shmem_destroy_callback() that used to take that kind of shortcut; |
| 663 | watch out, since that shortcut is no longer valid. |
| 664 | |
| 665 | --- |
| 666 | |
| 667 | **mandatory** |
| 668 | |
| 669 | ->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as |
| 670 | they used to - they just take it exclusive. However, ->lookup() may be |
| 671 | called with parent locked shared. Its instances must not |
| 672 | |
| 673 | * use d_instantiate) and d_rehash() separately - use d_add() or |
| 674 | d_splice_alias() instead. |
| 675 | * use d_rehash() alone - call d_add(new_dentry, NULL) instead. |
| 676 | * in the unlikely case when (read-only) access to filesystem |
| 677 | data structures needs exclusion for some reason, arrange it |
| 678 | yourself. None of the in-tree filesystems needed that. |
| 679 | * rely on ->d_parent and ->d_name not changing after dentry has |
| 680 | been fed to d_add() or d_splice_alias(). Again, none of the |
| 681 | in-tree instances relied upon that. |
| 682 | |
| 683 | We are guaranteed that lookups of the same name in the same directory |
| 684 | will not happen in parallel ("same" in the sense of your ->d_compare()). |
| 685 | Lookups on different names in the same directory can and do happen in |
| 686 | parallel now. |
| 687 | |
| 688 | --- |
| 689 | |
| 690 | **recommended** |
| 691 | |
| 692 | ->iterate_shared() is added; it's a parallel variant of ->iterate(). |
| 693 | Exclusion on struct file level is still provided (as well as that |
| 694 | between it and lseek on the same struct file), but if your directory |
| 695 | has been opened several times, you can get these called in parallel. |
| 696 | Exclusion between that method and all directory-modifying ones is |
| 697 | still provided, of course. |
| 698 | |
| 699 | Often enough ->iterate() can serve as ->iterate_shared() without any |
| 700 | changes - it is a read-only operation, after all. If you have any |
| 701 | per-inode or per-dentry in-core data structures modified by ->iterate(), |
| 702 | you might need something to serialize the access to them. If you |
| 703 | do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for |
| 704 | that; look for in-tree examples. |
| 705 | |
| 706 | Old method is only used if the new one is absent; eventually it will |
| 707 | be removed. Switch while you still can; the old one won't stay. |
| 708 | |
| 709 | --- |
| 710 | |
| 711 | **mandatory** |
| 712 | |
| 713 | ->atomic_open() calls without O_CREAT may happen in parallel. |
| 714 | |
| 715 | --- |
| 716 | |
| 717 | **mandatory** |
| 718 | |
| 719 | ->setxattr() and xattr_handler.set() get dentry and inode passed separately. |
Christian Brauner | e65ce2a | 2021-01-21 14:19:27 +0100 | [diff] [blame^] | 720 | The xattr_handler.set() gets passed the user namespace of the mount the inode |
| 721 | is seen from so filesystems can idmap the i_uid and i_gid accordingly. |
Mauro Carvalho Chehab | 25b532c | 2019-07-26 09:51:28 -0300 | [diff] [blame] | 722 | dentry might be yet to be attached to inode, so do _not_ use its ->d_inode |
| 723 | in the instances. Rationale: !@#!@# security_d_instantiate() needs to be |
| 724 | called before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack |
| 725 | ->d_instantiate() uses not just ->getxattr() but ->setxattr() as well. |
| 726 | |
| 727 | --- |
| 728 | |
| 729 | **mandatory** |
| 730 | |
| 731 | ->d_compare() doesn't get parent as a separate argument anymore. If you |
| 732 | used it for finding the struct super_block involved, dentry->d_sb will |
| 733 | work just as well; if it's something more complicated, use dentry->d_parent. |
| 734 | Just be careful not to assume that fetching it more than once will yield |
| 735 | the same value - in RCU mode it could change under you. |
| 736 | |
| 737 | --- |
| 738 | |
| 739 | **mandatory** |
| 740 | |
| 741 | ->rename() has an added flags argument. Any flags not handled by the |
| 742 | filesystem should result in EINVAL being returned. |
| 743 | |
| 744 | --- |
| 745 | |
| 746 | |
| 747 | **recommended** |
| 748 | |
| 749 | ->readlink is optional for symlinks. Don't set, unless filesystem needs |
| 750 | to fake something for readlink(2). |
| 751 | |
| 752 | --- |
| 753 | |
| 754 | **mandatory** |
| 755 | |
| 756 | ->getattr() is now passed a struct path rather than a vfsmount and |
| 757 | dentry separately, and it now has request_mask and query_flags arguments |
| 758 | to specify the fields and sync type requested by statx. Filesystems not |
| 759 | supporting any statx-specific features may ignore the new arguments. |
| 760 | |
| 761 | --- |
| 762 | |
| 763 | **mandatory** |
| 764 | |
| 765 | ->atomic_open() calling conventions have changed. Gone is ``int *opened``, |
| 766 | along with FILE_OPENED/FILE_CREATED. In place of those we have |
| 767 | FMODE_OPENED/FMODE_CREATED, set in file->f_mode. Additionally, return |
| 768 | value for 'called finish_no_open(), open it yourself' case has become |
| 769 | 0, not 1. Since finish_no_open() itself is returning 0 now, that part |
| 770 | does not need any changes in ->atomic_open() instances. |
| 771 | |
| 772 | --- |
| 773 | |
| 774 | **mandatory** |
| 775 | |
| 776 | alloc_file() has become static now; two wrappers are to be used instead. |
| 777 | alloc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases |
| 778 | when dentry needs to be created; that's the majority of old alloc_file() |
| 779 | users. Calling conventions: on success a reference to new struct file |
| 780 | is returned and callers reference to inode is subsumed by that. On |
| 781 | failure, ERR_PTR() is returned and no caller's references are affected, |
| 782 | so the caller needs to drop the inode reference it held. |
| 783 | alloc_file_clone(file, flags, ops) does not affect any caller's references. |
| 784 | On success you get a new struct file sharing the mount/dentry with the |
| 785 | original, on failure - ERR_PTR(). |
| 786 | |
| 787 | --- |
| 788 | |
| 789 | **mandatory** |
| 790 | |
| 791 | ->clone_file_range() and ->dedupe_file_range have been replaced with |
| 792 | ->remap_file_range(). See Documentation/filesystems/vfs.rst for more |
| 793 | information. |
| 794 | |
| 795 | --- |
| 796 | |
| 797 | **recommended** |
| 798 | |
| 799 | ->lookup() instances doing an equivalent of:: |
| 800 | |
| 801 | if (IS_ERR(inode)) |
| 802 | return ERR_CAST(inode); |
| 803 | return d_splice_alias(inode, dentry); |
| 804 | |
| 805 | don't need to bother with the check - d_splice_alias() will do the |
| 806 | right thing when given ERR_PTR(...) as inode. Moreover, passing NULL |
| 807 | inode to d_splice_alias() will also do the right thing (equivalent of |
| 808 | d_add(dentry, NULL); return NULL;), so that kind of special cases |
| 809 | also doesn't need a separate treatment. |
| 810 | |
| 811 | --- |
| 812 | |
| 813 | **strongly recommended** |
| 814 | |
| 815 | take the RCU-delayed parts of ->destroy_inode() into a new method - |
| 816 | ->free_inode(). If ->destroy_inode() becomes empty - all the better, |
| 817 | just get rid of it. Synchronous work (e.g. the stuff that can't |
| 818 | be done from an RCU callback, or any WARN_ON() where we want the |
| 819 | stack trace) *might* be movable to ->evict_inode(); however, |
| 820 | that goes only for the things that are not needed to balance something |
| 821 | done by ->alloc_inode(). IOW, if it's cleaning up the stuff that |
| 822 | might have accumulated over the life of in-core inode, ->evict_inode() |
| 823 | might be a fit. |
| 824 | |
| 825 | Rules for inode destruction: |
| 826 | |
| 827 | * if ->destroy_inode() is non-NULL, it gets called |
| 828 | * if ->free_inode() is non-NULL, it gets scheduled by call_rcu() |
| 829 | * combination of NULL ->destroy_inode and NULL ->free_inode is |
| 830 | treated as NULL/free_inode_nonrcu, to preserve the compatibility. |
| 831 | |
| 832 | Note that the callback (be it via ->free_inode() or explicit call_rcu() |
| 833 | in ->destroy_inode()) is *NOT* ordered wrt superblock destruction; |
| 834 | as the matter of fact, the superblock and all associated structures |
| 835 | might be already gone. The filesystem driver is guaranteed to be still |
| 836 | there, but that's it. Freeing memory in the callback is fine; doing |
| 837 | more than that is possible, but requires a lot of care and is best |
| 838 | avoided. |
| 839 | |
| 840 | --- |
| 841 | |
| 842 | **mandatory** |
| 843 | |
| 844 | DCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the |
| 845 | default. DCACHE_NORCU opts out, and only d_alloc_pseudo() has any |
| 846 | business doing so. |
| 847 | |
| 848 | --- |
| 849 | |
| 850 | **mandatory** |
| 851 | |
| 852 | d_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are |
| 853 | very suspect (and won't work in modules). Such uses are very likely to |
| 854 | be misspelled d_alloc_anon(). |
Al Viro | d9a9f48 | 2020-03-12 18:25:20 -0400 | [diff] [blame] | 855 | |
| 856 | --- |
| 857 | |
| 858 | **mandatory** |
| 859 | |
| 860 | [should've been added in 2016] stale comment in finish_open() nonwithstanding, |
| 861 | failure exits in ->atomic_open() instances should *NOT* fput() the file, |
| 862 | no matter what. Everything is handled by the caller. |
Miklos Szeredi | df820f8 | 2020-06-04 10:48:19 +0200 | [diff] [blame] | 863 | |
| 864 | --- |
| 865 | |
| 866 | **mandatory** |
| 867 | |
| 868 | clone_private_mount() returns a longterm mount now, so the proper destructor of |
| 869 | its result is kern_unmount() or kern_unmount_array(). |