Blame - Documentation/filesystems/porting.rst - SHIFTPHONES/mainline/linux

blob: 26c093969573639fc815803aa275d609057c7bfc [file] [log] [blame]

Mauro Carvalho Chehab	25b532c	2019-07-26 09:51:28 -0300	[diff] [blame]	1	====================
				2	Changes since 2.5.0:
				3	====================
				4
				5	---
				6
				7	recommended
				8
				9	New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
				10	sb_set_blocksize() and sb_min_blocksize().
				11
				12	Use them.
				13
				14	(sb_find_get_block() replaces 2.4's get_hash_table())
				15
				16	---
				17
				18	recommended
				19
				20	New methods: ->alloc_inode() and ->destroy_inode().
				21
				22	Remove inode->u.foo_inode_i
				23
				24	Declare::
				25
				26	struct foo_inode_info {
				27	/* fs-private stuff */
				28	struct inode vfs_inode;
				29	};
				30	static inline struct foo_inode_info FOO_I(struct inode inode)
				31	{
				32	return list_entry(inode, struct foo_inode_info, vfs_inode);
				33	}
				34
				35	Use FOO_I(inode) instead of &inode->u.foo_inode_i;
				36
				37	Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate
				38	foo_inode_info and return the address of ->vfs_inode, the latter should free
				39	FOO_I(inode) (see in-tree filesystems for examples).
				40
				41	Make them ->alloc_inode and ->destroy_inode in your super_operations.
				42
				43	Keep in mind that now you need explicit initialization of private data
				44	typically between calling iget_locked() and unlocking the inode.
				45
				46	At some point that will become mandatory.
				47
				48	---
				49
				50	mandatory
				51
				52	Change of file_system_type method (->read_super to ->get_sb)
				53
				54	->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
				55
				56	Turn your foo_read_super() into a function that would return 0 in case of
				57	success and negative number in case of error (-EINVAL unless you have more
				58	informative error value to report). Call it foo_fill_super(). Now declare::
				59
				60	int foo_get_sb(struct file_system_type *fs_type,
				61	int flags, const char dev_name, void data, struct vfsmount *mnt)
				62	{
				63	return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
				64	mnt);
				65	}
				66
				67	(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
				68	filesystem).
				69
				70	Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
				71	foo_get_sb.
				72
				73	---
				74
				75	mandatory
				76
				77	Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
				78	Most likely there is no need to change anything, but if you relied on
				79	global exclusion between renames for some internal purpose - you need to
				80	change your internal locking. Otherwise exclusion warranties remain the
				81	same (i.e. parents and victim are locked, etc.).
				82
				83	---
				84
				85	informational
				86
				87	Now we have the exclusion between ->lookup() and directory removal (by
				88	->rmdir() and ->rename()). If you used to need that exclusion and do
				89	it by internal locking (most of filesystems couldn't care less) - you
				90	can relax your locking.
				91
				92	---
				93
				94	mandatory
				95
				96	->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
				97	->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
				98	and ->readdir() are called without BKL now. Grab it on entry, drop upon return
				99	- that will guarantee the same locking you used to have. If your method or its
				100	parts do not need BKL - better yet, now you can shift lock_kernel() and
				101	unlock_kernel() so that they would protect exactly what needs to be
				102	protected.
				103
				104	---
				105
				106	mandatory
				107
				108	BKL is also moved from around sb operations. BKL should have been shifted into
				109	individual fs sb_op functions. If you don't need it, remove it.
				110
				111	---
				112
				113	informational
				114
				115	check for ->link() target not being a directory is done by callers. Feel
				116	free to drop it...
				117
				118	---
				119
				120	informational
				121
				122	->link() callers hold ->i_mutex on the object we are linking to. Some of your
				123	problems might be over...
				124
				125	---
				126
				127	mandatory
				128
				129	new file_system_type method - kill_sb(superblock). If you are converting
				130	an existing filesystem, set it according to ->fs_flags::
				131
				132	FS_REQUIRES_DEV - kill_block_super
				133	FS_LITTER - kill_litter_super
				134	neither - kill_anon_super
				135
				136	FS_LITTER is gone - just remove it from fs_flags.
				137
				138	---
				139
				140	mandatory
				141
				142	FS_SINGLE is gone (actually, that had happened back when ->get_sb()
				143	went in - and hadn't been documented ;-/). Just remove it from fs_flags
				144	(and see ->get_sb() entry for other actions).
				145
				146	---
				147
				148	mandatory
				149
				150	->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so
				151	watch for ->i_mutex-grabbing code that might be used by your ->setattr().
				152	Callers of notify_change() need ->i_mutex now.
				153
				154	---
				155
				156	recommended
				157
				158	New super_block field ``struct export_operations *s_export_op`` for
				159	explicit support for exporting, e.g. via NFS. The structure is fully
				160	documented at its declaration in include/linux/fs.h, and in
Mauro Carvalho Chehab	9195c3e8	2019-07-31 17:27:56 -0300	[diff] [blame]	161	Documentation/filesystems/nfs/exporting.rst.
Mauro Carvalho Chehab	25b532c	2019-07-26 09:51:28 -0300	[diff] [blame]	162
				163	Briefly it allows for the definition of decode_fh and encode_fh operations
				164	to encode and decode filehandles, and allows the filesystem to use
				165	a standard helper function for decode_fh, and provide file-system specific
				166	support for this helper, particularly get_parent.
				167
				168	It is planned that this will be required for exporting once the code
				169	settles down a bit.
				170
				171	mandatory
				172
				173	s_export_op is now required for exporting a filesystem.
				174	isofs, ext2, ext3, resierfs, fat
				175	can be used as examples of very different filesystems.
				176
				177	---
				178
				179	mandatory
				180
				181	iget4() and the read_inode2 callback have been superseded by iget5_locked()
				182	which has the following prototype::
				183
				184	struct inode iget5_locked(struct super_block sb, unsigned long ino,
				185	int (test)(struct inode , void *),
				186	int (set)(struct inode , void *),
				187	void *data);
				188
				189	'test' is an additional function that can be used when the inode
				190	number is not sufficient to identify the actual file object. 'set'
				191	should be a non-blocking function that initializes those parts of a
				192	newly created inode to allow the test function to succeed. 'data' is
				193	passed as an opaque value to both test and set functions.
				194
				195	When the inode has been created by iget5_locked(), it will be returned with the
				196	I_NEW flag set and will still be locked. The filesystem then needs to finalize
				197	the initialization. Once the inode is initialized it must be unlocked by
				198	calling unlock_new_inode().
				199
				200	The filesystem is responsible for setting (and possibly testing) i_ino
				201	when appropriate. There is also a simpler iget_locked function that
				202	just takes the superblock and inode number as arguments and does the
				203	test and set for you.
				204
				205	e.g.::
				206
				207	inode = iget_locked(sb, ino);
				208	if (inode->i_state & I_NEW) {
				209	err = read_inode_from_disk(inode);
				210	if (err < 0) {
				211	iget_failed(inode);
				212	return err;
				213	}
				214	unlock_new_inode(inode);
				215	}
				216
				217	Note that if the process of setting up a new inode fails, then iget_failed()
				218	should be called on the inode to render it dead, and an appropriate error
				219	should be passed back to the caller.
				220
				221	---
				222
				223	recommended
				224
				225	->getattr() finally getting used. See instances in nfs, minix, etc.
				226
				227	---
				228
				229	mandatory
				230
				231	->revalidate() is gone. If your filesystem had it - provide ->getattr()
				232	and let it call whatever you had as ->revlidate() + (for symlinks that
				233	had ->revalidate()) add calls in ->follow_link()/->readlink().
				234
				235	---
				236
				237	mandatory
				238
				239	->d_parent changes are not protected by BKL anymore. Read access is safe
				240	if at least one of the following is true:
				241
				242	* filesystem has no cross-directory rename()
				243	* we know that parent had been locked (e.g. we are looking at
				244	->d_parent of ->lookup() argument).
				245	* we are called from ->rename().
				246	* the child's ->d_lock is held
				247
				248	Audit your code and add locking if needed. Notice that any place that is
				249	not protected by the conditions above is risky even in the old tree - you
				250	had been relying on BKL and that's prone to screwups. Old tree had quite
				251	a few holes of that kind - unprotected access to ->d_parent leading to
				252	anything from oops to silent memory corruption.
				253
				254	---
				255
				256	mandatory
				257
				258	FS_NOMOUNT is gone. If you use it - just set SB_NOUSER in flags
				259	(see rootfs for one kind of solution and bdev/socket/pipe for another).
				260
				261	---
				262
				263	recommended
				264
				265	Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter
				266	is still alive, but only because of the mess in drivers/s390/block/dasd.c.
				267	As soon as it gets fixed is_read_only() will die.
				268
				269	---
				270
				271	mandatory
				272
				273	->permission() is called without BKL now. Grab it on entry, drop upon
				274	return - that will guarantee the same locking you used to have. If
				275	your method or its parts do not need BKL - better yet, now you can
				276	shift lock_kernel() and unlock_kernel() so that they would protect
				277	exactly what needs to be protected.
				278
				279	---
				280
				281	mandatory
				282
				283	->statfs() is now called without BKL held. BKL should have been
				284	shifted into individual fs sb_op functions where it's not clear that
				285	it's safe to remove it. If you don't need it, remove it.
				286
				287	---
				288
				289	mandatory
				290
				291	is_read_only() is gone; use bdev_read_only() instead.
				292
				293	---
				294
				295	mandatory
				296
				297	destroy_buffers() is gone; use invalidate_bdev().
				298
				299	---
				300
				301	mandatory
				302
				303	fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is
				304	deliberate; as soon as struct block_device * is propagated in a reasonable
				305	way by that code fixing will become trivial; until then nothing can be
				306	done.
				307
				308	mandatory
				309
				310	block truncatation on error exit from ->write_begin, and ->direct_IO
				311	moved from generic methods (block_write_begin, cont_write_begin,
				312	nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at
				313	ext2_write_failed and callers for an example.
				314
				315	mandatory
				316
				317	->truncate is gone. The whole truncate sequence needs to be
				318	implemented in ->setattr, which is now mandatory for filesystems
				319	implementing on-disk size changes. Start with a copy of the old inode_setattr
				320	and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
				321	be in order of zeroing blocks using block_truncate_page or similar helpers,
				322	size update and on finally on-disk truncation which should not fail.
				323	setattr_prepare (which used to be inode_change_ok) now includes the size checks
				324	for ATTR_SIZE and must be called in the beginning of ->setattr unconditionally.
				325
				326	mandatory
				327
				328	->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
				329	be used instead. It gets called whenever the inode is evicted, whether it has
				330	remaining links or not. Caller does not evict the pagecache or inode-associated
				331	metadata buffers; the method has to use truncate_inode_pages_final() to get rid
				332	of those. Caller makes sure async writeback cannot be running for the inode while
				333	(or after) ->evict_inode() is called.
				334
				335	->drop_inode() returns int now; it's called on final iput() with
				336	inode->i_lock held and it returns true if filesystems wants the inode to be
				337	dropped. As before, generic_drop_inode() is still the default and it's been
				338	updated appropriately. generic_delete_inode() is also alive and it consists
				339	simply of return 1. Note that all actual eviction work is done by caller after
				340	->drop_inode() returns.
				341
				342	As before, clear_inode() must be called exactly once on each call of
				343	->evict_inode() (as it used to be for each call of ->delete_inode()). Unlike
				344	before, if you are using inode-associated metadata buffers (i.e.
				345	mark_buffer_dirty_inode()), it's your responsibility to call
				346	invalidate_inode_buffers() before clear_inode().
				347
				348	NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
				349	if it's zero is not and never had been enough. Final unlink() and iput()
				350	may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
				351	free the on-disk inode, you may end up doing that while ->write_inode() is writing
				352	to it.
				353
				354	---
				355
				356	mandatory
				357
				358	.d_delete() now only advises the dcache as to whether or not to cache
				359	unreferenced dentries, and is now only called when the dentry refcount goes to
				360	0. Even on 0 refcount transition, it must be able to tolerate being called 0,
				361	1, or more times (eg. constant, idempotent).
				362
				363	---
				364
				365	mandatory
				366
				367	.d_compare() calling convention and locking rules are significantly
				368	changed. Read updated documentation in Documentation/filesystems/vfs.rst (and
				369	look at examples of other filesystems) for guidance.
				370
				371	---
				372
				373	mandatory
				374
				375	.d_hash() calling convention and locking rules are significantly
				376	changed. Read updated documentation in Documentation/filesystems/vfs.rst (and
				377	look at examples of other filesystems) for guidance.
				378
				379	---
				380
				381	mandatory
				382
				383	dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
				384	for details of what locks to replace dcache_lock with in order to protect
				385	particular things. Most of the time, a filesystem only needs ->d_lock, which
				386	protects all the dcache state of a given dentry.
				387
				388	---
				389
				390	mandatory
				391
				392	Filesystems must RCU-free their inodes, if they can have been accessed
				393	via rcu-walk path walk (basically, if the file can have had a path name in the
				394	vfs namespace).
				395
				396	Even though i_dentry and i_rcu share storage in a union, we will
				397	initialize the former in inode_init_always(), so just leave it alone in
				398	the callback. It used to be necessary to clean it there, but not anymore
				399	(starting at 3.2).
				400
				401	---
				402
				403	recommended
				404
				405	vfs now tries to do path walking in "rcu-walk mode", which avoids
				406	atomic operations and scalability hazards on dentries and inodes (see
				407	Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes
				408	(above) are examples of the changes required to support this. For more complex
				409	filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
				410	no changes are required to the filesystem. However, this is costly and loses
				411	the benefits of rcu-walk mode. We will begin to add filesystem callbacks that
				412	are rcu-walk aware, shown below. Filesystems should take advantage of this
				413	where possible.
				414
				415	---
				416
				417	mandatory
				418
				419	d_revalidate is a callback that is made on every path element (if
				420	the filesystem provides it), which requires dropping out of rcu-walk mode. This
				421	may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
				422	returned if the filesystem cannot handle rcu-walk. See
				423	Documentation/filesystems/vfs.rst for more details.
				424
				425	permission is an inode permission check that is called on many or all
				426	directory inodes on the way down a path walk (to check for exec permission). It
				427	must now be rcu-walk aware (mask & MAY_NOT_BLOCK). See
				428	Documentation/filesystems/vfs.rst for more details.
				429
				430	---
				431
				432	mandatory
				433
				434	In ->fallocate() you must check the mode option passed in. If your
				435	filesystem does not support hole punching (deallocating space in the middle of a
				436	file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
				437	Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
				438	so the i_size should not change when hole punching, even when puching the end of
				439	a file off.
				440
				441	---
				442
				443	mandatory
				444
				445	->get_sb() is gone. Switch to use of ->mount(). Typically it's just
				446	a matter of switching from calling ``get_sb_``... to ``mount_``... and changing
				447	the function type. If you were doing it manually, just switch from setting
				448	->mnt_root to some pointer to returning that pointer. On errors return
				449	ERR_PTR(...).
				450
				451	---
				452
				453	mandatory
				454
				455	->permission() and generic_permission()have lost flags
				456	argument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
				457
				458	generic_permission() has also lost the check_acl argument; ACL checking
				459	has been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
				460	to read an ACL from disk.
				461
				462	---
				463
				464	mandatory
				465
				466	If you implement your own ->llseek() you must handle SEEK_HOLE and
				467	SEEK_DATA. You can hanle this by returning -EINVAL, but it would be nicer to
				468	support it in some way. The generic handler assumes that the entire file is
				469	data and there is a virtual hole at the end of the file. So if the provided
				470	offset is less than i_size and SEEK_DATA is specified, return the same offset.
				471	If the above is true for the offset and you are given SEEK_HOLE, return the end
				472	of the file. If the offset is i_size or greater return -ENXIO in either case.
				473
				474	mandatory
				475
				476	If you have your own ->fsync() you must make sure to call
				477	filemap_write_and_wait_range() so that all dirty pages are synced out properly.
				478	You must also keep in mind that ->fsync() is not called with i_mutex held
				479	anymore, so if you require i_mutex locking you must make sure to take it and
				480	release it yourself.
				481
				482	---
				483
				484	mandatory
				485
				486	d_alloc_root() is gone, along with a lot of bugs caused by code
				487	misusing it. Replacement: d_make_root(inode). On success d_make_root(inode)
				488	allocates and returns a new dentry instantiated with the passed in inode.
				489	On failure NULL is returned and the passed in inode is dropped so the reference
				490	to inode is consumed in all cases and failure handling need not do any cleanup
				491	for the inode. If d_make_root(inode) is passed a NULL inode it returns NULL
				492	and also requires no further error handling. Typical usage is::
				493
				494	inode = foofs_new_inode(....);
				495	s->s_root = d_make_root(inode);
				496	if (!s->s_root)
				497	/* Nothing needed for the inode cleanup */
				498	return -ENOMEM;
				499	...
				500
				501	---
				502
				503	mandatory
				504
				505	The witch is dead! Well, 2/3 of it, anyway. ->d_revalidate() and
				506	->lookup() do not take struct nameidata anymore; just the flags.
				507
				508	---
				509
				510	mandatory
				511
				512	->create() doesn't take ``struct nameidata *``; unlike the previous
				513	two, it gets "is it an O_EXCL or equivalent?" boolean argument. Note that
				514	local filesystems can ignore tha argument - they are guaranteed that the
				515	object doesn't exist. It's remote/distributed ones that might care...
				516
				517	---
				518
				519	mandatory
				520
				521	FS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate()
				522	in your dentry operations instead.
				523
				524	---
				525
				526	mandatory
				527
				528	vfs_readdir() is gone; switch to iterate_dir() instead
				529
				530	---
				531
				532	mandatory
				533
				534	->readdir() is gone now; switch to ->iterate()
				535
				536	mandatory
				537
				538	vfs_follow_link has been removed. Filesystems must use nd_set_link
				539	from ->follow_link for normal symlinks, or nd_jump_link for magic
				540	/proc/<pid> style links.
				541
				542	---
				543
				544	mandatory
				545
				546	iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
				547	called with both ->i_lock and inode_hash_lock held; the former is not
				548	taken anymore, so verify that your callbacks do not rely on it (none
				549	of the in-tree instances did). inode_hash_lock is still held,
				550	of course, so they are still serialized wrt removal from inode hash,
				551	as well as wrt set() callback of iget5_locked().
				552
				553	---
				554
				555	mandatory
				556
				557	d_materialise_unique() is gone; d_splice_alias() does everything you
				558	need now. Remember that they have opposite orders of arguments ;-/
				559
				560	---
				561
				562	mandatory
				563
				564	f_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
				565	it entirely.
				566
				567	---
				568
				569	mandatory
				570
				571	never call ->read() and ->write() directly; use __vfs_{read,write} or
				572	wrappers; instead of checking for ->write or ->read being NULL, look for
				573	FMODE_CAN_{WRITE,READ} in file->f_mode.
				574
				575	---
				576
				577	mandatory
				578
				579	do _not_ use new_sync_{read,write} for ->read/->write; leave it NULL
				580	instead.
				581
				582	---
				583
				584	mandatory
				585	->aio_read/->aio_write are gone. Use ->read_iter/->write_iter.
				586
				587	---
				588
				589	recommended
				590
				591	for embedded ("fast") symlinks just set inode->i_link to wherever the
				592	symlink body is and use simple_follow_link() as ->follow_link().
				593
				594	---
				595
				596	mandatory
				597
				598	calling conventions for ->follow_link() have changed. Instead of returning
				599	cookie and using nd_set_link() to store the body to traverse, we return
				600	the body to traverse and store the cookie using explicit void ** argument.
				601	nameidata isn't passed at all - nd_jump_link() doesn't need it and
				602	nd_[gs]et_link() is gone.
				603
				604	---
				605
				606	mandatory
				607
				608	calling conventions for ->put_link() have changed. It gets inode instead of
				609	dentry, it does not get nameidata at all and it gets called only when cookie
				610	is non-NULL. Note that link body isn't available anymore, so if you need it,
				611	store it as cookie.
				612
				613	---
				614
				615	mandatory
				616
				617	any symlink that might use page_follow_link_light/page_put_link() must
				618	have inode_nohighmem(inode) called before anything might start playing with
				619	its pagecache. No highmem pages should end up in the pagecache of such
				620	symlinks. That includes any preseeding that might be done during symlink
				621	creation. __page_symlink() will honour the mapping gfp flags, so once
				622	you've done inode_nohighmem() it's safe to use, but if you allocate and
				623	insert the page manually, make sure to use the right gfp flags.
				624
				625	---
				626
				627	mandatory
				628
				629	->follow_link() is replaced with ->get_link(); same API, except that
				630
				631	* ->get_link() gets inode as a separate argument
				632	* ->get_link() may be called in RCU mode - in that case NULL
				633	dentry is passed
				634
				635	---
				636
				637	mandatory
				638
				639	->get_link() gets struct delayed_call ``*done`` now, and should do
				640	set_delayed_call() where it used to set ``*cookie``.
				641
				642	->put_link() is gone - just give the destructor to set_delayed_call()
				643	in ->get_link().
				644
				645	---
				646
				647	mandatory
				648
				649	->getxattr() and xattr_handler.get() get dentry and inode passed separately.
				650	dentry might be yet to be attached to inode, so do _not_ use its ->d_inode
				651	in the instances. Rationale: !@#!@# security_d_instantiate() needs to be
				652	called before we attach dentry to inode.
				653
				654	---
				655
				656	mandatory
				657
				658	symlinks are no longer the only inodes that do not have i_bdev/i_cdev/
				659	i_pipe/i_link union zeroed out at inode eviction. As the result, you can't
				660	assume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
				661	it's a symlink. Checking ->i_mode is really needed now. In-tree we had
				662	to fix shmem_destroy_callback() that used to take that kind of shortcut;
				663	watch out, since that shortcut is no longer valid.
				664
				665	---
				666
				667	mandatory
				668
				669	->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as
				670	they used to - they just take it exclusive. However, ->lookup() may be
				671	called with parent locked shared. Its instances must not
				672
				673	* use d_instantiate) and d_rehash() separately - use d_add() or
				674	d_splice_alias() instead.
				675	* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
				676	* in the unlikely case when (read-only) access to filesystem
				677	data structures needs exclusion for some reason, arrange it
				678	yourself. None of the in-tree filesystems needed that.
				679	* rely on ->d_parent and ->d_name not changing after dentry has
				680	been fed to d_add() or d_splice_alias(). Again, none of the
				681	in-tree instances relied upon that.
				682
				683	We are guaranteed that lookups of the same name in the same directory
				684	will not happen in parallel ("same" in the sense of your ->d_compare()).
				685	Lookups on different names in the same directory can and do happen in
				686	parallel now.
				687
				688	---
				689
				690	recommended
				691
				692	->iterate_shared() is added; it's a parallel variant of ->iterate().
				693	Exclusion on struct file level is still provided (as well as that
				694	between it and lseek on the same struct file), but if your directory
				695	has been opened several times, you can get these called in parallel.
				696	Exclusion between that method and all directory-modifying ones is
				697	still provided, of course.
				698
				699	Often enough ->iterate() can serve as ->iterate_shared() without any
				700	changes - it is a read-only operation, after all. If you have any
				701	per-inode or per-dentry in-core data structures modified by ->iterate(),
				702	you might need something to serialize the access to them. If you
				703	do dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
				704	that; look for in-tree examples.
				705
				706	Old method is only used if the new one is absent; eventually it will
				707	be removed. Switch while you still can; the old one won't stay.
				708
				709	---
				710
				711	mandatory
				712
				713	->atomic_open() calls without O_CREAT may happen in parallel.
				714
				715	---
				716
				717	mandatory
				718
				719	->setxattr() and xattr_handler.set() get dentry and inode passed separately.
				720	dentry might be yet to be attached to inode, so do _not_ use its ->d_inode
				721	in the instances. Rationale: !@#!@# security_d_instantiate() needs to be
				722	called before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack
				723	->d_instantiate() uses not just ->getxattr() but ->setxattr() as well.
				724
				725	---
				726
				727	mandatory
				728
				729	->d_compare() doesn't get parent as a separate argument anymore. If you
				730	used it for finding the struct super_block involved, dentry->d_sb will
				731	work just as well; if it's something more complicated, use dentry->d_parent.
				732	Just be careful not to assume that fetching it more than once will yield
				733	the same value - in RCU mode it could change under you.
				734
				735	---
				736
				737	mandatory
				738
				739	->rename() has an added flags argument. Any flags not handled by the
				740	filesystem should result in EINVAL being returned.
				741
				742	---
				743
				744
				745	recommended
				746
				747	->readlink is optional for symlinks. Don't set, unless filesystem needs
				748	to fake something for readlink(2).
				749
				750	---
				751
				752	mandatory
				753
				754	->getattr() is now passed a struct path rather than a vfsmount and
				755	dentry separately, and it now has request_mask and query_flags arguments
				756	to specify the fields and sync type requested by statx. Filesystems not
				757	supporting any statx-specific features may ignore the new arguments.
				758
				759	---
				760
				761	mandatory
				762
				763	->atomic_open() calling conventions have changed. Gone is ``int *opened``,
				764	along with FILE_OPENED/FILE_CREATED. In place of those we have
				765	FMODE_OPENED/FMODE_CREATED, set in file->f_mode. Additionally, return
				766	value for 'called finish_no_open(), open it yourself' case has become
				767	0, not 1. Since finish_no_open() itself is returning 0 now, that part
				768	does not need any changes in ->atomic_open() instances.
				769
				770	---
				771
				772	mandatory
				773
				774	alloc_file() has become static now; two wrappers are to be used instead.
				775	alloc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases
				776	when dentry needs to be created; that's the majority of old alloc_file()
				777	users. Calling conventions: on success a reference to new struct file
				778	is returned and callers reference to inode is subsumed by that. On
				779	failure, ERR_PTR() is returned and no caller's references are affected,
				780	so the caller needs to drop the inode reference it held.
				781	alloc_file_clone(file, flags, ops) does not affect any caller's references.
				782	On success you get a new struct file sharing the mount/dentry with the
				783	original, on failure - ERR_PTR().
				784
				785	---
				786
				787	mandatory
				788
				789	->clone_file_range() and ->dedupe_file_range have been replaced with
				790	->remap_file_range(). See Documentation/filesystems/vfs.rst for more
				791	information.
				792
				793	---
				794
				795	recommended
				796
				797	->lookup() instances doing an equivalent of::
				798
				799	if (IS_ERR(inode))
				800	return ERR_CAST(inode);
				801	return d_splice_alias(inode, dentry);
				802
				803	don't need to bother with the check - d_splice_alias() will do the
				804	right thing when given ERR_PTR(...) as inode. Moreover, passing NULL
				805	inode to d_splice_alias() will also do the right thing (equivalent of
				806	d_add(dentry, NULL); return NULL;), so that kind of special cases
				807	also doesn't need a separate treatment.
				808
				809	---
				810
				811	strongly recommended
				812
				813	take the RCU-delayed parts of ->destroy_inode() into a new method -
				814	->free_inode(). If ->destroy_inode() becomes empty - all the better,
				815	just get rid of it. Synchronous work (e.g. the stuff that can't
				816	be done from an RCU callback, or any WARN_ON() where we want the
				817	stack trace) might be movable to ->evict_inode(); however,
				818	that goes only for the things that are not needed to balance something
				819	done by ->alloc_inode(). IOW, if it's cleaning up the stuff that
				820	might have accumulated over the life of in-core inode, ->evict_inode()
				821	might be a fit.
				822
				823	Rules for inode destruction:
				824
				825	* if ->destroy_inode() is non-NULL, it gets called
				826	* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
				827	* combination of NULL ->destroy_inode and NULL ->free_inode is
				828	treated as NULL/free_inode_nonrcu, to preserve the compatibility.
				829
				830	Note that the callback (be it via ->free_inode() or explicit call_rcu()
				831	in ->destroy_inode()) is NOT ordered wrt superblock destruction;
				832	as the matter of fact, the superblock and all associated structures
				833	might be already gone. The filesystem driver is guaranteed to be still
				834	there, but that's it. Freeing memory in the callback is fine; doing
				835	more than that is possible, but requires a lot of care and is best
				836	avoided.
				837
				838	---
				839
				840	mandatory
				841
				842	DCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the
				843	default. DCACHE_NORCU opts out, and only d_alloc_pseudo() has any
				844	business doing so.
				845
				846	---
				847
				848	mandatory
				849
				850	d_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
				851	very suspect (and won't work in modules). Such uses are very likely to
				852	be misspelled d_alloc_anon().
Al Viro	d9a9f48	2020-03-12 18:25:20 -0400	[diff] [blame^]	853
				854	---
				855
				856	mandatory
				857
				858	[should've been added in 2016] stale comment in finish_open() nonwithstanding,
				859	failure exits in ->atomic_open() instances should NOT fput() the file,
				860	no matter what. Everything is handled by the caller.