blob: 266d2059b9b8d29a387ce2ef057fd995057542e8 [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001Changes since 2.5.0:
2
Oliver Pinter3eb43f62008-02-03 17:59:17 +02003---
Linus Torvalds1da177e2005-04-16 15:20:36 -07004[recommended]
5
6New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
7 sb_set_blocksize() and sb_min_blocksize().
8
9Use them.
10
11(sb_find_get_block() replaces 2.4's get_hash_table())
12
Oliver Pinter3eb43f62008-02-03 17:59:17 +020013---
Linus Torvalds1da177e2005-04-16 15:20:36 -070014[recommended]
15
16New methods: ->alloc_inode() and ->destroy_inode().
17
18Remove inode->u.foo_inode_i
19Declare
20 struct foo_inode_info {
21 /* fs-private stuff */
22 struct inode vfs_inode;
23 };
24 static inline struct foo_inode_info *FOO_I(struct inode *inode)
25 {
26 return list_entry(inode, struct foo_inode_info, vfs_inode);
27 }
28
29Use FOO_I(inode) instead of &inode->u.foo_inode_i;
30
Oliver Pinter3eb43f62008-02-03 17:59:17 +020031Add foo_alloc_inode() and foo_destroy_inode() - the former should allocate
Linus Torvalds1da177e2005-04-16 15:20:36 -070032foo_inode_info and return the address of ->vfs_inode, the latter should free
33FOO_I(inode) (see in-tree filesystems for examples).
34
35Make them ->alloc_inode and ->destroy_inode in your super_operations.
36
David Howells12debc42008-02-07 00:15:52 -080037Keep in mind that now you need explicit initialization of private data
38typically between calling iget_locked() and unlocking the inode.
Linus Torvalds1da177e2005-04-16 15:20:36 -070039
40At some point that will become mandatory.
41
42---
43[mandatory]
44
45Change of file_system_type method (->read_super to ->get_sb)
46
47->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
48
49Turn your foo_read_super() into a function that would return 0 in case of
50success and negative number in case of error (-EINVAL unless you have more
51informative error value to report). Call it foo_fill_super(). Now declare
52
David Howells454e2392006-06-23 02:02:57 -070053int foo_get_sb(struct file_system_type *fs_type,
54 int flags, const char *dev_name, void *data, struct vfsmount *mnt)
Linus Torvalds1da177e2005-04-16 15:20:36 -070055{
David Howells454e2392006-06-23 02:02:57 -070056 return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
57 mnt);
Linus Torvalds1da177e2005-04-16 15:20:36 -070058}
59
60(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
61filesystem).
62
63Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
64foo_get_sb.
65
66---
67[mandatory]
68
69Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
70Most likely there is no need to change anything, but if you relied on
71global exclusion between renames for some internal purpose - you need to
72change your internal locking. Otherwise exclusion warranties remain the
73same (i.e. parents and victim are locked, etc.).
74
75---
76[informational]
77
78Now we have the exclusion between ->lookup() and directory removal (by
79->rmdir() and ->rename()). If you used to need that exclusion and do
80it by internal locking (most of filesystems couldn't care less) - you
81can relax your locking.
82
83---
84[mandatory]
85
86->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
87->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
88and ->readdir() are called without BKL now. Grab it on entry, drop upon return
89- that will guarantee the same locking you used to have. If your method or its
90parts do not need BKL - better yet, now you can shift lock_kernel() and
91unlock_kernel() so that they would protect exactly what needs to be
92protected.
93
94---
95[mandatory]
96
97BKL is also moved from around sb operations. ->write_super() Is now called
98without BKL held. BKL should have been shifted into individual fs sb_op
99functions. If you don't need it, remove it.
100
101---
102[informational]
103
104check for ->link() target not being a directory is done by callers. Feel
105free to drop it...
106
107---
108[informational]
109
Josef 'Jeff' Sipekc2b38982007-05-24 12:21:43 -0400110->link() callers hold ->i_mutex on the object we are linking to. Some of your
Linus Torvalds1da177e2005-04-16 15:20:36 -0700111problems might be over...
112
113---
114[mandatory]
115
116new file_system_type method - kill_sb(superblock). If you are converting
117an existing filesystem, set it according to ->fs_flags:
118 FS_REQUIRES_DEV - kill_block_super
119 FS_LITTER - kill_litter_super
120 neither - kill_anon_super
121FS_LITTER is gone - just remove it from fs_flags.
122
123---
124[mandatory]
125
126 FS_SINGLE is gone (actually, that had happened back when ->get_sb()
127went in - and hadn't been documented ;-/). Just remove it from fs_flags
128(and see ->get_sb() entry for other actions).
129
130---
131[mandatory]
132
Josef 'Jeff' Sipekc2b38982007-05-24 12:21:43 -0400133->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so
134watch for ->i_mutex-grabbing code that might be used by your ->setattr().
135Callers of notify_change() need ->i_mutex now.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700136
137---
138[recommended]
139
140New super_block field "struct export_operations *s_export_op" for
141explicit support for exporting, e.g. via NFS. The structure is fully
142documented at its declaration in include/linux/fs.h, and in
J. Bruce Fieldsdc7a0812009-10-27 14:41:35 -0400143Documentation/filesystems/nfs/Exporting.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700144
145Briefly it allows for the definition of decode_fh and encode_fh operations
146to encode and decode filehandles, and allows the filesystem to use
147a standard helper function for decode_fh, and provide file-system specific
148support for this helper, particularly get_parent.
149
150It is planned that this will be required for exporting once the code
151settles down a bit.
152
153[mandatory]
154
155s_export_op is now required for exporting a filesystem.
156isofs, ext2, ext3, resierfs, fat
157can be used as examples of very different filesystems.
158
159---
160[mandatory]
161
162iget4() and the read_inode2 callback have been superseded by iget5_locked()
163which has the following prototype,
164
165 struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
166 int (*test)(struct inode *, void *),
167 int (*set)(struct inode *, void *),
168 void *data);
169
170'test' is an additional function that can be used when the inode
171number is not sufficient to identify the actual file object. 'set'
172should be a non-blocking function that initializes those parts of a
173newly created inode to allow the test function to succeed. 'data' is
174passed as an opaque value to both test and set functions.
175
David Howells12debc42008-02-07 00:15:52 -0800176When the inode has been created by iget5_locked(), it will be returned with the
177I_NEW flag set and will still be locked. The filesystem then needs to finalize
178the initialization. Once the inode is initialized it must be unlocked by
179calling unlock_new_inode().
Linus Torvalds1da177e2005-04-16 15:20:36 -0700180
181The filesystem is responsible for setting (and possibly testing) i_ino
182when appropriate. There is also a simpler iget_locked function that
183just takes the superblock and inode number as arguments and does the
184test and set for you.
185
186e.g.
David Howellsb46980f2008-02-07 00:15:27 -0800187 inode = iget_locked(sb, ino);
188 if (inode->i_state & I_NEW) {
189 err = read_inode_from_disk(inode);
190 if (err < 0) {
191 iget_failed(inode);
192 return err;
193 }
194 unlock_new_inode(inode);
195 }
196
197Note that if the process of setting up a new inode fails, then iget_failed()
198should be called on the inode to render it dead, and an appropriate error
199should be passed back to the caller.
Linus Torvalds1da177e2005-04-16 15:20:36 -0700200
201---
202[recommended]
203
204->getattr() finally getting used. See instances in nfs, minix, etc.
205
206---
207[mandatory]
208
209->revalidate() is gone. If your filesystem had it - provide ->getattr()
210and let it call whatever you had as ->revlidate() + (for symlinks that
211had ->revalidate()) add calls in ->follow_link()/->readlink().
212
213---
214[mandatory]
215
216->d_parent changes are not protected by BKL anymore. Read access is safe
217if at least one of the following is true:
218 * filesystem has no cross-directory rename()
Linus Torvalds1da177e2005-04-16 15:20:36 -0700219 * we know that parent had been locked (e.g. we are looking at
220->d_parent of ->lookup() argument).
221 * we are called from ->rename().
222 * the child's ->d_lock is held
223Audit your code and add locking if needed. Notice that any place that is
224not protected by the conditions above is risky even in the old tree - you
225had been relying on BKL and that's prone to screwups. Old tree had quite
226a few holes of that kind - unprotected access to ->d_parent leading to
227anything from oops to silent memory corruption.
228
229---
230[mandatory]
231
232 FS_NOMOUNT is gone. If you use it - just set MS_NOUSER in flags
233(see rootfs for one kind of solution and bdev/socket/pipe for another).
234
235---
236[recommended]
237
238 Use bdev_read_only(bdev) instead of is_read_only(kdev). The latter
239is still alive, but only because of the mess in drivers/s390/block/dasd.c.
240As soon as it gets fixed is_read_only() will die.
241
242---
243[mandatory]
244
245->permission() is called without BKL now. Grab it on entry, drop upon
246return - that will guarantee the same locking you used to have. If
247your method or its parts do not need BKL - better yet, now you can
248shift lock_kernel() and unlock_kernel() so that they would protect
249exactly what needs to be protected.
250
251---
252[mandatory]
253
254->statfs() is now called without BKL held. BKL should have been
255shifted into individual fs sb_op functions where it's not clear that
256it's safe to remove it. If you don't need it, remove it.
257
258---
259[mandatory]
260
261 is_read_only() is gone; use bdev_read_only() instead.
262
263---
264[mandatory]
265
266 destroy_buffers() is gone; use invalidate_bdev().
267
268---
269[mandatory]
270
271 fsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is
272deliberate; as soon as struct block_device * is propagated in a reasonable
273way by that code fixing will become trivial; until then nothing can be
274done.
Christoph Hellwig1e231732010-06-07 09:29:20 +0200275
276[mandatory]
277
278 block truncatation on error exit from ->write_begin, and ->direct_IO
279moved from generic methods (block_write_begin, cont_write_begin,
280nobh_write_begin, blockdev_direct_IO*) to callers. Take a look at
281ext2_write_failed and callers for an example.
282
283[mandatory]
284
285 ->truncate is going away. The whole truncate sequence needs to be
286implemented in ->setattr, which is now mandatory for filesystems
287implementing on-disk size changes. Start with a copy of the old inode_setattr
288and vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
289be in order of zeroing blocks using block_truncate_page or similar helpers,
290size update and on finally on-disk truncation which should not fail.
291inode_change_ok now includes the size checks for ATTR_SIZE and must be called
292in the beginning of ->setattr unconditionally.
Al Viro336fb3b2010-06-08 00:37:12 -0400293
294[mandatory]
295
296 ->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
297be used instead. It gets called whenever the inode is evicted, whether it has
298remaining links or not. Caller does *not* evict the pagecache or inode-associated
299metadata buffers; getting rid of those is responsibility of method, as it had
300been for ->delete_inode().
301 ->drop_inode() returns int now; it's called on final iput() with inode_lock
302held and it returns true if filesystems wants the inode to be dropped. As before,
303generic_drop_inode() is still the default and it's been updated appropriately.
304generic_delete_inode() is also alive and it consists simply of return 1. Note that
305all actual eviction work is done by caller after ->drop_inode() returns.
306 clear_inode() is gone; use end_writeback() instead. As before, it must
307be called exactly once on each call of ->evict_inode() (as it used to be for
308each call of ->delete_inode()). Unlike before, if you are using inode-associated
309metadata buffers (i.e. mark_buffer_dirty_inode()), it's your responsibility to
310call invalidate_inode_buffers() before end_writeback().
311 No async writeback (and thus no calls of ->write_inode()) will happen
312after end_writeback() returns, so actions that should not overlap with ->write_inode()
313(e.g. freeing on-disk inode if i_nlink is 0) ought to be done after that call.
314
315 NOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
316if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput()
317may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
318free the on-disk inode, you may end up doing that while ->write_inode() is writing
319to it.
Nick Pigginfe15ce42011-01-07 17:49:23 +1100320
321---
322[mandatory]
323
324 .d_delete() now only advises the dcache as to whether or not to cache
325unreferenced dentries, and is now only called when the dentry refcount goes to
3260. Even on 0 refcount transition, it must be able to tolerate being called 0,
3271, or more times (eg. constant, idempotent).
Nick Piggin621e1552011-01-07 17:49:27 +1100328
329---
330[mandatory]
331
332 .d_compare() calling convention and locking rules are significantly
333changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
334look at examples of other filesystems) for guidance.
Nick Pigginb1e6a012011-01-07 17:49:28 +1100335
336---
337[mandatory]
338
339 .d_hash() calling convention and locking rules are significantly
340changed. Read updated documentation in Documentation/filesystems/vfs.txt (and
341look at examples of other filesystems) for guidance.
Nick Pigginb5c84bf2011-01-07 17:49:38 +1100342
343---
344[mandatory]
345 dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
346for details of what locks to replace dcache_lock with in order to protect
347particular things. Most of the time, a filesystem only needs ->d_lock, which
348protects *all* the dcache state of a given dentry.
Nick Pigginfa0d7e3d2011-01-07 17:49:49 +1100349
350--
351[mandatory]
352
353 Filesystems must RCU-free their inodes, if they can have been accessed
354via rcu-walk path walk (basically, if the file can have had a path name in the
355vfs namespace).
356
357 i_dentry and i_rcu share storage in a union, and the vfs expects
358i_dentry to be reinitialized before it is freed, so an:
359
360 INIT_LIST_HEAD(&inode->i_dentry);
361
362must be done in the RCU callback.
Nick Piggin34286d62011-01-07 17:49:57 +1100363
364--
365[recommended]
366 vfs now tries to do path walking in "rcu-walk mode", which avoids
367atomic operations and scalability hazards on dentries and inodes (see
368Documentation/filesystems/path-walk.txt). d_hash and d_compare changes (above)
369are examples of the changes required to support this. For more complex
370filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
371no changes are required to the filesystem. However, this is costly and loses
372the benefits of rcu-walk mode. We will begin to add filesystem callbacks that
373are rcu-walk aware, shown below. Filesystems should take advantage of this
374where possible.
375
376--
377[mandatory]
378 d_revalidate is a callback that is made on every path element (if
379the filesystem provides it), which requires dropping out of rcu-walk mode. This
380may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
381returned if the filesystem cannot handle rcu-walk. See
382Documentation/filesystems/vfs.txt for more details.
Nick Pigginb74c79e2011-01-07 17:49:58 +1100383
384 permission and check_acl are inode permission checks that are called
385on many or all directory inodes on the way down a path walk (to check for
386exec permission). These must now be rcu-walk aware (flags & IPERM_RCU). See
387Documentation/filesystems/vfs.txt for more details.
Josef Bacik92424152011-01-05 15:00:07 -0500388
389--
390[mandatory]
391 In ->fallocate() you must check the mode option passed in. If your
392filesystem does not support hole punching (deallocating space in the middle of a
393file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
394Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
395so the i_size should not change when hole punching, even when puching the end of
396a file off.