blob: cbf8e57376bf681ee697c2499726fb946d2dd8ca [file] [log] [blame]
Mauro Carvalho Chehabe6d42cb2020-04-27 23:17:05 +02001.. SPDX-License-Identifier: GPL-2.0
2
3===================================
Dipankar Sarma282254182005-09-09 13:04:15 -07004File management in the Linux kernel
Mauro Carvalho Chehabe6d42cb2020-04-27 23:17:05 +02005===================================
Dipankar Sarma282254182005-09-09 13:04:15 -07006
7This document describes how locking for files (struct file)
8and file descriptor table (struct files) works.
9
10Up until 2.6.12, the file descriptor table has been protected
11with a lock (files->file_lock) and reference count (files->count).
12->file_lock protected accesses to all the file related fields
13of the table. ->count was used for sharing the file descriptor
14table between tasks cloned with CLONE_FILES flag. Typically
15this would be the case for posix threads. As with the common
16refcounting model in the kernel, the last task doing
17a put_files_struct() frees the file descriptor (fd) table.
18The files (struct file) themselves are protected using
19reference count (->f_count).
20
21In the new lock-free model of file descriptor management,
22the reference counting is similar, but the locking is
23based on RCU. The file descriptor table contains multiple
24elements - the fd sets (open_fds and close_on_exec, the
25array of file pointers, the sizes of the sets and the array
26etc.). In order for the updates to appear atomic to
27a lock-free reader, all the elements of the file descriptor
28table are in a separate structure - struct fdtable.
29files_struct contains a pointer to struct fdtable through
30which the actual fd table is accessed. Initially the
31fdtable is embedded in files_struct itself. On a subsequent
32expansion of fdtable, a new fdtable structure is allocated
33and files->fdtab points to the new structure. The fdtable
34structure is freed with RCU and lock-free readers either
35see the old fdtable or the new fdtable making the update
36appear atomic. Here are the locking rules for
37the fdtable structure -
38
391. All references to the fdtable must be done through
Mauro Carvalho Chehabe6d42cb2020-04-27 23:17:05 +020040 the files_fdtable() macro::
Dipankar Sarma282254182005-09-09 13:04:15 -070041
42 struct fdtable *fdt;
43
44 rcu_read_lock();
45
46 fdt = files_fdtable(files);
47 ....
48 if (n <= fdt->max_fds)
49 ....
50 ...
51 rcu_read_unlock();
52
53 files_fdtable() uses rcu_dereference() macro which takes care of
54 the memory barrier requirements for lock-free dereference.
55 The fdtable pointer must be read within the read-side
56 critical section.
57
582. Reading of the fdtable as described above must be protected
59 by rcu_read_lock()/rcu_read_unlock().
60
Paolo Ornati670e9f32006-10-03 22:57:56 +0200613. For any update to the fd table, files->file_lock must
Dipankar Sarma282254182005-09-09 13:04:15 -070062 be held.
63
644. To look up the file structure given an fd, a reader
65 must use either fcheck() or fcheck_files() APIs. These
66 take care of barrier requirements due to lock-free lookup.
Mauro Carvalho Chehabe6d42cb2020-04-27 23:17:05 +020067
68 An example::
Dipankar Sarma282254182005-09-09 13:04:15 -070069
70 struct file *file;
71
72 rcu_read_lock();
73 file = fcheck(fd);
74 if (file) {
75 ...
76 }
77 ....
78 rcu_read_unlock();
79
805. Handling of the file structures is special. Since the look-up
81 of the fd (fget()/fget_light()) are lock-free, it is possible
82 that look-up may race with the last put() operation on the
Eric Dumazetfd659fd2008-12-10 09:35:45 -080083 file structure. This is avoided using atomic_long_inc_not_zero()
Mauro Carvalho Chehabe6d42cb2020-04-27 23:17:05 +020084 on ->f_count::
Dipankar Sarma282254182005-09-09 13:04:15 -070085
86 rcu_read_lock();
87 file = fcheck_files(files, fd);
88 if (file) {
Eric Dumazetfd659fd2008-12-10 09:35:45 -080089 if (atomic_long_inc_not_zero(&file->f_count))
Dipankar Sarma282254182005-09-09 13:04:15 -070090 *fput_needed = 1;
91 else
92 /* Didn't get the reference, someone's freed */
93 file = NULL;
94 }
95 rcu_read_unlock();
96 ....
97 return file;
98
Eric Dumazetfd659fd2008-12-10 09:35:45 -080099 atomic_long_inc_not_zero() detects if refcounts is already zero or
Dipankar Sarma282254182005-09-09 13:04:15 -0700100 goes to zero during increment. If it does, we fail
101 fget()/fget_light().
102
1036. Since both fdtable and file structures can be looked up
104 lock-free, they must be installed using rcu_assign_pointer()
105 API. If they are looked up lock-free, rcu_dereference()
106 must be used. However it is advisable to use files_fdtable()
107 and fcheck()/fcheck_files() which take care of these issues.
108
1097. While updating, the fdtable pointer must be looked up while
110 holding files->file_lock. If ->file_lock is dropped, then
111 another thread expand the files thereby creating a new
112 fdtable and making the earlier fdtable pointer stale.
Mauro Carvalho Chehabe6d42cb2020-04-27 23:17:05 +0200113
114 For example::
Dipankar Sarma282254182005-09-09 13:04:15 -0700115
116 spin_lock(&files->file_lock);
117 fd = locate_fd(files, file, start);
118 if (fd >= 0) {
119 /* locate_fd() may have expanded fdtable, load the ptr */
120 fdt = files_fdtable(files);
David Howells1dce27c2012-02-16 17:49:42 +0000121 __set_open_fd(fd, fdt);
122 __clear_close_on_exec(fd, fdt);
Dipankar Sarma282254182005-09-09 13:04:15 -0700123 spin_unlock(&files->file_lock);
124 .....
125
126 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
127 the fdtable pointer (fdt) must be loaded after locate_fd().
128