Mauro Carvalho Chehab | e6d42cb | 2020-04-27 23:17:05 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | =================================== |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 4 | File management in the Linux kernel |
Mauro Carvalho Chehab | e6d42cb | 2020-04-27 23:17:05 +0200 | [diff] [blame] | 5 | =================================== |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 6 | |
| 7 | This document describes how locking for files (struct file) |
| 8 | and file descriptor table (struct files) works. |
| 9 | |
| 10 | Up until 2.6.12, the file descriptor table has been protected |
| 11 | with a lock (files->file_lock) and reference count (files->count). |
| 12 | ->file_lock protected accesses to all the file related fields |
| 13 | of the table. ->count was used for sharing the file descriptor |
| 14 | table between tasks cloned with CLONE_FILES flag. Typically |
| 15 | this would be the case for posix threads. As with the common |
| 16 | refcounting model in the kernel, the last task doing |
| 17 | a put_files_struct() frees the file descriptor (fd) table. |
| 18 | The files (struct file) themselves are protected using |
| 19 | reference count (->f_count). |
| 20 | |
| 21 | In the new lock-free model of file descriptor management, |
| 22 | the reference counting is similar, but the locking is |
| 23 | based on RCU. The file descriptor table contains multiple |
| 24 | elements - the fd sets (open_fds and close_on_exec, the |
| 25 | array of file pointers, the sizes of the sets and the array |
| 26 | etc.). In order for the updates to appear atomic to |
| 27 | a lock-free reader, all the elements of the file descriptor |
| 28 | table are in a separate structure - struct fdtable. |
| 29 | files_struct contains a pointer to struct fdtable through |
| 30 | which the actual fd table is accessed. Initially the |
| 31 | fdtable is embedded in files_struct itself. On a subsequent |
| 32 | expansion of fdtable, a new fdtable structure is allocated |
| 33 | and files->fdtab points to the new structure. The fdtable |
| 34 | structure is freed with RCU and lock-free readers either |
| 35 | see the old fdtable or the new fdtable making the update |
| 36 | appear atomic. Here are the locking rules for |
| 37 | the fdtable structure - |
| 38 | |
| 39 | 1. All references to the fdtable must be done through |
Mauro Carvalho Chehab | e6d42cb | 2020-04-27 23:17:05 +0200 | [diff] [blame] | 40 | the files_fdtable() macro:: |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 41 | |
| 42 | struct fdtable *fdt; |
| 43 | |
| 44 | rcu_read_lock(); |
| 45 | |
| 46 | fdt = files_fdtable(files); |
| 47 | .... |
| 48 | if (n <= fdt->max_fds) |
| 49 | .... |
| 50 | ... |
| 51 | rcu_read_unlock(); |
| 52 | |
| 53 | files_fdtable() uses rcu_dereference() macro which takes care of |
| 54 | the memory barrier requirements for lock-free dereference. |
| 55 | The fdtable pointer must be read within the read-side |
| 56 | critical section. |
| 57 | |
| 58 | 2. Reading of the fdtable as described above must be protected |
| 59 | by rcu_read_lock()/rcu_read_unlock(). |
| 60 | |
Paolo Ornati | 670e9f3 | 2006-10-03 22:57:56 +0200 | [diff] [blame] | 61 | 3. For any update to the fd table, files->file_lock must |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 62 | be held. |
| 63 | |
| 64 | 4. To look up the file structure given an fd, a reader |
Eric W. Biederman | 460b4f8 | 2020-11-20 17:14:27 -0600 | [diff] [blame^] | 65 | must use either lookup_fd_rcu() or files_lookup_fd_rcu() APIs. These |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 66 | take care of barrier requirements due to lock-free lookup. |
Mauro Carvalho Chehab | e6d42cb | 2020-04-27 23:17:05 +0200 | [diff] [blame] | 67 | |
| 68 | An example:: |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 69 | |
| 70 | struct file *file; |
| 71 | |
| 72 | rcu_read_lock(); |
Eric W. Biederman | 460b4f8 | 2020-11-20 17:14:27 -0600 | [diff] [blame^] | 73 | file = lookup_fd_rcu(fd); |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 74 | if (file) { |
| 75 | ... |
| 76 | } |
| 77 | .... |
| 78 | rcu_read_unlock(); |
| 79 | |
| 80 | 5. Handling of the file structures is special. Since the look-up |
| 81 | of the fd (fget()/fget_light()) are lock-free, it is possible |
| 82 | that look-up may race with the last put() operation on the |
Eric Dumazet | fd659fd | 2008-12-10 09:35:45 -0800 | [diff] [blame] | 83 | file structure. This is avoided using atomic_long_inc_not_zero() |
Mauro Carvalho Chehab | e6d42cb | 2020-04-27 23:17:05 +0200 | [diff] [blame] | 84 | on ->f_count:: |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 85 | |
| 86 | rcu_read_lock(); |
Eric W. Biederman | f36c294 | 2020-11-20 17:14:26 -0600 | [diff] [blame] | 87 | file = files_lookup_fd_rcu(files, fd); |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 88 | if (file) { |
Eric Dumazet | fd659fd | 2008-12-10 09:35:45 -0800 | [diff] [blame] | 89 | if (atomic_long_inc_not_zero(&file->f_count)) |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 90 | *fput_needed = 1; |
| 91 | else |
| 92 | /* Didn't get the reference, someone's freed */ |
| 93 | file = NULL; |
| 94 | } |
| 95 | rcu_read_unlock(); |
| 96 | .... |
| 97 | return file; |
| 98 | |
Eric Dumazet | fd659fd | 2008-12-10 09:35:45 -0800 | [diff] [blame] | 99 | atomic_long_inc_not_zero() detects if refcounts is already zero or |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 100 | goes to zero during increment. If it does, we fail |
| 101 | fget()/fget_light(). |
| 102 | |
| 103 | 6. Since both fdtable and file structures can be looked up |
| 104 | lock-free, they must be installed using rcu_assign_pointer() |
| 105 | API. If they are looked up lock-free, rcu_dereference() |
| 106 | must be used. However it is advisable to use files_fdtable() |
Eric W. Biederman | 460b4f8 | 2020-11-20 17:14:27 -0600 | [diff] [blame^] | 107 | and lookup_fd_rcu()/files_lookup_fd_rcu() which take care of these issues. |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 108 | |
| 109 | 7. While updating, the fdtable pointer must be looked up while |
| 110 | holding files->file_lock. If ->file_lock is dropped, then |
| 111 | another thread expand the files thereby creating a new |
| 112 | fdtable and making the earlier fdtable pointer stale. |
Mauro Carvalho Chehab | e6d42cb | 2020-04-27 23:17:05 +0200 | [diff] [blame] | 113 | |
| 114 | For example:: |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 115 | |
| 116 | spin_lock(&files->file_lock); |
| 117 | fd = locate_fd(files, file, start); |
| 118 | if (fd >= 0) { |
| 119 | /* locate_fd() may have expanded fdtable, load the ptr */ |
| 120 | fdt = files_fdtable(files); |
David Howells | 1dce27c | 2012-02-16 17:49:42 +0000 | [diff] [blame] | 121 | __set_open_fd(fd, fdt); |
| 122 | __clear_close_on_exec(fd, fdt); |
Dipankar Sarma | 28225418 | 2005-09-09 13:04:15 -0700 | [diff] [blame] | 123 | spin_unlock(&files->file_lock); |
| 124 | ..... |
| 125 | |
| 126 | Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), |
| 127 | the fdtable pointer (fdt) must be loaded after locate_fd(). |
| 128 | |