Mauro Carvalho Chehab | a02dcdf | 2020-04-27 23:17:08 +0200 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ===================================================== |
| 4 | Mandatory File Locking For The Linux Operating System |
| 5 | ===================================================== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 6 | |
| 7 | Andy Walker <andy@lysaker.kvaerner.no> |
| 8 | |
| 9 | 15 April 1996 |
Mauro Carvalho Chehab | a02dcdf | 2020-04-27 23:17:08 +0200 | [diff] [blame] | 10 | |
J. Bruce Fields | 9efa68e | 2007-09-25 11:57:19 -0400 | [diff] [blame] | 11 | (Updated September 2007) |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 12 | |
J. Bruce Fields | 9efa68e | 2007-09-25 11:57:19 -0400 | [diff] [blame] | 13 | 0. Why you should avoid mandatory locking |
| 14 | ----------------------------------------- |
| 15 | |
| 16 | The Linux implementation is prey to a number of difficult-to-fix race |
| 17 | conditions which in practice make it not dependable: |
| 18 | |
| 19 | - The write system call checks for a mandatory lock only once |
| 20 | at its start. It is therefore possible for a lock request to |
| 21 | be granted after this check but before the data is modified. |
| 22 | A process may then see file data change even while a mandatory |
| 23 | lock was held. |
| 24 | - Similarly, an exclusive lock may be granted on a file after |
| 25 | the kernel has decided to proceed with a read, but before the |
| 26 | read has actually completed, and the reading process may see |
| 27 | the file data in a state which should not have been visible |
| 28 | to it. |
| 29 | - Similar races make the claimed mutual exclusion between lock |
| 30 | and mmap similarly unreliable. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 31 | |
| 32 | 1. What is mandatory locking? |
| 33 | ------------------------------ |
| 34 | |
| 35 | Mandatory locking is kernel enforced file locking, as opposed to the more usual |
| 36 | cooperative file locking used to guarantee sequential access to files among |
| 37 | processes. File locks are applied using the flock() and fcntl() system calls |
| 38 | (and the lockf() library routine which is a wrapper around fcntl().) It is |
| 39 | normally a process' responsibility to check for locks on a file it wishes to |
| 40 | update, before applying its own lock, updating the file and unlocking it again. |
| 41 | The most commonly used example of this (and in the case of sendmail, the most |
| 42 | troublesome) is access to a user's mailbox. The mail user agent and the mail |
| 43 | transfer agent must guard against updating the mailbox at the same time, and |
| 44 | prevent reading the mailbox while it is being updated. |
| 45 | |
| 46 | In a perfect world all processes would use and honour a cooperative, or |
| 47 | "advisory" locking scheme. However, the world isn't perfect, and there's |
| 48 | a lot of poorly written code out there. |
| 49 | |
| 50 | In trying to address this problem, the designers of System V UNIX came up |
| 51 | with a "mandatory" locking scheme, whereby the operating system kernel would |
| 52 | block attempts by a process to write to a file that another process holds a |
| 53 | "read" -or- "shared" lock on, and block attempts to both read and write to a |
| 54 | file that a process holds a "write " -or- "exclusive" lock on. |
| 55 | |
| 56 | The System V mandatory locking scheme was intended to have as little impact as |
| 57 | possible on existing user code. The scheme is based on marking individual files |
| 58 | as candidates for mandatory locking, and using the existing fcntl()/lockf() |
| 59 | interface for applying locks just as if they were normal, advisory locks. |
| 60 | |
Mauro Carvalho Chehab | a02dcdf | 2020-04-27 23:17:08 +0200 | [diff] [blame] | 61 | .. Note:: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 62 | |
Mauro Carvalho Chehab | a02dcdf | 2020-04-27 23:17:08 +0200 | [diff] [blame] | 63 | 1. In saying "file" in the paragraphs above I am actually not telling |
| 64 | the whole truth. System V locking is based on fcntl(). The granularity of |
| 65 | fcntl() is such that it allows the locking of byte ranges in files, in |
| 66 | addition to entire files, so the mandatory locking rules also have byte |
| 67 | level granularity. |
| 68 | |
| 69 | 2. POSIX.1 does not specify any scheme for mandatory locking, despite |
| 70 | borrowing the fcntl() locking scheme from System V. The mandatory locking |
| 71 | scheme is defined by the System V Interface Definition (SVID) Version 3. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 72 | |
| 73 | 2. Marking a file for mandatory locking |
| 74 | --------------------------------------- |
| 75 | |
| 76 | A file is marked as a candidate for mandatory locking by setting the group-id |
| 77 | bit in its file mode but removing the group-execute bit. This is an otherwise |
| 78 | meaningless combination, and was chosen by the System V implementors so as not |
| 79 | to break existing user programs. |
| 80 | |
| 81 | Note that the group-id bit is usually automatically cleared by the kernel when |
| 82 | a setgid file is written to. This is a security measure. The kernel has been |
| 83 | modified to recognize the special case of a mandatory lock candidate and to |
| 84 | refrain from clearing this bit. Similarly the kernel has been modified not |
| 85 | to run mandatory lock candidates with setgid privileges. |
| 86 | |
| 87 | 3. Available implementations |
| 88 | ---------------------------- |
| 89 | |
| 90 | I have considered the implementations of mandatory locking available with |
| 91 | SunOS 4.1.x, Solaris 2.x and HP-UX 9.x. |
| 92 | |
| 93 | Generally I have tried to make the most sense out of the behaviour exhibited |
| 94 | by these three reference systems. There are many anomalies. |
| 95 | |
| 96 | All the reference systems reject all calls to open() for a file on which |
| 97 | another process has outstanding mandatory locks. This is in direct |
| 98 | contravention of SVID 3, which states that only calls to open() with the |
| 99 | O_TRUNC flag set should be rejected. The Linux implementation follows the SVID |
| 100 | definition, which is the "Right Thing", since only calls with O_TRUNC can |
| 101 | modify the contents of the file. |
| 102 | |
| 103 | HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not |
| 104 | just mandatory locks. That would appear to contravene POSIX.1. |
| 105 | |
| 106 | mmap() is another interesting case. All the operating systems mentioned |
| 107 | prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX |
| 108 | also disallows advisory locks for such a file. SVID actually specifies the |
| 109 | paranoid HP-UX behaviour. |
| 110 | |
| 111 | In my opinion only MAP_SHARED mappings should be immune from locking, and then |
| 112 | only from mandatory locks - that is what is currently implemented. |
| 113 | |
| 114 | SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for |
| 115 | mandatory locks, so reads and writes to locked files always block when they |
| 116 | should return EAGAIN. |
| 117 | |
| 118 | I'm afraid that this is such an esoteric area that the semantics described |
| 119 | below are just as valid as any others, so long as the main points seem to |
| 120 | agree. |
| 121 | |
| 122 | 4. Semantics |
| 123 | ------------ |
| 124 | |
| 125 | 1. Mandatory locks can only be applied via the fcntl()/lockf() locking |
| 126 | interface - in other words the System V/POSIX interface. BSD style |
| 127 | locks using flock() never result in a mandatory lock. |
| 128 | |
| 129 | 2. If a process has locked a region of a file with a mandatory read lock, then |
| 130 | other processes are permitted to read from that region. If any of these |
| 131 | processes attempts to write to the region it will block until the lock is |
| 132 | released, unless the process has opened the file with the O_NONBLOCK |
| 133 | flag in which case the system call will return immediately with the error |
| 134 | status EAGAIN. |
| 135 | |
| 136 | 3. If a process has locked a region of a file with a mandatory write lock, all |
| 137 | attempts to read or write to that region block until the lock is released, |
| 138 | unless a process has opened the file with the O_NONBLOCK flag in which case |
| 139 | the system call will return immediately with the error status EAGAIN. |
| 140 | |
| 141 | 4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has |
| 142 | any mandatory locks owned by other processes will be rejected with the |
| 143 | error status EAGAIN. |
| 144 | |
| 145 | 5. Attempts to apply a mandatory lock to a file that is memory mapped and |
| 146 | shared (via mmap() with MAP_SHARED) will be rejected with the error status |
| 147 | EAGAIN. |
| 148 | |
| 149 | 6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED) |
| 150 | that has any mandatory locks in effect will be rejected with the error status |
| 151 | EAGAIN. |
| 152 | |
| 153 | 5. Which system calls are affected? |
| 154 | ----------------------------------- |
| 155 | |
| 156 | Those which modify a file's contents, not just the inode. That gives read(), |
| 157 | write(), readv(), writev(), open(), creat(), mmap(), truncate() and |
| 158 | ftruncate(). truncate() and ftruncate() are considered to be "write" actions |
| 159 | for the purposes of mandatory locking. |
| 160 | |
| 161 | The affected region is usually defined as stretching from the current position |
| 162 | for the total number of bytes read or written. For the truncate calls it is |
| 163 | defined as the bytes of a file removed or added (we must also consider bytes |
| 164 | added, as a lock can specify just "the whole file", rather than a specific |
| 165 | range of bytes.) |
| 166 | |
| 167 | Note 3: I may have overlooked some system calls that need mandatory lock |
| 168 | checking in my eagerness to get this code out the door. Please let me know, or |
| 169 | better still fix the system calls yourself and submit a patch to me or Linus. |
| 170 | |
| 171 | 6. Warning! |
| 172 | ----------- |
| 173 | |
| 174 | Not even root can override a mandatory lock, so runaway processes can wreak |
| 175 | havoc if they lock crucial files. The way around it is to change the file |
| 176 | permissions (remove the setgid bit) before trying to read or write to it. |
| 177 | Of course, that might be a bit tricky if the system is hung :-( |
| 178 | |
Jeff Layton | df2474a | 2019-08-15 15:21:17 -0400 | [diff] [blame] | 179 | 7. The "mand" mount option |
| 180 | -------------------------- |
| 181 | Mandatory locking is disabled on all filesystems by default, and must be |
| 182 | administratively enabled by mounting with "-o mand". That mount option |
| 183 | is only allowed if the mounting task has the CAP_SYS_ADMIN capability. |
| 184 | |
| 185 | Since kernel v4.5, it is possible to disable mandatory locking |
| 186 | altogether by setting CONFIG_MANDATORY_FILE_LOCKING to "n". A kernel |
| 187 | with this disabled will reject attempts to mount filesystems with the |
| 188 | "mand" mount option with the error status EPERM. |