Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 2 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 3 | ======================== |
Darrick J. Wong | d309121 | 2018-10-05 19:11:59 -0400 | [diff] [blame] | 4 | ext4 General Information |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 5 | ======================== |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 6 | |
Masanari Iida | c9f3f2d | 2013-07-18 01:29:12 +0900 | [diff] [blame] | 7 | Ext4 is an advanced level of the ext3 filesystem which incorporates |
Diego Calleja | 22359f5 | 2008-10-17 09:15:14 -0400 | [diff] [blame] | 8 | scalability and reliability enhancements for supporting large filesystems |
| 9 | (64 bit) in keeping with increasing disk capacities and state-of-the-art |
| 10 | feature requirements. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 11 | |
Diego Calleja | 22359f5 | 2008-10-17 09:15:14 -0400 | [diff] [blame] | 12 | Mailing list: linux-ext4@vger.kernel.org |
| 13 | Web site: http://ext4.wiki.kernel.org |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 14 | |
| 15 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 16 | Quick usage instructions |
| 17 | ======================== |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 18 | |
Diego Calleja | 22359f5 | 2008-10-17 09:15:14 -0400 | [diff] [blame] | 19 | Note: More extensive information for getting started with ext4 can be |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 20 | found at the ext4 wiki site at the URL: |
| 21 | http://ext4.wiki.kernel.org/index.php/Ext4_Howto |
Diego Calleja | 22359f5 | 2008-10-17 09:15:14 -0400 | [diff] [blame] | 22 | |
Theodore Ts'o | 0694f8c | 2018-07-29 16:35:23 -0400 | [diff] [blame] | 23 | - The latest version of e2fsprogs can be found at: |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 24 | |
SeongJae Park | 3bdadc86 | 2017-03-27 22:05:34 +0900 | [diff] [blame] | 25 | https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 26 | |
Theodore Ts'o | 0694f8c | 2018-07-29 16:35:23 -0400 | [diff] [blame] | 27 | or |
| 28 | |
| 29 | http://sourceforge.net/project/showfiles.php?group_id=2406 |
| 30 | |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 31 | or grab the latest git repository from: |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 32 | |
Theodore Ts'o | 0694f8c | 2018-07-29 16:35:23 -0400 | [diff] [blame] | 33 | https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 34 | |
Theodore Ts'o | 0694f8c | 2018-07-29 16:35:23 -0400 | [diff] [blame] | 35 | - Create a new filesystem using the ext4 filesystem type: |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 36 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 37 | # mke2fs -t ext4 /dev/hda1 |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 38 | |
Theodore Ts'o | 0694f8c | 2018-07-29 16:35:23 -0400 | [diff] [blame] | 39 | Or to configure an existing ext3 filesystem to support extents: |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 40 | |
Diego Calleja | 22359f5 | 2008-10-17 09:15:14 -0400 | [diff] [blame] | 41 | # tune2fs -O extents /dev/hda1 |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 42 | |
| 43 | If the filesystem was created with 128 byte inodes, it can be |
Theodore Ts'o | 0694f8c | 2018-07-29 16:35:23 -0400 | [diff] [blame] | 44 | converted to use 256 byte for greater efficiency via: |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 45 | |
| 46 | # tune2fs -I 256 /dev/hda1 |
| 47 | |
Theodore Ts'o | 0694f8c | 2018-07-29 16:35:23 -0400 | [diff] [blame] | 48 | - Mounting: |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 49 | |
Theodore Ts'o | 03010a3 | 2008-10-10 20:02:48 -0400 | [diff] [blame] | 50 | # mount -t ext4 /dev/hda1 /wherever |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 51 | |
Theodore Ts'o | 8e1a485 | 2009-01-06 14:53:06 -0500 | [diff] [blame] | 52 | - When comparing performance with other filesystems, it's always |
| 53 | important to try multiple workloads; very often a subtle change in a |
| 54 | workload parameter can completely change the ranking of which |
| 55 | filesystems do well compared to others. When comparing versus ext3, |
| 56 | note that ext4 enables write barriers by default, while ext3 does |
| 57 | not enable write barriers by default. So it is useful to use |
| 58 | explicitly specify whether barriers are enabled or not when via the |
| 59 | '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems |
| 60 | for a fair comparison. When tuning ext3 for best benchmark numbers, |
| 61 | it is often worthwhile to try changing the data journaling mode; '-o |
Lukas Czerner | ad43401 | 2011-06-07 12:27:05 +0200 | [diff] [blame] | 62 | data=writeback' can be faster for some workloads. (Note however that |
| 63 | running mounted with data=writeback can potentially leave stale data |
| 64 | exposed in recently written files in case of an unclean shutdown, |
| 65 | which could be a security exposure in some situations.) Configuring |
| 66 | the filesystem with a large journal can also be helpful for |
| 67 | metadata-intensive workloads. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 68 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 69 | Features |
| 70 | ======== |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 71 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 72 | Currently Available |
| 73 | ------------------- |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 74 | |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 75 | * ability to use filesystems > 16TB (e2fsprogs support not available yet) |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 76 | * extent format reduces metadata overhead (RAM, IO for access, transactions) |
| 77 | * extent format more robust in face of on-disk corruption due to magics, |
Theodore Ts'o | 8e1a485 | 2009-01-06 14:53:06 -0500 | [diff] [blame] | 78 | * internal redundancy in tree |
Mingming Cao | 49f1487 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 79 | * improved file allocation (multi-block alloc) |
Theodore Ts'o | 722bde6 | 2009-02-23 00:51:57 -0500 | [diff] [blame] | 80 | * lift 32000 subdirectory limit imposed by i_links_count[1] |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 81 | * nsec timestamps for mtime, atime, ctime, create time |
| 82 | * inode version field on disk (NFSv4, Lustre) |
| 83 | * reduced e2fsck time via uninit_bg feature |
| 84 | * journal checksumming for robustness, performance |
| 85 | * persistent file preallocation (e.g for streaming media, databases) |
| 86 | * ability to pack bitmaps and inode tables into larger virtual groups via the |
| 87 | flex_bg feature |
| 88 | * large file support |
Pavel Machek | 98bfa34 | 2017-09-16 13:48:37 +0200 | [diff] [blame] | 89 | * inode allocation using large virtual block groups via flex_bg |
Mingming Cao | 49f1487 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 90 | * delayed allocation |
| 91 | * large block (up to pagesize) support |
Pavel Machek | 98bfa34 | 2017-09-16 13:48:37 +0200 | [diff] [blame] | 92 | * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force |
Mingming Cao | 49f1487 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 93 | the ordering) |
Gabriel Krisman Bertazi | 0a790fe | 2019-04-25 14:13:27 -0400 | [diff] [blame] | 94 | * Case-insensitive file name lookups |
Eric Biggers | 2fdff4c | 2019-12-26 09:40:07 -0600 | [diff] [blame] | 95 | * file-based encryption support (fscrypt) |
| 96 | * file-based verity support (fsverity) |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 97 | |
Theodore Ts'o | 722bde6 | 2009-02-23 00:51:57 -0500 | [diff] [blame] | 98 | [1] Filesystems with a block size of 1k may see a limit imposed by the |
| 99 | directory hash tree having a maximum depth of two. |
| 100 | |
Gabriel Krisman Bertazi | 0a790fe | 2019-04-25 14:13:27 -0400 | [diff] [blame] | 101 | case-insensitive file name lookups |
| 102 | ====================================================== |
| 103 | |
| 104 | The case-insensitive file name lookup feature is supported on a |
| 105 | per-directory basis, allowing the user to mix case-insensitive and |
| 106 | case-sensitive directories in the same filesystem. It is enabled by |
| 107 | flipping the +F inode attribute of an empty directory. The |
| 108 | case-insensitive string match operation is only defined when we know how |
| 109 | text in encoded in a byte sequence. For that reason, in order to enable |
| 110 | case-insensitive directories, the filesystem must have the |
| 111 | casefold feature, which stores the filesystem-wide encoding |
| 112 | model used. By default, the charset adopted is the latest version of |
| 113 | Unicode (12.1.0, by the time of this writing), encoded in the UTF-8 |
| 114 | form. The comparison algorithm is implemented by normalizing the |
| 115 | strings to the Canonical decomposition form, as defined by Unicode, |
| 116 | followed by a byte per byte comparison. |
| 117 | |
| 118 | The case-awareness is name-preserving on the disk, meaning that the file |
| 119 | name provided by userspace is a byte-per-byte match to what is actually |
| 120 | written in the disk. The Unicode normalization format used by the |
| 121 | kernel is thus an internal representation, and not exposed to the |
| 122 | userspace nor to the disk, with the important exception of disk hashes, |
| 123 | used on large case-insensitive directories with DX feature. On DX |
| 124 | directories, the hash must be calculated using the casefolded version of |
| 125 | the filename, meaning that the normalization format used actually has an |
| 126 | impact on where the directory entry is stored. |
| 127 | |
| 128 | When we change from viewing filenames as opaque byte sequences to seeing |
| 129 | them as encoded strings we need to address what happens when a program |
| 130 | tries to create a file with an invalid name. The Unicode subsystem |
| 131 | within the kernel leaves the decision of what to do in this case to the |
| 132 | filesystem, which select its preferred behavior by enabling/disabling |
| 133 | the strict mode. When Ext4 encounters one of those strings and the |
| 134 | filesystem did not require strict mode, it falls back to considering the |
| 135 | entire string as an opaque byte sequence, which still allows the user to |
| 136 | operate on that file, but the case-insensitive lookups won't work. |
| 137 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 138 | Options |
| 139 | ======= |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 140 | |
| 141 | When mounting an ext4 filesystem, the following option are accepted: |
| 142 | (*) == default |
| 143 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 144 | ro |
| 145 | Mount filesystem read only. Note that ext4 will replay the journal (and |
| 146 | thus write to the partition) even when mounted "read only". The mount |
| 147 | options "ro,noload" can be used to prevent writes to the filesystem. |
Theodore Ts'o | 8e1a485 | 2009-01-06 14:53:06 -0500 | [diff] [blame] | 148 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 149 | journal_checksum |
| 150 | Enable checksumming of the journal transactions. This will allow the |
| 151 | recovery code in e2fsck and the kernel to detect corruption in the |
| 152 | kernel. It is a compatible change and will be ignored by older |
| 153 | kernels. |
Linus Torvalds | d4da6c9 | 2009-11-02 10:15:27 -0800 | [diff] [blame] | 154 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 155 | journal_async_commit |
| 156 | Commit block can be written to disk without waiting for descriptor |
| 157 | blocks. If enabled older kernels cannot mount the device. This will |
| 158 | enable 'journal_checksum' internally. |
Girish Shilamkar | 818d276 | 2008-01-28 23:58:27 -0500 | [diff] [blame] | 159 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 160 | journal_path=path, journal_dev=devnum |
| 161 | When the external journal device's major/minor numbers have changed, |
| 162 | these options allow the user to specify the new journal location. The |
| 163 | journal device is identified through either its new major/minor numbers |
| 164 | encoded in devnum, or via a path to the device. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 165 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 166 | norecovery, noload |
| 167 | Don't load the journal on mounting. Note that if the filesystem was |
| 168 | not unmounted cleanly, skipping the journal replay will lead to the |
| 169 | filesystem containing inconsistencies that can lead to any number of |
| 170 | problems. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 171 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 172 | data=journal |
| 173 | All data are committed into the journal prior to being written into the |
| 174 | main file system. Enabling this mode will disable delayed allocation |
| 175 | and O_DIRECT support. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 176 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 177 | data=ordered (*) |
| 178 | All data are forced directly out to the main file system prior to its |
| 179 | metadata being committed to the journal. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 180 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 181 | data=writeback |
| 182 | Data ordering is not preserved, data may be written into the main file |
| 183 | system after its metadata has been committed to the journal. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 184 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 185 | commit=nrsec (*) |
Jan Kara | 23f6b02 | 2019-12-18 12:12:10 +0100 | [diff] [blame] | 186 | This setting limits the maximum age of the running transaction to |
| 187 | 'nrsec' seconds. The default value is 5 seconds. This means that if |
| 188 | you lose your power, you will lose as much as the latest 5 seconds of |
| 189 | metadata changes (your filesystem will not be damaged though, thanks |
| 190 | to the journaling). This default value (or any low value) will hurt |
| 191 | performance, but it's good for data-safety. Setting it to 0 will have |
| 192 | the same effect as leaving it at the default (5 seconds). Setting it |
| 193 | to very large values will improve performance. Note that due to |
| 194 | delayed allocation even older data can be lost on power failure since |
| 195 | writeback of those data begins only after time set in |
| 196 | /proc/sys/vm/dirty_expire_centisecs. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 197 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 198 | barrier=<0|1(*)>, barrier(*), nobarrier |
| 199 | This enables/disables the use of write barriers in the jbd code. |
| 200 | barrier=0 disables, barrier=1 enables. This also requires an IO stack |
| 201 | which can support barriers, and if jbd gets an error on a barrier |
| 202 | write, it will disable again with a warning. Write barriers enforce |
| 203 | proper on-disk ordering of journal commits, making volatile disk write |
| 204 | caches safe to use, at some performance penalty. If your disks are |
| 205 | battery-backed in one way or another, disabling barriers may safely |
| 206 | improve performance. The mount options "barrier" and "nobarrier" can |
| 207 | also be used to enable or disable barriers, for consistency with other |
| 208 | ext4 mount options. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 209 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 210 | inode_readahead_blks=n |
| 211 | This tuning parameter controls the maximum number of inode table blocks |
| 212 | that ext4's inode table readahead algorithm will pre-read into the |
| 213 | buffer cache. The default value is 32 blocks. |
Theodore Ts'o | 240799c | 2008-10-09 23:53:47 -0400 | [diff] [blame] | 214 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 215 | nouser_xattr |
| 216 | Disables Extended User Attributes. See the attr(5) manual page for |
| 217 | more information about extended attributes. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 218 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 219 | noacl |
| 220 | This option disables POSIX Access Control List support. If ACL support |
| 221 | is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL |
| 222 | is enabled by default on mount. See the acl(5) manual page for more |
| 223 | information about acl. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 224 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 225 | bsddf (*) |
| 226 | Make 'df' act like BSD. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 227 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 228 | minixdf |
| 229 | Make 'df' act like Minix. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 230 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 231 | debug |
| 232 | Extra debugging information is sent to syslog. |
Theodore Ts'o | 8a8a205 | 2009-06-13 10:08:59 -0400 | [diff] [blame] | 233 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 234 | abort |
| 235 | Simulate the effects of calling ext4_abort() for debugging purposes. |
| 236 | This is normally used while remounting a filesystem which is already |
| 237 | mounted. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 238 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 239 | errors=remount-ro |
| 240 | Remount the filesystem read-only on an error. |
Hidehiro Kawai | 5bf5683 | 2008-10-10 22:12:43 -0400 | [diff] [blame] | 241 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 242 | errors=continue |
| 243 | Keep going on a filesystem error. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 244 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 245 | errors=panic |
| 246 | Panic and halt the machine if an error occurs. (These mount options |
| 247 | override the errors behavior specified in the superblock, which can be |
| 248 | configured using tune2fs) |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 249 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 250 | data_err=ignore(*) |
| 251 | Just print an error message if an error occurs in a file data buffer in |
| 252 | ordered mode. |
| 253 | data_err=abort |
| 254 | Abort the journal if an error occurs in a file data buffer in ordered |
| 255 | mode. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 256 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 257 | grpid | bsdgroups |
| 258 | New objects have the group ID of their parent. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 259 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 260 | nogrpid (*) | sysvgroups |
| 261 | New objects have the group ID of their creator. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 262 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 263 | resgid=n |
| 264 | The group ID which may use the reserved blocks. |
Jan Kara | 1358870 | 2009-09-18 12:22:29 -0400 | [diff] [blame] | 265 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 266 | resuid=n |
| 267 | The user ID which may use the reserved blocks. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 268 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 269 | sb= |
| 270 | Use alternate superblock at this location. |
Jan Kara | 8365388 | 2009-09-29 15:59:34 -0400 | [diff] [blame] | 271 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 272 | quota, noquota, grpquota, usrquota |
| 273 | These options are ignored by the filesystem. They are used only by |
| 274 | quota tools to recognize volumes where quota should be turned on. See |
| 275 | documentation in the quota-tools package for more details |
| 276 | (http://sourceforge.net/projects/linuxquota). |
Theodore Ts'o | 240799c | 2008-10-09 23:53:47 -0400 | [diff] [blame] | 277 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 278 | jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file> |
| 279 | These options tell filesystem details about quota so that quota |
| 280 | information can be properly updated during journal replay. They replace |
| 281 | the above quota options. See documentation in the quota-tools package |
| 282 | for more details (http://sourceforge.net/projects/linuxquota). |
Theodore Ts'o | 3077384 | 2009-01-03 20:27:38 -0500 | [diff] [blame] | 283 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 284 | stripe=n |
| 285 | Number of filesystem blocks that mballoc will try to use for allocation |
| 286 | size and alignment. For RAID5/6 systems this should be the number of |
| 287 | data disks * RAID chunk size in file system blocks. |
Theodore Ts'o | 3077384 | 2009-01-03 20:27:38 -0500 | [diff] [blame] | 288 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 289 | delalloc (*) |
| 290 | Defer block allocation until just before ext4 writes out the block(s) |
| 291 | in question. This allows ext4 to better allocation decisions more |
| 292 | efficiently. |
Theodore Ts'o | b3881f7 | 2009-01-05 22:46:26 -0500 | [diff] [blame] | 293 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 294 | nodelalloc |
| 295 | Disable delayed allocation. Blocks are allocated when the data is |
| 296 | copied from userspace to the page cache, either via the write(2) system |
| 297 | call or when an mmap'ed page which was previously unallocated is |
| 298 | written for the first time. |
Theodore Ts'o | 06705bf | 2009-03-28 10:59:57 -0400 | [diff] [blame] | 299 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 300 | max_batch_time=usec |
| 301 | Maximum amount of time ext4 should wait for additional filesystem |
| 302 | operations to be batch together with a synchronous write operation. |
| 303 | Since a synchronous write operation is going to force a commit and then |
| 304 | a wait for the I/O complete, it doesn't cost much, and can be a huge |
| 305 | throughput win, we wait for a small amount of time to see if any other |
| 306 | transactions can piggyback on the synchronous write. The algorithm |
| 307 | used is designed to automatically tune for the speed of the disk, by |
| 308 | measuring the amount of time (on average) that it takes to finish |
| 309 | committing a transaction. Call this time the "commit time". If the |
| 310 | time that the transaction has been running is less than the commit |
| 311 | time, ext4 will try sleeping for the commit time to see if other |
| 312 | operations will join the transaction. The commit time is capped by |
| 313 | the max_batch_time, which defaults to 15000us (15ms). This |
| 314 | optimization can be turned off entirely by setting max_batch_time to 0. |
Lukas Czerner | bfff687 | 2010-10-27 21:30:05 -0400 | [diff] [blame] | 315 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 316 | min_batch_time=usec |
| 317 | This parameter sets the commit time (as described above) to be at least |
| 318 | min_batch_time. It defaults to zero microseconds. Increasing this |
| 319 | parameter may improve the throughput of multi-threaded, synchronous |
| 320 | workloads on very fast disks, at the cost of increasing latency. |
Lukas Czerner | bfff687 | 2010-10-27 21:30:05 -0400 | [diff] [blame] | 321 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 322 | journal_ioprio=prio |
| 323 | The I/O priority (from 0 to 7, where 0 is the highest priority) which |
| 324 | should be used for I/O operations submitted by kjournald2 during a |
| 325 | commit operation. This defaults to 3, which is a slightly higher |
| 326 | priority than the default I/O priority. |
Eric Sandeen | 5328e63 | 2009-11-19 14:25:42 -0500 | [diff] [blame] | 327 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 328 | auto_da_alloc(*), noauto_da_alloc |
| 329 | Many broken applications don't use fsync() when replacing existing |
| 330 | files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ |
| 331 | rename("foo.new", "foo"), or worse yet, fd = open("foo", |
| 332 | O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 |
| 333 | will detect the replace-via-rename and replace-via-truncate patterns |
| 334 | and force that any delayed allocation blocks are allocated such that at |
| 335 | the next journal commit, in the default data=ordered mode, the data |
| 336 | blocks of the new file are forced to disk before the rename() operation |
| 337 | is committed. This provides roughly the same level of guarantees as |
| 338 | ext3, and avoids the "zero-length" problem that can happen when a |
| 339 | system crashes before the delayed allocation blocks are forced to disk. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 340 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 341 | noinit_itable |
| 342 | Do not initialize any uninitialized inode table blocks in the |
| 343 | background. This feature may be used by installation CD's so that the |
| 344 | install process can complete as quickly as possible; the inode table |
| 345 | initialization process would then be deferred until the next time the |
| 346 | file system is unmounted. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 347 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 348 | init_itable=n |
| 349 | The lazy itable init code will wait n times the number of milliseconds |
| 350 | it took to zero out the previous block group's inode table. This |
| 351 | minimizes the impact on the system performance while file system's |
| 352 | inode table is being initialized. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 353 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 354 | discard, nodiscard(*) |
| 355 | Controls whether ext4 should issue discard/TRIM commands to the |
| 356 | underlying block device when blocks are freed. This is useful for SSD |
| 357 | devices and sparse/thinly-provisioned LUNs, but it is off by default |
| 358 | until sufficient testing has been done. |
Theodore Ts'o | df981d0 | 2012-08-17 09:48:17 -0400 | [diff] [blame] | 359 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 360 | nouid32 |
| 361 | Disables 32-bit UIDs and GIDs. This is for interoperability with |
| 362 | older kernels which only store and expect 16-bit values. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 363 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 364 | block_validity(*), noblock_validity |
| 365 | These options enable or disable the in-kernel facility for tracking |
| 366 | filesystem metadata blocks within internal data structures. This |
| 367 | allows multi- block allocator and other routines to notice bugs or |
| 368 | corrupted allocation bitmaps which cause blocks to be allocated which |
| 369 | overlap with filesystem metadata blocks. |
| 370 | |
| 371 | dioread_lock, dioread_nolock |
| 372 | Controls whether or not ext4 should use the DIO read locking. If the |
| 373 | dioread_nolock option is specified ext4 will allocate uninitialized |
| 374 | extent before buffer write and convert the extent to initialized after |
| 375 | IO completes. This approach allows ext4 code to avoid using inode |
| 376 | mutex, which improves scalability on high speed storages. However this |
| 377 | does not work with data journaling and dioread_nolock option will be |
| 378 | ignored with kernel warning. Note that dioread_nolock code path is only |
| 379 | used for extent-based files. Because of the restrictions this options |
| 380 | comprises it is off by default (e.g. dioread_lock). |
| 381 | |
| 382 | max_dir_size_kb=n |
| 383 | This limits the size of directories so that any attempt to expand them |
| 384 | beyond the specified limit in kilobytes will cause an ENOSPC error. |
| 385 | This is useful in memory constrained environments, where a very large |
| 386 | directory can cause severe performance problems or even provoke the Out |
| 387 | Of Memory killer. (For example, if there is only 512mb memory |
| 388 | available, a 176mb directory may seriously cramp the system's style.) |
| 389 | |
| 390 | i_version |
| 391 | Enable 64-bit inode version support. This option is off by default. |
| 392 | |
| 393 | dax |
| 394 | Use direct access (no page cache). See |
| 395 | Documentation/filesystems/dax.txt. Note that this option is |
| 396 | incompatible with data=journal. |
Ross Zwisler | 923ae0f | 2015-02-16 15:59:38 -0800 | [diff] [blame] | 397 | |
Eric Biggers | 4f74d15 | 2020-07-02 01:56:07 +0000 | [diff] [blame] | 398 | inlinecrypt |
| 399 | When possible, encrypt/decrypt the contents of encrypted files using the |
| 400 | blk-crypto framework rather than filesystem-layer encryption. This |
| 401 | allows the use of inline encryption hardware. The on-disk format is |
| 402 | unaffected. For more details, see |
| 403 | Documentation/block/inline-encryption.rst. |
| 404 | |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 405 | Data Mode |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 406 | ========= |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 407 | There are 3 different data modes: |
| 408 | |
| 409 | * writeback mode |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 410 | |
| 411 | In data=writeback mode, ext4 does not journal data at all. This mode provides |
| 412 | a similar level of journaling as that of XFS, JFS, and ReiserFS in its default |
| 413 | mode - metadata journaling. A crash+recovery can cause incorrect data to |
| 414 | appear in files which were written shortly before the crash. This mode will |
| 415 | typically provide the best ext4 performance. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 416 | |
| 417 | * ordered mode |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 418 | |
| 419 | In data=ordered mode, ext4 only officially journals metadata, but it logically |
| 420 | groups metadata information related to data changes with the data blocks into |
| 421 | a single unit called a transaction. When it's time to write the new metadata |
| 422 | out to disk, the associated data blocks are written first. In general, this |
| 423 | mode performs slightly slower than writeback but significantly faster than |
| 424 | journal mode. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 425 | |
| 426 | * journal mode |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 427 | |
| 428 | data=journal mode provides full data and metadata journaling. All new data is |
| 429 | written to the journal first, and then to its final location. In the event of |
| 430 | a crash, the journal can be replayed, bringing both data and metadata into a |
| 431 | consistent state. This mode is the slowest except when data needs to be read |
| 432 | from and written to disk at the same time where it outperforms all others |
| 433 | modes. Enabling this mode will disable delayed allocation and O_DIRECT |
| 434 | support. |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 435 | |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 436 | /proc entries |
| 437 | ============= |
| 438 | |
| 439 | Information about mounted ext4 file systems can be found in |
| 440 | /proc/fs/ext4. Each mounted filesystem will have a directory in |
| 441 | /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or |
| 442 | /proc/fs/ext4/dm-0). The files in each per-device directory are shown |
| 443 | in table below. |
| 444 | |
| 445 | Files in /proc/fs/ext4/<devname> |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 446 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 447 | mb_groups |
| 448 | details of multiblock allocator buddy cache of free blocks |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 449 | |
| 450 | /sys entries |
| 451 | ============ |
| 452 | |
| 453 | Information about mounted ext4 file systems can be found in |
| 454 | /sys/fs/ext4. Each mounted filesystem will have a directory in |
| 455 | /sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or |
| 456 | /sys/fs/ext4/dm-0). The files in each per-device directory are shown |
| 457 | in table below. |
| 458 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 459 | Files in /sys/fs/ext4/<devname>: |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 460 | |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 461 | (see also Documentation/ABI/testing/sysfs-fs-ext4) |
| 462 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 463 | delayed_allocation_blocks |
| 464 | This file is read-only and shows the number of blocks that are dirty in |
| 465 | the page cache, but which do not have their location in the filesystem |
| 466 | allocated yet. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 467 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 468 | inode_goal |
| 469 | Tuning parameter which (if non-zero) controls the goal inode used by |
| 470 | the inode allocator in preference to all other allocation heuristics. |
| 471 | This is intended for debugging use only, and should be 0 on production |
| 472 | systems. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 473 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 474 | inode_readahead_blks |
| 475 | Tuning parameter which controls the maximum number of inode table |
| 476 | blocks that ext4's inode table readahead algorithm will pre-read into |
| 477 | the buffer cache. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 478 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 479 | lifetime_write_kbytes |
| 480 | This file is read-only and shows the number of kilobytes of data that |
| 481 | have been written to this filesystem since it was created. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 482 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 483 | max_writeback_mb_bump |
| 484 | The maximum number of megabytes the writeback code will try to write |
| 485 | out before move on to another inode. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 486 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 487 | mb_group_prealloc |
| 488 | The multiblock allocator will round up allocation requests to a |
| 489 | multiple of this tuning parameter if the stripe size is not set in the |
| 490 | ext4 superblock |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 491 | |
brookxu | 27bc446 | 2020-08-17 15:36:15 +0800 | [diff] [blame] | 492 | mb_max_inode_prealloc |
| 493 | The maximum length of per-inode ext4_prealloc_space list. |
| 494 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 495 | mb_max_to_scan |
| 496 | The maximum number of extents the multiblock allocator will search to |
| 497 | find the best extent. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 498 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 499 | mb_min_to_scan |
| 500 | The minimum number of extents the multiblock allocator will search to |
| 501 | find the best extent. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 502 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 503 | mb_order2_req |
| 504 | Tuning parameter which controls the minimum size for requests (as a |
| 505 | power of 2) where the buddy cache is used. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 506 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 507 | mb_stats |
| 508 | Controls whether the multiblock allocator should collect statistics, |
| 509 | which are shown during the unmount. 1 means to collect statistics, 0 |
| 510 | means not to collect statistics. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 511 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 512 | mb_stream_req |
| 513 | Files which have fewer blocks than this tunable parameter will have |
| 514 | their blocks allocated out of a block group specific preallocation |
| 515 | pool, so that small files are packed closely together. Each large file |
| 516 | will have its blocks allocated out of its own unique preallocation |
| 517 | pool. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 518 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 519 | session_write_kbytes |
| 520 | This file is read-only and shows the number of kilobytes of data that |
| 521 | have been written to this filesystem since it was mounted. |
Lukas Czerner | 27dd438 | 2013-04-09 22:11:22 -0400 | [diff] [blame] | 522 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 523 | reserved_clusters |
| 524 | This is RW file and contains number of reserved clusters in the file |
| 525 | system which will be used in the specific situations to avoid costly |
| 526 | zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or |
| 527 | 4096 clusters, whichever is smaller and this can be changed however it |
| 528 | can never exceed number of clusters in the file system. If there is not |
| 529 | enough space for the reserved space when mounting the file mount will |
| 530 | _not_ fail. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 531 | |
| 532 | Ioctls |
| 533 | ====== |
| 534 | |
Eric Biggers | cb29a02 | 2020-07-14 16:09:09 -0700 | [diff] [blame] | 535 | Ext4 implements various ioctls which can be used by applications to access |
| 536 | ext4-specific functionality. An incomplete list of these ioctls is shown in the |
| 537 | table below. This list includes truly ext4-specific ioctls (``EXT4_IOC_*``) as |
| 538 | well as ioctls that may have been ext4-specific originally but are now supported |
| 539 | by some other filesystem(s) too (``FS_IOC_*``). |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 540 | |
Eric Biggers | cb29a02 | 2020-07-14 16:09:09 -0700 | [diff] [blame] | 541 | Table of Ext4 ioctls |
Darrick J. Wong | 489fcb9 | 2018-07-29 15:36:00 -0400 | [diff] [blame] | 542 | |
Eric Biggers | cb29a02 | 2020-07-14 16:09:09 -0700 | [diff] [blame] | 543 | FS_IOC_GETFLAGS |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 544 | Get additional attributes associated with inode. The ioctl argument is |
Eric Biggers | cb29a02 | 2020-07-14 16:09:09 -0700 | [diff] [blame] | 545 | an integer bitfield, with bit values described in ext4.h. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 546 | |
Eric Biggers | cb29a02 | 2020-07-14 16:09:09 -0700 | [diff] [blame] | 547 | FS_IOC_SETFLAGS |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 548 | Set additional attributes associated with inode. The ioctl argument is |
Eric Biggers | cb29a02 | 2020-07-14 16:09:09 -0700 | [diff] [blame] | 549 | an integer bitfield, with bit values described in ext4.h. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 550 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 551 | EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD |
| 552 | Get the inode i_generation number stored for each inode. The |
| 553 | i_generation number is normally changed only when new inode is created |
| 554 | and it is particularly useful for network filesystems. The '_OLD' |
| 555 | version of this ioctl is an alias for FS_IOC_GETVERSION. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 556 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 557 | EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD |
| 558 | Set the inode i_generation number stored for each inode. The '_OLD' |
| 559 | version of this ioctl is an alias for FS_IOC_SETVERSION. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 560 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 561 | EXT4_IOC_GROUP_EXTEND |
| 562 | This ioctl has the same purpose as the resize mount option. It allows |
| 563 | to resize filesystem to the end of the last existing block group, |
| 564 | further resize has to be done with resize2fs, either online, or |
| 565 | offline. The argument points to the unsigned logn number representing |
| 566 | the filesystem new block count. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 567 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 568 | EXT4_IOC_MOVE_EXT |
| 569 | Move the block extents from orig_fd (the one this ioctl is pointing to) |
| 570 | to the donor_fd (the one specified in move_extent structure passed as |
| 571 | an argument to this ioctl). Then, exchange inode metadata between |
| 572 | orig_fd and donor_fd. This is especially useful for online |
| 573 | defragmentation, because the allocator has the opportunity to allocate |
| 574 | moved blocks better, ideally into one contiguous extent. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 575 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 576 | EXT4_IOC_GROUP_ADD |
| 577 | Add a new group descriptor to an existing or new group descriptor |
| 578 | block. The new group descriptor is described by ext4_new_group_input |
| 579 | structure, which is passed as an argument to this ioctl. This is |
| 580 | especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which |
| 581 | allows online resize of the filesystem to the end of the last existing |
| 582 | block group. Those two ioctls combined is used in userspace online |
| 583 | resize tool (e.g. resize2fs). |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 584 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 585 | EXT4_IOC_MIGRATE |
| 586 | This ioctl operates on the filesystem itself. It converts (migrates) |
| 587 | ext3 indirect block mapped inode to ext4 extent mapped inode by walking |
| 588 | through indirect block mapping of the original inode and converting |
| 589 | contiguous block ranges into ext4 extents of the temporary inode. Then, |
| 590 | inodes are swapped. This ioctl might help, when migrating from ext3 to |
| 591 | ext4 filesystem, however suggestion is to create fresh ext4 filesystem |
| 592 | and copy data from the backup. Note, that filesystem has to support |
| 593 | extents for this ioctl to work. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 594 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 595 | EXT4_IOC_ALLOC_DA_BLKS |
| 596 | Force all of the delay allocated blocks to be allocated to preserve |
| 597 | application-expected ext3 behaviour. Note that this will also start |
| 598 | triggering a write of the data blocks, but this behaviour may change in |
| 599 | the future as it is not necessary and has been done this way only for |
| 600 | sake of simplicity. |
Yongqiang Yang | 19c5246 | 2012-01-04 17:09:44 -0500 | [diff] [blame] | 601 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 602 | EXT4_IOC_RESIZE_FS |
| 603 | Resize the filesystem to a new size. The number of blocks of resized |
| 604 | filesystem is passed in via 64 bit integer argument. The kernel |
| 605 | allocates bitmaps and inode table, the userspace tool thus just passes |
| 606 | the new number of blocks. |
Yongqiang Yang | 19c5246 | 2012-01-04 17:09:44 -0500 | [diff] [blame] | 607 | |
Darrick J. Wong | c0e3e04 | 2018-10-02 22:45:25 -0400 | [diff] [blame] | 608 | EXT4_IOC_SWAP_BOOT |
| 609 | Swap i_blocks and associated attributes (like i_blocks, i_size, |
| 610 | i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO |
| 611 | (#5). This is typically used to store a boot loader in a secure part of |
| 612 | the filesystem, where it can't be changed by a normal user by accident. |
| 613 | The data blocks of the previous boot loader will be associated with the |
| 614 | given inode. |
Lukas Czerner | 6f9524e | 2011-02-21 20:16:21 -0500 | [diff] [blame] | 615 | |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 616 | References |
| 617 | ========== |
| 618 | |
| 619 | kernel source: <file:fs/ext4/> |
| 620 | <file:fs/jbd2/> |
| 621 | |
| 622 | programs: http://e2fsprogs.sourceforge.net/ |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 623 | |
Alexander A. Klimov | 6b2484e | 2020-06-27 09:29:35 +0200 | [diff] [blame] | 624 | useful links: https://fedoraproject.org/wiki/ext3-devel |
Dave Kleikamp | fc513a3 | 2006-10-11 01:21:25 -0700 | [diff] [blame] | 625 | http://www.bullopensource.org/ext4/ |
Jose R. Santos | 93e3270 | 2008-07-11 19:27:31 -0400 | [diff] [blame] | 626 | http://ext4.wiki.kernel.org/index.php/Main_Page |
Alexander A. Klimov | 6b2484e | 2020-06-27 09:29:35 +0200 | [diff] [blame] | 627 | https://fedoraproject.org/wiki/Features/Ext4 |