Blame - Documentation/admin-guide/ext4.rst - SHIFTPHONES/mainline/linux

blob: 4c559e08d11ee43c8263888ed9c6770dadc813d0 [file] [log] [blame]

Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	1	.. SPDX-License-Identifier: GPL-2.0
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	2
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	3	========================
Darrick J. Wong	d309121	2018-10-05 19:11:59 -0400	[diff] [blame]	4	ext4 General Information
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	5	========================
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	6
Masanari Iida	c9f3f2d	2013-07-18 01:29:12 +0900	[diff] [blame]	7	Ext4 is an advanced level of the ext3 filesystem which incorporates
Diego Calleja	22359f5	2008-10-17 09:15:14 -0400	[diff] [blame]	8	scalability and reliability enhancements for supporting large filesystems
				9	(64 bit) in keeping with increasing disk capacities and state-of-the-art
				10	feature requirements.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	11
Diego Calleja	22359f5	2008-10-17 09:15:14 -0400	[diff] [blame]	12	Mailing list: linux-ext4@vger.kernel.org
				13	Web site: http://ext4.wiki.kernel.org
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	14
				15
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	16	Quick usage instructions
				17	========================
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	18
Diego Calleja	22359f5	2008-10-17 09:15:14 -0400	[diff] [blame]	19	Note: More extensive information for getting started with ext4 can be
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	20	found at the ext4 wiki site at the URL:
				21	http://ext4.wiki.kernel.org/index.php/Ext4_Howto
Diego Calleja	22359f5	2008-10-17 09:15:14 -0400	[diff] [blame]	22
Theodore Ts'o	0694f8c	2018-07-29 16:35:23 -0400	[diff] [blame]	23	- The latest version of e2fsprogs can be found at:
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	24
SeongJae Park	3bdadc86	2017-03-27 22:05:34 +0900	[diff] [blame]	25	https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	26
Theodore Ts'o	0694f8c	2018-07-29 16:35:23 -0400	[diff] [blame]	27	or
				28
				29	http://sourceforge.net/project/showfiles.php?group_id=2406
				30
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	31	or grab the latest git repository from:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	32
Theodore Ts'o	0694f8c	2018-07-29 16:35:23 -0400	[diff] [blame]	33	https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	34
Theodore Ts'o	0694f8c	2018-07-29 16:35:23 -0400	[diff] [blame]	35	- Create a new filesystem using the ext4 filesystem type:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	36
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	37	# mke2fs -t ext4 /dev/hda1
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	38
Theodore Ts'o	0694f8c	2018-07-29 16:35:23 -0400	[diff] [blame]	39	Or to configure an existing ext3 filesystem to support extents:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	40
Diego Calleja	22359f5	2008-10-17 09:15:14 -0400	[diff] [blame]	41	# tune2fs -O extents /dev/hda1
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	42
				43	If the filesystem was created with 128 byte inodes, it can be
Theodore Ts'o	0694f8c	2018-07-29 16:35:23 -0400	[diff] [blame]	44	converted to use 256 byte for greater efficiency via:
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	45
				46	# tune2fs -I 256 /dev/hda1
				47
Theodore Ts'o	0694f8c	2018-07-29 16:35:23 -0400	[diff] [blame]	48	- Mounting:
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	49
Theodore Ts'o	03010a3	2008-10-10 20:02:48 -0400	[diff] [blame]	50	# mount -t ext4 /dev/hda1 /wherever
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	51
Theodore Ts'o	8e1a485	2009-01-06 14:53:06 -0500	[diff] [blame]	52	- When comparing performance with other filesystems, it's always
				53	important to try multiple workloads; very often a subtle change in a
				54	workload parameter can completely change the ranking of which
				55	filesystems do well compared to others. When comparing versus ext3,
				56	note that ext4 enables write barriers by default, while ext3 does
				57	not enable write barriers by default. So it is useful to use
				58	explicitly specify whether barriers are enabled or not when via the
				59	'-o barriers=[0\|1]' mount option for both ext3 and ext4 filesystems
				60	for a fair comparison. When tuning ext3 for best benchmark numbers,
				61	it is often worthwhile to try changing the data journaling mode; '-o
Lukas Czerner	ad43401	2011-06-07 12:27:05 +0200	[diff] [blame]	62	data=writeback' can be faster for some workloads. (Note however that
				63	running mounted with data=writeback can potentially leave stale data
				64	exposed in recently written files in case of an unclean shutdown,
				65	which could be a security exposure in some situations.) Configuring
				66	the filesystem with a large journal can also be helpful for
				67	metadata-intensive workloads.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	68
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	69	Features
				70	========
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	71
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	72	Currently Available
				73	-------------------
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	74
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	75	* ability to use filesystems > 16TB (e2fsprogs support not available yet)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	76	* extent format reduces metadata overhead (RAM, IO for access, transactions)
				77	* extent format more robust in face of on-disk corruption due to magics,
Theodore Ts'o	8e1a485	2009-01-06 14:53:06 -0500	[diff] [blame]	78	* internal redundancy in tree
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	79	* improved file allocation (multi-block alloc)
Theodore Ts'o	722bde6	2009-02-23 00:51:57 -0500	[diff] [blame]	80	* lift 32000 subdirectory limit imposed by i_links_count[1]
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	81	* nsec timestamps for mtime, atime, ctime, create time
				82	* inode version field on disk (NFSv4, Lustre)
				83	* reduced e2fsck time via uninit_bg feature
				84	* journal checksumming for robustness, performance
				85	* persistent file preallocation (e.g for streaming media, databases)
				86	* ability to pack bitmaps and inode tables into larger virtual groups via the
				87	flex_bg feature
				88	* large file support
Pavel Machek	98bfa34	2017-09-16 13:48:37 +0200	[diff] [blame]	89	* inode allocation using large virtual block groups via flex_bg
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	90	* delayed allocation
				91	* large block (up to pagesize) support
Pavel Machek	98bfa34	2017-09-16 13:48:37 +0200	[diff] [blame]	92	* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	93	the ordering)
Gabriel Krisman Bertazi	0a790fe	2019-04-25 14:13:27 -0400	[diff] [blame]	94	* Case-insensitive file name lookups
Eric Biggers	2fdff4c	2019-12-26 09:40:07 -0600	[diff] [blame]	95	* file-based encryption support (fscrypt)
				96	* file-based verity support (fsverity)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	97
Theodore Ts'o	722bde6	2009-02-23 00:51:57 -0500	[diff] [blame]	98	[1] Filesystems with a block size of 1k may see a limit imposed by the
				99	directory hash tree having a maximum depth of two.
				100
Gabriel Krisman Bertazi	0a790fe	2019-04-25 14:13:27 -0400	[diff] [blame]	101	case-insensitive file name lookups
				102	======================================================
				103
				104	The case-insensitive file name lookup feature is supported on a
				105	per-directory basis, allowing the user to mix case-insensitive and
				106	case-sensitive directories in the same filesystem. It is enabled by
				107	flipping the +F inode attribute of an empty directory. The
				108	case-insensitive string match operation is only defined when we know how
				109	text in encoded in a byte sequence. For that reason, in order to enable
				110	case-insensitive directories, the filesystem must have the
				111	casefold feature, which stores the filesystem-wide encoding
				112	model used. By default, the charset adopted is the latest version of
				113	Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
				114	form. The comparison algorithm is implemented by normalizing the
				115	strings to the Canonical decomposition form, as defined by Unicode,
				116	followed by a byte per byte comparison.
				117
				118	The case-awareness is name-preserving on the disk, meaning that the file
				119	name provided by userspace is a byte-per-byte match to what is actually
				120	written in the disk. The Unicode normalization format used by the
				121	kernel is thus an internal representation, and not exposed to the
				122	userspace nor to the disk, with the important exception of disk hashes,
				123	used on large case-insensitive directories with DX feature. On DX
				124	directories, the hash must be calculated using the casefolded version of
				125	the filename, meaning that the normalization format used actually has an
				126	impact on where the directory entry is stored.
				127
				128	When we change from viewing filenames as opaque byte sequences to seeing
				129	them as encoded strings we need to address what happens when a program
				130	tries to create a file with an invalid name. The Unicode subsystem
				131	within the kernel leaves the decision of what to do in this case to the
				132	filesystem, which select its preferred behavior by enabling/disabling
				133	the strict mode. When Ext4 encounters one of those strings and the
				134	filesystem did not require strict mode, it falls back to considering the
				135	entire string as an opaque byte sequence, which still allows the user to
				136	operate on that file, but the case-insensitive lookups won't work.
				137
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	138	Options
				139	=======
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	140
				141	When mounting an ext4 filesystem, the following option are accepted:
				142	(*) == default
				143
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	144	ro
				145	Mount filesystem read only. Note that ext4 will replay the journal (and
				146	thus write to the partition) even when mounted "read only". The mount
				147	options "ro,noload" can be used to prevent writes to the filesystem.
Theodore Ts'o	8e1a485	2009-01-06 14:53:06 -0500	[diff] [blame]	148
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	149	journal_checksum
				150	Enable checksumming of the journal transactions. This will allow the
				151	recovery code in e2fsck and the kernel to detect corruption in the
				152	kernel. It is a compatible change and will be ignored by older
				153	kernels.
Linus Torvalds	d4da6c9	2009-11-02 10:15:27 -0800	[diff] [blame]	154
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	155	journal_async_commit
				156	Commit block can be written to disk without waiting for descriptor
				157	blocks. If enabled older kernels cannot mount the device. This will
				158	enable 'journal_checksum' internally.
Girish Shilamkar	818d276	2008-01-28 23:58:27 -0500	[diff] [blame]	159
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	160	journal_path=path, journal_dev=devnum
				161	When the external journal device's major/minor numbers have changed,
				162	these options allow the user to specify the new journal location. The
				163	journal device is identified through either its new major/minor numbers
				164	encoded in devnum, or via a path to the device.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	165
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	166	norecovery, noload
				167	Don't load the journal on mounting. Note that if the filesystem was
				168	not unmounted cleanly, skipping the journal replay will lead to the
				169	filesystem containing inconsistencies that can lead to any number of
				170	problems.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	171
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	172	data=journal
				173	All data are committed into the journal prior to being written into the
				174	main file system. Enabling this mode will disable delayed allocation
				175	and O_DIRECT support.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	176
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	177	data=ordered (*)
				178	All data are forced directly out to the main file system prior to its
				179	metadata being committed to the journal.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	180
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	181	data=writeback
				182	Data ordering is not preserved, data may be written into the main file
				183	system after its metadata has been committed to the journal.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	184
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	185	commit=nrsec (*)
Jan Kara	23f6b02	2019-12-18 12:12:10 +0100	[diff] [blame]	186	This setting limits the maximum age of the running transaction to
				187	'nrsec' seconds. The default value is 5 seconds. This means that if
				188	you lose your power, you will lose as much as the latest 5 seconds of
				189	metadata changes (your filesystem will not be damaged though, thanks
				190	to the journaling). This default value (or any low value) will hurt
				191	performance, but it's good for data-safety. Setting it to 0 will have
				192	the same effect as leaving it at the default (5 seconds). Setting it
				193	to very large values will improve performance. Note that due to
				194	delayed allocation even older data can be lost on power failure since
				195	writeback of those data begins only after time set in
				196	/proc/sys/vm/dirty_expire_centisecs.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	197
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	198	barrier=<0\|1()>, barrier(), nobarrier
				199	This enables/disables the use of write barriers in the jbd code.
				200	barrier=0 disables, barrier=1 enables. This also requires an IO stack
				201	which can support barriers, and if jbd gets an error on a barrier
				202	write, it will disable again with a warning. Write barriers enforce
				203	proper on-disk ordering of journal commits, making volatile disk write
				204	caches safe to use, at some performance penalty. If your disks are
				205	battery-backed in one way or another, disabling barriers may safely
				206	improve performance. The mount options "barrier" and "nobarrier" can
				207	also be used to enable or disable barriers, for consistency with other
				208	ext4 mount options.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	209
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	210	inode_readahead_blks=n
				211	This tuning parameter controls the maximum number of inode table blocks
				212	that ext4's inode table readahead algorithm will pre-read into the
				213	buffer cache. The default value is 32 blocks.
Theodore Ts'o	240799c	2008-10-09 23:53:47 -0400	[diff] [blame]	214
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	215	nouser_xattr
				216	Disables Extended User Attributes. See the attr(5) manual page for
				217	more information about extended attributes.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	218
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	219	noacl
				220	This option disables POSIX Access Control List support. If ACL support
				221	is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL
				222	is enabled by default on mount. See the acl(5) manual page for more
				223	information about acl.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	224
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	225	bsddf (*)
				226	Make 'df' act like BSD.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	227
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	228	minixdf
				229	Make 'df' act like Minix.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	230
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	231	debug
				232	Extra debugging information is sent to syslog.
Theodore Ts'o	8a8a205	2009-06-13 10:08:59 -0400	[diff] [blame]	233
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	234	abort
				235	Simulate the effects of calling ext4_abort() for debugging purposes.
				236	This is normally used while remounting a filesystem which is already
				237	mounted.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	238
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	239	errors=remount-ro
				240	Remount the filesystem read-only on an error.
Hidehiro Kawai	5bf5683	2008-10-10 22:12:43 -0400	[diff] [blame]	241
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	242	errors=continue
				243	Keep going on a filesystem error.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	244
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	245	errors=panic
				246	Panic and halt the machine if an error occurs. (These mount options
				247	override the errors behavior specified in the superblock, which can be
				248	configured using tune2fs)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	249
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	250	data_err=ignore(*)
				251	Just print an error message if an error occurs in a file data buffer in
				252	ordered mode.
				253	data_err=abort
				254	Abort the journal if an error occurs in a file data buffer in ordered
				255	mode.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	256
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	257	grpid \| bsdgroups
				258	New objects have the group ID of their parent.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	259
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	260	nogrpid (*) \| sysvgroups
				261	New objects have the group ID of their creator.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	262
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	263	resgid=n
				264	The group ID which may use the reserved blocks.
Jan Kara	1358870	2009-09-18 12:22:29 -0400	[diff] [blame]	265
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	266	resuid=n
				267	The user ID which may use the reserved blocks.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	268
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	269	sb=
				270	Use alternate superblock at this location.
Jan Kara	8365388	2009-09-29 15:59:34 -0400	[diff] [blame]	271
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	272	quota, noquota, grpquota, usrquota
				273	These options are ignored by the filesystem. They are used only by
				274	quota tools to recognize volumes where quota should be turned on. See
				275	documentation in the quota-tools package for more details
				276	(http://sourceforge.net/projects/linuxquota).
Theodore Ts'o	240799c	2008-10-09 23:53:47 -0400	[diff] [blame]	277
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	278	jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file>
				279	These options tell filesystem details about quota so that quota
				280	information can be properly updated during journal replay. They replace
				281	the above quota options. See documentation in the quota-tools package
				282	for more details (http://sourceforge.net/projects/linuxquota).
Theodore Ts'o	3077384	2009-01-03 20:27:38 -0500	[diff] [blame]	283
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	284	stripe=n
				285	Number of filesystem blocks that mballoc will try to use for allocation
				286	size and alignment. For RAID5/6 systems this should be the number of
				287	data disks * RAID chunk size in file system blocks.
Theodore Ts'o	3077384	2009-01-03 20:27:38 -0500	[diff] [blame]	288
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	289	delalloc (*)
				290	Defer block allocation until just before ext4 writes out the block(s)
				291	in question. This allows ext4 to better allocation decisions more
				292	efficiently.
Theodore Ts'o	b3881f7	2009-01-05 22:46:26 -0500	[diff] [blame]	293
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	294	nodelalloc
				295	Disable delayed allocation. Blocks are allocated when the data is
				296	copied from userspace to the page cache, either via the write(2) system
				297	call or when an mmap'ed page which was previously unallocated is
				298	written for the first time.
Theodore Ts'o	06705bf	2009-03-28 10:59:57 -0400	[diff] [blame]	299
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	300	max_batch_time=usec
				301	Maximum amount of time ext4 should wait for additional filesystem
				302	operations to be batch together with a synchronous write operation.
				303	Since a synchronous write operation is going to force a commit and then
				304	a wait for the I/O complete, it doesn't cost much, and can be a huge
				305	throughput win, we wait for a small amount of time to see if any other
				306	transactions can piggyback on the synchronous write. The algorithm
				307	used is designed to automatically tune for the speed of the disk, by
				308	measuring the amount of time (on average) that it takes to finish
				309	committing a transaction. Call this time the "commit time". If the
				310	time that the transaction has been running is less than the commit
				311	time, ext4 will try sleeping for the commit time to see if other
				312	operations will join the transaction. The commit time is capped by
				313	the max_batch_time, which defaults to 15000us (15ms). This
				314	optimization can be turned off entirely by setting max_batch_time to 0.
Lukas Czerner	bfff687	2010-10-27 21:30:05 -0400	[diff] [blame]	315
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	316	min_batch_time=usec
				317	This parameter sets the commit time (as described above) to be at least
				318	min_batch_time. It defaults to zero microseconds. Increasing this
				319	parameter may improve the throughput of multi-threaded, synchronous
				320	workloads on very fast disks, at the cost of increasing latency.
Lukas Czerner	bfff687	2010-10-27 21:30:05 -0400	[diff] [blame]	321
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	322	journal_ioprio=prio
				323	The I/O priority (from 0 to 7, where 0 is the highest priority) which
				324	should be used for I/O operations submitted by kjournald2 during a
				325	commit operation. This defaults to 3, which is a slightly higher
				326	priority than the default I/O priority.
Eric Sandeen	5328e63	2009-11-19 14:25:42 -0500	[diff] [blame]	327
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	328	auto_da_alloc(*), noauto_da_alloc
				329	Many broken applications don't use fsync() when replacing existing
				330	files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/
				331	rename("foo.new", "foo"), or worse yet, fd = open("foo",
				332	O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4
				333	will detect the replace-via-rename and replace-via-truncate patterns
				334	and force that any delayed allocation blocks are allocated such that at
				335	the next journal commit, in the default data=ordered mode, the data
				336	blocks of the new file are forced to disk before the rename() operation
				337	is committed. This provides roughly the same level of guarantees as
				338	ext3, and avoids the "zero-length" problem that can happen when a
				339	system crashes before the delayed allocation blocks are forced to disk.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	340
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	341	noinit_itable
				342	Do not initialize any uninitialized inode table blocks in the
				343	background. This feature may be used by installation CD's so that the
				344	install process can complete as quickly as possible; the inode table
				345	initialization process would then be deferred until the next time the
				346	file system is unmounted.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	347
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	348	init_itable=n
				349	The lazy itable init code will wait n times the number of milliseconds
				350	it took to zero out the previous block group's inode table. This
				351	minimizes the impact on the system performance while file system's
				352	inode table is being initialized.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	353
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	354	discard, nodiscard(*)
				355	Controls whether ext4 should issue discard/TRIM commands to the
				356	underlying block device when blocks are freed. This is useful for SSD
				357	devices and sparse/thinly-provisioned LUNs, but it is off by default
				358	until sufficient testing has been done.
Theodore Ts'o	df981d0	2012-08-17 09:48:17 -0400	[diff] [blame]	359
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	360	nouid32
				361	Disables 32-bit UIDs and GIDs. This is for interoperability with
				362	older kernels which only store and expect 16-bit values.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	363
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	364	block_validity(*), noblock_validity
				365	These options enable or disable the in-kernel facility for tracking
				366	filesystem metadata blocks within internal data structures. This
				367	allows multi- block allocator and other routines to notice bugs or
				368	corrupted allocation bitmaps which cause blocks to be allocated which
				369	overlap with filesystem metadata blocks.
				370
				371	dioread_lock, dioread_nolock
				372	Controls whether or not ext4 should use the DIO read locking. If the
				373	dioread_nolock option is specified ext4 will allocate uninitialized
				374	extent before buffer write and convert the extent to initialized after
				375	IO completes. This approach allows ext4 code to avoid using inode
				376	mutex, which improves scalability on high speed storages. However this
				377	does not work with data journaling and dioread_nolock option will be
				378	ignored with kernel warning. Note that dioread_nolock code path is only
				379	used for extent-based files. Because of the restrictions this options
				380	comprises it is off by default (e.g. dioread_lock).
				381
				382	max_dir_size_kb=n
				383	This limits the size of directories so that any attempt to expand them
				384	beyond the specified limit in kilobytes will cause an ENOSPC error.
				385	This is useful in memory constrained environments, where a very large
				386	directory can cause severe performance problems or even provoke the Out
				387	Of Memory killer. (For example, if there is only 512mb memory
				388	available, a 176mb directory may seriously cramp the system's style.)
				389
				390	i_version
				391	Enable 64-bit inode version support. This option is off by default.
				392
				393	dax
				394	Use direct access (no page cache). See
Kir Kolyshkin	a9edc03	2021-06-10 20:00:44 -0700	[diff] [blame]	395	Documentation/filesystems/dax.rst. Note that this option is
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	396	incompatible with data=journal.
Ross Zwisler	923ae0f	2015-02-16 15:59:38 -0800	[diff] [blame]	397
Eric Biggers	4f74d15	2020-07-02 01:56:07 +0000	[diff] [blame]	398	inlinecrypt
				399	When possible, encrypt/decrypt the contents of encrypted files using the
				400	blk-crypto framework rather than filesystem-layer encryption. This
				401	allows the use of inline encryption hardware. The on-disk format is
				402	unaffected. For more details, see
				403	Documentation/block/inline-encryption.rst.
				404
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	405	Data Mode
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	406	=========
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	407	There are 3 different data modes:
				408
				409	* writeback mode
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	410
				411	In data=writeback mode, ext4 does not journal data at all. This mode provides
				412	a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
				413	mode - metadata journaling. A crash+recovery can cause incorrect data to
				414	appear in files which were written shortly before the crash. This mode will
				415	typically provide the best ext4 performance.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	416
				417	* ordered mode
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	418
				419	In data=ordered mode, ext4 only officially journals metadata, but it logically
				420	groups metadata information related to data changes with the data blocks into
				421	a single unit called a transaction. When it's time to write the new metadata
				422	out to disk, the associated data blocks are written first. In general, this
				423	mode performs slightly slower than writeback but significantly faster than
				424	journal mode.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	425
				426	* journal mode
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	427
				428	data=journal mode provides full data and metadata journaling. All new data is
				429	written to the journal first, and then to its final location. In the event of
				430	a crash, the journal can be replayed, bringing both data and metadata into a
				431	consistent state. This mode is the slowest except when data needs to be read
				432	from and written to disk at the same time where it outperforms all others
				433	modes. Enabling this mode will disable delayed allocation and O_DIRECT
				434	support.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	435
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	436	/proc entries
				437	=============
				438
				439	Information about mounted ext4 file systems can be found in
				440	/proc/fs/ext4. Each mounted filesystem will have a directory in
				441	/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
				442	/proc/fs/ext4/dm-0). The files in each per-device directory are shown
				443	in table below.
				444
				445	Files in /proc/fs/ext4/<devname>
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	446
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	447	mb_groups
				448	details of multiblock allocator buddy cache of free blocks
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	449
				450	/sys entries
				451	============
				452
				453	Information about mounted ext4 file systems can be found in
				454	/sys/fs/ext4. Each mounted filesystem will have a directory in
				455	/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or
				456	/sys/fs/ext4/dm-0). The files in each per-device directory are shown
				457	in table below.
				458
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	459	Files in /sys/fs/ext4/<devname>:
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	460
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	461	(see also Documentation/ABI/testing/sysfs-fs-ext4)
				462
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	463	delayed_allocation_blocks
				464	This file is read-only and shows the number of blocks that are dirty in
				465	the page cache, but which do not have their location in the filesystem
				466	allocated yet.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	467
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	468	inode_goal
				469	Tuning parameter which (if non-zero) controls the goal inode used by
				470	the inode allocator in preference to all other allocation heuristics.
				471	This is intended for debugging use only, and should be 0 on production
				472	systems.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	473
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	474	inode_readahead_blks
				475	Tuning parameter which controls the maximum number of inode table
				476	blocks that ext4's inode table readahead algorithm will pre-read into
				477	the buffer cache.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	478
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	479	lifetime_write_kbytes
				480	This file is read-only and shows the number of kilobytes of data that
				481	have been written to this filesystem since it was created.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	482
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	483	max_writeback_mb_bump
				484	The maximum number of megabytes the writeback code will try to write
				485	out before move on to another inode.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	486
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	487	mb_group_prealloc
				488	The multiblock allocator will round up allocation requests to a
				489	multiple of this tuning parameter if the stripe size is not set in the
				490	ext4 superblock
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	491
brookxu	27bc446	2020-08-17 15:36:15 +0800	[diff] [blame]	492	mb_max_inode_prealloc
				493	The maximum length of per-inode ext4_prealloc_space list.
				494
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	495	mb_max_to_scan
				496	The maximum number of extents the multiblock allocator will search to
				497	find the best extent.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	498
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	499	mb_min_to_scan
				500	The minimum number of extents the multiblock allocator will search to
				501	find the best extent.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	502
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	503	mb_order2_req
				504	Tuning parameter which controls the minimum size for requests (as a
				505	power of 2) where the buddy cache is used.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	506
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	507	mb_stats
				508	Controls whether the multiblock allocator should collect statistics,
				509	which are shown during the unmount. 1 means to collect statistics, 0
				510	means not to collect statistics.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	511
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	512	mb_stream_req
				513	Files which have fewer blocks than this tunable parameter will have
				514	their blocks allocated out of a block group specific preallocation
				515	pool, so that small files are packed closely together. Each large file
				516	will have its blocks allocated out of its own unique preallocation
				517	pool.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	518
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	519	session_write_kbytes
				520	This file is read-only and shows the number of kilobytes of data that
				521	have been written to this filesystem since it was mounted.
Lukas Czerner	27dd438	2013-04-09 22:11:22 -0400	[diff] [blame]	522
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	523	reserved_clusters
				524	This is RW file and contains number of reserved clusters in the file
				525	system which will be used in the specific situations to avoid costly
				526	zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or
				527	4096 clusters, whichever is smaller and this can be changed however it
				528	can never exceed number of clusters in the file system. If there is not
				529	enough space for the reserved space when mounting the file mount will
				530	_not_ fail.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	531
				532	Ioctls
				533	======
				534
Eric Biggers	cb29a02	2020-07-14 16:09:09 -0700	[diff] [blame]	535	Ext4 implements various ioctls which can be used by applications to access
				536	ext4-specific functionality. An incomplete list of these ioctls is shown in the
				537	table below. This list includes truly ext4-specific ioctls (``EXT4_IOC_*``) as
				538	well as ioctls that may have been ext4-specific originally but are now supported
				539	by some other filesystem(s) too (``FS_IOC_*``).
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	540
Eric Biggers	cb29a02	2020-07-14 16:09:09 -0700	[diff] [blame]	541	Table of Ext4 ioctls
Darrick J. Wong	489fcb9	2018-07-29 15:36:00 -0400	[diff] [blame]	542
Eric Biggers	cb29a02	2020-07-14 16:09:09 -0700	[diff] [blame]	543	FS_IOC_GETFLAGS
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	544	Get additional attributes associated with inode. The ioctl argument is
Eric Biggers	cb29a02	2020-07-14 16:09:09 -0700	[diff] [blame]	545	an integer bitfield, with bit values described in ext4.h.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	546
Eric Biggers	cb29a02	2020-07-14 16:09:09 -0700	[diff] [blame]	547	FS_IOC_SETFLAGS
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	548	Set additional attributes associated with inode. The ioctl argument is
Eric Biggers	cb29a02	2020-07-14 16:09:09 -0700	[diff] [blame]	549	an integer bitfield, with bit values described in ext4.h.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	550
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	551	EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD
				552	Get the inode i_generation number stored for each inode. The
				553	i_generation number is normally changed only when new inode is created
				554	and it is particularly useful for network filesystems. The '_OLD'
				555	version of this ioctl is an alias for FS_IOC_GETVERSION.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	556
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	557	EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD
				558	Set the inode i_generation number stored for each inode. The '_OLD'
				559	version of this ioctl is an alias for FS_IOC_SETVERSION.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	560
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	561	EXT4_IOC_GROUP_EXTEND
				562	This ioctl has the same purpose as the resize mount option. It allows
				563	to resize filesystem to the end of the last existing block group,
				564	further resize has to be done with resize2fs, either online, or
				565	offline. The argument points to the unsigned logn number representing
				566	the filesystem new block count.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	567
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	568	EXT4_IOC_MOVE_EXT
				569	Move the block extents from orig_fd (the one this ioctl is pointing to)
				570	to the donor_fd (the one specified in move_extent structure passed as
				571	an argument to this ioctl). Then, exchange inode metadata between
				572	orig_fd and donor_fd. This is especially useful for online
				573	defragmentation, because the allocator has the opportunity to allocate
				574	moved blocks better, ideally into one contiguous extent.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	575
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	576	EXT4_IOC_GROUP_ADD
				577	Add a new group descriptor to an existing or new group descriptor
				578	block. The new group descriptor is described by ext4_new_group_input
				579	structure, which is passed as an argument to this ioctl. This is
				580	especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which
				581	allows online resize of the filesystem to the end of the last existing
				582	block group. Those two ioctls combined is used in userspace online
				583	resize tool (e.g. resize2fs).
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	584
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	585	EXT4_IOC_MIGRATE
				586	This ioctl operates on the filesystem itself. It converts (migrates)
				587	ext3 indirect block mapped inode to ext4 extent mapped inode by walking
				588	through indirect block mapping of the original inode and converting
				589	contiguous block ranges into ext4 extents of the temporary inode. Then,
				590	inodes are swapped. This ioctl might help, when migrating from ext3 to
				591	ext4 filesystem, however suggestion is to create fresh ext4 filesystem
				592	and copy data from the backup. Note, that filesystem has to support
				593	extents for this ioctl to work.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	594
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	595	EXT4_IOC_ALLOC_DA_BLKS
				596	Force all of the delay allocated blocks to be allocated to preserve
				597	application-expected ext3 behaviour. Note that this will also start
				598	triggering a write of the data blocks, but this behaviour may change in
				599	the future as it is not necessary and has been done this way only for
				600	sake of simplicity.
Yongqiang Yang	19c5246	2012-01-04 17:09:44 -0500	[diff] [blame]	601
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	602	EXT4_IOC_RESIZE_FS
				603	Resize the filesystem to a new size. The number of blocks of resized
				604	filesystem is passed in via 64 bit integer argument. The kernel
				605	allocates bitmaps and inode table, the userspace tool thus just passes
				606	the new number of blocks.
Yongqiang Yang	19c5246	2012-01-04 17:09:44 -0500	[diff] [blame]	607
Darrick J. Wong	c0e3e04	2018-10-02 22:45:25 -0400	[diff] [blame]	608	EXT4_IOC_SWAP_BOOT
				609	Swap i_blocks and associated attributes (like i_blocks, i_size,
				610	i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO
				611	(#5). This is typically used to store a boot loader in a secure part of
				612	the filesystem, where it can't be changed by a normal user by accident.
				613	The data blocks of the previous boot loader will be associated with the
				614	given inode.
Lukas Czerner	6f9524e	2011-02-21 20:16:21 -0500	[diff] [blame]	615
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	616	References
				617	==========
				618
				619	kernel source: <file:fs/ext4/>
				620	<file:fs/jbd2/>
				621
				622	programs: http://e2fsprogs.sourceforge.net/
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	623
Alexander A. Klimov	6b2484e	2020-06-27 09:29:35 +0200	[diff] [blame]	624	useful links: https://fedoraproject.org/wiki/ext3-devel
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	625	http://www.bullopensource.org/ext4/
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	626	http://ext4.wiki.kernel.org/index.php/Main_Page
Alexander A. Klimov	6b2484e	2020-06-27 09:29:35 +0200	[diff] [blame]	627	https://fedoraproject.org/wiki/Features/Ext4