Blame - Documentation/filesystems/overlayfs.rst - SHIFTPHONES/mainline/linux

blob: 7da6c30ed596ad72a446b5dc81b52b305b77b069 [file] [log] [blame]

Amir Goldstein	35c6cb4	2019-11-25 11:51:25 +0200	[diff] [blame]	1	.. SPDX-License-Identifier: GPL-2.0
				2
NeilBrown	a907c90	2015-11-07 17:38:58 +1100	[diff] [blame]	3	Written by: Neil Brown
				4	Please see MAINTAINERS file for where to send questions.
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	5
				6	Overlay Filesystem
				7	==================
				8
				9	This document describes a prototype for a new approach to providing
				10	overlay-filesystem functionality in Linux (sometimes referred to as
				11	union-filesystems). An overlay-filesystem tries to present a
				12	filesystem which is the result over overlaying one filesystem on top
				13	of the other.
				14
Amir Goldstein	1614901	2018-03-29 16:36:56 +0300	[diff] [blame]	15
				16	Overlay objects
				17	---------------
				18
				19	The overlay filesystem approach is 'hybrid', because the objects that
				20	appear in the filesystem do not always appear to belong to that filesystem.
				21	In many cases, an object accessed in the union will be indistinguishable
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	22	from accessing the corresponding object from the original filesystem.
				23	This is most obvious from the 'st_dev' field returned by stat(2).
				24
				25	While directories will report an st_dev from the overlay-filesystem,
Amir Goldstein	65f2673	2017-04-25 17:28:31 +0300	[diff] [blame]	26	non-directory objects may report an st_dev from the lower filesystem or
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	27	upper filesystem that is providing the object. Similarly st_ino will
				28	only be unique when combined with st_dev, and both of these can change
				29	over the lifetime of a non-directory object. Many applications and
				30	tools ignore these values and will not be affected.
				31
Amir Goldstein	65f2673	2017-04-25 17:28:31 +0300	[diff] [blame]	32	In the special case of all overlay layers on the same underlying
				33	filesystem, all objects will report an st_dev from the overlay
				34	filesystem and st_ino from the underlying filesystem. This will
				35	make the overlay mount more compliant with filesystem scanners and
				36	overlay objects will be distinguishable from the corresponding
				37	objects in the original filesystem.
				38
Amir Goldstein	1614901	2018-03-29 16:36:56 +0300	[diff] [blame]	39	On 64bit systems, even if all overlay layers are not on the same
				40	underlying filesystem, the same compliant behavior could be achieved
				41	with the "xino" feature. The "xino" feature composes a unique object
				42	identifier from the real object st_ino and an underlying fsid index.
Amir Goldstein	b0e0f69	2021-03-09 18:26:54 +0200	[diff] [blame]	43	The "xino" feature uses the high inode number bits for fsid, because the
				44	underlying filesystems rarely use the high inode number bits. In case
Amir Goldstein	2eda9ea	2020-02-21 16:34:46 +0200	[diff] [blame]	45	the underlying inode number does overflow into the high xino bits, overlay
				46	filesystem will fall back to the non xino behavior for that inode.
				47
Amir Goldstein	b0e0f69	2021-03-09 18:26:54 +0200	[diff] [blame]	48	The "xino" feature can be enabled with the "-o xino=on" overlay mount option.
				49	If all underlying filesystems support NFS file handles, the value of st_ino
				50	for overlay filesystem objects is not only unique, but also persistent over
				51	the lifetime of the filesystem. The "-o xino=auto" overlay mount option
				52	enables the "xino" feature only if the persistent st_ino requirement is met.
				53
Amir Goldstein	2eda9ea	2020-02-21 16:34:46 +0200	[diff] [blame]	54	The following table summarizes what can be expected in different overlay
				55	configurations.
				56
				57	Inode properties
				58	````````````````
				59
				60	+--------------+------------+------------+-----------------+----------------+
				61	\|Configuration \| Persistent \| Uniform \| st_ino == d_ino \| d_ino == i_ino \|
				62	\| \| st_ino \| st_dev \| \| [*] \|
				63	+==============+=====+======+=====+======+========+========+========+=======+
				64	\| \| dir \| !dir \| dir \| !dir \| dir + !dir \| dir \| !dir \|
				65	+--------------+-----+------+-----+------+--------+--------+--------+-------+
				66	\| All layers \| Y \| Y \| Y \| Y \| Y \| Y \| Y \| Y \|
				67	\| on same fs \| \| \| \| \| \| \| \| \|
				68	+--------------+-----+------+-----+------+--------+--------+--------+-------+
Amir Goldstein	b0e0f69	2021-03-09 18:26:54 +0200	[diff] [blame]	69	\| Layers not \| N \| N \| Y \| N \| N \| Y \| N \| Y \|
Amir Goldstein	2eda9ea	2020-02-21 16:34:46 +0200	[diff] [blame]	70	\| on same fs, \| \| \| \| \| \| \| \| \|
				71	\| xino=off \| \| \| \| \| \| \| \| \|
				72	+--------------+-----+------+-----+------+--------+--------+--------+-------+
				73	\| xino=on/auto \| Y \| Y \| Y \| Y \| Y \| Y \| Y \| Y \|
Amir Goldstein	2eda9ea	2020-02-21 16:34:46 +0200	[diff] [blame]	74	+--------------+-----+------+-----+------+--------+--------+--------+-------+
Amir Goldstein	b0e0f69	2021-03-09 18:26:54 +0200	[diff] [blame]	75	\| xino=on/auto,\| N \| N \| Y \| N \| N \| Y \| N \| Y \|
Amir Goldstein	2eda9ea	2020-02-21 16:34:46 +0200	[diff] [blame]	76	\| ino overflow \| \| \| \| \| \| \| \| \|
				77	+--------------+-----+------+-----+------+--------+--------+--------+-------+
				78
				79	[*] nfsd v3 readdirplus verifies d_ino == i_ino. i_ino is exposed via several
				80	/proc files, such as /proc/locks and /proc/self/fdinfo/<fd> of an inotify
				81	file descriptor.
Amir Goldstein	1614901	2018-03-29 16:36:56 +0300	[diff] [blame]	82
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	83	Upper and Lower
				84	---------------
				85
				86	An overlay filesystem combines two filesystems - an 'upper' filesystem
				87	and a 'lower' filesystem. When a name exists in both filesystems, the
				88	object in the 'upper' filesystem is visible while the object in the
				89	'lower' filesystem is either hidden or, in the case of directories,
				90	merged with the 'upper' object.
				91
				92	It would be more correct to refer to an upper and lower 'directory
				93	tree' rather than 'filesystem' as it is quite possible for both
				94	directory trees to be in the same filesystem and there is no
				95	requirement that the root of a filesystem be given for either upper or
				96	lower.
				97
Miklos Szeredi	58afaf5	2020-11-12 11:31:55 +0100	[diff] [blame]	98	A wide range of filesystems supported by Linux can be the lower filesystem,
				99	but not all filesystems that are mountable by Linux have the features
				100	needed for OverlayFS to work. The lower filesystem does not need to be
				101	writable. The lower filesystem can even be another overlayfs. The upper
				102	filesystem will normally be writable and if it is it must support the
Miklos Szeredi	2d2f2d7	2020-12-14 15:26:14 +0100	[diff] [blame]	103	creation of trusted.* and/or user.* extended attributes, and must provide
				104	valid d_type in readdir responses, so NFS is not suitable.
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	105
				106	A read-only overlay of two read-only filesystems may use any
				107	filesystem type.
				108
				109	Directories
				110	-----------
				111
				112	Overlaying mainly involves directories. If a given name appears in both
				113	upper and lower filesystems and refers to a non-directory in either,
				114	then the lower object is hidden - the name refers only to the upper
				115	object.
				116
				117	Where both upper and lower objects are directories, a merged directory
				118	is formed.
				119
				120	At mount time, the two directories given as mount options "lowerdir" and
				121	"upperdir" are combined into a merged directory:
				122
Miklos Szeredi	ef94b18	2014-11-20 16:39:59 +0100	[diff] [blame]	123	mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,\
Amir Goldstein	c3c8699	2016-12-08 09:49:51 +0200	[diff] [blame]	124	workdir=/work /merged
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	125
				126	The "workdir" needs to be an empty directory on the same filesystem
				127	as upperdir.
				128
				129	Then whenever a lookup is requested in such a merged directory, the
				130	lookup is performed in each actual directory and the combined result
				131	is cached in the dentry belonging to the overlay filesystem. If both
				132	actual lookups find directories, both are stored and a merged
				133	directory is created, otherwise only one is stored: the upper if it
				134	exists, else the lower.
				135
				136	Only the lists of names from directories are merged. Other content
				137	such as metadata and extended attributes are reported for the upper
				138	directory only. These attributes of the lower directory are hidden.
				139
				140	whiteouts and opaque directories
				141	--------------------------------
				142
				143	In order to support rm and rmdir without changing the lower
				144	filesystem, an overlay filesystem needs to record in the upper filesystem
				145	that files have been removed. This is done using whiteouts and opaque
				146	directories (non-directories are always opaque).
				147
				148	A whiteout is created as a character device with 0/0 device number.
				149	When a whiteout is found in the upper level of a merged directory, any
				150	matching name in the lower level is ignored, and the whiteout itself
				151	is also hidden.
				152
				153	A directory is made opaque by setting the xattr "trusted.overlay.opaque"
				154	to "y". Where the upper filesystem contains an opaque directory, any
				155	directory in the lower filesystem with the same name is ignored.
				156
				157	readdir
				158	-------
				159
				160	When a 'readdir' request is made on a merged directory, the upper and
				161	lower directories are each read and the name lists merged in the
				162	obvious way (upper is read first, then lower - entries that already
				163	exist are not re-added). This merged name list is cached in the
				164	'struct file' and so remains as long as the file is kept open. If the
				165	directory is opened and read by two processes at the same time, they
				166	will each have separate caches. A seekdir to the start of the
				167	directory (offset 0) followed by a readdir will cause the cache to be
				168	discarded and rebuilt.
				169
				170	This means that changes to the merged directory do not appear while a
				171	directory is being read. This is unlikely to be noticed by many
				172	programs.
				173
				174	seek offsets are assigned sequentially when the directories are read.
				175	Thus if
Amir Goldstein	c3c8699	2016-12-08 09:49:51 +0200	[diff] [blame]	176
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	177	- read part of a directory
				178	- remember an offset, and close the directory
				179	- re-open the directory some time later
				180	- seek to the remembered offset
				181
				182	there may be little correlation between the old and new locations in
				183	the list of filenames, particularly if anything has changed in the
				184	directory.
				185
				186	Readdir on directories that are not merged is simply handled by the
				187	underlying directory (upper or lower).
				188
Miklos Szeredi	a6c6065	2016-12-16 11:02:56 +0100	[diff] [blame]	189	renaming directories
				190	--------------------
				191
				192	When renaming a directory that is on the lower layer or merged (i.e. the
				193	directory was not created on the upper layer to start with) overlayfs can
				194	handle it in two different ways:
				195
Amir Goldstein	c3c8699	2016-12-08 09:49:51 +0200	[diff] [blame]	196	1. return EXDEV error: this error is returned by rename(2) when trying to
Miklos Szeredi	a6c6065	2016-12-16 11:02:56 +0100	[diff] [blame]	197	move a file or directory across filesystem boundaries. Hence
				198	applications are usually prepared to hande this error (mv(1) for example
				199	recursively copies the directory tree). This is the default behavior.
				200
Amir Goldstein	c3c8699	2016-12-08 09:49:51 +0200	[diff] [blame]	201	2. If the "redirect_dir" feature is enabled, then the directory will be
Miklos Szeredi	a6c6065	2016-12-16 11:02:56 +0100	[diff] [blame]	202	copied up (but not the contents). Then the "trusted.overlay.redirect"
				203	extended attribute is set to the path of the original location from the
				204	root of the overlay. Finally the directory is moved to the new
				205	location.
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	206
Miklos Szeredi	438c84c	2017-12-11 11:28:10 +0100	[diff] [blame]	207	There are several ways to tune the "redirect_dir" feature.
				208
				209	Kernel config options:
				210
				211	- OVERLAY_FS_REDIRECT_DIR:
				212	If this is enabled, then redirect_dir is turned on by default.
				213	- OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW:
				214	If this is enabled, then redirects are always followed by default. Enabling
				215	this results in a less secure configuration. Enable this option only when
				216	worried about backward compatibility with kernels that have the redirect_dir
				217	feature and follow redirects even if turned off.
				218
Amir Goldstein	35c6cb4	2019-11-25 11:51:25 +0200	[diff] [blame]	219	Module options (can also be changed through /sys/module/overlay/parameters/):
Miklos Szeredi	438c84c	2017-12-11 11:28:10 +0100	[diff] [blame]	220
				221	- "redirect_dir=BOOL":
				222	See OVERLAY_FS_REDIRECT_DIR kernel config option above.
				223	- "redirect_always_follow=BOOL":
				224	See OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW kernel config option above.
				225	- "redirect_max=NUM":
				226	The maximum number of bytes in an absolute redirect (default is 256).
				227
				228	Mount options:
				229
				230	- "redirect_dir=on":
				231	Redirects are enabled.
				232	- "redirect_dir=follow":
				233	Redirects are not created, but followed.
				234	- "redirect_dir=off":
				235	Redirects are not created and only followed if "redirect_always_follow"
				236	feature is enabled in the kernel/module config.
				237	- "redirect_dir=nofollow":
				238	Redirects are not created and not followed (equivalent to "redirect_dir=off"
				239	if "redirect_always_follow" feature is not enabled).
				240
Amir Goldstein	f168f10	2018-01-19 11:26:53 +0200	[diff] [blame]	241	When the NFS export feature is enabled, every copied up directory is
				242	indexed by the file handle of the lower inode and a file handle of the
				243	upper directory is stored in a "trusted.overlay.upper" extended attribute
				244	on the index entry. On lookup of a merged directory, if the upper
				245	directory does not match the file handle stores in the index, that is an
				246	indication that multiple upper directories may be redirected to the same
				247	lower directory. In that case, lookup returns an error and warns about
				248	a possible inconsistency.
				249
				250	Because lower layer redirects cannot be verified with the index, enabling
				251	NFS export support on an overlay filesystem with no upper layer requires
				252	turning off redirect follow (e.g. "redirect_dir=nofollow").
				253
Amir Goldstein	f168f10	2018-01-19 11:26:53 +0200	[diff] [blame]	254
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	255	Non-directories
				256	---------------
				257
				258	Objects that are not directories (files, symlinks, device-special
				259	files etc.) are presented either from the upper or lower filesystem as
				260	appropriate. When a file in the lower filesystem is accessed in a way
				261	the requires write-access, such as opening for write access, changing
				262	some metadata etc., the file is first copied from the lower filesystem
				263	to the upper filesystem (copy_up). Note that creating a hard-link
				264	also requires copy_up, though of course creation of a symlink does
				265	not.
				266
				267	The copy_up may turn out to be unnecessary, for example if the file is
				268	opened for read-write but the data is not modified.
				269
				270	The copy_up process first makes sure that the containing directory
				271	exists in the upper filesystem - creating it and any parents as
				272	necessary. It then creates the object with the same metadata (owner,
				273	mode, mtime, symlink-target etc.) and then if the object is a file, the
				274	data is copied from the lower to the upper filesystem. Finally any
				275	extended attributes are copied up.
				276
				277	Once the copy_up is complete, the overlay filesystem simply
				278	provides direct access to the newly created file in the upper
				279	filesystem - future operations on the file are barely noticed by the
				280	overlay filesystem (though an operation on the name of the file such as
				281	rename or unlink will of course be noticed and handled).
				282
				283
Miklos Szeredi	4c494bd	2020-03-17 15:04:22 +0100	[diff] [blame]	284	Permission model
				285	----------------
				286
				287	Permission checking in the overlay filesystem follows these principles:
				288
				289	1) permission check SHOULD return the same result before and after copy up
				290
				291	2) task creating the overlay mount MUST NOT gain additional privileges
				292
				293	3) non-mounting task MAY gain additional privileges through the overlay,
				294	compared to direct access on underlying lower or upper filesystems
				295
				296	This is achieved by performing two permission checks on each access
				297
				298	a) check if current task is allowed access based on local DAC (owner,
				299	group, mode and posix acl), as well as MAC checks
				300
				301	b) check if mounting task would be allowed real operation on lower or
				302	upper layer based on underlying filesystem permissions, again including
				303	MAC checks
				304
				305	Check (a) ensures consistency (1) since owner, group, mode and posix acls
				306	are copied up. On the other hand it can result in server enforced
				307	permissions (used by NFS, for example) being ignored (3).
				308
				309	Check (b) ensures that no task gains permissions to underlying layers that
				310	the mounting task does not have (2). This also means that it is possible
				311	to create setups where the consistency rule (1) does not hold; normally,
				312	however, the mounting task will have sufficient privileges to perform all
				313	operations.
				314
				315	Another way to demonstrate this model is drawing parallels between
				316
				317	mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,... /merged
				318
				319	and
				320
				321	cp -a /lower /upper
				322	mount --bind /upper /merged
				323
				324	The resulting access permissions should be the same. The difference is in
				325	the time of copy (on-demand vs. up-front).
				326
				327
Miklos Szeredi	a78d9f0	2014-12-13 00:59:52 +0100	[diff] [blame]	328	Multiple lower layers
				329	---------------------
				330
Randy Dunlap	f7eb0de	2020-07-03 14:43:22 -0700	[diff] [blame]	331	Multiple lower layers can now be given using the colon (":") as a
Miklos Szeredi	a78d9f0	2014-12-13 00:59:52 +0100	[diff] [blame]	332	separator character between the directory names. For example:
				333
				334	mount -t overlay overlay -olowerdir=/lower1:/lower2:/lower3 /merged
				335
Miklos Szeredi	6d900f5a	2015-01-08 15:09:15 +0100	[diff] [blame]	336	As the example shows, "upperdir=" and "workdir=" may be omitted. In
				337	that case the overlay will be read-only.
				338
				339	The specified lower directories will be stacked beginning from the
				340	rightmost one and going left. In the above example lower1 will be the
				341	top, lower2 the middle and lower3 the bottom layer.
Miklos Szeredi	a78d9f0	2014-12-13 00:59:52 +0100	[diff] [blame]	342
				343
Vivek Goyal	d579104	2018-05-11 11:49:27 -0400	[diff] [blame]	344	Metadata only copy up
Amir Goldstein	35c6cb4	2019-11-25 11:51:25 +0200	[diff] [blame]	345	---------------------
Vivek Goyal	d579104	2018-05-11 11:49:27 -0400	[diff] [blame]	346
				347	When metadata only copy up feature is enabled, overlayfs will only copy
				348	up metadata (as opposed to whole file), when a metadata specific operation
				349	like chown/chmod is performed. Full file will be copied up later when
				350	file is opened for WRITE operation.
				351
				352	In other words, this is delayed data copy up operation and data is copied
				353	up when there is a need to actually modify data.
				354
				355	There are multiple ways to enable/disable this feature. A config option
				356	CONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature
				357	by default. Or one can enable/disable it at module load time with module
				358	parameter metacopy=on/off. Lastly, there is also a per mount option
				359	metacopy=on/off to enable/disable this feature per mount.
				360
				361	Do not use metacopy=on with untrusted upper/lower directories. Otherwise
				362	it is possible that an attacker can create a handcrafted file with
				363	appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
				364	pointed by REDIRECT. This should not be possible on local system as setting
				365	"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
				366	for untrusted layers like from a pen drive.
				367
Amir Goldstein	b0def88	2020-04-09 18:58:34 +0300	[diff] [blame]	368	Note: redirect_dir={off\|nofollow\|follow[*]} and nfs_export=on mount options
				369	conflict with metacopy=on, and will result in an error.
Miklos Szeredi	d47748e	2018-11-01 21:31:39 +0100	[diff] [blame]	370
Amir Goldstein	35c6cb4	2019-11-25 11:51:25 +0200	[diff] [blame]	371	[*] redirect_dir=follow only conflicts with metacopy=on if upperdir=... is
Miklos Szeredi	d47748e	2018-11-01 21:31:39 +0100	[diff] [blame]	372	given.
				373
Amir Goldstein	9412812	2017-05-25 15:08:24 +0300	[diff] [blame]	374	Sharing and copying layers
				375	--------------------------
				376
				377	Lower layers may be shared among several overlay mounts and that is indeed
				378	a very common practice. An overlay mount may use the same lower layer
				379	path as another overlay mount and it may use a lower layer path that is
				380	beneath or above the path of another overlay lower layer path.
				381
				382	Using an upper layer path and/or a workdir path that are already used by
Amir Goldstein	85fdee1	2017-09-29 10:21:21 +0300	[diff] [blame]	383	another overlay mount is not allowed and may fail with EBUSY. Using
Amir Goldstein	0be0bfd	2019-07-12 15:24:34 +0300	[diff] [blame]	384	partially overlapping paths is not allowed and may fail with EBUSY.
Amir Goldstein	85fdee1	2017-09-29 10:21:21 +0300	[diff] [blame]	385	If files are accessed from two overlayfs mounts which share or overlap the
				386	upper layer and/or workdir path the behavior of the overlay is undefined,
				387	though it will not result in a crash or deadlock.
Amir Goldstein	9412812	2017-05-25 15:08:24 +0300	[diff] [blame]	388
				389	Mounting an overlay using an upper layer path, where the upper layer path
				390	was previously used by another mounted overlay in combination with a
				391	different lower layer path, is allowed, unless the "inodes index" feature
Vivek Goyal	d579104	2018-05-11 11:49:27 -0400	[diff] [blame]	392	or "metadata only copy up" feature is enabled.
Amir Goldstein	9412812	2017-05-25 15:08:24 +0300	[diff] [blame]	393
				394	With the "inodes index" feature, on the first time mount, an NFS file
				395	handle of the lower layer root directory, along with the UUID of the lower
				396	filesystem, are encoded and stored in the "trusted.overlay.origin" extended
				397	attribute on the upper layer root directory. On subsequent mount attempts,
				398	the lower root directory file handle and lower filesystem UUID are compared
				399	to the stored origin in upper root directory. On failure to verify the
				400	lower root origin, mount will fail with ESTALE. An overlayfs mount with
				401	"inodes index" enabled will fail with EOPNOTSUPP if the lower filesystem
				402	does not support NFS export, lower filesystem does not have a valid UUID or
				403	if the upper filesystem does not support extended attributes.
				404
Vivek Goyal	d579104	2018-05-11 11:49:27 -0400	[diff] [blame]	405	For "metadata only copy up" feature there is no verification mechanism at
				406	mount time. So if same upper is mounted with different set of lower, mount
				407	probably will succeed but expect the unexpected later on. So don't do it.
				408
Amir Goldstein	9412812	2017-05-25 15:08:24 +0300	[diff] [blame]	409	It is quite a common practice to copy overlay layers to a different
				410	directory tree on the same or different underlying filesystem, and even
				411	to a different machine. With the "inodes index" feature, trying to mount
				412	the copied layers will fail the verification of the lower root file handle.
				413
				414
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	415	Non-standard behavior
				416	---------------------
				417
Miklos Szeredi	5d3211b	2019-05-31 11:27:25 +0200	[diff] [blame]	418	Current version of overlayfs can act as a mostly POSIX compliant
				419	filesystem.
				420
				421	This is the list of cases that overlayfs doesn't currently handle:
				422
				423	a) POSIX mandates updating st_atime for reads. This is currently not
				424	done in the case when the file resides on a lower layer.
				425
				426	b) If a file residing on a lower layer is opened for read-only and then
				427	memory mapped with MAP_SHARED, then subsequent changes to the file are not
				428	reflected in the memory mapping.
				429
Chengguang Xu	b71759e	2021-04-24 22:03:15 +0800	[diff] [blame]	430	c) If a file residing on a lower layer is being executed, then opening that
				431	file for write or truncating the file will not be denied with ETXTBSY.
				432
Miklos Szeredi	5d3211b	2019-05-31 11:27:25 +0200	[diff] [blame]	433	The following options allow overlayfs to act more like a standards
				434	compliant filesystem:
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	435
Miklos Szeredi	0c31d67	2018-07-18 15:44:44 +0200	[diff] [blame]	436	1) "redirect_dir"
Amir Goldstein	1614901	2018-03-29 16:36:56 +0300	[diff] [blame]	437
Miklos Szeredi	0c31d67	2018-07-18 15:44:44 +0200	[diff] [blame]	438	Enabled with the mount option or module option: "redirect_dir=on" or with
				439	the kernel config option CONFIG_OVERLAY_FS_REDIRECT_DIR=y.
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	440
Miklos Szeredi	0c31d67	2018-07-18 15:44:44 +0200	[diff] [blame]	441	If this feature is disabled, then rename(2) on a lower or merged directory
				442	will fail with EXDEV ("Invalid cross-device link").
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	443
Miklos Szeredi	0c31d67	2018-07-18 15:44:44 +0200	[diff] [blame]	444	2) "inode index"
				445
				446	Enabled with the mount option or module option "index=on" or with the
				447	kernel config option CONFIG_OVERLAY_FS_INDEX=y.
				448
				449	If this feature is disabled and a file with multiple hard links is copied
				450	up, then this will "break" the link. Changes will not be propagated to
				451	other names referring to the same inode.
				452
				453	3) "xino"
				454
				455	Enabled with the mount option "xino=auto" or "xino=on", with the module
				456	option "xino_auto=on" or with the kernel config option
				457	CONFIG_OVERLAY_FS_XINO_AUTO=y. Also implicitly enabled by using the same
				458	underlying filesystem for all layers making up the overlay.
				459
				460	If this feature is disabled or the underlying filesystem doesn't have
				461	enough free bits in the inode number, then overlayfs will not be able to
				462	guarantee that the values of st_ino and st_dev returned by stat(2) and the
				463	value of d_ino returned by readdir(3) will act like on a normal filesystem.
				464	E.g. the value of st_dev may be different for two objects in the same
Amir Goldstein	b0e0f69	2021-03-09 18:26:54 +0200	[diff] [blame]	465	overlay filesystem and the value of st_ino for filesystem objects may not be
Amir Goldstein	2eda9ea	2020-02-21 16:34:46 +0200	[diff] [blame]	466	persistent and could change even while the overlay filesystem is mounted, as
				467	summarized in the `Inode properties`_ table above.
Miklos Szeredi	2d8f290	2016-12-16 11:02:54 +0100	[diff] [blame]	468
Amir Goldstein	1614901	2018-03-29 16:36:56 +0300	[diff] [blame]	469
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	470	Changes to underlying filesystems
				471	---------------------------------
				472
Neil Brown	7c37fbd	2014-10-24 00:14:39 +0200	[diff] [blame]	473	Changes to the underlying filesystems while part of a mounted overlay
				474	filesystem are not allowed. If the underlying filesystem is changed,
				475	the behavior of the overlay is undefined, though it will not result in
				476	a crash or deadlock.
Miklos Szeredi	2b7a8f36	2014-12-13 00:59:53 +0100	[diff] [blame]	477
Kevin Locke	13c6ad0	2020-08-22 20:22:57 -0600	[diff] [blame]	478	Offline changes, when the overlay is not mounted, are allowed to the
				479	upper tree. Offline changes to the lower tree are only allowed if the
Amir Goldstein	b0e0f69	2021-03-09 18:26:54 +0200	[diff] [blame]	480	"metadata only copy up", "inode index", "xino" and "redirect_dir" features
Kevin Locke	13c6ad0	2020-08-22 20:22:57 -0600	[diff] [blame]	481	have not been used. If the lower tree is modified and any of these
				482	features has been used, the behavior of the overlay is undefined,
				483	though it will not result in a crash or deadlock.
				484
Amir Goldstein	f168f10	2018-01-19 11:26:53 +0200	[diff] [blame]	485	When the overlay NFS export feature is enabled, overlay filesystems
				486	behavior on offline changes of the underlying lower layer is different
				487	than the behavior when NFS export is disabled.
				488
				489	On every copy_up, an NFS file handle of the lower inode, along with the
				490	UUID of the lower filesystem, are encoded and stored in an extended
				491	attribute "trusted.overlay.origin" on the upper inode.
				492
				493	When the NFS export feature is enabled, a lookup of a merged directory,
				494	that found a lower directory at the lookup path or at the path pointed
				495	to by the "trusted.overlay.redirect" extended attribute, will verify
				496	that the found lower directory file handle and lower filesystem UUID
				497	match the origin file handle that was stored at copy_up time. If a
				498	found lower directory does not match the stored origin, that directory
				499	will not be merged with the upper directory.
				500
				501
Amir Goldstein	a01f64b	2017-05-25 22:39:21 +0300	[diff] [blame]	502
				503	NFS export
				504	----------
				505
				506	When the underlying filesystems supports NFS export and the "nfs_export"
				507	feature is enabled, an overlay filesystem may be exported to NFS.
				508
				509	With the "nfs_export" feature, on copy_up of any lower object, an index
				510	entry is created under the index directory. The index entry name is the
				511	hexadecimal representation of the copy up origin file handle. For a
				512	non-directory object, the index entry is a hard link to the upper inode.
				513	For a directory object, the index entry has an extended attribute
				514	"trusted.overlay.upper" with an encoded file handle of the upper
				515	directory inode.
				516
				517	When encoding a file handle from an overlay filesystem object, the
				518	following rules apply:
				519
				520	1. For a non-upper object, encode a lower file handle from lower inode
				521	2. For an indexed object, encode a lower file handle from copy_up origin
				522	3. For a pure-upper object and for an existing non-indexed upper object,
				523	encode an upper file handle from upper inode
				524
				525	The encoded overlay file handle includes:
				526	- Header including path type information (e.g. lower/upper)
				527	- UUID of the underlying filesystem
				528	- Underlying filesystem encoding of underlying inode
				529
				530	This encoding format is identical to the encoding format file handles that
				531	are stored in extended attribute "trusted.overlay.origin".
				532
				533	When decoding an overlay file handle, the following steps are followed:
				534
				535	1. Find underlying layer by UUID and path type information.
				536	2. Decode the underlying filesystem file handle to underlying dentry.
				537	3. For a lower file handle, lookup the handle in index directory by name.
				538	4. If a whiteout is found in index, return ESTALE. This represents an
				539	overlay object that was deleted after its file handle was encoded.
				540	5. For a non-directory, instantiate a disconnected overlay dentry from the
				541	decoded underlying dentry, the path type and index inode, if found.
				542	6. For a directory, use the connected underlying decoded dentry, path type
				543	and index, to lookup a connected overlay dentry.
				544
				545	Decoding a non-directory file handle may return a disconnected dentry.
				546	copy_up of that disconnected dentry will create an upper index entry with
				547	no upper alias.
				548
				549	When overlay filesystem has multiple lower layers, a middle layer
				550	directory may have a "redirect" to lower directory. Because middle layer
				551	"redirects" are not indexed, a lower file handle that was encoded from the
				552	"redirect" origin directory, cannot be used to find the middle or upper
				553	layer directory. Similarly, a lower file handle that was encoded from a
				554	descendant of the "redirect" origin directory, cannot be used to
				555	reconstruct a connected overlay path. To mitigate the cases of
				556	directories that cannot be decoded from a lower file handle, these
				557	directories are copied up on encode and encoded as an upper file handle.
				558	On an overlay filesystem with no upper layer this mitigation cannot be
				559	used NFS export in this setup requires turning off redirect follow (e.g.
				560	"redirect_dir=nofollow").
				561
				562	The overlay filesystem does not support non-directory connectable file
				563	handles, so exporting with the 'subtree_check' exportfs configuration will
				564	cause failures to lookup files over NFS.
				565
				566	When the NFS export feature is enabled, all directory index entries are
				567	verified on mount time to check that upper file handles are not stale.
				568	This verification may cause significant overhead in some cases.
				569
Amir Goldstein	f0e1266e	2020-07-13 17:19:44 +0300	[diff] [blame]	570	Note: the mount options index=off,nfs_export=on are conflicting for a
				571	read-write mount and will result in an error.
Amir Goldstein	b0def88	2020-04-09 18:58:34 +0300	[diff] [blame]	572
Pavel Tikhomirov	5830fb6	2020-10-13 17:59:54 +0300	[diff] [blame]	573	Note: the mount option uuid=off can be used to replace UUID of the underlying
				574	filesystem in file handles with null, and effectively disable UUID checks. This
				575	can be useful in case the underlying disk is copied and the UUID of this copy
				576	is changed. This is only applicable if all lower/upper/work directories are on
				577	the same filesystem, otherwise it will fallback to normal behaviour.
Amir Goldstein	a01f64b	2017-05-25 22:39:21 +0300	[diff] [blame]	578
Vivek Goyal	c86243b0	2020-08-31 14:15:29 -0400	[diff] [blame]	579	Volatile mount
				580	--------------
				581
				582	This is enabled with the "volatile" mount option. Volatile mounts are not
				583	guaranteed to survive a crash. It is strongly recommended that volatile
				584	mounts are only used if data written to the overlay can be recreated
				585	without significant effort.
				586
				587	The advantage of mounting with the "volatile" option is that all forms of
				588	sync calls to the upper filesystem are omitted.
				589
Sargun Dhillon	335d3fc	2021-01-07 16:10:43 -0800	[diff] [blame]	590	In order to avoid a giving a false sense of safety, the syncfs (and fsync)
				591	semantics of volatile mounts are slightly different than that of the rest of
				592	VFS. If any writeback error occurs on the upperdir's filesystem after a
				593	volatile mount takes place, all sync functions will return an error. Once this
				594	condition is reached, the filesystem will not recover, and every subsequent sync
				595	call will return an error, even if the upperdir has not experience a new error
				596	since the last sync call.
				597
Vivek Goyal	c86243b0	2020-08-31 14:15:29 -0400	[diff] [blame]	598	When overlay is mounted with "volatile" option, the directory
				599	"$workdir/work/incompat/volatile" is created. During next mount, overlay
				600	checks for this directory and refuses to mount if present. This is a strong
				601	indicator that user should throw away upper and work directories and create
				602	fresh one. In very limited cases where the user knows that the system has
				603	not crashed and contents of upperdir are intact, The "volatile" directory
				604	can be removed.
				605
Miklos Szeredi	2d2f2d7	2020-12-14 15:26:14 +0100	[diff] [blame]	606
				607	User xattr
				608	----------
				609
				610	The the "-o userxattr" mount option forces overlayfs to use the
				611	"user.overlay." xattr namespace instead of "trusted.overlay.". This is
				612	useful for unprivileged mounting of overlayfs.
				613
				614
Miklos Szeredi	2b7a8f36	2014-12-13 00:59:53 +0100	[diff] [blame]	615	Testsuite
				616	---------
				617
Amir Goldstein	05af4fe	2018-05-07 13:57:44 +0300	[diff] [blame]	618	There's a testsuite originally developed by David Howells and currently
				619	maintained by Amir Goldstein at:
Miklos Szeredi	2b7a8f36	2014-12-13 00:59:53 +0100	[diff] [blame]	620
Amir Goldstein	05af4fe	2018-05-07 13:57:44 +0300	[diff] [blame]	621	https://github.com/amir73il/unionmount-testsuite.git
Miklos Szeredi	2b7a8f36	2014-12-13 00:59:53 +0100	[diff] [blame]	622
				623	Run as root:
				624
				625	# cd unionmount-testsuite
Amir Goldstein	05af4fe	2018-05-07 13:57:44 +0300	[diff] [blame]	626	# ./run --ov --verify