Blame - Documentation/filesystems/f2fs.rst - SHIFTPHONES/mainline/linux

blob: 9b0517d900637266efd766f448fa77bb947d3d7a [file] [log] [blame]

Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	1	.. SPDX-License-Identifier: GPL-2.0
				2
				3	==========================================
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	4	WHAT IS Flash-Friendly File System (F2FS)?
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	5	==========================================
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	6
				7	NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
				8	been equipped on a variety systems ranging from mobile to server systems. Since
				9	they are known to have different characteristics from the conventional rotating
				10	disks, a file system, an upper layer to the storage device, should adapt to the
				11	changes from the sketch in the design level.
				12
				13	F2FS is a file system exploiting NAND flash memory-based storage devices, which
				14	is based on Log-structured File System (LFS). The design has been focused on
				15	addressing the fundamental issues in LFS, which are snowball effect of wandering
				16	tree and high cleaning overhead.
				17
				18	Since a NAND flash memory-based storage device shows different characteristic
				19	according to its internal geometry or flash memory management scheme, namely FTL,
				20	F2FS and its tools support various parameters not only for configuring on-disk
				21	layout, but also for selecting allocation and cleaning algorithms.
				22
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	23	The following git tree provides the file system formatting tool (mkfs.f2fs),
				24	a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	25
				26	- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
Jaegeuk Kim	5bb446a	2012-11-27 14:36:14 +0900	[diff] [blame]	27
				28	For reporting bugs and sending patches, please use the following mailing list:
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	29
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	30	- linux-f2fs-devel@lists.sourceforge.net
				31
				32	Background and Design issues
				33	============================
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	34
				35	Log-structured File System (LFS)
				36	--------------------------------
				37	"A log-structured file system writes all modifications to disk sequentially in
				38	a log-like structure, thereby speeding up both file writing and crash recovery.
				39	The log is the only structure on disk; it contains indexing information so that
				40	files can be read back from the log efficiently. In order to maintain large free
				41	areas on disk for fast writing, we divide the log into segments and use a
				42	segment cleaner to compress the live information from heavily fragmented
				43	segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
				44	implementation of a log-structured file system", ACM Trans. Computer Systems
				45	10, 1, 26–52.
				46
				47	Wandering Tree Problem
				48	----------------------
				49	In LFS, when a file data is updated and written to the end of log, its direct
				50	pointer block is updated due to the changed location. Then the indirect pointer
				51	block is also updated due to the direct pointer block update. In this manner,
				52	the upper index structures such as inode, inode map, and checkpoint block are
				53	also updated recursively. This problem is called as wandering tree problem [1],
				54	and in order to enhance the performance, it should eliminate or relax the update
				55	propagation as much as possible.
				56
				57	[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
				58
				59	Cleaning Overhead
				60	-----------------
				61	Since LFS is based on out-of-place writes, it produces so many obsolete blocks
				62	scattered across the whole storage. In order to serve new empty log space, it
				63	needs to reclaim these obsolete blocks seamlessly to users. This job is called
				64	as a cleaning process.
				65
				66	The process consists of three operations as follows.
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	67
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	68	1. A victim segment is selected through referencing segment usage table.
				69	2. It loads parent index structures of all the data in the victim identified by
				70	segment summary blocks.
				71	3. It checks the cross-reference between the data and its parent index structure.
				72	4. It moves valid data selectively.
				73
				74	This cleaning job may cause unexpected long delays, so the most important goal
				75	is to hide the latencies to users. And also definitely, it should reduce the
				76	amount of valid data to be moved, and move them quickly as well.
				77
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	78	Key Features
				79	============
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	80
				81	Flash Awareness
				82	---------------
				83	- Enlarge the random write area for better performance, but provide the high
				84	spatial locality
				85	- Align FS data structures to the operational units in FTL as best efforts
				86
				87	Wandering Tree Problem
				88	----------------------
				89	- Use a term, “node”, that represents inodes as well as various pointer blocks
				90	- Introduce Node Address Table (NAT) containing the locations of all the “node”
				91	blocks; this will cut off the update propagation.
				92
				93	Cleaning Overhead
				94	-----------------
				95	- Support a background cleaning process
				96	- Support greedy and cost-benefit algorithms for victim selection policies
				97	- Support multi-head logs for static/dynamic hot and cold data separation
				98	- Introduce adaptive logging for efficient block allocation
				99
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	100	Mount Options
				101	=============
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	102
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	103
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	104	======================== ============================================================
				105	background_gc=%s Turn on/off cleaning operations, namely garbage
				106	collection, triggered in background when I/O subsystem is
				107	idle. If background_gc=on, it will turn on the garbage
				108	collection and if background_gc=off, garbage collection
				109	will be turned off. If background_gc=sync, it will turn
				110	on synchronous garbage collection running in background.
				111	Default value for this option is on. So garbage
				112	collection is on by default.
Chao Yu	5911d2d	2021-03-27 17:57:06 +0800	[diff] [blame]	113	gc_merge When background_gc is on, this option can be enabled to
				114	let background GC thread to handle foreground GC requests,
				115	it can eliminate the sluggish issue caused by slow foreground
				116	GC operation when GC is triggered from a process with limited
				117	I/O and CPU resources.
				118	nogc_merge Disable GC merge feature.
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	119	disable_roll_forward Disable the roll-forward recovery routine
				120	norecovery Disable the roll-forward recovery routine, mounted read-
				121	only (i.e., -o ro,disable_roll_forward)
				122	discard/nodiscard Enable/disable real-time discard in f2fs, if discard is
				123	enabled, f2fs will issue discard/TRIM commands when a
				124	segment is cleaned.
				125	no_heap Disable heap-style segment allocation which finds free
				126	segments for data from the beginning of main area, while
				127	for node from the end of main area.
				128	nouser_xattr Disable Extended User Attributes. Note: xattr is enabled
				129	by default if CONFIG_F2FS_FS_XATTR is selected.
				130	noacl Disable POSIX Access Control List. Note: acl is enabled
				131	by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
				132	active_logs=%u Support configuring the number of active logs. In the
				133	current design, f2fs supports only 2, 4, and 6 logs.
				134	Default number is 6.
				135	disable_ext_identify Disable the extension list configured by mkfs, so f2fs
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	136	is not aware of cold files such as media files.
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	137	inline_xattr Enable the inline xattrs feature.
				138	noinline_xattr Disable the inline xattrs feature.
				139	inline_xattr_size=%u Support configuring inline xattr size, it depends on
				140	flexible inline xattr feature.
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	141	inline_data Enable the inline data feature: Newly created small (<~3.4k)
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	142	files can be written into inode block.
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	143	inline_dentry Enable the inline dir feature: data in newly created
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	144	directory entries can be written into inode block. The
				145	space of inode block which is used to store inline
				146	dentries is limited to ~3.4k.
				147	noinline_dentry Disable the inline dentry feature.
				148	flush_merge Merge concurrent cache_flush commands as much as possible
				149	to eliminate redundant command issues. If the underlying
				150	device handles the cache_flush command relatively slowly,
				151	recommend to enable this option.
				152	nobarrier This option can be used if underlying storage guarantees
				153	its cached data should be written to the novolatile area.
				154	If this option is set, no cache_flush commands are issued
				155	but f2fs still guarantees the write ordering of all the
				156	data writes.
				157	fastboot This option is used when a system wants to reduce mount
				158	time as much as possible, even though normal performance
				159	can be sacrificed.
				160	extent_cache Enable an extent cache based on rb-tree, it can cache
				161	as many as extent which map between contiguous logical
				162	address and physical address per inode, resulting in
				163	increasing the cache hit ratio. Set by default.
				164	noextent_cache Disable an extent cache based on rb-tree explicitly, see
				165	the above extent_cache mount option.
				166	noinline_data Disable the inline data feature, inline data feature is
				167	enabled by default.
				168	data_flush Enable data flushing before checkpoint in order to
				169	persist data of regular and symlink.
				170	reserve_root=%d Support configuring reserved space which is used for
				171	allocation from a privileged user with specified uid or
				172	gid, unit: 4KB, the default limit is 0.2% of user blocks.
				173	resuid=%d The user ID which may use the reserved blocks.
				174	resgid=%d The group ID which may use the reserved blocks.
				175	fault_injection=%d Enable fault injection in all supported types with
				176	specified injection rate.
				177	fault_type=%d Support configuring fault injection type, should be
				178	enabled with fault_injection option, fault type value
				179	is shown below, it supports single or combined type.
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	180
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	181	=================== ===========
				182	Type_Name Type_Value
				183	=================== ===========
				184	FAULT_KMALLOC 0x000000001
				185	FAULT_KVMALLOC 0x000000002
				186	FAULT_PAGE_ALLOC 0x000000004
				187	FAULT_PAGE_GET 0x000000008
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	188	FAULT_ALLOC_NID 0x000000020
				189	FAULT_ORPHAN 0x000000040
				190	FAULT_BLOCK 0x000000080
				191	FAULT_DIR_DEPTH 0x000000100
				192	FAULT_EVICT_INODE 0x000000200
				193	FAULT_TRUNCATE 0x000000400
				194	FAULT_READ_IO 0x000000800
				195	FAULT_CHECKPOINT 0x000001000
				196	FAULT_DISCARD 0x000002000
				197	FAULT_WRITE_IO 0x000004000
				198	=================== ===========
				199	mode=%s Control block allocation mode which supports "adaptive"
				200	and "lfs". In "lfs" mode, there should be no random
				201	writes towards main area.
				202	io_bits=%u Set the bit size of write IO requests. It should be set
				203	with "mode=lfs".
				204	usrquota Enable plain user disk quota accounting.
				205	grpquota Enable plain group disk quota accounting.
				206	prjquota Enable plain project quota accounting.
				207	usrjquota=<file> Appoint specified file and type during mount, so that quota
				208	grpjquota=<file> information can be properly updated during recovery flow,
				209	prjjquota=<file> <quota file>: must be in root directory;
				210	jqfmt=<quota type> <quota type>: [vfsold,vfsv0,vfsv1].
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	211	offusrjquota Turn off user journalled quota.
				212	offgrpjquota Turn off group journalled quota.
				213	offprjjquota Turn off project journalled quota.
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	214	quota Enable plain user disk quota accounting.
				215	noquota Disable all plain disk quota option.
				216	whint_mode=%s Control which write hints are passed down to block
				217	layer. This supports "off", "user-based", and
				218	"fs-based". In "off" mode (default), f2fs does not pass
				219	down hints. In "user-based" mode, f2fs tries to pass
				220	down hints given by users. And in "fs-based" mode, f2fs
				221	passes down hints with its policy.
				222	alloc_mode=%s Adjust block allocation policy, which supports "reuse"
				223	and "default".
				224	fsync_mode=%s Control the policy of fsync. Currently supports "posix",
				225	"strict", and "nobarrier". In "posix" mode, which is
				226	default, fsync will follow POSIX semantics and does a
				227	light operation to improve the filesystem performance.
				228	In "strict" mode, fsync will be heavy and behaves in line
				229	with xfs, ext4 and btrfs, where xfstest generic/342 will
				230	pass, but the performance will regress. "nobarrier" is
				231	based on "posix", but doesn't issue flush command for
				232	non-atomic files likewise "nobarrier" mount option.
Eric Biggers	ed318a6	2020-05-12 16:32:50 -0700	[diff] [blame]	233	test_dummy_encryption
				234	test_dummy_encryption=%s
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	235	Enable dummy encryption, which provides a fake fscrypt
				236	context. The fake fscrypt context is used by xfstests.
				237	The argument may be either "v1" or "v2", in order to
				238	select the corresponding fscrypt policy version.
				239	checkpoint=%s[:%u[%]] Set to "disable" to turn off checkpointing. Set to "enable"
				240	to reenable checkpointing. Is enabled by default. While
				241	disabled, any unmounting or unexpected shutdowns will cause
				242	the filesystem contents to appear as they did when the
				243	filesystem was mounted with that option.
				244	While mounting with checkpoint=disabled, the filesystem must
				245	run garbage collection to ensure that all available space can
				246	be used. If this takes too much time, the mount may return
				247	EAGAIN. You may optionally add a value to indicate how much
				248	of the disk you would be willing to temporarily give up to
				249	avoid additional garbage collection. This can be given as a
				250	number of blocks, or as a percent. For instance, mounting
				251	with checkpoint=disable:100% would always succeed, but it may
				252	hide up to all remaining free space. The actual space that
				253	would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
				254	This space is reclaimed once checkpoint=enable.
Daeho Jeong	261eeb9	2021-01-19 09:00:42 +0900	[diff] [blame]	255	checkpoint_merge When checkpoint is enabled, this can be used to create a kernel
				256	daemon and make it to merge concurrent checkpoint requests as
				257	much as possible to eliminate redundant checkpoint issues. Plus,
				258	we can eliminate the sluggish issue caused by slow checkpoint
				259	operation when the checkpoint is done in a process context in
				260	a cgroup having low i/o budget and cpu shares. To make this
				261	do better, we set the default i/o priority of the kernel daemon
				262	to "3", to give one higher priority than other kernel threads.
				263	This is the same way to give a I/O priority to the jbd2
				264	journaling thread of ext4 filesystem.
				265	nocheckpoint_merge Disable checkpoint merge feature.
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	266	compress_algorithm=%s Control compress algorithm, currently f2fs supports "lzo",
				267	"lz4", "zstd" and "lzo-rle" algorithm.
Chao Yu	3fde13f	2021-01-22 17:46:43 +0800	[diff] [blame]	268	compress_algorithm=%s:%d Control compress algorithm and its compress level, now, only
				269	"lz4" and "zstd" support compress level config.
				270	algorithm level range
				271	lz4 3 - 16
				272	zstd 1 - 22
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	273	compress_log_size=%u Support configuring compress cluster size, the size will
				274	be 4KB * (1 << %u), 16KB is minimum size, also it's
				275	default size.
				276	compress_extension=%s Support adding specified extension, so that f2fs can enable
				277	compression on those corresponding files, e.g. if all files
				278	with '.ext' has high compression rate, we can set the '.ext'
				279	on compression extension list and enable compression on
				280	these file by default rather than to enable it via ioctl.
				281	For other files, we can still enable compression via ioctl.
Linus Torvalds	086ba2e	2020-08-10 18:33:22 -0700	[diff] [blame]	282	Note that, there is one reserved special extension '*', it
				283	can be set to enable compression for all files.
Fengnan Chang	151b198	2021-06-08 19:15:08 +0800	[diff] [blame]	284	nocompress_extension=%s Support adding specified extension, so that f2fs can disable
				285	compression on those corresponding files, just contrary to compression extension.
				286	If you know exactly which files cannot be compressed, you can use this.
				287	The same extension name can't appear in both compress and nocompress
				288	extension at the same time.
				289	If the compress extension specifies all files, the types specified by the
				290	nocompress extension will be treated as special cases and will not be compressed.
				291	Don't allow use '*' to specifie all file in nocompress extension.
				292	After add nocompress_extension, the priority should be:
				293	dir_flag < comp_extention,nocompress_extension < comp_file_flag,no_comp_file_flag.
				294	See more in compression sections.
				295
Chao Yu	b28f047	2020-11-26 18:32:09 +0800	[diff] [blame]	296	compress_chksum Support verifying chksum of raw data in compressed cluster.
Daeho Jeong	602a16d	2020-12-01 13:08:02 +0900	[diff] [blame]	297	compress_mode=%s Control file compression mode. This supports "fs" and "user"
				298	modes. In "fs" mode (default), f2fs does automatic compression
				299	on the compression enabled files. In "user" mode, f2fs disables
				300	the automaic compression and gives the user discretion of
				301	choosing the target file and the timing. The user can do manual
				302	compression/decompression on the compression enabled files using
				303	ioctls.
Chao Yu	6ce19af	2021-05-20 19:51:50 +0800	[diff] [blame]	304	compress_cache Support to use address space of a filesystem managed inode to
				305	cache compressed block, in order to improve cache hit ratio of
				306	random read.
Linus Torvalds	2324d50	2020-08-04 22:47:54 -0700	[diff] [blame]	307	inlinecrypt When possible, encrypt/decrypt the contents of encrypted
				308	files using the blk-crypto framework rather than
				309	filesystem-layer encryption. This allows the use of
				310	inline encryption hardware. The on-disk format is
				311	unaffected. For more details, see
				312	Documentation/block/inline-encryption.rst.
Chao Yu	093749e	2020-08-04 21:14:49 +0800	[diff] [blame]	313	atgc Enable age-threshold garbage collection, it provides high
				314	effectiveness and efficiency on background GC.
Chao Yu	4f99326	2021-08-03 08:15:43 +0800	[diff] [blame]	315	discard_unit=%s Control discard unit, the argument can be "block", "segment"
				316	and "section", issued discard command's offset/size will be
				317	aligned to the unit, by default, "discard_unit=block" is set,
				318	so that small discard functionality is enabled.
				319	For blkzoned device, "discard_unit=section" will be set by
				320	default, it is helpful for large sized SMR or ZNS devices to
				321	reduce memory cost by getting rid of fs metadata supports small
				322	discard.
Jonathan Corbet	9aa1ccb	2020-06-22 07:35:39 -0600	[diff] [blame]	323	======================== ============================================================
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	324
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	325	Debugfs Entries
				326	===============
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	327
				328	/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
				329	f2fs. Each file shows the whole f2fs information.
				330
				331	/sys/kernel/debug/f2fs/status includes:
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	332
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	333	- major file system information managed by f2fs currently
				334	- average SIT information about whole segments
				335	- current memory footprint consumed by f2fs.
				336
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	337	Sysfs Entries
				338	=============
Namjae Jeon	b59d0ba	2013-08-04 23:09:40 +0900	[diff] [blame]	339
Tiezhu Yang	6de3f12	2017-02-08 05:08:01 +0800	[diff] [blame]	340	Information about mounted f2fs file systems can be found in
Namjae Jeon	b59d0ba	2013-08-04 23:09:40 +0900	[diff] [blame]	341	/sys/fs/f2fs. Each mounted filesystem will have a directory in
				342	/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
				343	The files in each per-device directory are shown in table below.
				344
				345	Files in /sys/fs/f2fs/<devname>
				346	(see also Documentation/ABI/testing/sysfs-fs-f2fs)
Daniel Rosenberg	5aba543	2019-07-23 16:05:28 -0700	[diff] [blame]	347
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	348	Usage
				349	=====
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	350
				351	1. Download userland tools and compile them.
				352
				353	2. Skip, if f2fs was compiled statically inside kernel.
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	354	Otherwise, insert the f2fs.ko module::
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	355
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	356	# insmod f2fs.ko
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	357
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	358	3. Create a directory to use when mounting::
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	359
				360	# mkdir /mnt/f2fs
				361
				362	4. Format the block device, and then mount as f2fs::
				363
				364	# mkfs.f2fs -l label /dev/block_device
				365	# mount -t f2fs /dev/block_device /mnt/f2fs
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	366
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	367	mkfs.f2fs
				368	---------
				369	The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
				370	which builds a basic on-disk layout.
				371
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	372	The quick options consist of:
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	373
				374	=============== ===========================================================
				375	``-l [label]`` Give a volume label, up to 512 unicode name.
				376	``-a [0 or 1]`` Split start location of each area for heap-based allocation.
				377
				378	1 is set by default, which performs this.
				379	``-o [int]`` Set overprovision ratio in percent over volume size.
				380
				381	5 is set by default.
				382	``-s [int]`` Set the number of segments per section.
				383
				384	1 is set by default.
				385	``-z [int]`` Set the number of sections per zone.
				386
				387	1 is set by default.
				388	``-e [str]`` Set basic extension list. e.g. "mp3,gif,mov"
				389	``-t [0 or 1]`` Disable discard command or not.
				390
				391	1 is set by default, which conducts discard.
				392	=============== ===========================================================
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	393
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	394	Note: please refer to the manpage of mkfs.f2fs(8) to get full option list.
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	395
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	396	fsck.f2fs
				397	---------
				398	The fsck.f2fs is a tool to check the consistency of an f2fs-formatted
				399	partition, which examines whether the filesystem metadata and user-made data
				400	are cross-referenced correctly or not.
				401	Note that, initial version of the tool does not fix any inconsistency.
				402
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	403	The quick options consist of::
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	404
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	405	-d debug level [default:0]
				406
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	407	Note: please refer to the manpage of fsck.f2fs(8) to get full option list.
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	408
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	409	dump.f2fs
				410	---------
				411	The dump.f2fs shows the information of specific inode and dumps SSA and SIT to
				412	file. Each file is dump_ssa and dump_sit.
				413
				414	The dump.f2fs is used to debug on-disk data structures of the f2fs filesystem.
Masanari Iida	4bb9998	2015-11-16 20:46:28 +0900	[diff] [blame]	415	It shows on-disk inode information recognized by a given inode number, and is
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	416	able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
				417	./dump_sit respectively.
				418
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	419	The options consist of::
				420
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	421	-d debug level [default:0]
				422	-i inode no (hex)
				423	-s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
				424	-a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
				425
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	426	Examples::
Changman Lee	d51a7fb	2013-07-04 17:12:47 +0900	[diff] [blame]	427
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	428	# dump.f2fs -i [ino] /dev/sdx
				429	# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
				430	# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
				431
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	432	Note: please refer to the manpage of dump.f2fs(8) to get full option list.
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	433
				434	sload.f2fs
				435	----------
				436	The sload.f2fs gives a way to insert files and directories in the exisiting disk
				437	image. This tool is useful when building f2fs images given compiled files.
				438
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	439	Note: please refer to the manpage of sload.f2fs(8) to get full option list.
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	440
				441	resize.f2fs
				442	-----------
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	443	The resize.f2fs lets a user resize the f2fs-formatted disk image, while preserving
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	444	all the files and directories stored in the image.
				445
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	446	Note: please refer to the manpage of resize.f2fs(8) to get full option list.
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	447
				448	defrag.f2fs
				449	-----------
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	450	The defrag.f2fs can be used to defragment scattered written data as well as
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	451	filesystem metadata across the disk. This can improve the write speed by giving
				452	more free consecutive space.
				453
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	454	Note: please refer to the manpage of defrag.f2fs(8) to get full option list.
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	455
				456	f2fs_io
				457	-------
				458	The f2fs_io is a simple tool to issue various filesystem APIs as well as
				459	f2fs-specific ones, which is very useful for QA tests.
				460
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	461	Note: please refer to the manpage of f2fs_io(8) to get full option list.
Jaegeuk Kim	568d2a1	2020-08-31 10:22:17 -0700	[diff] [blame]	462
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	463	Design
				464	======
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	465
				466	On-disk Layout
				467	--------------
				468
				469	F2FS divides the whole volume into a number of segments, each of which is fixed
				470	to 2MB in size. A section is composed of consecutive segments, and a zone
				471	consists of a set of sections. By default, section and zone sizes are set to one
				472	segment size identically, but users can easily modify the sizes by mkfs.
				473
				474	F2FS splits the entire volume into six areas, and all the areas except superblock
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	475	consist of multiple segments as described below::
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	476
				477	align with the zone size <-\|
				478	\|-> align with the segment size
				479	_________________________________________________________________________
Huajun Li	9268cc3	2012-12-31 13:59:04 +0800	[diff] [blame]	480	\| \| \| Segment \| Node \| Segment \| \|
				481	\| Superblock \| Checkpoint \| Info. \| Address \| Summary \| Main \|
				482	\| (SB) \| (CP) \| Table (SIT) \| Table (NAT) \| Area (SSA) \| \|
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	483	\|____________\|_____2______\|______N______\|______N______\|______N_____\|__N___\|
				484	. .
				485	. .
				486	. .
				487	._________________________________________.
				488	\|_Segment_\|_..._\|_Segment_\|_..._\|_Segment_\|
				489	. .
				490	._________._________
				491	\|_section_\|__...__\|_
				492	. .
				493	.________.
				494	\|__zone__\|
				495
				496	- Superblock (SB)
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	497	It is located at the beginning of the partition, and there exist two copies
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	498	to avoid file system crash. It contains basic partition information and some
				499	default parameters of f2fs.
				500
				501	- Checkpoint (CP)
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	502	It contains file system information, bitmaps for valid NAT/SIT sets, orphan
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	503	inode lists, and summary entries of current active segments.
				504
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	505	- Segment Information Table (SIT)
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	506	It contains segment information such as valid block count and bitmap for the
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	507	validity of all the blocks.
				508
Huajun Li	9268cc3	2012-12-31 13:59:04 +0800	[diff] [blame]	509	- Node Address Table (NAT)
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	510	It is composed of a block address table for all the node blocks stored in
Huajun Li	9268cc3	2012-12-31 13:59:04 +0800	[diff] [blame]	511	Main area.
				512
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	513	- Segment Summary Area (SSA)
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	514	It contains summary entries which contains the owner information of all the
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	515	data and node blocks stored in Main area.
				516
				517	- Main Area
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	518	It contains file and directory data including their indices.
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	519
				520	In order to avoid misalignment between file system and flash-based storage, F2FS
				521	aligns the start block address of CP with the segment size. Also, it aligns the
				522	start block address of Main area with the zone size by reserving some segments
				523	in SSA area.
				524
				525	Reference the following survey for additional technical details.
				526	https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
				527
				528	File System Metadata Structure
				529	------------------------------
				530
				531	F2FS adopts the checkpointing scheme to maintain file system consistency. At
				532	mount time, F2FS first tries to find the last valid checkpoint data by scanning
				533	CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
				534	One of them always indicates the last valid data, which is called as shadow copy
				535	mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
				536
				537	For file system consistency, each CP points to which NAT and SIT copies are
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	538	valid, as shown as below::
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	539
				540	+--------+----------+---------+
Huajun Li	9268cc3	2012-12-31 13:59:04 +0800	[diff] [blame]	541	\| CP \| SIT \| NAT \|
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	542	+--------+----------+---------+
				543	. . . .
				544	. . . .
				545	. . . .
				546	+-------+-------+--------+--------+--------+--------+
Huajun Li	9268cc3	2012-12-31 13:59:04 +0800	[diff] [blame]	547	\| CP #0 \| CP #1 \| SIT #0 \| SIT #1 \| NAT #0 \| NAT #1 \|
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	548	+-------+-------+--------+--------+--------+--------+
				549	\| ^ ^
				550	\| \| \|
				551	`----------------------------------------'
				552
				553	Index Structure
				554	---------------
				555
				556	The key data structure to manage the data locations is a "node". Similar to
				557	traditional file structures, F2FS has three types of node: inode, direct node,
Huajun Li	d08ab08	2012-12-05 16:45:32 +0800	[diff] [blame]	558	indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	559	indices, two direct node pointers, two indirect node pointers, and one double
				560	indirect node pointer as described below. One direct node block contains 1018
				561	data blocks, and one indirect node block contains also 1018 node blocks. Thus,
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	562	one inode block (i.e., a file) covers::
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	563
				564	4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
				565
				566	Inode block (4KB)
				567	\|- data (923)
				568	\|- direct node (2)
				569	\| `- data (1018)
				570	\|- indirect node (2)
				571	\| `- direct node (1018)
				572	\| `- data (1018)
				573	`- double indirect node (1)
				574	`- indirect node (1018)
				575	`- direct node (1018)
				576	`- data (1018)
				577
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	578	Note that all the node blocks are mapped by NAT which means the location of
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	579	each node is translated by the NAT table. In the consideration of the wandering
				580	tree problem, F2FS is able to cut off the propagation of node updates caused by
				581	leaf data writes.
				582
				583	Directory Structure
				584	-------------------
				585
				586	A directory entry occupies 11 bytes, which consists of the following attributes.
				587
				588	- hash hash value of the file name
				589	- ino inode number
				590	- len the length of file name
				591	- type file type such as directory, symlink, etc
				592
				593	A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
				594	used to represent whether each dentry is valid or not. A dentry block occupies
				595	4KB with the following composition.
				596
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	597	::
				598
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	599	Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
				600	dentries(11 * 214 bytes) + file name (8 * 214 bytes)
				601
				602	[Bucket]
				603	+--------------------------------+
				604	\|dentry block 1 \| dentry block 2 \|
				605	+--------------------------------+
				606	. .
				607	. .
				608	. [Dentry Block Structure: 4KB] .
				609	+--------+----------+----------+------------+
				610	\| bitmap \| reserved \| dentries \| file names \|
				611	+--------+----------+----------+------------+
				612	[Dentry Block: 4KB] . .
				613	. .
				614	. .
				615	+------+------+-----+------+
				616	\| hash \| ino \| len \| type \|
				617	+------+------+-----+------+
				618	[Dentry Structure: 11 bytes]
				619
				620	F2FS implements multi-level hash tables for directory structure. Each level has
				621	a hash table with dedicated number of hash buckets as shown below. Note that
				622	"A(2B)" means a bucket includes 2 data blocks.
				623
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	624	::
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	625
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	626	----------------------
				627	A : bucket
				628	B : block
				629	N : MAX_DIR_HASH_DEPTH
				630	----------------------
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	631
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	632	level #0 \| A(2B)
				633	\|
				634	level #1 \| A(2B) - A(2B)
				635	\|
				636	level #2 \| A(2B) - A(2B) - A(2B) - A(2B)
				637	. \| . . . .
				638	level #N/2 \| A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
				639	. \| . . . .
				640	level #N \| A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
				641
				642	The number of blocks and buckets are determined by::
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	643
				644	,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
				645	# of blocks in level #n = \|
				646	`- 4, Otherwise
				647
Chao Yu	bfec07d	2014-05-28 08:56:09 +0800	[diff] [blame]	648	,- 2^(n + dir_level),
				649	\| if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	650	# of buckets in level #n = \|
Chao Yu	bfec07d	2014-05-28 08:56:09 +0800	[diff] [blame]	651	`- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
				652	Otherwise
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	653
				654	When F2FS finds a file name in a directory, at first a hash value of the file
				655	name is calculated. Then, F2FS scans the hash table in level #0 to find the
				656	dentry consisting of the file name and its inode number. If not found, F2FS
				657	scans the next hash table in level #1. In this way, F2FS scans hash tables in
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	658	each levels incrementally from 1 to N. In each level F2FS needs to scan only
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	659	one bucket determined by the following equation, which shows O(log(# of files))
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	660	complexity::
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	661
				662	bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
				663
				664	In the case of file creation, F2FS finds empty consecutive slots that cover the
				665	file name. F2FS searches the empty slots in the hash tables of whole levels from
				666	1 to N in the same way as the lookup operation.
				667
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	668	The following figure shows an example of two cases holding children::
				669
Jaegeuk Kim	98e4da8	2012-11-02 17:05:42 +0900	[diff] [blame]	670	--------------> Dir <--------------
				671	\| \|
				672	child child
				673
				674	child - child [hole] - child
				675
				676	child - child - child [hole] - [hole] - child
				677
				678	Case 1: Case 2:
				679	Number of children = 6, Number of children = 3,
				680	File size = 7 File size = 7
				681
				682	Default Block Allocation
				683	------------------------
				684
				685	At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
				686	and Hot/Warm/Cold data.
				687
				688	- Hot node contains direct node blocks of directories.
				689	- Warm node contains direct node blocks except hot node blocks.
				690	- Cold node contains indirect node blocks
				691	- Hot data contains dentry blocks
				692	- Warm data contains data blocks except hot and cold data blocks
				693	- Cold data contains multimedia data or migrated data blocks
				694
				695	LFS has two schemes for free space management: threaded log and copy-and-compac-
				696	tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
				697	for devices showing very good sequential write performance, since free segments
				698	are served all the time for writing new data. However, it suffers from cleaning
				699	overhead under high utilization. Contrarily, the threaded log scheme suffers
				700	from random writes, but no cleaning process is needed. F2FS adopts a hybrid
				701	scheme where the copy-and-compaction scheme is adopted by default, but the
				702	policy is dynamically changed to the threaded log scheme according to the file
				703	system status.
				704
				705	In order to align F2FS with underlying flash-based storage, F2FS allocates a
				706	segment in a unit of section. F2FS expects that the section size would be the
				707	same as the unit size of garbage collection in FTL. Furthermore, with respect
				708	to the mapping granularity in FTL, F2FS allocates each section of the active
				709	logs from different zones as much as possible, since FTL can write the data in
				710	the active logs into one allocation unit according to its mapping granularity.
				711
				712	Cleaning process
				713	----------------
				714
				715	F2FS does cleaning both on demand and in the background. On-demand cleaning is
				716	triggered when there are not enough free segments to serve VFS calls. Background
				717	cleaner is operated by a kernel thread, and triggers the cleaning job when the
				718	system is idle.
				719
				720	F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
				721	In the greedy algorithm, F2FS selects a victim segment having the smallest number
				722	of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
				723	according to the segment age and the number of valid blocks in order to address
				724	log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
				725	algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
				726	algorithm.
				727
				728	In order to identify whether the data in the victim segment are valid or not,
				729	F2FS manages a bitmap. Each bit represents the validity of a block, and the
				730	bitmap is composed of a bit stream covering whole blocks in main area.
Hyunchul Lee	8b3a0ca	2018-01-31 11:36:59 +0900	[diff] [blame]	731
				732	Write-hint Policy
				733	-----------------
				734
				735	1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
				736
				737	2) whint_mode=user-based. F2FS tries to pass down hints given by
				738	users.
				739
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	740	===================== ======================== ===================
Hyunchul Lee	8b3a0ca	2018-01-31 11:36:59 +0900	[diff] [blame]	741	User F2FS Block
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	742	===================== ======================== ===================
Chao Yu	3c16dc4	2021-06-08 07:31:22 +0800	[diff] [blame]	743	N/A META WRITE_LIFE_NOT_SET
				744	N/A HOT_NODE "
				745	N/A WARM_NODE "
				746	N/A COLD_NODE "
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	747	ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
				748	extension list " "
Hyunchul Lee	8b3a0ca	2018-01-31 11:36:59 +0900	[diff] [blame]	749
				750	-- buffered io
				751	WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
				752	WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
				753	WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
				754	WRITE_LIFE_NONE " "
				755	WRITE_LIFE_MEDIUM " "
				756	WRITE_LIFE_LONG " "
				757
				758	-- direct io
				759	WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
				760	WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
				761	WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
				762	WRITE_LIFE_NONE " WRITE_LIFE_NONE
				763	WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
				764	WRITE_LIFE_LONG " WRITE_LIFE_LONG
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	765	===================== ======================== ===================
Hyunchul Lee	8b3a0ca	2018-01-31 11:36:59 +0900	[diff] [blame]	766
				767	3) whint_mode=fs-based. F2FS passes down hints with its policy.
				768
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	769	===================== ======================== ===================
Hyunchul Lee	8b3a0ca	2018-01-31 11:36:59 +0900	[diff] [blame]	770	User F2FS Block
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	771	===================== ======================== ===================
Chao Yu	3c16dc4	2021-06-08 07:31:22 +0800	[diff] [blame]	772	N/A META WRITE_LIFE_MEDIUM;
				773	N/A HOT_NODE WRITE_LIFE_NOT_SET
				774	N/A WARM_NODE "
				775	N/A COLD_NODE WRITE_LIFE_NONE
Hyunchul Lee	8b3a0ca	2018-01-31 11:36:59 +0900	[diff] [blame]	776	ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
				777	extension list " "
				778
				779	-- buffered io
				780	WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
				781	WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
				782	WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
				783	WRITE_LIFE_NONE " "
				784	WRITE_LIFE_MEDIUM " "
				785	WRITE_LIFE_LONG " "
				786
				787	-- direct io
				788	WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
				789	WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
				790	WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
				791	WRITE_LIFE_NONE " WRITE_LIFE_NONE
				792	WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
				793	WRITE_LIFE_LONG " WRITE_LIFE_LONG
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	794	===================== ======================== ===================
Jaegeuk Kim	cad3836	2019-06-26 18:23:05 -0700	[diff] [blame]	795
				796	Fallocate(2) Policy
				797	-------------------
				798
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	799	The default policy follows the below POSIX rule.
Jaegeuk Kim	cad3836	2019-06-26 18:23:05 -0700	[diff] [blame]	800
				801	Allocating disk space
				802	The default operation (i.e., mode is zero) of fallocate() allocates
				803	the disk space within the range specified by offset and len. The
				804	file size (as reported by stat(2)) will be changed if offset+len is
				805	greater than the file size. Any subregion within the range specified
				806	by offset and len that did not contain data before the call will be
				807	initialized to zero. This default behavior closely resembles the
				808	behavior of the posix_fallocate(3) library function, and is intended
				809	as a method of optimally implementing that function.
				810
				811	However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	812	fallocate(fd, DEFAULT_MODE), it allocates on-disk block addressess having
Jaegeuk Kim	cad3836	2019-06-26 18:23:05 -0700	[diff] [blame]	813	zero or random data, which is useful to the below scenario where:
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	814
Jaegeuk Kim	cad3836	2019-06-26 18:23:05 -0700	[diff] [blame]	815	1. create(fd)
				816	2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
				817	3. fallocate(fd, 0, 0, size)
				818	4. address = fibmap(fd, offset)
				819	5. open(blkdev)
				820	6. write(blkdev, address)
Chao Yu	4c8ff70	2019-11-01 18:07:14 +0800	[diff] [blame]	821
				822	Compression implementation
				823	--------------------------
				824
				825	- New term named cluster is defined as basic unit of compression, file can
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	826	be divided into multiple clusters logically. One cluster includes 4 << n
				827	(n >= 0) logical pages, compression size is also cluster size, each of
				828	cluster can be compressed or not.
Chao Yu	4c8ff70	2019-11-01 18:07:14 +0800	[diff] [blame]	829
				830	- In cluster metadata layout, one special block address is used to indicate
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	831	a cluster is a compressed one or normal one; for compressed cluster, following
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	832	metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
				833	stores data including compress header and compressed data.
Chao Yu	4c8ff70	2019-11-01 18:07:14 +0800	[diff] [blame]	834
				835	- In order to eliminate write amplification during overwrite, F2FS only
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	836	support compression on write-once file, data can be compressed only when
Chao Yu	4fc781a	2020-07-03 16:39:09 +0800	[diff] [blame]	837	all logical blocks in cluster contain valid data and compress ratio of
				838	cluster data is lower than specified threshold.
Chao Yu	4c8ff70	2019-11-01 18:07:14 +0800	[diff] [blame]	839
Fengnan Chang	151b198	2021-06-08 19:15:08 +0800	[diff] [blame]	840	- To enable compression on regular inode, there are four ways:
Chao Yu	4c8ff70	2019-11-01 18:07:14 +0800	[diff] [blame]	841
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	842	* chattr +c file
				843	* chattr +c dir; touch dir/file
				844	* mount w/ -o compress_extension=ext; touch file.ext
Chao Yu	3874070	2021-04-13 17:56:53 +0800	[diff] [blame]	845	* mount w/ -o compress_extension=*; touch any_file
				846
Fengnan Chang	151b198	2021-06-08 19:15:08 +0800	[diff] [blame]	847	- To disable compression on regular inode, there are two ways:
				848
				849	* chattr -c file
				850	* mount w/ -o nocompress_extension=ext; touch file.ext
				851
				852	- Priority in between FS_COMPR_FL, FS_NOCOMP_FS, extensions:
				853
				854	* compress_extension=so; nocompress_extension=zip; chattr +c dir; touch
				855	dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so and baz.txt
				856	should be compresse, bar.zip should be non-compressed. chattr +c dir/bar.zip
				857	can enable compress on bar.zip.
				858	* compress_extension=so; nocompress_extension=zip; chattr -c dir; touch
				859	dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so should be
				860	compresse, bar.zip and baz.txt should be non-compressed.
				861	chattr+c dir/bar.zip; chattr+c dir/baz.txt; can enable compress on bar.zip
				862	and baz.txt.
				863
Chao Yu	3874070	2021-04-13 17:56:53 +0800	[diff] [blame]	864	- At this point, compression feature doesn't expose compressed space to user
				865	directly in order to guarantee potential data updates later to the space.
				866	Instead, the main goal is to reduce data writes to flash disk as much as
				867	possible, resulting in extending disk life time as well as relaxing IO
Fengnan Chang	4a4fc04	2021-08-09 10:21:04 +0800	[diff] [blame^]	868	congestion. Alternatively, we've added ioctl(F2FS_IOC_RELEASE_COMPRESS_BLOCKS)
				869	interface to reclaim compressed space and show it to user after putting the
				870	immutable bit. Immutable bit, after release, it doesn't allow writing/mmaping
				871	on the file, until reserving compressed space via
				872	ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or truncating filesize to zero.
Mauro Carvalho Chehab	89272ca	2020-02-17 17:12:04 +0100	[diff] [blame]	873
				874	Compress metadata layout::
				875
				876	[Dnode Structure]
				877	+-----------------------------------------------+
				878	\| cluster 1 \| cluster 2 \| ......... \| cluster N \|
				879	+-----------------------------------------------+
				880	. . . .
				881	. . . .
				882	. Compressed Cluster . . Normal Cluster .
				883	+----------+---------+---------+---------+ +---------+---------+---------+---------+
				884	\|compr flag\| block 1 \| block 2 \| block 3 \| \| block 1 \| block 2 \| block 3 \| block 4 \|
				885	+----------+---------+---------+---------+ +---------+---------+---------+---------+
				886	. .
				887	. .
				888	. .
				889	+-------------+-------------+----------+----------------------------+
				890	\| data length \| data chksum \| reserved \| compressed data \|
				891	+-------------+-------------+----------+----------------------------+
Aravind Ramesh	de881df	2020-07-16 18:26:56 +0530	[diff] [blame]	892
Daeho Jeong	602a16d	2020-12-01 13:08:02 +0900	[diff] [blame]	893	Compression mode
				894	--------------------------
				895
				896	f2fs supports "fs" and "user" compression modes with "compression_mode" mount option.
				897	With this option, f2fs provides a choice to select the way how to compress the
				898	compression enabled files (refer to "Compression implementation" section for how to
				899	enable compression on a regular inode).
				900
				901	1) compress_mode=fs
				902	This is the default option. f2fs does automatic compression in the writeback of the
				903	compression enabled files.
				904
				905	2) compress_mode=user
Ed Tsai	092af2e	2021-02-04 21:25:56 +0800	[diff] [blame]	906	This disables the automatic compression and gives the user discretion of choosing the
Daeho Jeong	602a16d	2020-12-01 13:08:02 +0900	[diff] [blame]	907	target file and the timing. The user can do manual compression/decompression on the
				908	compression enabled files using F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE
				909	ioctls like the below.
				910
				911	To decompress a file,
				912
				913	fd = open(filename, O_WRONLY, 0);
				914	ret = ioctl(fd, F2FS_IOC_DECOMPRESS_FILE);
				915
				916	To compress a file,
				917
				918	fd = open(filename, O_WRONLY, 0);
				919	ret = ioctl(fd, F2FS_IOC_COMPRESS_FILE);
				920
Aravind Ramesh	de881df	2020-07-16 18:26:56 +0530	[diff] [blame]	921	NVMe Zoned Namespace devices
				922	----------------------------
				923
				924	- ZNS defines a per-zone capacity which can be equal or less than the
				925	zone-size. Zone-capacity is the number of usable blocks in the zone.
Randy Dunlap	ca313c8	2020-09-02 17:08:31 -0700	[diff] [blame]	926	F2FS checks if zone-capacity is less than zone-size, if it is, then any
Aravind Ramesh	de881df	2020-07-16 18:26:56 +0530	[diff] [blame]	927	segment which starts after the zone-capacity is marked as not-free in
				928	the free segment bitmap at initial mount time. These segments are marked
				929	as permanently used so they are not allocated for writes and
				930	consequently are not needed to be garbage collected. In case the
				931	zone-capacity is not aligned to default segment size(2MB), then a segment
				932	can start before the zone-capacity and span across zone-capacity boundary.
				933	Such spanning segments are also considered as usable segments. All blocks
				934	past the zone-capacity are considered unusable in these segments.