David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 1 | |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 2 | BTRFS |
| 3 | ===== |
David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 4 | |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 5 | Btrfs is a copy on write filesystem for Linux aimed at |
David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 6 | implementing advanced features while focusing on fault tolerance, |
| 7 | repair and easy administration. Initially developed by Oracle, Btrfs |
| 8 | is licensed under the GPL and open for contribution from anyone. |
| 9 | |
| 10 | Linux has a wealth of filesystems to choose from, but we are facing a |
| 11 | number of challenges with scaling to the large storage subsystems that |
| 12 | are becoming common in today's data centers. Filesystems need to scale |
| 13 | in their ability to address and manage large storage, and also in |
| 14 | their ability to detect, repair and tolerate errors in the data stored |
| 15 | on disk. Btrfs is under heavy development, and is not suitable for |
| 16 | any uses other than benchmarking and review. The Btrfs disk format is |
| 17 | not yet finalized. |
| 18 | |
| 19 | The main Btrfs features include: |
| 20 | |
| 21 | * Extent based file storage (2^64 max file size) |
| 22 | * Space efficient packing of small files |
| 23 | * Space efficient indexed directories |
| 24 | * Dynamic inode allocation |
| 25 | * Writable snapshots |
| 26 | * Subvolumes (separate internal filesystem roots) |
| 27 | * Object level mirroring and striping |
| 28 | * Checksums on data and metadata (multiple algorithms available) |
| 29 | * Compression |
| 30 | * Integrated multiple device support, with several raid algorithms |
| 31 | * Online filesystem check (not yet implemented) |
| 32 | * Very fast offline filesystem check |
| 33 | * Efficient incremental backup and FS mirroring (not yet implemented) |
| 34 | * Online filesystem defragmentation |
| 35 | |
| 36 | |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 37 | Mount Options |
| 38 | ============= |
David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 39 | |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 40 | When mounting a btrfs filesystem, the following option are accepted. |
| 41 | Unless otherwise specified, all options default to off. |
| 42 | |
| 43 | alloc_start=<bytes> |
| 44 | Debugging option to force all block allocations above a certain |
| 45 | byte threshold on each block device. The value is specified in |
| 46 | bytes, optionally with a K, M, or G suffix, case insensitive. |
| 47 | Default is 1MB. |
| 48 | |
| 49 | autodefrag |
| 50 | Detect small random writes into files and queue them up for the |
| 51 | defrag process. Works best for small files; Not well suited for |
| 52 | large database workloads. |
| 53 | |
| 54 | check_int |
| 55 | check_int_data |
| 56 | check_int_print_mask=<value> |
| 57 | These debugging options control the behavior of the integrity checking |
| 58 | module (the BTRFS_FS_CHECK_INTEGRITY config option required). |
| 59 | |
| 60 | check_int enables the integrity checker module, which examines all |
| 61 | block write requests to ensure on-disk consistency, at a large |
| 62 | memory and CPU cost. |
| 63 | |
| 64 | check_int_data includes extent data in the integrity checks, and |
| 65 | implies the check_int option. |
| 66 | |
| 67 | check_int_print_mask takes a bitmask of BTRFSIC_PRINT_MASK_* values |
| 68 | as defined in fs/btrfs/check-integrity.c, to control the integrity |
| 69 | checker module behavior. |
| 70 | |
| 71 | See comments at the top of fs/btrfs/check-integrity.c for more info. |
| 72 | |
David Sterba | 906c176 | 2013-11-20 15:05:51 +0100 | [diff] [blame^] | 73 | commit=<seconds> |
| 74 | Set the interval of periodic commit, 30 seconds by default. Higher |
| 75 | values defer data being synced to permanent storage with obvious |
| 76 | consequences when the system crashes. The upper bound is not forced, |
| 77 | but a warning is printed if it's more than 300 seconds (5 minutes). |
| 78 | |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 79 | compress |
| 80 | compress=<type> |
| 81 | compress-force |
| 82 | compress-force=<type> |
| 83 | Control BTRFS file data compression. Type may be specified as "zlib" |
| 84 | "lzo" or "no" (for no compression, used for remounting). If no type |
| 85 | is specified, zlib is used. If compress-force is specified, |
| 86 | all files will be compressed, whether or not they compress well. |
| 87 | If compression is enabled, nodatacow and nodatasum are disabled. |
| 88 | |
| 89 | degraded |
| 90 | Allow mounts to continue with missing devices. A read-write mount may |
| 91 | fail with too many devices missing, for example if a stripe member |
| 92 | is completely missing. |
| 93 | |
| 94 | device=<devicepath> |
| 95 | Specify a device during mount so that ioctls on the control device |
Masanari Iida | 9ed354b | 2013-08-20 20:33:17 +0900 | [diff] [blame] | 96 | can be avoided. Especially useful when trying to mount a multi-device |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 97 | setup as root. May be specified multiple times for multiple devices. |
| 98 | |
| 99 | discard |
| 100 | Issue frequent commands to let the block device reclaim space freed by |
| 101 | the filesystem. This is useful for SSD devices, thinly provisioned |
| 102 | LUNs and virtual machine images, but may have a significant |
| 103 | performance impact. (The fstrim command is also available to |
| 104 | initiate batch trims from userspace). |
| 105 | |
| 106 | enospc_debug |
| 107 | Debugging option to be more verbose in some ENOSPC conditions. |
| 108 | |
| 109 | fatal_errors=<action> |
| 110 | Action to take when encountering a fatal error: |
| 111 | "bug" - BUG() on a fatal error. This is the default. |
| 112 | "panic" - panic() on a fatal error. |
| 113 | |
| 114 | flushoncommit |
| 115 | The 'flushoncommit' mount option forces any data dirtied by a write in a |
| 116 | prior transaction to commit as part of the current commit. This makes |
| 117 | the committed state a fully consistent view of the file system from the |
| 118 | application's perspective (i.e., it includes all completed file system |
| 119 | operations). This was previously the behavior only when a snapshot is |
| 120 | created. |
| 121 | |
| 122 | inode_cache |
| 123 | Enable free inode number caching. Defaults to off due to an overflow |
| 124 | problem when the free space crcs don't fit inside a single page. |
| 125 | |
| 126 | max_inline=<bytes> |
| 127 | Specify the maximum amount of space, in bytes, that can be inlined in |
| 128 | a metadata B-tree leaf. The value is specified in bytes, optionally |
| 129 | with a K, M, or G suffix, case insensitive. In practice, this value |
| 130 | is limited by the root sector size, with some space unavailable due |
| 131 | to leaf headers. For a 4k sectorsize, max inline data is ~3900 bytes. |
| 132 | |
| 133 | metadata_ratio=<value> |
| 134 | Specify that 1 metadata chunk should be allocated after every <value> |
| 135 | data chunks. Off by default. |
| 136 | |
| 137 | noacl |
| 138 | Disable support for Posix Access Control Lists (ACLs). See the |
| 139 | acl(5) manual page for more information about ACLs. |
| 140 | |
| 141 | nobarrier |
| 142 | Disables the use of block layer write barriers. Write barriers ensure |
| 143 | that certain IOs make it through the device cache and are on persistent |
| 144 | storage. If used on a device with a volatile (non-battery-backed) |
| 145 | write-back cache, this option will lead to filesystem corruption on a |
| 146 | system crash or power loss. |
| 147 | |
| 148 | nodatacow |
| 149 | Disable data copy-on-write for newly created files. Implies nodatasum, |
| 150 | and disables all compression. |
| 151 | |
| 152 | nodatasum |
| 153 | Disable data checksumming for newly created files. |
| 154 | |
| 155 | notreelog |
| 156 | Disable the tree logging used for fsync and O_SYNC writes. |
| 157 | |
| 158 | recovery |
| 159 | Enable autorecovery attempts if a bad tree root is found at mount time. |
| 160 | Currently this scans a list of several previous tree roots and tries to |
| 161 | use the first readable. |
| 162 | |
David Sterba | 906c176 | 2013-11-20 15:05:51 +0100 | [diff] [blame^] | 163 | rescan_uuid_tree |
| 164 | Force check and rebuild procedure of the UUID tree. This should not |
| 165 | normally be needed. |
| 166 | |
| 167 | skip_balance |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 168 | Skip automatic resume of interrupted balance operation after mount. |
| 169 | May be resumed with "btrfs balance resume." |
| 170 | |
| 171 | space_cache (*) |
| 172 | Enable the on-disk freespace cache. |
| 173 | nospace_cache |
| 174 | Disable freespace cache loading without clearing the cache. |
| 175 | clear_cache |
| 176 | Force clearing and rebuilding of the disk space cache if something |
| 177 | has gone wrong. |
| 178 | |
| 179 | ssd |
| 180 | nossd |
| 181 | ssd_spread |
| 182 | Options to control ssd allocation schemes. By default, BTRFS will |
| 183 | enable or disable ssd allocation heuristics depending on whether a |
| 184 | rotational or nonrotational disk is in use. The ssd and nossd options |
| 185 | can override this autodetection. |
| 186 | |
| 187 | The ssd_spread mount option attempts to allocate into big chunks |
| 188 | of unused space, and may perform better on low-end ssds. ssd_spread |
| 189 | implies ssd, enabling all other ssd heuristics as well. |
| 190 | |
| 191 | subvol=<path> |
| 192 | Mount subvolume at <path> rather than the root subvolume. <path> is |
| 193 | relative to the top level subvolume. |
| 194 | |
| 195 | subvolid=<ID> |
| 196 | Mount subvolume specified by an ID number rather than the root subvolume. |
| 197 | This allows mounting of subvolumes which are not in the root of the mounted |
| 198 | filesystem. |
| 199 | You can use "btrfs subvolume list" to see subvolume ID numbers. |
| 200 | |
| 201 | subvolrootid=<objectid> (deprecated) |
| 202 | Mount subvolume specified by <objectid> rather than the root subvolume. |
| 203 | This allows mounting of subvolumes which are not in the root of the mounted |
| 204 | filesystem. |
| 205 | You can use "btrfs subvolume show " to see the object ID for a subvolume. |
| 206 | |
| 207 | thread_pool=<number> |
| 208 | The number of worker threads to allocate. The default number is equal |
| 209 | to the number of CPUs + 2, or 8, whichever is smaller. |
| 210 | |
| 211 | user_subvol_rm_allowed |
| 212 | Allow subvolumes to be deleted by a non-root user. Use with caution. |
| 213 | |
| 214 | MAILING LIST |
| 215 | ============ |
David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 216 | |
| 217 | There is a Btrfs mailing list hosted on vger.kernel.org. You can |
| 218 | find details on how to subscribe here: |
| 219 | |
| 220 | http://vger.kernel.org/vger-lists.html#linux-btrfs |
| 221 | |
| 222 | Mailing list archives are available from gmane: |
| 223 | |
| 224 | http://dir.gmane.org/gmane.comp.file-systems.btrfs |
| 225 | |
| 226 | |
| 227 | |
Eric Sandeen | c854a99 | 2013-03-26 19:36:12 +0000 | [diff] [blame] | 228 | IRC |
| 229 | === |
David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 230 | |
| 231 | Discussion of Btrfs also occurs on the #btrfs channel of the Freenode |
| 232 | IRC network. |
| 233 | |
| 234 | |
| 235 | |
| 236 | UTILITIES |
| 237 | ========= |
| 238 | |
| 239 | Userspace tools for creating and manipulating Btrfs file systems are |
| 240 | available from the git repository at the following location: |
| 241 | |
Arnd Hannemann | b52f75a | 2011-11-16 17:35:37 +0100 | [diff] [blame] | 242 | http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git |
| 243 | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git |
David Woodhouse | 709ac06 | 2009-01-07 09:54:24 -0500 | [diff] [blame] | 244 | |
| 245 | These include the following tools: |
| 246 | |
| 247 | mkfs.btrfs: create a filesystem |
| 248 | |
| 249 | btrfsctl: control program to create snapshots and subvolumes: |
| 250 | |
| 251 | mount /dev/sda2 /mnt |
| 252 | btrfsctl -s new_subvol_name /mnt |
| 253 | btrfsctl -s snapshot_of_default /mnt/default |
| 254 | btrfsctl -s snapshot_of_new_subvol /mnt/new_subvol_name |
| 255 | btrfsctl -s snapshot_of_a_snapshot /mnt/snapshot_of_new_subvol |
| 256 | ls /mnt |
| 257 | default snapshot_of_a_snapshot snapshot_of_new_subvol |
| 258 | new_subvol_name snapshot_of_default |
| 259 | |
| 260 | Snapshots and subvolumes cannot be deleted right now, but you can |
| 261 | rm -rf all the files and directories inside them. |
| 262 | |
| 263 | btrfsck: do a limited check of the FS extent trees. |
| 264 | |
| 265 | btrfs-debug-tree: print all of the FS metadata in text form. Example: |
| 266 | |
| 267 | btrfs-debug-tree /dev/sda2 >& big_output_file |