Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 1 | dm-raid |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 2 | ======= |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 3 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 4 | The device-mapper RAID (dm-raid) target provides a bridge from DM to MD. |
| 5 | It allows the MD RAID drivers to be accessed using a device-mapper |
| 6 | interface. |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 7 | |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 8 | |
| 9 | Mapping Table Interface |
| 10 | ----------------------- |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 11 | The target is named "raid" and it accepts the following parameters: |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 12 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 13 | <raid_type> <#raid_params> <raid_params> \ |
| 14 | <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>] |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 15 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 16 | <raid_type>: |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 17 | raid0 RAID0 striping (no resilience) |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 18 | raid1 RAID1 mirroring |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 19 | raid4 RAID4 with dedicated last parity disk |
Masanari Iida | bb1423a | 2016-10-17 21:17:10 +0900 | [diff] [blame] | 20 | raid5_n RAID5 with dedicated last parity disk supporting takeover |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 21 | Same as raid4 |
| 22 | -Transitory layout |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 23 | raid5_la RAID5 left asymmetric |
| 24 | - rotating parity 0 with data continuation |
| 25 | raid5_ra RAID5 right asymmetric |
| 26 | - rotating parity N with data continuation |
| 27 | raid5_ls RAID5 left symmetric |
| 28 | - rotating parity 0 with data restart |
| 29 | raid5_rs RAID5 right symmetric |
| 30 | - rotating parity N with data restart |
| 31 | raid6_zr RAID6 zero restart |
| 32 | - rotating parity zero (left-to-right) with data restart |
| 33 | raid6_nr RAID6 N restart |
| 34 | - rotating parity N (right-to-left) with data restart |
| 35 | raid6_nc RAID6 N continue |
| 36 | - rotating parity N (right-to-left) with data continuation |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 37 | raid6_n_6 RAID6 with dedicate parity disks |
| 38 | - parity and Q-syndrome on the last 2 disks; |
Masanari Iida | bb1423a | 2016-10-17 21:17:10 +0900 | [diff] [blame] | 39 | layout for takeover from/to raid4/raid5_n |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 40 | raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk |
| 41 | - layout for takeover from raid5_la from/to raid6 |
| 42 | raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk |
| 43 | - layout for takeover from raid5_ra from/to raid6 |
| 44 | raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk |
| 45 | - layout for takeover from raid5_ls from/to raid6 |
| 46 | raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk |
| 47 | - layout for takeover from raid5_rs from/to raid6 |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 48 | raid10 Various RAID10 inspired algorithms chosen by additional params |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 49 | (see raid10_format and raid10_copies below) |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 50 | - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') |
| 51 | - RAID1E: Integrated Adjacent Stripe Mirroring |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 52 | - RAID1E: Integrated Offset Stripe Mirroring |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 53 | - and other similar RAID10 variants |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 54 | |
Masanari Iida | 40e4712 | 2012-03-04 23:16:11 +0900 | [diff] [blame] | 55 | Reference: Chapter 4 of |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 56 | http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 57 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 58 | <#raid_params>: The number of parameters that follow. |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 59 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 60 | <raid_params> consists of |
| 61 | Mandatory parameters: |
| 62 | <chunk_size>: Chunk size in sectors. This parameter is often known as |
| 63 | "stripe size". It is the only mandatory parameter and |
| 64 | is placed first. |
| 65 | |
| 66 | followed by optional parameters (in any order): |
| 67 | [sync|nosync] Force or prevent RAID initialization. |
| 68 | |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 69 | [rebuild <idx>] Rebuild drive number 'idx' (first drive is 0). |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 70 | |
| 71 | [daemon_sleep <ms>] |
| 72 | Interval between runs of the bitmap daemon that |
| 73 | clear bits. A longer interval means less bitmap I/O but |
| 74 | resyncing after a failure is likely to take longer. |
| 75 | |
| 76 | [min_recovery_rate <kB/sec/disk>] Throttle RAID initialization |
| 77 | [max_recovery_rate <kB/sec/disk>] Throttle RAID initialization |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 78 | [write_mostly <idx>] Mark drive index 'idx' write-mostly. |
| 79 | [max_write_behind <sectors>] See '--write-behind=' (man mdadm) |
| 80 | [stripe_cache <sectors>] Stripe cache size (RAID 4/5/6 only) |
Jonathan Brassow | c108456 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 81 | [region_size <sectors>] |
| 82 | The region_size multiplied by the number of regions is the |
| 83 | logical size of the array. The bitmap records the device |
| 84 | synchronisation state for each region. |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 85 | |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 86 | [raid10_copies <# copies>] |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 87 | [raid10_format <near|far|offset>] |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 88 | These two options are used to alter the default layout of |
| 89 | a RAID10 configuration. The number of copies is can be |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 90 | specified, but the default is 2. There are also three |
| 91 | variations to how the copies are laid down - the default |
| 92 | is "near". Near copies are what most people think of with |
| 93 | respect to mirroring. If these options are left unspecified, |
| 94 | or 'raid10_copies 2' and/or 'raid10_format near' are given, |
| 95 | then the layouts for 2, 3 and 4 devices are: |
Jonathan Brassow | 63f33b8d | 2012-07-31 21:44:26 -0500 | [diff] [blame] | 96 | 2 drives 3 drives 4 drives |
| 97 | -------- ---------- -------------- |
| 98 | A1 A1 A1 A1 A2 A1 A1 A2 A2 |
| 99 | A2 A2 A2 A3 A3 A3 A3 A4 A4 |
| 100 | A3 A3 A4 A4 A5 A5 A5 A6 A6 |
| 101 | A4 A4 A5 A6 A6 A7 A7 A8 A8 |
| 102 | .. .. .. .. .. .. .. .. .. |
| 103 | The 2-device layout is equivalent 2-way RAID1. The 4-device |
| 104 | layout is what a traditional RAID10 would look like. The |
| 105 | 3-device layout is what might be called a 'RAID1E - Integrated |
| 106 | Adjacent Stripe Mirroring'. |
| 107 | |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 108 | If 'raid10_copies 2' and 'raid10_format far', then the layouts |
| 109 | for 2, 3 and 4 devices are: |
| 110 | 2 drives 3 drives 4 drives |
| 111 | -------- -------------- -------------------- |
| 112 | A1 A2 A1 A2 A3 A1 A2 A3 A4 |
| 113 | A3 A4 A4 A5 A6 A5 A6 A7 A8 |
| 114 | A5 A6 A7 A8 A9 A9 A10 A11 A12 |
| 115 | .. .. .. .. .. .. .. .. .. |
| 116 | A2 A1 A3 A1 A2 A2 A1 A4 A3 |
| 117 | A4 A3 A6 A4 A5 A6 A5 A8 A7 |
| 118 | A6 A5 A9 A7 A8 A10 A9 A12 A11 |
| 119 | .. .. .. .. .. .. .. .. .. |
| 120 | |
| 121 | If 'raid10_copies 2' and 'raid10_format offset', then the |
| 122 | layouts for 2, 3 and 4 devices are: |
| 123 | 2 drives 3 drives 4 drives |
| 124 | -------- ------------ ----------------- |
| 125 | A1 A2 A1 A2 A3 A1 A2 A3 A4 |
| 126 | A2 A1 A3 A1 A2 A2 A1 A4 A3 |
| 127 | A3 A4 A4 A5 A6 A5 A6 A7 A8 |
| 128 | A4 A3 A6 A4 A5 A6 A5 A8 A7 |
| 129 | A5 A6 A7 A8 A9 A9 A10 A11 A12 |
| 130 | A6 A5 A9 A7 A8 A10 A9 A12 A11 |
| 131 | .. .. .. .. .. .. .. .. .. |
| 132 | Here we see layouts closely akin to 'RAID1E - Integrated |
| 133 | Offset Stripe Mirroring'. |
| 134 | |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 135 | [delta_disks <N>] |
| 136 | The delta_disks option value (-251 < N < +251) triggers |
| 137 | device removal (negative value) or device addition (positive |
| 138 | value) to any reshape supporting raid levels 4/5/6 and 10. |
| 139 | RAID levels 4/5/6 allow for addition of devices (metadata |
Masanari Iida | bb1423a | 2016-10-17 21:17:10 +0900 | [diff] [blame] | 140 | and data device tuple), raid10_near and raid10_offset only |
| 141 | allow for device addition. raid10_far does not support any |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 142 | reshaping at all. |
| 143 | A minimum of devices have to be kept to enforce resilience, |
| 144 | which is 3 devices for raid4/5 and 4 devices for raid6. |
| 145 | |
| 146 | [data_offset <sectors>] |
| 147 | This option value defines the offset into each data device |
| 148 | where the data starts. This is used to provide out-of-place |
| 149 | reshaping space to avoid writing over data whilst |
| 150 | changing the layout of stripes, hence an interruption/crash |
| 151 | may happen at any time without the risk of losing data. |
| 152 | E.g. when adding devices to an existing raid set during |
| 153 | forward reshaping, the out-of-place space will be allocated |
| 154 | at the beginning of each raid device. The kernel raid4/5/6/10 |
| 155 | MD personalities supporting such device addition will read the data from |
| 156 | the existing first stripes (those with smaller number of stripes) |
| 157 | starting at data_offset to fill up a new stripe with the larger |
| 158 | number of stripes, calculate the redundancy blocks (CRC/Q-syndrome) |
| 159 | and write that new stripe to offset 0. Same will be applied to all |
| 160 | N-1 other new stripes. This out-of-place scheme is used to change |
| 161 | the RAID type (i.e. the allocation algorithm) as well, e.g. |
| 162 | changing from raid5_ls to raid5_n. |
| 163 | |
Heinz Mauelshagen | 63c32ed | 2016-11-30 22:31:05 +0100 | [diff] [blame] | 164 | [journal_dev <dev>] |
| 165 | This option adds a journal device to raid4/5/6 raid sets and |
| 166 | uses it to close the 'write hole' caused by the non-atomic updates |
| 167 | to the component devices which can cause data loss during recovery. |
| 168 | The journal device is used as writethrough thus causing writes to |
| 169 | be throttled versus non-journaled raid4/5/6 sets. |
| 170 | Takeover/reshape is not possible with a raid4/5/6 journal device; |
| 171 | it has to be deconfigured before requesting these. |
| 172 | |
Heinz Mauelshagen | 6e53636 | 2017-03-22 17:44:38 +0100 | [diff] [blame] | 173 | [journal_mode <mode>] |
| 174 | This option sets the caching mode on journaled raid4/5/6 raid sets |
| 175 | (see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'. |
| 176 | If 'writeback' is selected the journal device has to be resilient |
| 177 | and must not suffer from the 'write hole' problem itself (e.g. use |
| 178 | raid1 or raid10) to avoid a single point of failure. |
| 179 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 180 | <#raid_devs>: The number of devices composing the array. |
| 181 | Each device consists of two entries. The first is the device |
| 182 | containing the metadata (if any); the second is the one containing the |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 183 | data. A Maximum of 64 metadata/data device entries are supported |
| 184 | up to target version 1.8.0. |
| 185 | 1.9.0 supports up to 253 which is enforced by the used MD kernel runtime. |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 186 | |
| 187 | If a drive has failed or is missing at creation time, a '-' can be |
| 188 | given for both the metadata and data drives for a given position. |
| 189 | |
| 190 | |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 191 | Example Tables |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 192 | -------------- |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 193 | # RAID4 - 4 data drives, 1 parity (no metadata devices) |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 194 | # No metadata devices specified to hold superblock/bitmap info |
| 195 | # Chunk size of 1MiB |
| 196 | # (Lines separated for easy reading) |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 197 | |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 198 | 0 1960893648 raid \ |
| 199 | raid4 1 2048 \ |
| 200 | 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 |
| 201 | |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 202 | # RAID4 - 4 data drives, 1 parity (with metadata devices) |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 203 | # Chunk size of 1MiB, force RAID initialization, |
| 204 | # min recovery rate at 20 kiB/sec/disk |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 205 | |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 206 | 0 1960893648 raid \ |
Jonathan Brassow | b12d437 | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 207 | raid4 4 2048 sync min_recovery_rate 20 \ |
| 208 | 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82 |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 209 | |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 210 | |
| 211 | Status Output |
| 212 | ------------- |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 213 | 'dmsetup table' displays the table used to construct the mapping. |
Jonathan Brassow | 46bed2b | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 214 | The optional parameters are always printed in the order listed |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 215 | above with "sync" or "nosync" always output ahead of the other |
| 216 | arguments, regardless of the order used when originally loading the table. |
Jonathan Brassow | 46bed2b | 2011-08-02 12:32:07 +0100 | [diff] [blame] | 217 | Arguments that can be repeated are ordered by value. |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 218 | |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 219 | |
| 220 | 'dmsetup status' yields information on the state and health of the array. |
| 221 | The output is as follows (normally a single line, but expanded here for |
| 222 | clarity): |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 223 | 1: <s> <l> raid \ |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 224 | 2: <raid_type> <#devices> <health_chars> \ |
| 225 | 3: <sync_ratio> <sync_action> <mismatch_cnt> |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 226 | |
Jonathan Brassow | c0a2fa1 | 2011-08-02 12:32:06 +0100 | [diff] [blame] | 227 | Line 1 is the standard output produced by device-mapper. |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 228 | Line 2 & 3 are produced by the raid target and are best explained by example: |
| 229 | 0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0 |
NeilBrown | 9d09e66 | 2011-01-13 20:00:02 +0000 | [diff] [blame] | 230 | Here we can see the RAID type is raid4, there are 5 devices - all of |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 231 | which are 'A'live, and the array is 2/490221568 complete with its initial |
| 232 | recovery. Here is a fuller description of the individual fields: |
| 233 | <raid_type> Same as the <raid_type> used to create the array. |
| 234 | <health_chars> One char for each device, indicating: 'A' = alive and |
| 235 | in-sync, 'a' = alive but not in-sync, 'D' = dead/failed. |
| 236 | <sync_ratio> The ratio indicating how much of the array has undergone |
| 237 | the process described by 'sync_action'. If the |
| 238 | 'sync_action' is "check" or "repair", then the process |
| 239 | of "resync" or "recover" can be considered complete. |
| 240 | <sync_action> One of the following possible states: |
| 241 | idle - No synchronization action is being performed. |
| 242 | frozen - The current action has been halted. |
| 243 | resync - Array is undergoing its initial synchronization |
| 244 | or is resynchronizing after an unclean shutdown |
| 245 | (possibly aided by a bitmap). |
| 246 | recover - A device in the array is being rebuilt or |
| 247 | replaced. |
| 248 | check - A user-initiated full check of the array is |
| 249 | being performed. All blocks are read and |
| 250 | checked for consistency. The number of |
| 251 | discrepancies found are recorded in |
| 252 | <mismatch_cnt>. No changes are made to the |
| 253 | array by this action. |
| 254 | repair - The same as "check", but discrepancies are |
| 255 | corrected. |
| 256 | reshape - The array is undergoing a reshape. |
| 257 | <mismatch_cnt> The number of discrepancies found between mirror copies |
| 258 | in RAID1/10 or wrong parity values found in RAID4/5/6. |
| 259 | This value is valid only after a "check" of the array |
| 260 | is performed. A healthy array has a 'mismatch_cnt' of 0. |
Heinz Mauelshagen | 58fc4fe | 2016-11-29 19:26:08 +0100 | [diff] [blame] | 261 | <data_offset> The current data offset to the start of the user data on |
| 262 | each component device of a raid set (see the respective |
| 263 | raid parameter to support out-of-place reshaping). |
Heinz Mauelshagen | 6e53636 | 2017-03-22 17:44:38 +0100 | [diff] [blame] | 264 | <journal_char> 'A' - active write-through journal device. |
| 265 | 'a' - active write-back journal device. |
Heinz Mauelshagen | 63c32ed | 2016-11-30 22:31:05 +0100 | [diff] [blame] | 266 | 'D' - dead journal device. |
| 267 | '-' - no journal device. |
Heinz Mauelshagen | 58fc4fe | 2016-11-29 19:26:08 +0100 | [diff] [blame] | 268 | |
Jonathan Brassow | 4ec1e36 | 2012-10-11 13:40:24 +1100 | [diff] [blame] | 269 | |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 270 | Message Interface |
| 271 | ----------------- |
| 272 | The dm-raid target will accept certain actions through the 'message' interface. |
| 273 | ('man dmsetup' for more information on the message interface.) These actions |
| 274 | include: |
| 275 | "idle" - Halt the current sync action. |
| 276 | "frozen" - Freeze the current sync action. |
| 277 | "resync" - Initiate/continue a resync. |
| 278 | "recover"- Initiate/continue a recover process. |
| 279 | "check" - Initiate a check (i.e. a "scrub") of the array. |
| 280 | "repair" - Initiate a repair of the array. |
Jonathan Brassow | 4ec1e36 | 2012-10-11 13:40:24 +1100 | [diff] [blame] | 281 | |
Heinz Mauelshagen | f15f4d72 | 2015-08-25 17:15:41 +0200 | [diff] [blame] | 282 | |
| 283 | Discard Support |
| 284 | --------------- |
| 285 | The implementation of discard support among hardware vendors varies. |
| 286 | When a block is discarded, some storage devices will return zeroes when |
| 287 | the block is read. These devices set the 'discard_zeroes_data' |
| 288 | attribute. Other devices will return random data. Confusingly, some |
| 289 | devices that advertise 'discard_zeroes_data' will not reliably return |
| 290 | zeroes when discarded blocks are read! Since RAID 4/5/6 uses blocks |
| 291 | from a number of devices to calculate parity blocks and (for performance |
| 292 | reasons) relies on 'discard_zeroes_data' being reliable, it is important |
| 293 | that the devices be consistent. Blocks may be discarded in the middle |
| 294 | of a RAID 4/5/6 stripe and if subsequent read results are not |
| 295 | consistent, the parity blocks may be calculated differently at any time; |
| 296 | making the parity blocks useless for redundancy. It is important to |
| 297 | understand how your hardware behaves with discards if you are going to |
| 298 | enable discards with RAID 4/5/6. |
| 299 | |
| 300 | Since the behavior of storage devices is unreliable in this respect, |
| 301 | even when reporting 'discard_zeroes_data', by default RAID 4/5/6 |
| 302 | discard support is disabled -- this ensures data integrity at the |
| 303 | expense of losing some performance. |
| 304 | |
| 305 | Storage devices that properly support 'discard_zeroes_data' are |
| 306 | increasingly whitelisted in the kernel and can thus be trusted. |
| 307 | |
| 308 | For trusted devices, the following dm-raid module parameter can be set |
| 309 | to safely enable discard support for RAID 4/5/6: |
| 310 | 'devices_handle_discards_safely' |
| 311 | |
| 312 | |
Jonathan Brassow | 4ec1e36 | 2012-10-11 13:40:24 +1100 | [diff] [blame] | 313 | Version History |
| 314 | --------------- |
| 315 | 1.0.0 Initial version. Support for RAID 4/5/6 |
| 316 | 1.1.0 Added support for RAID 1 |
| 317 | 1.2.0 Handle creation of arrays that contain failed devices. |
| 318 | 1.3.0 Added support for RAID 10 |
| 319 | 1.3.1 Allow device replacement/rebuild for RAID 10 |
Jonathan Brassow | 55ebbb5 | 2013-01-22 21:42:18 -0600 | [diff] [blame] | 320 | 1.3.2 Fix/improve redundancy checking for RAID10 |
Jonathan Brassow | fe5d2f4 | 2013-02-21 13:28:10 +1100 | [diff] [blame] | 321 | 1.4.0 Non-functional change. Removes arg from mapping function. |
Jonathan Brassow | be83651 | 2013-04-24 11:42:43 +1000 | [diff] [blame] | 322 | 1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5). |
| 323 | 1.4.2 Add RAID10 "far" and "offset" algorithm support. |
| 324 | 1.5.0 Add message interface to allow manipulation of the sync_action. |
| 325 | New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt. |
Jonathan Brassow | 9092c02 | 2013-05-02 14:19:24 -0500 | [diff] [blame] | 326 | 1.5.1 Add ability to restore transiently failed devices on resume. |
Jonathan Brassow | c4a3955 | 2013-06-25 01:23:59 -0500 | [diff] [blame] | 327 | 1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check". |
Heinz Mauelshagen | 0f4106b | 2015-04-29 14:03:07 +0200 | [diff] [blame] | 328 | 1.6.0 Add discard support (and devices_handle_discard_safely module param). |
Heinz Mauelshagen | 0cf4503 | 2015-04-29 14:03:04 +0200 | [diff] [blame] | 329 | 1.7.0 Add support for MD RAID0 mappings. |
Masahiro Yamada | 34dcaf4 | 2017-02-27 14:29:36 -0800 | [diff] [blame] | 330 | 1.8.0 Explicitly check for compatible flags in the superblock metadata |
Heinz Mauelshagen | d41bfed | 2016-06-14 01:46:01 +0200 | [diff] [blame] | 331 | and reject to start the raid set if any are set by a newer |
| 332 | target version, thus avoiding data corruption on a raid set |
| 333 | with a reshape in progress. |
| 334 | 1.9.0 Add support for RAID level takeover/reshape/region size |
| 335 | and set size reduction. |
Heinz Mauelshagen | b052b07 | 2016-10-17 21:20:07 +0200 | [diff] [blame] | 336 | 1.9.1 Fix activation of existing RAID 4/10 mapped devices |
Heinz Mauelshagen | c63ede3 | 2017-01-14 03:53:07 +0100 | [diff] [blame] | 337 | 1.9.2 Don't emit '- -' on the status table line in case the constructor |
| 338 | fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and |
| 339 | 'D' on the status line. If '- -' is passed into the constructor, emit |
| 340 | '- -' on the table line and '-' as the status line health character. |
Heinz Mauelshagen | 63c32ed | 2016-11-30 22:31:05 +0100 | [diff] [blame] | 341 | 1.10.0 Add support for raid4/5/6 journal device |
Heinz Mauelshagen | 4464e36 | 2017-03-18 01:39:12 +0100 | [diff] [blame] | 342 | 1.10.1 Fix data corruption on reshape request |
| 343 | 1.11.0 Fix table line argument order |
| 344 | (wrong raid10_copies/raid10_format sequence) |
Heinz Mauelshagen | 6e53636 | 2017-03-22 17:44:38 +0100 | [diff] [blame] | 345 | 1.11.1 Add raid4/5/6 journal write-back support via journal_mode option |
Mike Snitzer | b84cf26 | 2017-12-04 10:26:21 -0500 | [diff] [blame] | 346 | 1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available |
Jonathan Brassow | 41dcf19 | 2017-10-02 17:17:35 -0500 | [diff] [blame] | 347 | 1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A') |
Mike Snitzer | b84cf26 | 2017-12-04 10:26:21 -0500 | [diff] [blame] | 348 | 1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an |
| 349 | state races. |
Heinz Mauelshagen | 11e4723 | 2017-12-13 17:13:18 +0100 | [diff] [blame] | 350 | 1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen |