Bryan Gurney | e4f3fab | 2019-03-07 15:42:39 -0500 | [diff] [blame] | 1 | dm-dust |
| 2 | ======= |
| 3 | |
| 4 | This target emulates the behavior of bad sectors at arbitrary |
| 5 | locations, and the ability to enable the emulation of the failures |
| 6 | at an arbitrary time. |
| 7 | |
| 8 | This target behaves similarly to a linear target. At a given time, |
| 9 | the user can send a message to the target to start failing read |
| 10 | requests on specific blocks (to emulate the behavior of a hard disk |
| 11 | drive with bad sectors). |
| 12 | |
| 13 | When the failure behavior is enabled (i.e.: when the output of |
| 14 | "dmsetup status" displays "fail_read_on_bad_block"), reads of blocks |
| 15 | in the "bad block list" will fail with EIO ("Input/output error"). |
| 16 | |
| 17 | Writes of blocks in the "bad block list will result in the following: |
| 18 | |
| 19 | 1. Remove the block from the "bad block list". |
| 20 | 2. Successfully complete the write. |
| 21 | |
| 22 | This emulates the "remapped sector" behavior of a drive with bad |
| 23 | sectors. |
| 24 | |
| 25 | Normally, a drive that is encountering bad sectors will most likely |
| 26 | encounter more bad sectors, at an unknown time or location. |
| 27 | With dm-dust, the user can use the "addbadblock" and "removebadblock" |
| 28 | messages to add arbitrary bad blocks at new locations, and the |
| 29 | "enable" and "disable" messages to modulate the state of whether the |
| 30 | configured "bad blocks" will be treated as bad, or bypassed. |
| 31 | This allows the pre-writing of test data and metadata prior to |
| 32 | simulating a "failure" event where bad sectors start to appear. |
| 33 | |
| 34 | Table parameters: |
| 35 | ----------------- |
| 36 | <device_path> <offset> <blksz> |
| 37 | |
| 38 | Mandatory parameters: |
| 39 | <device_path>: path to the block device. |
| 40 | <offset>: offset to data area from start of device_path |
| 41 | <blksz>: block size in bytes |
| 42 | (minimum 512, maximum 1073741824, must be a power of 2) |
| 43 | |
| 44 | Usage instructions: |
| 45 | ------------------- |
| 46 | |
| 47 | First, find the size (in 512-byte sectors) of the device to be used: |
| 48 | |
| 49 | $ sudo blockdev --getsz /dev/vdb1 |
| 50 | 33552384 |
| 51 | |
| 52 | Create the dm-dust device: |
| 53 | (For a device with a block size of 512 bytes) |
| 54 | $ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 512' |
| 55 | |
| 56 | (For a device with a block size of 4096 bytes) |
| 57 | $ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 4096' |
| 58 | |
| 59 | Check the status of the read behavior ("bypass" indicates that all I/O |
| 60 | will be passed through to the underlying device): |
| 61 | $ sudo dmsetup status dust1 |
| 62 | 0 33552384 dust 252:17 bypass |
| 63 | |
| 64 | $ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=128 iflag=direct |
| 65 | 128+0 records in |
| 66 | 128+0 records out |
| 67 | |
| 68 | $ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct |
| 69 | 128+0 records in |
| 70 | 128+0 records out |
| 71 | |
| 72 | Adding and removing bad blocks: |
| 73 | ------------------------------- |
| 74 | |
| 75 | At any time (i.e.: whether the device has the "bad block" emulation |
| 76 | enabled or disabled), bad blocks may be added or removed from the |
| 77 | device via the "addbadblock" and "removebadblock" messages: |
| 78 | |
| 79 | $ sudo dmsetup message dust1 0 addbadblock 60 |
| 80 | kernel: device-mapper: dust: badblock added at block 60 |
| 81 | |
| 82 | $ sudo dmsetup message dust1 0 addbadblock 67 |
| 83 | kernel: device-mapper: dust: badblock added at block 67 |
| 84 | |
| 85 | $ sudo dmsetup message dust1 0 addbadblock 72 |
| 86 | kernel: device-mapper: dust: badblock added at block 72 |
| 87 | |
| 88 | These bad blocks will be stored in the "bad block list". |
| 89 | While the device is in "bypass" mode, reads and writes will succeed: |
| 90 | |
| 91 | $ sudo dmsetup status dust1 |
| 92 | 0 33552384 dust 252:17 bypass |
| 93 | |
| 94 | Enabling block read failures: |
| 95 | ----------------------------- |
| 96 | |
| 97 | To enable the "fail read on bad block" behavior, send the "enable" message: |
| 98 | |
| 99 | $ sudo dmsetup message dust1 0 enable |
| 100 | kernel: device-mapper: dust: enabling read failures on bad sectors |
| 101 | |
| 102 | $ sudo dmsetup status dust1 |
| 103 | 0 33552384 dust 252:17 fail_read_on_bad_block |
| 104 | |
| 105 | With the device in "fail read on bad block" mode, attempting to read a |
| 106 | block will encounter an "Input/output error": |
| 107 | |
| 108 | $ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=1 skip=67 iflag=direct |
| 109 | dd: error reading '/dev/mapper/dust1': Input/output error |
| 110 | 0+0 records in |
| 111 | 0+0 records out |
| 112 | 0 bytes copied, 0.00040651 s, 0.0 kB/s |
| 113 | |
| 114 | ...and writing to the bad blocks will remove the blocks from the list, |
| 115 | therefore emulating the "remap" behavior of hard disk drives: |
| 116 | |
| 117 | $ sudo dd if=/dev/zero of=/dev/mapper/dust1 bs=512 count=128 oflag=direct |
| 118 | 128+0 records in |
| 119 | 128+0 records out |
| 120 | |
| 121 | kernel: device-mapper: dust: block 60 removed from badblocklist by write |
| 122 | kernel: device-mapper: dust: block 67 removed from badblocklist by write |
| 123 | kernel: device-mapper: dust: block 72 removed from badblocklist by write |
| 124 | kernel: device-mapper: dust: block 87 removed from badblocklist by write |
| 125 | |
| 126 | Bad block add/remove error handling: |
| 127 | ------------------------------------ |
| 128 | |
| 129 | Attempting to add a bad block that already exists in the list will |
| 130 | result in an "Invalid argument" error, as well as a helpful message: |
| 131 | |
| 132 | $ sudo dmsetup message dust1 0 addbadblock 88 |
| 133 | device-mapper: message ioctl on dust1 failed: Invalid argument |
| 134 | kernel: device-mapper: dust: block 88 already in badblocklist |
| 135 | |
| 136 | Attempting to remove a bad block that doesn't exist in the list will |
| 137 | result in an "Invalid argument" error, as well as a helpful message: |
| 138 | |
| 139 | $ sudo dmsetup message dust1 0 removebadblock 87 |
| 140 | device-mapper: message ioctl on dust1 failed: Invalid argument |
| 141 | kernel: device-mapper: dust: block 87 not found in badblocklist |
| 142 | |
| 143 | Counting the number of bad blocks in the bad block list: |
| 144 | -------------------------------------------------------- |
| 145 | |
| 146 | To count the number of bad blocks configured in the device, run the |
| 147 | following message command: |
| 148 | |
| 149 | $ sudo dmsetup message dust1 0 countbadblocks |
| 150 | |
| 151 | A message will print with the number of bad blocks currently |
| 152 | configured on the device: |
| 153 | |
| 154 | kernel: device-mapper: dust: countbadblocks: 895 badblock(s) found |
| 155 | |
| 156 | Querying for specific bad blocks: |
| 157 | --------------------------------- |
| 158 | |
| 159 | To find out if a specific block is in the bad block list, run the |
| 160 | following message command: |
| 161 | |
| 162 | $ sudo dmsetup message dust1 0 queryblock 72 |
| 163 | |
| 164 | The following message will print if the block is in the list: |
| 165 | device-mapper: dust: queryblock: block 72 found in badblocklist |
| 166 | |
| 167 | The following message will print if the block is in the list: |
| 168 | device-mapper: dust: queryblock: block 72 not found in badblocklist |
| 169 | |
| 170 | The "queryblock" message command will work in both the "enabled" |
| 171 | and "disabled" modes, allowing the verification of whether a block |
| 172 | will be treated as "bad" without having to issue I/O to the device, |
| 173 | or having to "enable" the bad block emulation. |
| 174 | |
| 175 | Clearing the bad block list: |
| 176 | ---------------------------- |
| 177 | |
| 178 | To clear the bad block list (without needing to individually run |
| 179 | a "removebadblock" message command for every block), run the |
| 180 | following message command: |
| 181 | |
| 182 | $ sudo dmsetup message dust1 0 clearbadblocks |
| 183 | |
| 184 | After clearing the bad block list, the following message will appear: |
| 185 | |
| 186 | kernel: device-mapper: dust: clearbadblocks: badblocks cleared |
| 187 | |
| 188 | If there were no bad blocks to clear, the following message will |
| 189 | appear: |
| 190 | |
| 191 | kernel: device-mapper: dust: clearbadblocks: no badblocks found |
| 192 | |
| 193 | Message commands list: |
| 194 | ---------------------- |
| 195 | |
| 196 | Below is a list of the messages that can be sent to a dust device: |
| 197 | |
| 198 | Operations on blocks (requires a <blknum> argument): |
| 199 | |
| 200 | addbadblock <blknum> |
| 201 | queryblock <blknum> |
| 202 | removebadblock <blknum> |
| 203 | |
| 204 | ...where <blknum> is a block number within range of the device |
| 205 | (corresponding to the block size of the device.) |
| 206 | |
| 207 | Single argument message commands: |
| 208 | |
| 209 | countbadblocks |
| 210 | clearbadblocks |
| 211 | disable |
| 212 | enable |
| 213 | quiet |
| 214 | |
| 215 | Device removal: |
| 216 | --------------- |
| 217 | |
| 218 | When finished, remove the device via the "dmsetup remove" command: |
| 219 | |
| 220 | $ sudo dmsetup remove dust1 |
| 221 | |
| 222 | Quiet mode: |
| 223 | ----------- |
| 224 | |
| 225 | On test runs with many bad blocks, it may be desirable to avoid |
| 226 | excessive logging (from bad blocks added, removed, or "remapped"). |
| 227 | This can be done by enabling "quiet mode" via the following message: |
| 228 | |
| 229 | $ sudo dmsetup message dust1 0 quiet |
| 230 | |
| 231 | This will suppress log messages from add / remove / removed by write |
| 232 | operations. Log messages from "countbadblocks" or "queryblock" |
| 233 | message commands will still print in quiet mode. |
| 234 | |
| 235 | The status of quiet mode can be seen by running "dmsetup status": |
| 236 | |
| 237 | $ sudo dmsetup status dust1 |
| 238 | 0 33552384 dust 252:17 fail_read_on_bad_block quiet |
| 239 | |
| 240 | To disable quiet mode, send the "quiet" message again: |
| 241 | |
| 242 | $ sudo dmsetup message dust1 0 quiet |
| 243 | |
| 244 | $ sudo dmsetup status dust1 |
| 245 | 0 33552384 dust 252:17 fail_read_on_bad_block verbose |
| 246 | |
| 247 | (The presence of "verbose" indicates normal logging.) |
| 248 | |
| 249 | "Why not...?" |
| 250 | ------------- |
| 251 | |
| 252 | scsi_debug has a "medium error" mode that can fail reads on one |
| 253 | specified sector (sector 0x1234, hardcoded in the source code), but |
| 254 | it uses RAM for the persistent storage, which drastically decreases |
| 255 | the potential device size. |
| 256 | |
| 257 | dm-flakey fails all I/O from all block locations at a specified time |
| 258 | frequency, and not a given point in time. |
| 259 | |
| 260 | When a bad sector occurs on a hard disk drive, reads to that sector |
| 261 | are failed by the device, usually resulting in an error code of EIO |
| 262 | ("I/O error") or ENODATA ("No data available"). However, a write to |
| 263 | the sector may succeed, and result in the sector becoming readable |
| 264 | after the device controller no longer experiences errors reading the |
| 265 | sector (or after a reallocation of the sector). However, there may |
| 266 | be bad sectors that occur on the device in the future, in a different, |
| 267 | unpredictable location. |
| 268 | |
| 269 | This target seeks to provide a device that can exhibit the behavior |
| 270 | of a bad sector at a known sector location, at a known time, based |
| 271 | on a large storage device (at least tens of gigabytes, not occupying |
| 272 | system memory). |