Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 1 | Device-mapper snapshot support |
| 2 | ============================== |
| 3 | |
| 4 | Device-mapper allows you, without massive data copying: |
| 5 | |
| 6 | *) To create snapshots of any block device i.e. mountable, saved states of |
| 7 | the block device which are also writable without interfering with the |
| 8 | original content; |
| 9 | *) To create device "forks", i.e. multiple different versions of the |
| 10 | same data stream. |
Mikulas Patocka | d698aa4 | 2009-12-10 23:52:30 +0000 | [diff] [blame] | 11 | *) To merge a snapshot of a block device back into the snapshot's origin |
| 12 | device. |
| 13 | |
| 14 | In the first two cases, dm copies only the chunks of data that get |
| 15 | changed and uses a separate copy-on-write (COW) block device for |
| 16 | storage. |
| 17 | |
| 18 | For snapshot merge the contents of the COW storage are merged back into |
| 19 | the origin device. |
Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 20 | |
| 21 | |
Mikulas Patocka | d698aa4 | 2009-12-10 23:52:30 +0000 | [diff] [blame] | 22 | There are three dm targets available: |
| 23 | snapshot, snapshot-origin, and snapshot-merge. |
Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 24 | |
| 25 | *) snapshot-origin <origin> |
| 26 | |
| 27 | which will normally have one or more snapshots based on it. |
Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 28 | Reads will be mapped directly to the backing device. For each write, the |
| 29 | original data will be saved in the <COW device> of each snapshot to keep |
| 30 | its visible content unchanged, at least until the <COW device> fills up. |
| 31 | |
| 32 | |
| 33 | *) snapshot <origin> <COW device> <persistent?> <chunksize> |
| 34 | |
Paolo 'Blaisorblade' Giarrusso | 411f114 | 2005-11-07 01:01:01 -0800 | [diff] [blame] | 35 | A snapshot of the <origin> block device is created. Changed chunks of |
Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 36 | <chunksize> sectors will be stored on the <COW device>. Writes will |
| 37 | only go to the <COW device>. Reads will come from the <COW device> or |
| 38 | from <origin> for unchanged data. <COW device> will often be |
| 39 | smaller than the origin and if it fills up the snapshot will become |
| 40 | useless and be disabled, returning errors. So it is important to monitor |
| 41 | the amount of free space and expand the <COW device> before it fills up. |
| 42 | |
| 43 | <persistent?> is P (Persistent) or N (Not persistent - will not survive |
Mike Snitzer | b0d3cc0 | 2015-10-08 18:05:41 -0400 | [diff] [blame] | 44 | after reboot). O (Overflow) can be added as a persistent store option |
| 45 | to allow userspace to advertise its support for seeing "Overflow" in the |
| 46 | snapshot status. So supported store types are "P", "PO" and "N". |
| 47 | |
| 48 | The difference between persistent and transient is with transient |
| 49 | snapshots less metadata must be saved on disk - they can be kept in |
| 50 | memory by the kernel. |
Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 51 | |
Mikulas Patocka | 424da29 | 2015-12-02 12:32:49 -0500 | [diff] [blame] | 52 | When loading or unloading the snapshot target, the corresponding |
| 53 | snapshot-origin or snapshot-merge target must be suspended. A failure to |
| 54 | suspend the origin target could result in data corruption. |
| 55 | |
Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 56 | |
Mikulas Patocka | d698aa4 | 2009-12-10 23:52:30 +0000 | [diff] [blame] | 57 | * snapshot-merge <origin> <COW device> <persistent> <chunksize> |
| 58 | |
| 59 | takes the same table arguments as the snapshot target except it only |
| 60 | works with persistent snapshots. This target assumes the role of the |
| 61 | "snapshot-origin" target and must not be loaded if the "snapshot-origin" |
| 62 | is still present for <origin>. |
| 63 | |
| 64 | Creates a merging snapshot that takes control of the changed chunks |
| 65 | stored in the <COW device> of an existing snapshot, through a handover |
| 66 | procedure, and merges these chunks back into the <origin>. Once merging |
| 67 | has started (in the background) the <origin> may be opened and the merge |
| 68 | will continue while I/O is flowing to it. Changes to the <origin> are |
| 69 | deferred until the merging snapshot's corresponding chunk(s) have been |
| 70 | merged. Once merging has started the snapshot device, associated with |
| 71 | the "snapshot" target, will return -EIO when accessed. |
| 72 | |
| 73 | |
| 74 | How snapshot is used by LVM2 |
| 75 | ============================ |
Paolo 'Blaisorblade' Giarrusso | e484585 | 2005-09-22 21:44:29 -0700 | [diff] [blame] | 76 | When you create the first LVM2 snapshot of a volume, four dm devices are used: |
| 77 | |
| 78 | 1) a device containing the original mapping table of the source volume; |
| 79 | 2) a device used as the <COW device>; |
| 80 | 3) a "snapshot" device, combining #1 and #2, which is the visible snapshot |
| 81 | volume; |
| 82 | 4) the "original" volume (which uses the device number used by the original |
| 83 | source volume), whose table is replaced by a "snapshot-origin" mapping |
| 84 | from device #1. |
| 85 | |
| 86 | A fixed naming scheme is used, so with the following commands: |
| 87 | |
| 88 | lvcreate -L 1G -n base volumeGroup |
| 89 | lvcreate -L 100M --snapshot -n snap volumeGroup/base |
| 90 | |
| 91 | we'll have this situation (with volumes in above order): |
| 92 | |
| 93 | # dmsetup table|grep volumeGroup |
| 94 | |
| 95 | volumeGroup-base-real: 0 2097152 linear 8:19 384 |
| 96 | volumeGroup-snap-cow: 0 204800 linear 8:19 2097536 |
| 97 | volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16 |
| 98 | volumeGroup-base: 0 2097152 snapshot-origin 254:11 |
| 99 | |
| 100 | # ls -lL /dev/mapper/volumeGroup-* |
| 101 | brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real |
| 102 | brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow |
| 103 | brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap |
| 104 | brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base |
| 105 | |
Mikulas Patocka | d698aa4 | 2009-12-10 23:52:30 +0000 | [diff] [blame] | 106 | |
| 107 | How snapshot-merge is used by LVM2 |
| 108 | ================================== |
| 109 | A merging snapshot assumes the role of the "snapshot-origin" while |
| 110 | merging. As such the "snapshot-origin" is replaced with |
| 111 | "snapshot-merge". The "-real" device is not changed and the "-cow" |
| 112 | device is renamed to <origin name>-cow to aid LVM2's cleanup of the |
| 113 | merging snapshot after it completes. The "snapshot" that hands over its |
| 114 | COW device to the "snapshot-merge" is deactivated (unless using lvchange |
| 115 | --refresh); but if it is left active it will simply return I/O errors. |
| 116 | |
| 117 | A snapshot will merge into its origin with the following command: |
| 118 | |
| 119 | lvconvert --merge volumeGroup/snap |
| 120 | |
| 121 | we'll now have this situation: |
| 122 | |
| 123 | # dmsetup table|grep volumeGroup |
| 124 | |
| 125 | volumeGroup-base-real: 0 2097152 linear 8:19 384 |
| 126 | volumeGroup-base-cow: 0 204800 linear 8:19 2097536 |
| 127 | volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 |
| 128 | |
| 129 | # ls -lL /dev/mapper/volumeGroup-* |
| 130 | brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real |
| 131 | brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow |
| 132 | brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base |
Mike Snitzer | c53a381 | 2010-03-06 02:29:56 +0000 | [diff] [blame] | 133 | |
| 134 | |
| 135 | How to determine when a merging is complete |
| 136 | =========================================== |
| 137 | The snapshot-merge and snapshot status lines end with: |
| 138 | <sectors_allocated>/<total_sectors> <metadata_sectors> |
| 139 | |
| 140 | Both <sectors_allocated> and <total_sectors> include both data and metadata. |
| 141 | During merging, the number of sectors allocated gets smaller and |
| 142 | smaller. Merging has finished when the number of sectors holding data |
| 143 | is zero, in other words <sectors_allocated> == <metadata_sectors>. |
| 144 | |
| 145 | Here is a practical example (using a hybrid of lvm and dmsetup commands): |
| 146 | |
| 147 | # lvs |
| 148 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert |
| 149 | base volumeGroup owi-a- 4.00g |
| 150 | snap volumeGroup swi-a- 1.00g base 18.97 |
| 151 | |
| 152 | # dmsetup status volumeGroup-snap |
| 153 | 0 8388608 snapshot 397896/2097152 1560 |
| 154 | ^^^^ metadata sectors |
| 155 | |
| 156 | # lvconvert --merge -b volumeGroup/snap |
| 157 | Merging of volume snap started. |
| 158 | |
| 159 | # lvs volumeGroup/snap |
| 160 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert |
| 161 | base volumeGroup Owi-a- 4.00g 17.23 |
| 162 | |
| 163 | # dmsetup status volumeGroup-base |
| 164 | 0 8388608 snapshot-merge 281688/2097152 1104 |
| 165 | |
| 166 | # dmsetup status volumeGroup-base |
| 167 | 0 8388608 snapshot-merge 180480/2097152 712 |
| 168 | |
| 169 | # dmsetup status volumeGroup-base |
| 170 | 0 8388608 snapshot-merge 16/2097152 16 |
| 171 | |
| 172 | Merging has finished. |
| 173 | |
| 174 | # lvs |
| 175 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert |
| 176 | base volumeGroup owi-a- 4.00g |