Joe Thornber | 3241b1d | 2011-10-31 20:19:11 +0000 | [diff] [blame^] | 1 | Introduction |
| 2 | ============ |
| 3 | |
| 4 | The more-sophisticated device-mapper targets require complex metadata |
| 5 | that is managed in kernel. In late 2010 we were seeing that various |
| 6 | different targets were rolling their own data strutures, for example: |
| 7 | |
| 8 | - Mikulas Patocka's multisnap implementation |
| 9 | - Heinz Mauelshagen's thin provisioning target |
| 10 | - Another btree-based caching target posted to dm-devel |
| 11 | - Another multi-snapshot target based on a design of Daniel Phillips |
| 12 | |
| 13 | Maintaining these data structures takes a lot of work, so if possible |
| 14 | we'd like to reduce the number. |
| 15 | |
| 16 | The persistent-data library is an attempt to provide a re-usable |
| 17 | framework for people who want to store metadata in device-mapper |
| 18 | targets. It's currently used by the thin-provisioning target and an |
| 19 | upcoming hierarchical storage target. |
| 20 | |
| 21 | Overview |
| 22 | ======== |
| 23 | |
| 24 | The main documentation is in the header files which can all be found |
| 25 | under drivers/md/persistent-data. |
| 26 | |
| 27 | The block manager |
| 28 | ----------------- |
| 29 | |
| 30 | dm-block-manager.[hc] |
| 31 | |
| 32 | This provides access to the data on disk in fixed sized-blocks. There |
| 33 | is a read/write locking interface to prevent concurrent accesses, and |
| 34 | keep data that is being used in the cache. |
| 35 | |
| 36 | Clients of persistent-data are unlikely to use this directly. |
| 37 | |
| 38 | The transaction manager |
| 39 | ----------------------- |
| 40 | |
| 41 | dm-transaction-manager.[hc] |
| 42 | |
| 43 | This restricts access to blocks and enforces copy-on-write semantics. |
| 44 | The only way you can get hold of a writable block through the |
| 45 | transaction manager is by shadowing an existing block (ie. doing |
| 46 | copy-on-write) or allocating a fresh one. Shadowing is elided within |
| 47 | the same transaction so performance is reasonable. The commit method |
| 48 | ensures that all data is flushed before it writes the superblock. |
| 49 | On power failure your metadata will be as it was when last committed. |
| 50 | |
| 51 | The Space Maps |
| 52 | -------------- |
| 53 | |
| 54 | dm-space-map.h |
| 55 | dm-space-map-metadata.[hc] |
| 56 | dm-space-map-disk.[hc] |
| 57 | |
| 58 | On-disk data structures that keep track of reference counts of blocks. |
| 59 | Also acts as the allocator of new blocks. Currently two |
| 60 | implementations: a simpler one for managing blocks on a different |
| 61 | device (eg. thinly-provisioned data blocks); and one for managing |
| 62 | the metadata space. The latter is complicated by the need to store |
| 63 | its own data within the space it's managing. |
| 64 | |
| 65 | The data structures |
| 66 | ------------------- |
| 67 | |
| 68 | dm-btree.[hc] |
| 69 | dm-btree-remove.c |
| 70 | dm-btree-spine.c |
| 71 | dm-btree-internal.h |
| 72 | |
| 73 | Currently there is only one data structure, a hierarchical btree. |
| 74 | There are plans to add more. For example, something with an |
| 75 | array-like interface would see a lot of use. |
| 76 | |
| 77 | The btree is 'hierarchical' in that you can define it to be composed |
| 78 | of nested btrees, and take multiple keys. For example, the |
| 79 | thin-provisioning target uses a btree with two levels of nesting. |
| 80 | The first maps a device id to a mapping tree, and that in turn maps a |
| 81 | virtual block to a physical block. |
| 82 | |
| 83 | Values stored in the btrees can have arbitrary size. Keys are always |
| 84 | 64bits, although nesting allows you to use multiple keys. |