Fred Isaman | 02c35fc | 2010-10-20 00:17:59 -0400 | [diff] [blame] | 1 | Reference counting in pnfs: |
| 2 | ========================== |
| 3 | |
| 4 | The are several inter-related caches. We have layouts which can |
| 5 | reference multiple devices, each of which can reference multiple data servers. |
| 6 | Each data server can be referenced by multiple devices. Each device |
| 7 | can be referenced by multiple layouts. To keep all of this straight, |
| 8 | we need to reference count. |
| 9 | |
| 10 | |
| 11 | struct pnfs_layout_hdr |
| 12 | ---------------------- |
| 13 | The on-the-wire command LAYOUTGET corresponds to struct |
| 14 | pnfs_layout_segment, usually referred to by the variable name lseg. |
Masanari Iida | c9f3f2d | 2013-07-18 01:29:12 +0900 | [diff] [blame] | 15 | Each nfs_inode may hold a pointer to a cache of these layout |
Fred Isaman | 02c35fc | 2010-10-20 00:17:59 -0400 | [diff] [blame] | 16 | segments in nfsi->layout, of type struct pnfs_layout_hdr. |
| 17 | |
| 18 | We reference the header for the inode pointing to it, across each |
| 19 | outstanding RPC call that references it (LAYOUTGET, LAYOUTRETURN, |
| 20 | LAYOUTCOMMIT), and for each lseg held within. |
| 21 | |
| 22 | Each header is also (when non-empty) put on a list associated with |
| 23 | struct nfs_client (cl_layouts). Being put on this list does not bump |
| 24 | the reference count, as the layout is kept around by the lseg that |
| 25 | keeps it in the list. |
| 26 | |
| 27 | deviceid_cache |
| 28 | -------------- |
| 29 | lsegs reference device ids, which are resolved per nfs_client and |
| 30 | layout driver type. The device ids are held in a RCU cache (struct |
| 31 | nfs4_deviceid_cache). The cache itself is referenced across each |
| 32 | mount. The entries (struct nfs4_deviceid) themselves are held across |
| 33 | the lifetime of each lseg referencing them. |
| 34 | |
| 35 | RCU is used because the deviceid is basically a write once, read many |
| 36 | data structure. The hlist size of 32 buckets needs better |
| 37 | justification, but seems reasonable given that we can have multiple |
| 38 | deviceid's per filesystem, and multiple filesystems per nfs_client. |
| 39 | |
| 40 | The hash code is copied from the nfsd code base. A discussion of |
| 41 | hashing and variations of this algorithm can be found at: |
| 42 | http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809 |
| 43 | |
| 44 | data server cache |
| 45 | ----------------- |
| 46 | file driver devices refer to data servers, which are kept in a module |
| 47 | level cache. Its reference is held over the lifetime of the deviceid |
| 48 | pointing to it. |
Fred Isaman | 80fe2b1 | 2011-03-01 01:34:23 +0000 | [diff] [blame] | 49 | |
| 50 | lseg |
| 51 | ---- |
| 52 | lseg maintains an extra reference corresponding to the NFS_LSEG_VALID |
| 53 | bit which holds it in the pnfs_layout_hdr's list. When the final lseg |
| 54 | is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED |
| 55 | bit is set, preventing any new lsegs from being added. |
Sachin Bhamare | 18d98f6 | 2012-03-19 20:47:58 -0700 | [diff] [blame] | 56 | |
| 57 | layout drivers |
| 58 | -------------- |
| 59 | |
Tom Haynes | 8f9cdcb | 2015-01-12 11:51:45 -0800 | [diff] [blame] | 60 | PNFS utilizes what is called layout drivers. The STD defines 4 basic |
| 61 | layout types: "files", "objects", "blocks", and "flexfiles". For each |
| 62 | of these types there is a layout-driver with a common function-vectors |
| 63 | table which are called by the nfs-client pnfs-core to implement the |
| 64 | different layout types. |
Sachin Bhamare | 18d98f6 | 2012-03-19 20:47:58 -0700 | [diff] [blame] | 65 | |
Tom Haynes | 8f9cdcb | 2015-01-12 11:51:45 -0800 | [diff] [blame] | 66 | Files-layout-driver code is in: fs/nfs/filelayout/.. directory |
Masanari Iida | 0d6f3eb | 2016-02-18 12:26:13 +0900 | [diff] [blame^] | 67 | Objects-layout-driver code is in: fs/nfs/objlayout/.. directory |
| 68 | Blocks-layout-driver code is in: fs/nfs/blocklayout/.. directory |
Tom Haynes | 8f9cdcb | 2015-01-12 11:51:45 -0800 | [diff] [blame] | 69 | Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory |
Sachin Bhamare | 18d98f6 | 2012-03-19 20:47:58 -0700 | [diff] [blame] | 70 | |
| 71 | objects-layout setup |
| 72 | -------------------- |
| 73 | |
| 74 | As part of the full STD implementation the objlayoutdriver.ko needs, at times, |
| 75 | to automatically login to yet undiscovered iscsi/osd devices. For this the |
| 76 | driver makes up-calles to a user-mode script called *osd_login* |
| 77 | |
| 78 | The path_name of the script to use is by default: |
| 79 | /sbin/osd_login. |
| 80 | This name can be overridden by the Kernel module parameter: |
| 81 | objlayoutdriver.osd_login_prog |
| 82 | |
| 83 | If Kernel does not find the osd_login_prog path it will zero it out |
| 84 | and will not attempt farther logins. An admin can then write new value |
| 85 | to the objlayoutdriver.osd_login_prog Kernel parameter to re-enable it. |
| 86 | |
| 87 | The /sbin/osd_login is part of the nfs-utils package, and should usually |
| 88 | be installed on distributions that support this Kernel version. |
| 89 | |
| 90 | The API to the login script is as follows: |
| 91 | Usage: $0 -u <URI> -o <OSDNAME> -s <SYSTEMID> |
| 92 | Options: |
| 93 | -u target uri e.g. iscsi://<ip>:<port> |
Masanari Iida | 0d6f3eb | 2016-02-18 12:26:13 +0900 | [diff] [blame^] | 94 | (always exists) |
Sachin Bhamare | 18d98f6 | 2012-03-19 20:47:58 -0700 | [diff] [blame] | 95 | (More protocols can be defined in the future. |
| 96 | The client does not interpret this string it is |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 97 | passed unchanged as received from the Server) |
Sachin Bhamare | 18d98f6 | 2012-03-19 20:47:58 -0700 | [diff] [blame] | 98 | -o osdname of the requested target OSD |
| 99 | (Might be empty) |
| 100 | (A string which denotes the OSD name, there is a |
| 101 | limit of 64 chars on this string) |
| 102 | -s systemid of the requested target OSD |
| 103 | (Might be empty) |
| 104 | (This string, if not empty is always an hex |
| 105 | representation of the 20 bytes osd_system_id) |
| 106 | |
| 107 | blocks-layout setup |
| 108 | ------------------- |
| 109 | |
| 110 | TODO: Document the setup needs of the blocks layout driver |