Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 1 | =============================================================================== |
| 2 | WHAT IS EXOFS? |
| 3 | =============================================================================== |
| 4 | |
| 5 | exofs is a file system that uses an OSD and exports the API of a normal Linux |
| 6 | file system. Users access exofs like any other local file system, and exofs |
| 7 | will in turn issue commands to the local OSD initiator. |
| 8 | |
| 9 | OSD is a new T10 command set that views storage devices not as a large/flat |
| 10 | array of sectors but as a container of objects, each having a length, quota, |
| 11 | time attributes and more. Each object is addressed by a 64bit ID, and is |
| 12 | contained in a 64bit ID partition. Each object has associated attributes |
| 13 | attached to it, which are integral part of the object and provide metadata about |
| 14 | the object. The standard defines some common obligatory attributes, but user |
| 15 | attributes can be added as needed. |
| 16 | |
| 17 | =============================================================================== |
| 18 | ENVIRONMENT |
| 19 | =============================================================================== |
| 20 | |
| 21 | To use this file system, you need to have an object store to run it on. You |
| 22 | may download a target from: |
| 23 | http://open-osd.org |
| 24 | |
| 25 | See Documentation/scsi/osd.txt for how to setup a working osd environment. |
| 26 | |
| 27 | =============================================================================== |
| 28 | USAGE |
| 29 | =============================================================================== |
| 30 | |
| 31 | 1. Download and compile exofs and open-osd initiator: |
| 32 | You need an external Kernel source tree or kernel headers from your |
| 33 | distribution. (anything based on 2.6.26 or later). |
| 34 | |
| 35 | a. download open-osd including exofs source using: |
| 36 | [parent-directory]$ git clone git://git.open-osd.org/open-osd.git |
| 37 | |
| 38 | b. Build the library module like this: |
| 39 | [parent-directory]$ make -C KSRC=$(KER_DIR) open-osd |
| 40 | |
| 41 | This will build both the open-osd initiator as well as the exofs kernel |
| 42 | module. Use whatever parameters you compiled your Kernel with and |
| 43 | $(KER_DIR) above pointing to the Kernel you compile against. See the file |
| 44 | open-osd/top-level-Makefile for an example. |
| 45 | |
| 46 | 2. Get the OSD initiator and target set up properly, and login to the target. |
| 47 | See Documentation/scsi/osd.txt for farther instructions. Also see ./do-osd |
| 48 | for example script that does all these steps. |
| 49 | |
| 50 | 3. Insmod the exofs.ko module: |
| 51 | [exofs]$ insmod exofs.ko |
| 52 | |
| 53 | 4. Make sure the directory where you want to mount exists. If not, create it. |
| 54 | (For example, mkdir /mnt/exofs) |
| 55 | |
| 56 | 5. At first run you will need to invoke the mkfs.exofs application |
| 57 | |
| 58 | As an example, this will create the file system on: |
| 59 | /dev/osd0 partition ID 65536 |
| 60 | |
| 61 | mkfs.exofs --pid=65536 --format /dev/osd0 |
| 62 | |
Thadeu Lima de Souza Cascardo | 9f24916 | 2009-07-27 13:26:32 -0300 | [diff] [blame] | 63 | The --format is optional. If not specified, no OSD_FORMAT will be |
| 64 | performed and a clean file system will be created in the specified pid, |
Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 65 | in the available space of the target. (Use --format=size_in_meg to limit |
| 66 | the total LUN space available) |
| 67 | |
Thadeu Lima de Souza Cascardo | 9f24916 | 2009-07-27 13:26:32 -0300 | [diff] [blame] | 68 | If pid already exists, it will be deleted and a new one will be created in |
| 69 | its place. Be careful. |
Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 70 | |
| 71 | An exofs lives inside a single OSD partition. You can create multiple exofs |
| 72 | filesystems on the same device using multiple pids. |
| 73 | |
| 74 | (run mkfs.exofs without any parameters for usage help message) |
| 75 | |
| 76 | 6. Mount the file system. |
| 77 | |
| 78 | For example, to mount /dev/osd0, partition ID 0x10000 on /mnt/exofs: |
| 79 | |
| 80 | mount -t exofs -o pid=65536 /dev/osd0 /mnt/exofs/ |
| 81 | |
| 82 | 7. For reference (See do-exofs example script): |
| 83 | do-exofs start - an example of how to perform the above steps. |
Thadeu Lima de Souza Cascardo | 9f24916 | 2009-07-27 13:26:32 -0300 | [diff] [blame] | 84 | do-exofs stop - an example of how to unmount the file system. |
Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 85 | do-exofs format - an example of how to format and mkfs a new exofs. |
| 86 | |
| 87 | 8. Extra compilation flags (uncomment in fs/exofs/Kbuild): |
| 88 | CONFIG_EXOFS_DEBUG - for debug messages and extra checks. |
| 89 | |
| 90 | =============================================================================== |
| 91 | exofs mount options |
| 92 | =============================================================================== |
| 93 | Similar to any mount command: |
| 94 | mount -t exofs -o exofs_options /dev/osdX mount_exofs_directory |
| 95 | |
| 96 | Where: |
| 97 | -t exofs: specifies the exofs file system |
| 98 | |
| 99 | /dev/osdX: X is a decimal number. /dev/osdX was created after a successful |
| 100 | login into an OSD target. |
| 101 | |
| 102 | mount_exofs_directory: The directory to mount the file system on |
| 103 | |
| 104 | exofs specific options: Options are separated by commas (,) |
| 105 | pid=<integer> - The partition number to mount/create as |
| 106 | container of the filesystem. |
Boaz Harrosh | 9ed9648 | 2011-01-31 14:32:14 +0200 | [diff] [blame^] | 107 | This option is mandatory. integer can be |
| 108 | Hex by pre-pending an 0x to the number. |
| 109 | osdname=<id> - Mount by a device's osdname. |
| 110 | osdname is usually a 36 character uuid of the |
| 111 | form "d2683732-c906-4ee1-9dbd-c10c27bb40df". |
| 112 | It is one of the device's uuid specified in the |
| 113 | mkfs.exofs format command. |
| 114 | If this option is specified then the /dev/osdX |
| 115 | above can be empty and is ignored. |
Thadeu Lima de Souza Cascardo | 9f24916 | 2009-07-27 13:26:32 -0300 | [diff] [blame] | 116 | to=<integer> - Timeout in ticks for a single command. |
Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 117 | default is (60 * HZ) [for debugging only] |
| 118 | |
| 119 | =============================================================================== |
| 120 | DESIGN |
| 121 | =============================================================================== |
| 122 | |
| 123 | * The file system control block (AKA on-disk superblock) resides in an object |
| 124 | with a special ID (defined in common.h). |
| 125 | Information included in the file system control block is used to fill the |
| 126 | in-memory superblock structure at mount time. This object is created before |
Thadeu Lima de Souza Cascardo | 9f24916 | 2009-07-27 13:26:32 -0300 | [diff] [blame] | 127 | the file system is used by mkexofs.c. It contains information such as: |
Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 128 | - The file system's magic number |
| 129 | - The next inode number to be allocated |
| 130 | |
| 131 | * Each file resides in its own object and contains the data (and it will be |
| 132 | possible to extend the file over multiple objects, though this has not been |
| 133 | implemented yet). |
| 134 | |
| 135 | * A directory is treated as a file, and essentially contains a list of <file |
| 136 | name, inode #> pairs for files that are found in that directory. The object |
| 137 | IDs correspond to the files' inode numbers and will be allocated according to |
| 138 | a bitmap (stored in a separate object). Now they are allocated using a |
| 139 | counter. |
| 140 | |
| 141 | * Each file's control block (AKA on-disk inode) is stored in its object's |
| 142 | attributes. This applies to both regular files and other types (directories, |
| 143 | device files, symlinks, etc.). |
| 144 | |
Thadeu Lima de Souza Cascardo | 9f24916 | 2009-07-27 13:26:32 -0300 | [diff] [blame] | 145 | * Credentials are generated per object (inode and superblock) when they are |
| 146 | created in memory (read from disk or created). The credential works for all |
Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 147 | operations and is used as long as the object remains in memory. |
| 148 | |
| 149 | * Async OSD operations are used whenever possible, but the target may execute |
| 150 | them out of order. The operations that concern us are create, delete, |
| 151 | readpage, writepage, update_inode, and truncate. The following pairs of |
| 152 | operations should execute in the order written, and we need to prevent them |
| 153 | from executing in reverse order: |
| 154 | - The following are handled with the OBJ_CREATED and OBJ_2BCREATED |
| 155 | flags. OBJ_CREATED is set when we know the object exists on the OSD - |
Thadeu Lima de Souza Cascardo | 9f24916 | 2009-07-27 13:26:32 -0300 | [diff] [blame] | 156 | in create's callback function, and when we successfully do a |
| 157 | read_inode. |
Boaz Harrosh | 214c8ad | 2008-10-28 17:22:01 +0200 | [diff] [blame] | 158 | OBJ_2BCREATED is set in the beginning of the create function, so we |
| 159 | know that we should wait. |
| 160 | - create/delete: delete should wait until the object is created |
| 161 | on the OSD. |
| 162 | - create/readpage: readpage should be able to return a page |
| 163 | full of zeroes in this case. If there was a write already |
| 164 | en-route (i.e. create, writepage, readpage) then the page |
| 165 | would be locked, and so it would really be the same as |
| 166 | create/writepage. |
| 167 | - create/writepage: if writepage is called for a sync write, it |
| 168 | should wait until the object is created on the OSD. |
| 169 | Otherwise, it should just return. |
| 170 | - create/truncate: truncate should wait until the object is |
| 171 | created on the OSD. |
| 172 | - create/update_inode: update_inode should wait until the |
| 173 | object is created on the OSD. |
| 174 | - Handled by VFS locks: |
| 175 | - readpage/delete: shouldn't happen because of page lock. |
| 176 | - writepage/delete: shouldn't happen because of page lock. |
| 177 | - readpage/writepage: shouldn't happen because of page lock. |
| 178 | |
| 179 | =============================================================================== |
| 180 | LICENSE/COPYRIGHT |
| 181 | =============================================================================== |
| 182 | The exofs file system is based on ext2 v0.5b (distributed with the Linux kernel |
| 183 | version 2.6.10). All files include the original copyrights, and the license |
| 184 | is GPL version 2 (only version 2, as is true for the Linux kernel). The |
| 185 | Linux kernel can be downloaded from www.kernel.org. |