Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | =================== |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 4 | The QNX6 Filesystem |
| 5 | =================== |
| 6 | |
| 7 | The qnx6fs is used by newer QNX operating system versions. (e.g. Neutrino) |
| 8 | It got introduced in QNX 6.4.0 and is used default since 6.4.1. |
| 9 | |
| 10 | Option |
| 11 | ====== |
| 12 | |
| 13 | mmi_fs Mount filesystem as used for example by Audi MMI 3G system |
| 14 | |
| 15 | Specification |
| 16 | ============= |
| 17 | |
| 18 | qnx6fs shares many properties with traditional Unix filesystems. It has the |
| 19 | concepts of blocks, inodes and directories. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 20 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 21 | On QNX it is possible to create little endian and big endian qnx6 filesystems. |
| 22 | This feature makes it possible to create and use a different endianness fs |
Kees Cook | 0855965 | 2016-04-26 16:41:21 -0700 | [diff] [blame] | 23 | for the target (QNX is used on quite a range of embedded systems) platform |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 24 | running on a different endianness. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 25 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 26 | The Linux driver handles endianness transparently. (LE and BE) |
| 27 | |
| 28 | Blocks |
| 29 | ------ |
| 30 | |
| 31 | The space in the device or file is split up into blocks. These are a fixed |
| 32 | size of 512, 1024, 2048 or 4096, which is decided when the filesystem is |
| 33 | created. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 34 | |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 35 | Blockpointers are 32bit, so the maximum space that can be addressed is |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 36 | 2^32 * 4096 bytes or 16TB |
| 37 | |
| 38 | The superblocks |
| 39 | --------------- |
| 40 | |
| 41 | The superblock contains all global information about the filesystem. |
| 42 | Each qnx6fs got two superblocks, each one having a 64bit serial number. |
| 43 | That serial number is used to identify the "active" superblock. |
| 44 | In write mode with reach new snapshot (after each synchronous write), the |
| 45 | serial of the new master superblock is increased (old superblock serial + 1) |
| 46 | |
| 47 | So basically the snapshot functionality is realized by an atomic final |
| 48 | update of the serial number. Before updating that serial, all modifications |
| 49 | are done by copying all modified blocks during that specific write request |
| 50 | (or period) and building up a new (stable) filesystem structure under the |
| 51 | inactive superblock. |
| 52 | |
| 53 | Each superblock holds a set of root inodes for the different filesystem |
| 54 | parts. (Inode, Bitmap and Longfilenames) |
| 55 | Each of these root nodes holds information like total size of the stored |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 56 | data and the addressing levels in that specific tree. |
| 57 | If the level value is 0, up to 16 direct blocks can be addressed by each |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 58 | node. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 59 | |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 60 | Level 1 adds an additional indirect addressing level where each indirect |
| 61 | addressing block holds up to blocksize / 4 bytes pointers to data blocks. |
| 62 | Level 2 adds an additional indirect addressing block level (so, already up |
| 63 | to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree). |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 64 | |
| 65 | Unused block pointers are always set to ~0 - regardless of root node, |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 66 | indirect addressing blocks or inodes. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 67 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 68 | Data leaves are always on the lowest level. So no data is stored on upper |
| 69 | tree levels. |
| 70 | |
| 71 | The first Superblock is located at 0x2000. (0x2000 is the bootblock size) |
| 72 | The Audi MMI 3G first superblock directly starts at byte 0. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 73 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 74 | Second superblock position can either be calculated from the superblock |
| 75 | information (total number of filesystem blocks) or by taking the highest |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 76 | device address, zeroing the last 3 bytes and then subtracting 0x1000 from |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 77 | that address. |
| 78 | |
| 79 | 0x1000 is the size reserved for each superblock - regardless of the |
| 80 | blocksize of the filesystem. |
| 81 | |
| 82 | Inodes |
| 83 | ------ |
| 84 | |
| 85 | Each object in the filesystem is represented by an inode. (index node) |
| 86 | The inode structure contains pointers to the filesystem blocks which contain |
| 87 | the data held in the object and all of the metadata about an object except |
| 88 | its longname. (filenames longer than 27 characters) |
| 89 | The metadata about an object includes the permissions, owner, group, flags, |
| 90 | size, number of blocks used, access time, change time and modification time. |
| 91 | |
| 92 | Object mode field is POSIX format. (which makes things easier) |
| 93 | |
| 94 | There are also pointers to the first 16 blocks, if the object data can be |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 95 | addressed with 16 direct blocks. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 96 | |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 97 | For more than 16 blocks an indirect addressing in form of another tree is |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 98 | used. (scheme is the same as the one used for the superblock root nodes) |
| 99 | |
Will Deacon | 806654a | 2018-11-19 11:02:45 +0000 | [diff] [blame] | 100 | The filesize is stored 64bit. Inode counting starts with 1. (while long |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 101 | filename inodes start with 0) |
| 102 | |
| 103 | Directories |
| 104 | ----------- |
| 105 | |
| 106 | A directory is a filesystem object and has an inode just like a file. |
| 107 | It is a specially formatted file containing records which associate each |
| 108 | name with an inode number. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 109 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 110 | '.' inode number points to the directory inode |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 111 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 112 | '..' inode number points to the parent directory inode |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 113 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 114 | Eeach filename record additionally got a filename length field. |
| 115 | |
| 116 | One special case are long filenames or subdirectory names. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 117 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 118 | These got set a filename length field of 0xff in the corresponding directory |
| 119 | record plus the longfile inode number also stored in that record. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 120 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 121 | With that longfilename inode number, the longfilename tree can be walked |
| 122 | starting with the superblock longfilename root node pointers. |
| 123 | |
| 124 | Special files |
| 125 | ------------- |
| 126 | |
| 127 | Symbolic links are also filesystem objects with inodes. They got a specific |
| 128 | bit in the inode mode field identifying them as symbolic link. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 129 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 130 | The directory entry file inode pointer points to the target file inode. |
| 131 | |
| 132 | Hard links got an inode, a directory entry, but a specific mode bit set, |
| 133 | no block pointers and the directory file record pointing to the target file |
| 134 | inode. |
| 135 | |
| 136 | Character and block special devices do not exist in QNX as those files |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 137 | are handled by the QNX kernel/drivers and created in /dev independent of the |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 138 | underlaying filesystem. |
| 139 | |
| 140 | Long filenames |
| 141 | -------------- |
| 142 | |
Masanari Iida | c94bed8e | 2012-04-10 00:22:13 +0900 | [diff] [blame] | 143 | Long filenames are stored in a separate addressing tree. The staring point |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 144 | is the longfilename root node in the active superblock. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 145 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 146 | Each data block (tree leaves) holds one long filename. That filename is |
| 147 | limited to 510 bytes. The first two starting bytes are used as length field |
| 148 | for the actual filename. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 149 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 150 | If that structure shall fit for all allowed blocksizes, it is clear why there |
| 151 | is a limit of 510 bytes for the actual filename stored. |
| 152 | |
| 153 | Bitmap |
| 154 | ------ |
| 155 | |
| 156 | The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap |
| 157 | root node in the superblock and each bit in the bitmap represents one |
| 158 | filesystem block. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 159 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 160 | The first block is block 0, which starts 0x1000 after superblock start. |
| 161 | So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical |
| 162 | address at which block 0 is located. |
| 163 | |
| 164 | Bits at the end of the last bitmap block are set to 1, if the device is |
| 165 | smaller than addressing space in the bitmap. |
| 166 | |
| 167 | Bitmap system area |
| 168 | ------------------ |
| 169 | |
Anatol Pomozov | f884ab1 | 2013-05-08 16:56:16 -0700 | [diff] [blame] | 170 | The bitmap itself is divided into three parts. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 171 | |
Masanari Iida | 9ed354b | 2013-08-20 20:33:17 +0900 | [diff] [blame] | 172 | First the system area, that is split into two halves. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 173 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 174 | Then userspace. |
| 175 | |
| 176 | The requirement for a static, fixed preallocated system area comes from how |
| 177 | qnx6fs deals with writes. |
Mauro Carvalho Chehab | d5eefa2 | 2020-02-17 17:12:19 +0100 | [diff] [blame] | 178 | |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 179 | Each superblock got it's own half of the system area. So superblock #1 |
Will Deacon | 806654a | 2018-11-19 11:02:45 +0000 | [diff] [blame] | 180 | always uses blocks from the lower half while superblock #2 just writes to |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 181 | blocks represented by the upper half bitmap system area bits. |
| 182 | |
| 183 | Bitmap blocks, Inode blocks and indirect addressing blocks for those two |
| 184 | tree structures are treated as system blocks. |
| 185 | |
| 186 | The rational behind that is that a write request can work on a new snapshot |
| 187 | (system area of the inactive - resp. lower serial numbered superblock) while |
Vilhelm Prytz | 86f93e7 | 2020-04-06 10:33:34 +0200 | [diff] [blame^] | 188 | at the same time there is still a complete stable filesystem structure in the |
Kai Bankett | 5d026c7 | 2012-02-17 05:59:20 +0100 | [diff] [blame] | 189 | other half of the system area. |
| 190 | |
| 191 | When finished with writing (a sync write is completed, the maximum sync leap |
| 192 | time or a filesystem sync is requested), serial of the previously inactive |
| 193 | superblock atomically is increased and the fs switches over to that - then |
| 194 | stable declared - superblock. |
| 195 | |
| 196 | For all data outside the system area, blocks are just copied while writing. |