Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 1 | =========================================================== |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 2 | LZO stream format as understood by Linux's LZO decompressor |
| 3 | =========================================================== |
| 4 | |
| 5 | Introduction |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 6 | ============ |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 7 | |
| 8 | This is not a specification. No specification seems to be publicly available |
| 9 | for the LZO stream format. This document describes what input format the LZO |
| 10 | decompressor as implemented in the Linux kernel understands. The file subject |
| 11 | of this analysis is lib/lzo/lzo1x_decompress_safe.c. No analysis was made on |
| 12 | the compressor nor on any other implementations though it seems likely that |
| 13 | the format matches the standard one. The purpose of this document is to |
| 14 | better understand what the code does in order to propose more efficient fixes |
| 15 | for future bug reports. |
| 16 | |
| 17 | Description |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 18 | =========== |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 19 | |
| 20 | The stream is composed of a series of instructions, operands, and data. The |
| 21 | instructions consist in a few bits representing an opcode, and bits forming |
| 22 | the operands for the instruction, whose size and position depend on the |
| 23 | opcode and on the number of literals copied by previous instruction. The |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 24 | operands are used to indicate: |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 25 | |
| 26 | - a distance when copying data from the dictionary (past output buffer) |
| 27 | - a length (number of bytes to copy from dictionary) |
| 28 | - the number of literals to copy, which is retained in variable "state" |
| 29 | as a piece of information for next instructions. |
| 30 | |
| 31 | Optionally depending on the opcode and operands, extra data may follow. These |
| 32 | extra data can be a complement for the operand (eg: a length or a distance |
| 33 | encoded on larger values), or a literal to be copied to the output buffer. |
| 34 | |
| 35 | The first byte of the block follows a different encoding from other bytes, it |
| 36 | seems to be optimized for literal use only, since there is no dictionary yet |
| 37 | prior to that byte. |
| 38 | |
| 39 | Lengths are always encoded on a variable size starting with a small number |
| 40 | of bits in the operand. If the number of bits isn't enough to represent the |
| 41 | length, up to 255 may be added in increments by consuming more bytes with a |
| 42 | rate of at most 255 per extra byte (thus the compression ratio cannot exceed |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 43 | around 255:1). The variable length encoding using #bits is always the same:: |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 44 | |
| 45 | length = byte & ((1 << #bits) - 1) |
| 46 | if (!length) { |
| 47 | length = ((1 << #bits) - 1) |
| 48 | length += 255*(number of zero bytes) |
| 49 | length += first-non-zero-byte |
| 50 | } |
| 51 | length += constant (generally 2 or 3) |
| 52 | |
| 53 | For references to the dictionary, distances are relative to the output |
| 54 | pointer. Distances are encoded using very few bits belonging to certain |
| 55 | ranges, resulting in multiple copy instructions using different encodings. |
| 56 | Certain encodings involve one extra byte, others involve two extra bytes |
| 57 | forming a little-endian 16-bit quantity (marked LE16 below). |
| 58 | |
| 59 | After any instruction except the large literal copy, 0, 1, 2 or 3 literals |
| 60 | are copied before starting the next instruction. The number of literals that |
| 61 | were copied may change the meaning and behaviour of the next instruction. In |
| 62 | practice, only one instruction needs to know whether 0, less than 4, or more |
| 63 | literals were copied. This is the information stored in the <state> variable |
| 64 | in this implementation. This number of immediate literals to be copied is |
| 65 | generally encoded in the last two bits of the instruction but may also be |
| 66 | taken from the last two bits of an extra operand (eg: distance). |
| 67 | |
| 68 | End of stream is declared when a block copy of distance 0 is seen. Only one |
| 69 | instruction may encode this distance (0001HLLL), it takes one LE16 operand |
| 70 | for the distance, thus requiring 3 bytes. |
| 71 | |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 72 | .. important:: |
| 73 | |
| 74 | In the code some length checks are missing because certain instructions |
| 75 | are called under the assumption that a certain number of bytes follow |
| 76 | because it has already been guaranteed before parsing the instructions. |
| 77 | They just have to "refill" this credit if they consume extra bytes. This |
| 78 | is an implementation design choice independent on the algorithm or |
| 79 | encoding. |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 80 | |
Dave Rodgman | 5ee4014 | 2019-03-07 16:30:40 -0800 | [diff] [blame] | 81 | Versions |
| 82 | |
| 83 | 0: Original version |
| 84 | 1: LZO-RLE |
| 85 | |
| 86 | Version 1 of LZO implements an extension to encode runs of zeros using run |
| 87 | length encoding. This improves speed for data with many zeros, which is a |
| 88 | common case for zram. This modifies the bitstream in a backwards compatible way |
| 89 | (v1 can correctly decompress v0 compressed data, but v0 cannot read v1 data). |
| 90 | |
Dave Rodgman | 45ec975 | 2019-03-07 16:30:44 -0800 | [diff] [blame] | 91 | For maximum compatibility, both versions are available under different names |
| 92 | (lzo and lzo-rle). Differences in the encoding are noted in this document with |
| 93 | e.g.: version 1 only. |
| 94 | |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 95 | Byte sequences |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 96 | ============== |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 97 | |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 98 | First byte encoding:: |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 99 | |
Dave Rodgman | 5ee4014 | 2019-03-07 16:30:40 -0800 | [diff] [blame] | 100 | 0..16 : follow regular instruction encoding, see below. It is worth |
| 101 | noting that code 16 will represent a block copy from the |
| 102 | dictionary which is empty, and that it will always be |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 103 | invalid at this place. |
| 104 | |
Dave Rodgman | b11ed18 | 2019-04-05 18:38:58 -0700 | [diff] [blame] | 105 | 17 : bitstream version. If the first byte is 17, and compressed |
| 106 | stream length is at least 5 bytes (length of shortest possible |
| 107 | versioned bitstream), the next byte gives the bitstream version |
| 108 | (version 1 only). |
| 109 | Otherwise, the bitstream version is 0. |
Dave Rodgman | 5ee4014 | 2019-03-07 16:30:40 -0800 | [diff] [blame] | 110 | |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 111 | 18..21 : copy 0..3 literals |
| 112 | state = (byte - 17) = 0..3 [ copy <state> literals ] |
| 113 | skip byte |
| 114 | |
| 115 | 22..255 : copy literal string |
| 116 | length = (byte - 17) = 4..238 |
| 117 | state = 4 [ don't copy extra literals ] |
| 118 | skip byte |
| 119 | |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 120 | Instruction encoding:: |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 121 | |
| 122 | 0 0 0 0 X X X X (0..15) |
| 123 | Depends on the number of literals copied by the last instruction. |
| 124 | If last instruction did not copy any literal (state == 0), this |
| 125 | encoding will be a copy of 4 or more literal, and must be interpreted |
| 126 | like this : |
| 127 | |
| 128 | 0 0 0 0 L L L L (0..15) : copy long literal string |
| 129 | length = 3 + (L ?: 15 + (zero_bytes * 255) + non_zero_byte) |
| 130 | state = 4 (no extra literals are copied) |
| 131 | |
| 132 | If last instruction used to copy between 1 to 3 literals (encoded in |
| 133 | the instruction's opcode or distance), the instruction is a copy of a |
| 134 | 2-byte block from the dictionary within a 1kB distance. It is worth |
| 135 | noting that this instruction provides little savings since it uses 2 |
| 136 | bytes to encode a copy of 2 other bytes but it encodes the number of |
| 137 | following literals for free. It must be interpreted like this : |
| 138 | |
| 139 | 0 0 0 0 D D S S (0..15) : copy 2 bytes from <= 1kB distance |
| 140 | length = 2 |
| 141 | state = S (copy S literals after this block) |
| 142 | Always followed by exactly one byte : H H H H H H H H |
| 143 | distance = (H << 2) + D + 1 |
| 144 | |
| 145 | If last instruction used to copy 4 or more literals (as detected by |
| 146 | state == 4), the instruction becomes a copy of a 3-byte block from the |
| 147 | dictionary from a 2..3kB distance, and must be interpreted like this : |
| 148 | |
| 149 | 0 0 0 0 D D S S (0..15) : copy 3 bytes from 2..3 kB distance |
| 150 | length = 3 |
| 151 | state = S (copy S literals after this block) |
| 152 | Always followed by exactly one byte : H H H H H H H H |
| 153 | distance = (H << 2) + D + 2049 |
| 154 | |
| 155 | 0 0 0 1 H L L L (16..31) |
| 156 | Copy of a block within 16..48kB distance (preferably less than 10B) |
| 157 | length = 2 + (L ?: 7 + (zero_bytes * 255) + non_zero_byte) |
| 158 | Always followed by exactly one LE16 : D D D D D D D D : D D D D D D S S |
| 159 | distance = 16384 + (H << 14) + D |
| 160 | state = S (copy S literals after this block) |
| 161 | End of stream is reached if distance == 16384 |
Dave Rodgman | b5265c8 | 2020-06-11 17:34:54 -0700 | [diff] [blame] | 162 | In version 1 only, to prevent ambiguity with the RLE case when |
| 163 | ((distance & 0x803f) == 0x803f) && (261 <= length <= 264), the |
| 164 | compressor must not emit block copies where distance and length |
| 165 | meet these conditions. |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 166 | |
Dave Rodgman | 45ec975 | 2019-03-07 16:30:44 -0800 | [diff] [blame] | 167 | In version 1 only, this instruction is also used to encode a run of |
Dave Rodgman | b5265c8 | 2020-06-11 17:34:54 -0700 | [diff] [blame] | 168 | zeros if distance = 0xbfff, i.e. H = 1 and the D bits are all 1. |
Dave Rodgman | 5ee4014 | 2019-03-07 16:30:40 -0800 | [diff] [blame] | 169 | In this case, it is followed by a fourth byte, X. |
Dave Rodgman | b5265c8 | 2020-06-11 17:34:54 -0700 | [diff] [blame] | 170 | run length = ((X << 3) | (0 0 0 0 0 L L L)) + 4 |
Dave Rodgman | 5ee4014 | 2019-03-07 16:30:40 -0800 | [diff] [blame] | 171 | |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 172 | 0 0 1 L L L L L (32..63) |
| 173 | Copy of small block within 16kB distance (preferably less than 34B) |
| 174 | length = 2 + (L ?: 31 + (zero_bytes * 255) + non_zero_byte) |
| 175 | Always followed by exactly one LE16 : D D D D D D D D : D D D D D D S S |
| 176 | distance = D + 1 |
| 177 | state = S (copy S literals after this block) |
| 178 | |
| 179 | 0 1 L D D D S S (64..127) |
| 180 | Copy 3-4 bytes from block within 2kB distance |
| 181 | state = S (copy S literals after this block) |
| 182 | length = 3 + L |
| 183 | Always followed by exactly one byte : H H H H H H H H |
| 184 | distance = (H << 3) + D + 1 |
| 185 | |
| 186 | 1 L L D D D S S (128..255) |
| 187 | Copy 5-8 bytes from block within 2kB distance |
| 188 | state = S (copy S literals after this block) |
| 189 | length = 5 + L |
| 190 | Always followed by exactly one byte : H H H H H H H H |
| 191 | distance = (H << 3) + D + 1 |
| 192 | |
| 193 | Authors |
Mauro Carvalho Chehab | 7b001bf | 2017-05-14 21:02:59 -0300 | [diff] [blame] | 194 | ======= |
Willy Tarreau | d98a052 | 2014-09-27 12:31:35 +0200 | [diff] [blame] | 195 | |
| 196 | This document was written by Willy Tarreau <w@1wt.eu> on 2014/07/19 during an |
Dave Rodgman | 5ee4014 | 2019-03-07 16:30:40 -0800 | [diff] [blame] | 197 | analysis of the decompression code available in Linux 3.16-rc5, and updated |
| 198 | by Dave Rodgman <dave.rodgman@arm.com> on 2018/10/30 to introduce run-length |
| 199 | encoding. The code is tricky, it is possible that this document contains |
| 200 | mistakes or that a few corner cases were overlooked. In any case, please |
| 201 | report any doubt, fix, or proposed updates to the author(s) so that the |
| 202 | document can be updated. |