Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 1 | The Kernel Address Sanitizer (KASAN) |
| 2 | ==================================== |
| 3 | |
| 4 | Overview |
| 5 | -------- |
| 6 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 7 | KernelAddressSANitizer (KASAN) is a dynamic memory error detector designed to |
| 8 | find out-of-bound and use-after-free bugs. KASAN has two modes: generic KASAN |
| 9 | (similar to userspace ASan) and software tag-based KASAN (similar to userspace |
| 10 | HWASan). |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 11 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 12 | KASAN uses compile-time instrumentation to insert validity checks before every |
| 13 | memory access, and therefore requires a compiler version that supports that. |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 14 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 15 | Generic KASAN is supported in both GCC and Clang. With GCC it requires version |
| 16 | 4.9.2 or later for basic support and version 5.0 or later for detection of |
| 17 | out-of-bounds accesses for stack and global variables and for inline |
| 18 | instrumentation mode (see the Usage section). With Clang it requires version |
| 19 | 7.0.0 or later and it doesn't support detection of out-of-bounds accesses for |
| 20 | global variables yet. |
| 21 | |
| 22 | Tag-based KASAN is only supported in Clang and requires version 7.0.0 or later. |
| 23 | |
Nick Hu | ea01ce67 | 2019-10-28 10:41:01 +0800 | [diff] [blame] | 24 | Currently generic KASAN is supported for the x86_64, arm64, xtensa, s390 and |
| 25 | riscv architectures, and tag-based KASAN is supported only for arm64. |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 26 | |
| 27 | Usage |
| 28 | ----- |
| 29 | |
| 30 | To enable KASAN configure kernel with:: |
| 31 | |
| 32 | CONFIG_KASAN = y |
| 33 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 34 | and choose between CONFIG_KASAN_GENERIC (to enable generic KASAN) and |
| 35 | CONFIG_KASAN_SW_TAGS (to enable software tag-based KASAN). |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 36 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 37 | You also need to choose between CONFIG_KASAN_OUTLINE and CONFIG_KASAN_INLINE. |
| 38 | Outline and inline are compiler instrumentation types. The former produces |
| 39 | smaller binary while the latter is 1.1 - 2 times faster. |
| 40 | |
| 41 | Both KASAN modes work with both SLUB and SLAB memory allocators. |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 42 | For better bug detection and nicer reporting, enable CONFIG_STACKTRACE. |
| 43 | |
Vlastimil Babka | 0fe9a44 | 2019-10-14 14:11:44 -0700 | [diff] [blame] | 44 | To augment reports with last allocation and freeing stack of the physical page, |
| 45 | it is recommended to enable also CONFIG_PAGE_OWNER and boot with page_owner=on. |
| 46 | |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 47 | To disable instrumentation for specific files or directories, add a line |
| 48 | similar to the following to the respective kernel Makefile: |
| 49 | |
| 50 | - For a single file (e.g. main.o):: |
| 51 | |
| 52 | KASAN_SANITIZE_main.o := n |
| 53 | |
| 54 | - For all files in one directory:: |
| 55 | |
| 56 | KASAN_SANITIZE := n |
| 57 | |
| 58 | Error reports |
| 59 | ~~~~~~~~~~~~~ |
| 60 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 61 | A typical out-of-bounds access generic KASAN report looks like this:: |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 62 | |
| 63 | ================================================================== |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 64 | BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa8/0xbc [test_kasan] |
| 65 | Write of size 1 at addr ffff8801f44ec37b by task insmod/2760 |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 66 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 67 | CPU: 1 PID: 2760 Comm: insmod Not tainted 4.19.0-rc3+ #698 |
| 68 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 69 | Call Trace: |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 70 | dump_stack+0x94/0xd8 |
| 71 | print_address_description+0x73/0x280 |
| 72 | kasan_report+0x144/0x187 |
| 73 | __asan_report_store1_noabort+0x17/0x20 |
| 74 | kmalloc_oob_right+0xa8/0xbc [test_kasan] |
| 75 | kmalloc_tests_init+0x16/0x700 [test_kasan] |
| 76 | do_one_initcall+0xa5/0x3ae |
| 77 | do_init_module+0x1b6/0x547 |
| 78 | load_module+0x75df/0x8070 |
| 79 | __do_sys_init_module+0x1c6/0x200 |
| 80 | __x64_sys_init_module+0x6e/0xb0 |
| 81 | do_syscall_64+0x9f/0x2c0 |
| 82 | entry_SYSCALL_64_after_hwframe+0x44/0xa9 |
| 83 | RIP: 0033:0x7f96443109da |
| 84 | RSP: 002b:00007ffcf0b51b08 EFLAGS: 00000202 ORIG_RAX: 00000000000000af |
| 85 | RAX: ffffffffffffffda RBX: 000055dc3ee521a0 RCX: 00007f96443109da |
| 86 | RDX: 00007f96445cff88 RSI: 0000000000057a50 RDI: 00007f9644992000 |
| 87 | RBP: 000055dc3ee510b0 R08: 0000000000000003 R09: 0000000000000000 |
| 88 | R10: 00007f964430cd0a R11: 0000000000000202 R12: 00007f96445cff88 |
| 89 | R13: 000055dc3ee51090 R14: 0000000000000000 R15: 0000000000000000 |
| 90 | |
| 91 | Allocated by task 2760: |
| 92 | save_stack+0x43/0xd0 |
| 93 | kasan_kmalloc+0xa7/0xd0 |
| 94 | kmem_cache_alloc_trace+0xe1/0x1b0 |
| 95 | kmalloc_oob_right+0x56/0xbc [test_kasan] |
| 96 | kmalloc_tests_init+0x16/0x700 [test_kasan] |
| 97 | do_one_initcall+0xa5/0x3ae |
| 98 | do_init_module+0x1b6/0x547 |
| 99 | load_module+0x75df/0x8070 |
| 100 | __do_sys_init_module+0x1c6/0x200 |
| 101 | __x64_sys_init_module+0x6e/0xb0 |
| 102 | do_syscall_64+0x9f/0x2c0 |
| 103 | entry_SYSCALL_64_after_hwframe+0x44/0xa9 |
| 104 | |
| 105 | Freed by task 815: |
| 106 | save_stack+0x43/0xd0 |
| 107 | __kasan_slab_free+0x135/0x190 |
| 108 | kasan_slab_free+0xe/0x10 |
| 109 | kfree+0x93/0x1a0 |
| 110 | umh_complete+0x6a/0xa0 |
| 111 | call_usermodehelper_exec_async+0x4c3/0x640 |
| 112 | ret_from_fork+0x35/0x40 |
| 113 | |
| 114 | The buggy address belongs to the object at ffff8801f44ec300 |
| 115 | which belongs to the cache kmalloc-128 of size 128 |
| 116 | The buggy address is located 123 bytes inside of |
| 117 | 128-byte region [ffff8801f44ec300, ffff8801f44ec380) |
| 118 | The buggy address belongs to the page: |
| 119 | page:ffffea0007d13b00 count:1 mapcount:0 mapping:ffff8801f7001640 index:0x0 |
| 120 | flags: 0x200000000000100(slab) |
| 121 | raw: 0200000000000100 ffffea0007d11dc0 0000001a0000001a ffff8801f7001640 |
| 122 | raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000 |
| 123 | page dumped because: kasan: bad access detected |
| 124 | |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 125 | Memory state around the buggy address: |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 126 | ffff8801f44ec200: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb |
| 127 | ffff8801f44ec280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc |
| 128 | >ffff8801f44ec300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 |
| 129 | ^ |
| 130 | ffff8801f44ec380: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb |
| 131 | ffff8801f44ec400: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 132 | ================================================================== |
| 133 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 134 | The header of the report provides a short summary of what kind of bug happened |
| 135 | and what kind of access caused it. It's followed by a stack trace of the bad |
| 136 | access, a stack trace of where the accessed memory was allocated (in case bad |
| 137 | access happens on a slab object), and a stack trace of where the object was |
| 138 | freed (in case of a use-after-free bug report). Next comes a description of |
| 139 | the accessed slab object and information about the accessed memory page. |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 140 | |
| 141 | In the last section the report shows memory state around the accessed address. |
| 142 | Reading this part requires some understanding of how KASAN works. |
| 143 | |
| 144 | The state of each 8 aligned bytes of memory is encoded in one shadow byte. |
| 145 | Those 8 bytes can be accessible, partially accessible, freed or be a redzone. |
| 146 | We use the following encoding for each shadow byte: 0 means that all 8 bytes |
| 147 | of the corresponding memory region are accessible; number N (1 <= N <= 7) means |
| 148 | that the first N bytes are accessible, and other (8 - N) bytes are not; |
| 149 | any negative value indicates that the entire 8-byte word is inaccessible. |
| 150 | We use different negative values to distinguish between different kinds of |
| 151 | inaccessible memory like redzones or freed memory (see mm/kasan/kasan.h). |
| 152 | |
| 153 | In the report above the arrows point to the shadow byte 03, which means that |
| 154 | the accessed address is partially accessible. |
| 155 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 156 | For tag-based KASAN this last report section shows the memory tags around the |
| 157 | accessed address (see Implementation details section). |
| 158 | |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 159 | |
| 160 | Implementation details |
| 161 | ---------------------- |
| 162 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 163 | Generic KASAN |
| 164 | ~~~~~~~~~~~~~ |
| 165 | |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 166 | From a high level, our approach to memory error detection is similar to that |
| 167 | of kmemcheck: use shadow memory to record whether each byte of memory is safe |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 168 | to access, and use compile-time instrumentation to insert checks of shadow |
| 169 | memory on each memory access. |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 170 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 171 | Generic KASAN dedicates 1/8th of kernel memory to its shadow memory (e.g. 16TB |
| 172 | to cover 128TB on x86_64) and uses direct mapping with a scale and offset to |
| 173 | translate a memory address to its corresponding shadow address. |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 174 | |
| 175 | Here is the function which translates an address to its corresponding shadow |
| 176 | address:: |
| 177 | |
| 178 | static inline void *kasan_mem_to_shadow(const void *addr) |
| 179 | { |
| 180 | return ((unsigned long)addr >> KASAN_SHADOW_SCALE_SHIFT) |
| 181 | + KASAN_SHADOW_OFFSET; |
| 182 | } |
| 183 | |
| 184 | where ``KASAN_SHADOW_SCALE_SHIFT = 3``. |
| 185 | |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 186 | Compile-time instrumentation is used to insert memory access checks. Compiler |
| 187 | inserts function calls (__asan_load*(addr), __asan_store*(addr)) before each |
| 188 | memory access of size 1, 2, 4, 8 or 16. These functions check whether memory |
| 189 | access is valid or not by checking corresponding shadow memory. |
Jonathan Corbet | 2757aaf | 2016-08-07 15:31:03 -0600 | [diff] [blame] | 190 | |
| 191 | GCC 5.0 has possibility to perform inline instrumentation. Instead of making |
| 192 | function calls GCC directly inserts the code to check the shadow memory. |
| 193 | This option significantly enlarges kernel but it gives x1.1-x2 performance |
| 194 | boost over outline instrumented kernel. |
Andrey Konovalov | b3b0e6a | 2018-12-28 00:31:10 -0800 | [diff] [blame] | 195 | |
| 196 | Software tag-based KASAN |
| 197 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 198 | |
| 199 | Tag-based KASAN uses the Top Byte Ignore (TBI) feature of modern arm64 CPUs to |
| 200 | store a pointer tag in the top byte of kernel pointers. Like generic KASAN it |
| 201 | uses shadow memory to store memory tags associated with each 16-byte memory |
| 202 | cell (therefore it dedicates 1/16th of the kernel memory for shadow memory). |
| 203 | |
| 204 | On each memory allocation tag-based KASAN generates a random tag, tags the |
| 205 | allocated memory with this tag, and embeds this tag into the returned pointer. |
| 206 | Software tag-based KASAN uses compile-time instrumentation to insert checks |
| 207 | before each memory access. These checks make sure that tag of the memory that |
| 208 | is being accessed is equal to tag of the pointer that is used to access this |
| 209 | memory. In case of a tag mismatch tag-based KASAN prints a bug report. |
| 210 | |
| 211 | Software tag-based KASAN also has two instrumentation modes (outline, that |
| 212 | emits callbacks to check memory accesses; and inline, that performs the shadow |
| 213 | memory checks inline). With outline instrumentation mode, a bug report is |
| 214 | simply printed from the function that performs the access check. With inline |
| 215 | instrumentation a brk instruction is emitted by the compiler, and a dedicated |
| 216 | brk handler is used to print bug reports. |
| 217 | |
| 218 | A potential expansion of this mode is a hardware tag-based mode, which would |
| 219 | use hardware memory tagging support instead of compiler instrumentation and |
| 220 | manual shadow memory manipulation. |
Daniel Axtens | 3c5c3cf | 2019-11-30 17:54:50 -0800 | [diff] [blame] | 221 | |
| 222 | What memory accesses are sanitised by KASAN? |
| 223 | -------------------------------------------- |
| 224 | |
| 225 | The kernel maps memory in a number of different parts of the address |
| 226 | space. This poses something of a problem for KASAN, which requires |
| 227 | that all addresses accessed by instrumented code have a valid shadow |
| 228 | region. |
| 229 | |
| 230 | The range of kernel virtual addresses is large: there is not enough |
| 231 | real memory to support a real shadow region for every address that |
| 232 | could be accessed by the kernel. |
| 233 | |
| 234 | By default |
| 235 | ~~~~~~~~~~ |
| 236 | |
| 237 | By default, architectures only map real memory over the shadow region |
| 238 | for the linear mapping (and potentially other small areas). For all |
| 239 | other areas - such as vmalloc and vmemmap space - a single read-only |
| 240 | page is mapped over the shadow area. This read-only shadow page |
| 241 | declares all memory accesses as permitted. |
| 242 | |
| 243 | This presents a problem for modules: they do not live in the linear |
| 244 | mapping, but in a dedicated module space. By hooking in to the module |
| 245 | allocator, KASAN can temporarily map real shadow memory to cover |
| 246 | them. This allows detection of invalid accesses to module globals, for |
| 247 | example. |
| 248 | |
| 249 | This also creates an incompatibility with ``VMAP_STACK``: if the stack |
| 250 | lives in vmalloc space, it will be shadowed by the read-only page, and |
| 251 | the kernel will fault when trying to set up the shadow data for stack |
| 252 | variables. |
| 253 | |
| 254 | CONFIG_KASAN_VMALLOC |
| 255 | ~~~~~~~~~~~~~~~~~~~~ |
| 256 | |
| 257 | With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the |
| 258 | cost of greater memory usage. Currently this is only supported on x86. |
| 259 | |
| 260 | This works by hooking into vmalloc and vmap, and dynamically |
| 261 | allocating real shadow memory to back the mappings. |
| 262 | |
| 263 | Most mappings in vmalloc space are small, requiring less than a full |
| 264 | page of shadow space. Allocating a full shadow page per mapping would |
| 265 | therefore be wasteful. Furthermore, to ensure that different mappings |
| 266 | use different shadow pages, mappings would have to be aligned to |
| 267 | ``KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE``. |
| 268 | |
| 269 | Instead, we share backing space across multiple mappings. We allocate |
| 270 | a backing page when a mapping in vmalloc space uses a particular page |
| 271 | of the shadow region. This page can be shared by other vmalloc |
| 272 | mappings later on. |
| 273 | |
| 274 | We hook in to the vmap infrastructure to lazily clean up unused shadow |
| 275 | memory. |
| 276 | |
| 277 | To avoid the difficulties around swapping mappings around, we expect |
| 278 | that the part of the shadow region that covers the vmalloc space will |
| 279 | not be covered by the early shadow page, but will be left |
| 280 | unmapped. This will require changes in arch-specific code. |
| 281 | |
| 282 | This allows ``VMAP_STACK`` support on x86, and can simplify support of |
| 283 | architectures that do not have a fixed module region. |