blob: 58a0a5fa1ddc4b3c44cad1ed8ce81cdc563a8860 [file] [log] [blame]
Marco Elver10efe552021-02-25 17:19:26 -08001.. SPDX-License-Identifier: GPL-2.0
2.. Copyright (C) 2020, Google LLC.
3
4Kernel Electric-Fence (KFENCE)
5==============================
6
7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
9invalid-free errors.
10
11KFENCE is designed to be enabled in production kernels, and has near zero
12performance overhead. Compared to KASAN, KFENCE trades performance for
13precision. The main motivation behind KFENCE's design, is that with enough
14total uptime KFENCE will detect bugs in code paths not typically exercised by
15non-production test workloads. One way to quickly achieve a large enough total
16uptime is when the tool is deployed across a large fleet of machines.
17
18Usage
19-----
20
21To enable KFENCE, configure the kernel with::
22
23 CONFIG_KFENCE=y
24
25To build a kernel with KFENCE support, but disabled by default (to enable, set
26``kfence.sample_interval`` to non-zero value), configure the kernel with::
27
28 CONFIG_KFENCE=y
29 CONFIG_KFENCE_SAMPLE_INTERVAL=0
30
31KFENCE provides several other configuration options to customize behaviour (see
32the respective help text in ``lib/Kconfig.kfence`` for more info).
33
34Tuning performance
35~~~~~~~~~~~~~~~~~~
36
37The most important parameter is KFENCE's sample interval, which can be set via
38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
39sample interval determines the frequency with which heap allocations will be
40guarded by KFENCE. The default is configurable via the Kconfig option
41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
42disables KFENCE.
43
44The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
45further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
46255), the number of available guarded objects can be controlled. Each object
47requires 2 pages, one for the object itself and the other one used as a guard
48page; object pages are interleaved with guard pages, and every object page is
49therefore surrounded by two guard pages.
50
51The total memory dedicated to the KFENCE memory pool can be computed as::
52
53 ( #objects + 1 ) * 2 * PAGE_SIZE
54
55Using the default config, and assuming a page size of 4 KiB, results in
56dedicating 2 MiB to the KFENCE memory pool.
57
58Note: On architectures that support huge pages, KFENCE will ensure that the
59pool is using pages of size ``PAGE_SIZE``. This will result in additional page
60tables being allocated.
61
62Error reports
63~~~~~~~~~~~~~
64
65A typical out-of-bounds access looks like this::
66
67 ==================================================================
Marco Elverbc8fbc52021-02-25 17:19:31 -080068 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b
Marco Elver10efe552021-02-25 17:19:26 -080069
Marco Elverbc8fbc52021-02-25 17:19:31 -080070 Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17):
Marco Elver10efe552021-02-25 17:19:26 -080071 test_out_of_bounds_read+0xa3/0x22b
72 kunit_try_run_case+0x51/0x85
73 kunit_generic_run_threadfn_adapter+0x16/0x30
74 kthread+0x137/0x160
75 ret_from_fork+0x22/0x30
76
77 kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507:
78 test_alloc+0xf3/0x25b
79 test_out_of_bounds_read+0x98/0x22b
80 kunit_try_run_case+0x51/0x85
81 kunit_generic_run_threadfn_adapter+0x16/0x30
82 kthread+0x137/0x160
83 ret_from_fork+0x22/0x30
84
85 CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
86 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
87 ==================================================================
88
89The header of the report provides a short summary of the function involved in
90the access. It is followed by more detailed information about the access and
91its origin. Note that, real kernel addresses are only shown for
92``CONFIG_DEBUG_KERNEL=y`` builds.
93
94Use-after-free accesses are reported as::
95
96 ==================================================================
Marco Elverbc8fbc52021-02-25 17:19:31 -080097 BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
Marco Elver10efe552021-02-25 17:19:26 -080098
Marco Elverbc8fbc52021-02-25 17:19:31 -080099 Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24):
Marco Elver10efe552021-02-25 17:19:26 -0800100 test_use_after_free_read+0xb3/0x143
101 kunit_try_run_case+0x51/0x85
102 kunit_generic_run_threadfn_adapter+0x16/0x30
103 kthread+0x137/0x160
104 ret_from_fork+0x22/0x30
105
106 kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507:
107 test_alloc+0xf3/0x25b
108 test_use_after_free_read+0x76/0x143
109 kunit_try_run_case+0x51/0x85
110 kunit_generic_run_threadfn_adapter+0x16/0x30
111 kthread+0x137/0x160
112 ret_from_fork+0x22/0x30
113
114 freed by task 507:
115 test_use_after_free_read+0xa8/0x143
116 kunit_try_run_case+0x51/0x85
117 kunit_generic_run_threadfn_adapter+0x16/0x30
118 kthread+0x137/0x160
119 ret_from_fork+0x22/0x30
120
121 CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
122 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
123 ==================================================================
124
125KFENCE also reports on invalid frees, such as double-frees::
126
127 ==================================================================
128 BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
129
130 Invalid free of 0xffffffffb6741000:
131 test_double_free+0xdc/0x171
132 kunit_try_run_case+0x51/0x85
133 kunit_generic_run_threadfn_adapter+0x16/0x30
134 kthread+0x137/0x160
135 ret_from_fork+0x22/0x30
136
137 kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507:
138 test_alloc+0xf3/0x25b
139 test_double_free+0x76/0x171
140 kunit_try_run_case+0x51/0x85
141 kunit_generic_run_threadfn_adapter+0x16/0x30
142 kthread+0x137/0x160
143 ret_from_fork+0x22/0x30
144
145 freed by task 507:
146 test_double_free+0xa8/0x171
147 kunit_try_run_case+0x51/0x85
148 kunit_generic_run_threadfn_adapter+0x16/0x30
149 kthread+0x137/0x160
150 ret_from_fork+0x22/0x30
151
152 CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
153 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
154 ==================================================================
155
156KFENCE also uses pattern-based redzones on the other side of an object's guard
157page, to detect out-of-bounds writes on the unprotected side of the object.
158These are reported on frees::
159
160 ==================================================================
161 BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
162
163 Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69):
164 test_kmalloc_aligned_oob_write+0xef/0x184
165 kunit_try_run_case+0x51/0x85
166 kunit_generic_run_threadfn_adapter+0x16/0x30
167 kthread+0x137/0x160
168 ret_from_fork+0x22/0x30
169
170 kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507:
171 test_alloc+0xf3/0x25b
172 test_kmalloc_aligned_oob_write+0x57/0x184
173 kunit_try_run_case+0x51/0x85
174 kunit_generic_run_threadfn_adapter+0x16/0x30
175 kthread+0x137/0x160
176 ret_from_fork+0x22/0x30
177
178 CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
179 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
180 ==================================================================
181
182For such errors, the address where the corruption occurred as well as the
183invalidly written bytes (offset from the address) are shown; in this
184representation, '.' denote untouched bytes. In the example above ``0xac`` is
185the value written to the invalid address at offset 0, and the remaining '.'
186denote that no following bytes have been touched. Note that, real values are
187only shown for ``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information
188disclosure for non-debug builds, '!' is used instead to denote invalidly
189written bytes.
190
191And finally, KFENCE may also report on invalid accesses to any protected page
192where it was not possible to determine an associated object, e.g. if adjacent
193object pages had not yet been allocated::
194
195 ==================================================================
Marco Elverbc8fbc52021-02-25 17:19:31 -0800196 BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
Marco Elver10efe552021-02-25 17:19:26 -0800197
Marco Elverbc8fbc52021-02-25 17:19:31 -0800198 Invalid read at 0xffffffffb670b00a:
Marco Elver10efe552021-02-25 17:19:26 -0800199 test_invalid_access+0x26/0xe0
200 kunit_try_run_case+0x51/0x85
201 kunit_generic_run_threadfn_adapter+0x16/0x30
202 kthread+0x137/0x160
203 ret_from_fork+0x22/0x30
204
205 CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
206 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
207 ==================================================================
208
209DebugFS interface
210~~~~~~~~~~~~~~~~~
211
212Some debugging information is exposed via debugfs:
213
214* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
215
216* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
217 allocated via KFENCE, including those already freed but protected.
218
219Implementation Details
220----------------------
221
222Guarded allocations are set up based on the sample interval. After expiration
223of the sample interval, the next allocation through the main allocator (SLAB or
224SLUB) returns a guarded allocation from the KFENCE object pool (allocation
225sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
226the next allocation is set up after the expiration of the interval. To "gate" a
227KFENCE allocation through the main allocator's fast-path without overhead,
228KFENCE relies on static branches via the static keys infrastructure. The static
229branch is toggled to redirect the allocation to KFENCE.
230
231KFENCE objects each reside on a dedicated page, at either the left or right
232page boundaries selected at random. The pages to the left and right of the
233object page are "guard pages", whose attributes are changed to a protected
234state, and cause page faults on any attempted access. Such page faults are then
235intercepted by KFENCE, which handles the fault gracefully by reporting an
236out-of-bounds access, and marking the page as accessible so that the faulting
237code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead).
238
239To detect out-of-bounds writes to memory within the object's page itself,
240KFENCE also uses pattern-based redzones. For each object page, a redzone is set
241up for all non-object memory. For typical alignments, the redzone is only
242required on the unguarded side of an object. Because KFENCE must honor the
243cache's requested alignment, special alignments may result in unprotected gaps
244on either side of an object, all of which are redzoned.
245
246The following figure illustrates the page layout::
247
248 ---+-----------+-----------+-----------+-----------+-----------+---
249 | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
250 | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
251 | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
252 | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
253 | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
254 | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
255 ---+-----------+-----------+-----------+-----------+-----------+---
256
257Upon deallocation of a KFENCE object, the object's page is again protected and
258the object is marked as freed. Any further access to the object causes a fault
259and KFENCE reports a use-after-free access. Freed objects are inserted at the
260tail of KFENCE's freelist, so that the least recently freed objects are reused
261first, and the chances of detecting use-after-frees of recently freed objects
262is increased.
263
264Interface
265---------
266
267The following describes the functions which are used by allocators as well as
268page handling code to set up and deal with KFENCE allocations.
269
270.. kernel-doc:: include/linux/kfence.h
271 :functions: is_kfence_address
272 kfence_shutdown_cache
273 kfence_alloc kfence_free __kfence_free
274 kfence_ksize kfence_object_start
275 kfence_handle_page_fault
276
277Related Tools
278-------------
279
280In userspace, a similar approach is taken by `GWP-ASan
281<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
282a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
283directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
284similar but non-sampling approach, that also inspired the name "KFENCE", can be
285found in the userspace `Electric Fence Malloc Debugger
286<https://linux.die.net/man/3/efence>`_.
287
288In the kernel, several tools exist to debug memory access errors, and in
289particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
290is more precise, relying on compiler instrumentation, this comes at a
291performance cost.
292
293It is worth highlighting that KASAN and KFENCE are complementary, with
294different target environments. For instance, KASAN is the better debugging-aid,
295where test cases or reproducers exists: due to the lower chance to detect the
296error, it would require more effort using KFENCE to debug. Deployments at scale
297that cannot afford to enable KASAN, however, would benefit from using KFENCE to
298discover bugs due to code paths not exercised by test cases or fuzzers.