blob: ac6b89d1a8c324e5faf175e70ad8e1cf338a3c76 [file] [log] [blame]
Marco Elver10efe552021-02-25 17:19:26 -08001.. SPDX-License-Identifier: GPL-2.0
2.. Copyright (C) 2020, Google LLC.
3
4Kernel Electric-Fence (KFENCE)
5==============================
6
7Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
8error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
9invalid-free errors.
10
11KFENCE is designed to be enabled in production kernels, and has near zero
12performance overhead. Compared to KASAN, KFENCE trades performance for
13precision. The main motivation behind KFENCE's design, is that with enough
14total uptime KFENCE will detect bugs in code paths not typically exercised by
15non-production test workloads. One way to quickly achieve a large enough total
16uptime is when the tool is deployed across a large fleet of machines.
17
18Usage
19-----
20
21To enable KFENCE, configure the kernel with::
22
23 CONFIG_KFENCE=y
24
25To build a kernel with KFENCE support, but disabled by default (to enable, set
26``kfence.sample_interval`` to non-zero value), configure the kernel with::
27
28 CONFIG_KFENCE=y
29 CONFIG_KFENCE_SAMPLE_INTERVAL=0
30
31KFENCE provides several other configuration options to customize behaviour (see
32the respective help text in ``lib/Kconfig.kfence`` for more info).
33
34Tuning performance
35~~~~~~~~~~~~~~~~~~
36
37The most important parameter is KFENCE's sample interval, which can be set via
38the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
39sample interval determines the frequency with which heap allocations will be
40guarded by KFENCE. The default is configurable via the Kconfig option
41``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
42disables KFENCE.
43
44The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
45further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
46255), the number of available guarded objects can be controlled. Each object
47requires 2 pages, one for the object itself and the other one used as a guard
48page; object pages are interleaved with guard pages, and every object page is
49therefore surrounded by two guard pages.
50
51The total memory dedicated to the KFENCE memory pool can be computed as::
52
53 ( #objects + 1 ) * 2 * PAGE_SIZE
54
55Using the default config, and assuming a page size of 4 KiB, results in
56dedicating 2 MiB to the KFENCE memory pool.
57
58Note: On architectures that support huge pages, KFENCE will ensure that the
59pool is using pages of size ``PAGE_SIZE``. This will result in additional page
60tables being allocated.
61
62Error reports
63~~~~~~~~~~~~~
64
65A typical out-of-bounds access looks like this::
66
67 ==================================================================
Marco Elver4bbf04a2021-09-07 19:56:21 -070068 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234
Marco Elver10efe552021-02-25 17:19:26 -080069
Marco Elver4bbf04a2021-09-07 19:56:21 -070070 Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72):
71 test_out_of_bounds_read+0xa6/0x234
72 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -080073 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -070074 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -080075 ret_from_fork+0x22/0x30
76
Marco Elver4bbf04a2021-09-07 19:56:21 -070077 kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32
78
79 allocated by task 484 on cpu 0 at 32.919330s:
80 test_alloc+0xfe/0x738
81 test_out_of_bounds_read+0x9b/0x234
82 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -080083 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -070084 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -080085 ret_from_fork+0x22/0x30
86
Marco Elver4bbf04a2021-09-07 19:56:21 -070087 CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7
88 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Marco Elver10efe552021-02-25 17:19:26 -080089 ==================================================================
90
91The header of the report provides a short summary of the function involved in
92the access. It is followed by more detailed information about the access and
Marco Elver35beccf2021-02-25 17:19:40 -080093its origin. Note that, real kernel addresses are only shown when using the
94kernel command line option ``no_hash_pointers``.
Marco Elver10efe552021-02-25 17:19:26 -080095
96Use-after-free accesses are reported as::
97
98 ==================================================================
Marco Elverbc8fbc52021-02-25 17:19:31 -080099 BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143
Marco Elver10efe552021-02-25 17:19:26 -0800100
Marco Elver4bbf04a2021-09-07 19:56:21 -0700101 Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79):
Marco Elver10efe552021-02-25 17:19:26 -0800102 test_use_after_free_read+0xb3/0x143
Marco Elver4bbf04a2021-09-07 19:56:21 -0700103 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800104 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700105 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800106 ret_from_fork+0x22/0x30
107
Marco Elver4bbf04a2021-09-07 19:56:21 -0700108 kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32
109
110 allocated by task 488 on cpu 2 at 33.871326s:
111 test_alloc+0xfe/0x738
Marco Elver10efe552021-02-25 17:19:26 -0800112 test_use_after_free_read+0x76/0x143
Marco Elver4bbf04a2021-09-07 19:56:21 -0700113 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800114 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700115 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800116 ret_from_fork+0x22/0x30
117
Marco Elver4bbf04a2021-09-07 19:56:21 -0700118 freed by task 488 on cpu 2 at 33.871358s:
Marco Elver10efe552021-02-25 17:19:26 -0800119 test_use_after_free_read+0xa8/0x143
Marco Elver4bbf04a2021-09-07 19:56:21 -0700120 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800121 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700122 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800123 ret_from_fork+0x22/0x30
124
Marco Elver4bbf04a2021-09-07 19:56:21 -0700125 CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7
126 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Marco Elver10efe552021-02-25 17:19:26 -0800127 ==================================================================
128
129KFENCE also reports on invalid frees, such as double-frees::
130
131 ==================================================================
132 BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
133
Marco Elver4bbf04a2021-09-07 19:56:21 -0700134 Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81):
Marco Elver10efe552021-02-25 17:19:26 -0800135 test_double_free+0xdc/0x171
Marco Elver4bbf04a2021-09-07 19:56:21 -0700136 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800137 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700138 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800139 ret_from_fork+0x22/0x30
140
Marco Elver4bbf04a2021-09-07 19:56:21 -0700141 kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32
142
143 allocated by task 490 on cpu 1 at 34.175321s:
144 test_alloc+0xfe/0x738
Marco Elver10efe552021-02-25 17:19:26 -0800145 test_double_free+0x76/0x171
Marco Elver4bbf04a2021-09-07 19:56:21 -0700146 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800147 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700148 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800149 ret_from_fork+0x22/0x30
150
Marco Elver4bbf04a2021-09-07 19:56:21 -0700151 freed by task 490 on cpu 1 at 34.175348s:
Marco Elver10efe552021-02-25 17:19:26 -0800152 test_double_free+0xa8/0x171
Marco Elver4bbf04a2021-09-07 19:56:21 -0700153 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800154 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700155 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800156 ret_from_fork+0x22/0x30
157
Marco Elver4bbf04a2021-09-07 19:56:21 -0700158 CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7
159 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Marco Elver10efe552021-02-25 17:19:26 -0800160 ==================================================================
161
162KFENCE also uses pattern-based redzones on the other side of an object's guard
163page, to detect out-of-bounds writes on the unprotected side of the object.
164These are reported on frees::
165
166 ==================================================================
167 BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
168
Marco Elver4bbf04a2021-09-07 19:56:21 -0700169 Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156):
Marco Elver10efe552021-02-25 17:19:26 -0800170 test_kmalloc_aligned_oob_write+0xef/0x184
Marco Elver4bbf04a2021-09-07 19:56:21 -0700171 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800172 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700173 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800174 ret_from_fork+0x22/0x30
175
Marco Elver4bbf04a2021-09-07 19:56:21 -0700176 kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96
177
178 allocated by task 502 on cpu 7 at 42.159302s:
179 test_alloc+0xfe/0x738
Marco Elver10efe552021-02-25 17:19:26 -0800180 test_kmalloc_aligned_oob_write+0x57/0x184
Marco Elver4bbf04a2021-09-07 19:56:21 -0700181 kunit_try_run_case+0x61/0xa0
Marco Elver10efe552021-02-25 17:19:26 -0800182 kunit_generic_run_threadfn_adapter+0x16/0x30
Marco Elver4bbf04a2021-09-07 19:56:21 -0700183 kthread+0x176/0x1b0
Marco Elver10efe552021-02-25 17:19:26 -0800184 ret_from_fork+0x22/0x30
185
Marco Elver4bbf04a2021-09-07 19:56:21 -0700186 CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7
187 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Marco Elver10efe552021-02-25 17:19:26 -0800188 ==================================================================
189
190For such errors, the address where the corruption occurred as well as the
191invalidly written bytes (offset from the address) are shown; in this
192representation, '.' denote untouched bytes. In the example above ``0xac`` is
193the value written to the invalid address at offset 0, and the remaining '.'
194denote that no following bytes have been touched. Note that, real values are
Marco Elver35beccf2021-02-25 17:19:40 -0800195only shown if the kernel was booted with ``no_hash_pointers``; to avoid
196information disclosure otherwise, '!' is used instead to denote invalidly
Marco Elver10efe552021-02-25 17:19:26 -0800197written bytes.
198
199And finally, KFENCE may also report on invalid accesses to any protected page
200where it was not possible to determine an associated object, e.g. if adjacent
201object pages had not yet been allocated::
202
203 ==================================================================
Marco Elverbc8fbc52021-02-25 17:19:31 -0800204 BUG: KFENCE: invalid read in test_invalid_access+0x26/0xe0
Marco Elver10efe552021-02-25 17:19:26 -0800205
Marco Elverbc8fbc52021-02-25 17:19:31 -0800206 Invalid read at 0xffffffffb670b00a:
Marco Elver10efe552021-02-25 17:19:26 -0800207 test_invalid_access+0x26/0xe0
208 kunit_try_run_case+0x51/0x85
209 kunit_generic_run_threadfn_adapter+0x16/0x30
210 kthread+0x137/0x160
211 ret_from_fork+0x22/0x30
212
213 CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
214 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
215 ==================================================================
216
217DebugFS interface
218~~~~~~~~~~~~~~~~~
219
220Some debugging information is exposed via debugfs:
221
222* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
223
224* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
225 allocated via KFENCE, including those already freed but protected.
226
227Implementation Details
228----------------------
229
230Guarded allocations are set up based on the sample interval. After expiration
231of the sample interval, the next allocation through the main allocator (SLAB or
232SLUB) returns a guarded allocation from the KFENCE object pool (allocation
233sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
Marco Elver4f612ed2021-11-05 13:45:49 -0700234the next allocation is set up after the expiration of the interval.
235
236When using ``CONFIG_KFENCE_STATIC_KEYS=y``, KFENCE allocations are "gated"
237through the main allocator's fast-path by relying on static branches via the
238static keys infrastructure. The static branch is toggled to redirect the
239allocation to KFENCE. Depending on sample interval, target workloads, and
240system architecture, this may perform better than the simple dynamic branch.
241Careful benchmarking is recommended.
Marco Elver10efe552021-02-25 17:19:26 -0800242
243KFENCE objects each reside on a dedicated page, at either the left or right
244page boundaries selected at random. The pages to the left and right of the
245object page are "guard pages", whose attributes are changed to a protected
246state, and cause page faults on any attempted access. Such page faults are then
247intercepted by KFENCE, which handles the fault gracefully by reporting an
248out-of-bounds access, and marking the page as accessible so that the faulting
249code can (wrongly) continue executing (set ``panic_on_warn`` to panic instead).
250
251To detect out-of-bounds writes to memory within the object's page itself,
252KFENCE also uses pattern-based redzones. For each object page, a redzone is set
253up for all non-object memory. For typical alignments, the redzone is only
254required on the unguarded side of an object. Because KFENCE must honor the
255cache's requested alignment, special alignments may result in unprotected gaps
256on either side of an object, all of which are redzoned.
257
258The following figure illustrates the page layout::
259
260 ---+-----------+-----------+-----------+-----------+-----------+---
261 | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
262 | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
263 | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
264 | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
265 | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
266 | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
267 ---+-----------+-----------+-----------+-----------+-----------+---
268
269Upon deallocation of a KFENCE object, the object's page is again protected and
270the object is marked as freed. Any further access to the object causes a fault
271and KFENCE reports a use-after-free access. Freed objects are inserted at the
272tail of KFENCE's freelist, so that the least recently freed objects are reused
273first, and the chances of detecting use-after-frees of recently freed objects
274is increased.
275
Marco Elver5cc906b2021-11-05 13:45:37 -0700276If pool utilization reaches 75% (default) or above, to reduce the risk of the
277pool eventually being fully occupied by allocated objects yet ensure diverse
278coverage of allocations, KFENCE limits currently covered allocations of the
279same source from further filling up the pool. The "source" of an allocation is
280based on its partial allocation stack trace. A side-effect is that this also
281limits frequent long-lived allocations (e.g. pagecache) of the same source
282filling up the pool permanently, which is the most common risk for the pool
283becoming full and the sampled allocation rate dropping to zero. The threshold
284at which to start limiting currently covered allocations can be configured via
285the boot parameter ``kfence.skip_covered_thresh`` (pool usage%).
286
Marco Elver10efe552021-02-25 17:19:26 -0800287Interface
288---------
289
290The following describes the functions which are used by allocators as well as
291page handling code to set up and deal with KFENCE allocations.
292
293.. kernel-doc:: include/linux/kfence.h
294 :functions: is_kfence_address
295 kfence_shutdown_cache
296 kfence_alloc kfence_free __kfence_free
297 kfence_ksize kfence_object_start
298 kfence_handle_page_fault
299
300Related Tools
301-------------
302
303In userspace, a similar approach is taken by `GWP-ASan
304<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
305a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
306directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
307similar but non-sampling approach, that also inspired the name "KFENCE", can be
308found in the userspace `Electric Fence Malloc Debugger
309<https://linux.die.net/man/3/efence>`_.
310
311In the kernel, several tools exist to debug memory access errors, and in
312particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
313is more precise, relying on compiler instrumentation, this comes at a
314performance cost.
315
316It is worth highlighting that KASAN and KFENCE are complementary, with
317different target environments. For instance, KASAN is the better debugging-aid,
318where test cases or reproducers exists: due to the lower chance to detect the
319error, it would require more effort using KFENCE to debug. Deployments at scale
320that cannot afford to enable KASAN, however, would benefit from using KFENCE to
321discover bugs due to code paths not exercised by test cases or fuzzers.