Marco Elver | bd0ccc4 | 2021-01-15 18:09:53 +0100 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | .. Copyright (C) 2019, Google LLC. |
| 3 | |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 4 | The Kernel Concurrency Sanitizer (KCSAN) |
| 5 | ======================================== |
| 6 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 7 | The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector, which |
| 8 | relies on compile-time instrumentation, and uses a watchpoint-based sampling |
| 9 | approach to detect races. KCSAN's primary purpose is to detect `data races`_. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 10 | |
| 11 | Usage |
| 12 | ----- |
| 13 | |
Marco Elver | e68dcd8 | 2020-06-18 11:31:16 +0200 | [diff] [blame] | 14 | KCSAN is supported by both GCC and Clang. With GCC we require version 11 or |
| 15 | later, and with Clang also require version 11 or later. |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 16 | |
| 17 | To enable KCSAN configure the kernel with:: |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 18 | |
| 19 | CONFIG_KCSAN = y |
| 20 | |
| 21 | KCSAN provides several other configuration options to customize behaviour (see |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 22 | the respective help text in ``lib/Kconfig.kcsan`` for more info). |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 23 | |
| 24 | Error reports |
| 25 | ~~~~~~~~~~~~~ |
| 26 | |
| 27 | A typical data race report looks like this:: |
| 28 | |
| 29 | ================================================================== |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 30 | BUG: KCSAN: data-race in test_kernel_read / test_kernel_write |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 31 | |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 32 | write to 0xffffffffc009a628 of 8 bytes by task 487 on cpu 0: |
| 33 | test_kernel_write+0x1d/0x30 |
| 34 | access_thread+0x89/0xd0 |
| 35 | kthread+0x23e/0x260 |
| 36 | ret_from_fork+0x22/0x30 |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 37 | |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 38 | read to 0xffffffffc009a628 of 8 bytes by task 488 on cpu 6: |
| 39 | test_kernel_read+0x10/0x20 |
| 40 | access_thread+0x89/0xd0 |
| 41 | kthread+0x23e/0x260 |
| 42 | ret_from_fork+0x22/0x30 |
| 43 | |
| 44 | value changed: 0x00000000000009a6 -> 0x00000000000009b2 |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 45 | |
| 46 | Reported by Kernel Concurrency Sanitizer on: |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 47 | CPU: 6 PID: 488 Comm: access_thread Not tainted 5.12.0-rc2+ #1 |
| 48 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 49 | ================================================================== |
| 50 | |
| 51 | The header of the report provides a short summary of the functions involved in |
| 52 | the race. It is followed by the access types and stack traces of the 2 threads |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 53 | involved in the data race. If KCSAN also observed a value change, the observed |
| 54 | old value and new value are shown on the "value changed" line respectively. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 55 | |
| 56 | The other less common type of data race report looks like this:: |
| 57 | |
| 58 | ================================================================== |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 59 | BUG: KCSAN: data-race in test_kernel_rmw_array+0x71/0xd0 |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 60 | |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 61 | race at unknown origin, with read to 0xffffffffc009bdb0 of 8 bytes by task 515 on cpu 2: |
| 62 | test_kernel_rmw_array+0x71/0xd0 |
| 63 | access_thread+0x89/0xd0 |
| 64 | kthread+0x23e/0x260 |
| 65 | ret_from_fork+0x22/0x30 |
| 66 | |
| 67 | value changed: 0x0000000000002328 -> 0x0000000000002329 |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 68 | |
| 69 | Reported by Kernel Concurrency Sanitizer on: |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 70 | CPU: 2 PID: 515 Comm: access_thread Not tainted 5.12.0-rc2+ #1 |
| 71 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 72 | ================================================================== |
| 73 | |
| 74 | This report is generated where it was not possible to determine the other |
| 75 | racing thread, but a race was inferred due to the data value of the watched |
Marco Elver | b930226 | 2021-04-14 13:28:25 +0200 | [diff] [blame] | 76 | memory location having changed. These reports always show a "value changed" |
| 77 | line. A common reason for reports of this type are missing instrumentation in |
| 78 | the racing thread, but could also occur due to e.g. DMA accesses. Such reports |
| 79 | are shown only if ``CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN=y``, which is |
| 80 | enabled by default. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 81 | |
| 82 | Selective analysis |
| 83 | ~~~~~~~~~~~~~~~~~~ |
| 84 | |
Marco Elver | 7161177 | 2019-12-12 01:07:08 +0100 | [diff] [blame] | 85 | It may be desirable to disable data race detection for specific accesses, |
| 86 | functions, compilation units, or entire subsystems. For static blacklisting, |
| 87 | the below options are available: |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 88 | |
Marco Elver | 7161177 | 2019-12-12 01:07:08 +0100 | [diff] [blame] | 89 | * KCSAN understands the ``data_race(expr)`` annotation, which tells KCSAN that |
| 90 | any data races due to accesses in ``expr`` should be ignored and resulting |
Paul E. McKenney | ea04846 | 2021-03-04 16:04:09 -0800 | [diff] [blame] | 91 | behaviour when encountering a data race is deemed safe. Please see |
Akira Yokosawa | 117232c | 2021-05-13 10:49:41 -0700 | [diff] [blame] | 92 | `"Marking Shared-Memory Accesses" in the LKMM`_ for more information. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 93 | |
Marco Elver | 7161177 | 2019-12-12 01:07:08 +0100 | [diff] [blame] | 94 | * Disabling data race detection for entire functions can be accomplished by |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 95 | using the function attribute ``__no_kcsan``:: |
| 96 | |
| 97 | __no_kcsan |
| 98 | void foo(void) { |
| 99 | ... |
| 100 | |
| 101 | To dynamically limit for which functions to generate reports, see the |
| 102 | `DebugFS interface`_ blacklist/whitelist feature. |
| 103 | |
Marco Elver | 7161177 | 2019-12-12 01:07:08 +0100 | [diff] [blame] | 104 | * To disable data race detection for a particular compilation unit, add to the |
| 105 | ``Makefile``:: |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 106 | |
| 107 | KCSAN_SANITIZE_file.o := n |
| 108 | |
Marco Elver | 7161177 | 2019-12-12 01:07:08 +0100 | [diff] [blame] | 109 | * To disable data race detection for all compilation units listed in a |
| 110 | ``Makefile``, add to the respective ``Makefile``:: |
| 111 | |
| 112 | KCSAN_SANITIZE := n |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 113 | |
Akira Yokosawa | 117232c | 2021-05-13 10:49:41 -0700 | [diff] [blame] | 114 | .. _"Marking Shared-Memory Accesses" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt |
| 115 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 116 | Furthermore, it is possible to tell KCSAN to show or hide entire classes of |
| 117 | data races, depending on preferences. These can be changed via the following |
| 118 | Kconfig options: |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 119 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 120 | * ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY``: If enabled and a conflicting write |
| 121 | is observed via a watchpoint, but the data value of the memory location was |
| 122 | observed to remain unchanged, do not report the data race. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 123 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 124 | * ``CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC``: Assume that plain aligned writes |
| 125 | up to word size are atomic by default. Assumes that such writes are not |
| 126 | subject to unsafe compiler optimizations resulting in data races. The option |
| 127 | causes KCSAN to not report data races due to conflicts where the only plain |
| 128 | accesses are aligned writes up to word size. |
| 129 | |
Marco Elver | 49f72d5 | 2021-06-07 14:56:51 +0200 | [diff] [blame] | 130 | * ``CONFIG_KCSAN_PERMISSIVE``: Enable additional permissive rules to ignore |
| 131 | certain classes of common data races. Unlike the above, the rules are more |
| 132 | complex involving value-change patterns, access type, and address. This |
| 133 | option depends on ``CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=y``. For details |
| 134 | please see the ``kernel/kcsan/permissive.h``. Testers and maintainers that |
| 135 | only focus on reports from specific subsystems and not the whole kernel are |
| 136 | recommended to disable this option. |
| 137 | |
Marco Elver | e675d25 | 2021-06-07 14:56:49 +0200 | [diff] [blame] | 138 | To use the strictest possible rules, select ``CONFIG_KCSAN_STRICT=y``, which |
| 139 | configures KCSAN to follow the Linux-kernel memory consistency model (LKMM) as |
| 140 | closely as possible. |
| 141 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 142 | DebugFS interface |
| 143 | ~~~~~~~~~~~~~~~~~ |
| 144 | |
| 145 | The file ``/sys/kernel/debug/kcsan`` provides the following interface: |
| 146 | |
| 147 | * Reading ``/sys/kernel/debug/kcsan`` returns various runtime statistics. |
| 148 | |
| 149 | * Writing ``on`` or ``off`` to ``/sys/kernel/debug/kcsan`` allows turning KCSAN |
| 150 | on or off, respectively. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 151 | |
| 152 | * Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds |
| 153 | ``some_func_name`` to the report filter list, which (by default) blacklists |
| 154 | reporting data races where either one of the top stackframes are a function |
| 155 | in the list. |
| 156 | |
| 157 | * Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan`` |
| 158 | changes the report filtering behaviour. For example, the blacklist feature |
| 159 | can be used to silence frequently occurring data races; the whitelist feature |
| 160 | can help with reproduction and testing of fixes. |
| 161 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 162 | Tuning performance |
| 163 | ~~~~~~~~~~~~~~~~~~ |
| 164 | |
| 165 | Core parameters that affect KCSAN's overall performance and bug detection |
| 166 | ability are exposed as kernel command-line arguments whose defaults can also be |
| 167 | changed via the corresponding Kconfig options. |
| 168 | |
| 169 | * ``kcsan.skip_watch`` (``CONFIG_KCSAN_SKIP_WATCH``): Number of per-CPU memory |
| 170 | operations to skip, before another watchpoint is set up. Setting up |
| 171 | watchpoints more frequently will result in the likelihood of races to be |
| 172 | observed to increase. This parameter has the most significant impact on |
| 173 | overall system performance and race detection ability. |
| 174 | |
| 175 | * ``kcsan.udelay_task`` (``CONFIG_KCSAN_UDELAY_TASK``): For tasks, the |
| 176 | microsecond delay to stall execution after a watchpoint has been set up. |
| 177 | Larger values result in the window in which we may observe a race to |
| 178 | increase. |
| 179 | |
| 180 | * ``kcsan.udelay_interrupt`` (``CONFIG_KCSAN_UDELAY_INTERRUPT``): For |
| 181 | interrupts, the microsecond delay to stall execution after a watchpoint has |
| 182 | been set up. Interrupts have tighter latency requirements, and their delay |
| 183 | should generally be smaller than the one chosen for tasks. |
| 184 | |
| 185 | They may be tweaked at runtime via ``/sys/module/kcsan/parameters/``. |
| 186 | |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 187 | Data Races |
| 188 | ---------- |
| 189 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 190 | In an execution, two memory accesses form a *data race* if they *conflict*, |
| 191 | they happen concurrently in different threads, and at least one of them is a |
| 192 | *plain access*; they *conflict* if both access the same memory location, and at |
| 193 | least one is a write. For a more thorough discussion and definition, see `"Plain |
| 194 | Accesses and Data Races" in the LKMM`_. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 195 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 196 | .. _"Plain Accesses and Data Races" in the LKMM: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1922 |
| 197 | |
| 198 | Relationship with the Linux-Kernel Memory Consistency Model (LKMM) |
| 199 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 200 | |
| 201 | The LKMM defines the propagation and ordering rules of various memory |
| 202 | operations, which gives developers the ability to reason about concurrent code. |
| 203 | Ultimately this allows to determine the possible executions of concurrent code, |
| 204 | and if that code is free from data races. |
| 205 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 206 | KCSAN is aware of *marked atomic operations* (``READ_ONCE``, ``WRITE_ONCE``, |
Marco Elver | 82eb691 | 2021-11-30 12:44:17 +0100 | [diff] [blame] | 207 | ``atomic_*``, etc.), and a subset of ordering guarantees implied by memory |
| 208 | barriers. With ``CONFIG_KCSAN_WEAK_MEMORY=y``, KCSAN models load or store |
| 209 | buffering, and can detect missing ``smp_mb()``, ``smp_wmb()``, ``smp_rmb()``, |
| 210 | ``smp_store_release()``, and all ``atomic_*`` operations with equivalent |
| 211 | implied barriers. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 212 | |
Marco Elver | 82eb691 | 2021-11-30 12:44:17 +0100 | [diff] [blame] | 213 | Note, KCSAN will not report all data races due to missing memory ordering, |
| 214 | specifically where a memory barrier would be required to prohibit subsequent |
| 215 | memory operation from reordering before the barrier. Developers should |
| 216 | therefore carefully consider the required memory ordering requirements that |
| 217 | remain unchecked. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 218 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 219 | Race Detection Beyond Data Races |
| 220 | -------------------------------- |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 221 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 222 | For code with complex concurrency design, race-condition bugs may not always |
| 223 | manifest as data races. Race conditions occur if concurrently executing |
| 224 | operations result in unexpected system behaviour. On the other hand, data races |
| 225 | are defined at the C-language level. The following macros can be used to check |
| 226 | properties of concurrent code where bugs would not manifest as data races. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 227 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 228 | .. kernel-doc:: include/linux/kcsan-checks.h |
Marco Elver | d8949ef | 2020-03-25 17:41:58 +0100 | [diff] [blame] | 229 | :functions: ASSERT_EXCLUSIVE_WRITER ASSERT_EXCLUSIVE_WRITER_SCOPED |
| 230 | ASSERT_EXCLUSIVE_ACCESS ASSERT_EXCLUSIVE_ACCESS_SCOPED |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 231 | ASSERT_EXCLUSIVE_BITS |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 232 | |
| 233 | Implementation Details |
| 234 | ---------------------- |
| 235 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 236 | KCSAN relies on observing that two accesses happen concurrently. Crucially, we |
| 237 | want to (a) increase the chances of observing races (especially for races that |
| 238 | manifest rarely), and (b) be able to actually observe them. We can accomplish |
| 239 | (a) by injecting various delays, and (b) by using address watchpoints (or |
| 240 | breakpoints). |
| 241 | |
| 242 | If we deliberately stall a memory access, while we have a watchpoint for its |
| 243 | address set up, and then observe the watchpoint to fire, two accesses to the |
| 244 | same address just raced. Using hardware watchpoints, this is the approach taken |
| 245 | in `DataCollider |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 246 | <http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_. |
| 247 | Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 248 | relies on compiler instrumentation and "soft watchpoints". |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 249 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 250 | In KCSAN, watchpoints are implemented using an efficient encoding that stores |
| 251 | access type, size, and address in a long; the benefits of using "soft |
| 252 | watchpoints" are portability and greater flexibility. KCSAN then relies on the |
| 253 | compiler instrumenting plain accesses. For each instrumented plain access: |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 254 | |
| 255 | 1. Check if a matching watchpoint exists; if yes, and at least one access is a |
| 256 | write, then we encountered a racing access. |
| 257 | |
| 258 | 2. Periodically, if no matching watchpoint exists, set up a watchpoint and |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 259 | stall for a small randomized delay. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 260 | |
| 261 | 3. Also check the data value before the delay, and re-check the data value |
| 262 | after delay; if the values mismatch, we infer a race of unknown origin. |
| 263 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 264 | To detect data races between plain and marked accesses, KCSAN also annotates |
| 265 | marked accesses, but only to check if a watchpoint exists; i.e. KCSAN never |
| 266 | sets up a watchpoint on marked accesses. By never setting up watchpoints for |
| 267 | marked operations, if all accesses to a variable that is accessed concurrently |
| 268 | are properly marked, KCSAN will never trigger a watchpoint and therefore never |
| 269 | report the accesses. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 270 | |
Marco Elver | 82eb691 | 2021-11-30 12:44:17 +0100 | [diff] [blame] | 271 | Modeling Weak Memory |
| 272 | ~~~~~~~~~~~~~~~~~~~~ |
| 273 | |
| 274 | KCSAN's approach to detecting data races due to missing memory barriers is |
| 275 | based on modeling access reordering (with ``CONFIG_KCSAN_WEAK_MEMORY=y``). |
| 276 | Each plain memory access for which a watchpoint is set up, is also selected for |
| 277 | simulated reordering within the scope of its function (at most 1 in-flight |
| 278 | access). |
| 279 | |
| 280 | Once an access has been selected for reordering, it is checked along every |
| 281 | other access until the end of the function scope. If an appropriate memory |
| 282 | barrier is encountered, the access will no longer be considered for simulated |
| 283 | reordering. |
| 284 | |
| 285 | When the result of a memory operation should be ordered by a barrier, KCSAN can |
| 286 | then detect data races where the conflict only occurs as a result of a missing |
| 287 | barrier. Consider the example:: |
| 288 | |
| 289 | int x, flag; |
| 290 | void T1(void) |
| 291 | { |
| 292 | x = 1; // data race! |
| 293 | WRITE_ONCE(flag, 1); // correct: smp_store_release(&flag, 1) |
| 294 | } |
| 295 | void T2(void) |
| 296 | { |
| 297 | while (!READ_ONCE(flag)); // correct: smp_load_acquire(&flag) |
| 298 | ... = x; // data race! |
| 299 | } |
| 300 | |
| 301 | When weak memory modeling is enabled, KCSAN can consider ``x`` in ``T1`` for |
| 302 | simulated reordering. After the write of ``flag``, ``x`` is again checked for |
| 303 | concurrent accesses: because ``T2`` is able to proceed after the write of |
| 304 | ``flag``, a data race is detected. With the correct barriers in place, ``x`` |
| 305 | would not be considered for reordering after the proper release of ``flag``, |
| 306 | and no data race would be detected. |
| 307 | |
| 308 | Deliberate trade-offs in complexity but also practical limitations mean only a |
| 309 | subset of data races due to missing memory barriers can be detected. With |
| 310 | currently available compiler support, the implementation is limited to modeling |
| 311 | the effects of "buffering" (delaying accesses), since the runtime cannot |
| 312 | "prefetch" accesses. Also recall that watchpoints are only set up for plain |
| 313 | accesses, and the only access type for which KCSAN simulates reordering. This |
| 314 | means reordering of marked accesses is not modeled. |
| 315 | |
| 316 | A consequence of the above is that acquire operations do not require barrier |
| 317 | instrumentation (no prefetching). Furthermore, marked accesses introducing |
| 318 | address or control dependencies do not require special handling (the marked |
| 319 | access cannot be reordered, later dependent accesses cannot be prefetched). |
| 320 | |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 321 | Key Properties |
| 322 | ~~~~~~~~~~~~~~ |
| 323 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 324 | 1. **Memory Overhead:** The overall memory overhead is only a few MiB |
| 325 | depending on configuration. The current implementation uses a small array of |
| 326 | longs to encode watchpoint information, which is negligible. |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 327 | |
| 328 | 2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an |
| 329 | efficient watchpoint encoding that does not require acquiring any shared |
| 330 | locks in the fast-path. For kernel boot on a system with 8 CPUs: |
| 331 | |
| 332 | - 5.0x slow-down with the default KCSAN config; |
| 333 | - 2.8x slow-down from runtime fast-path overhead only (set very large |
| 334 | ``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``). |
| 335 | |
| 336 | 3. **Annotation Overheads:** Minimal annotations are required outside the KCSAN |
| 337 | runtime. As a result, maintenance overheads are minimal as the kernel |
| 338 | evolves. |
| 339 | |
| 340 | 4. **Detects Racy Writes from Devices:** Due to checking data values upon |
| 341 | setting up watchpoints, racy writes from devices can also be detected. |
| 342 | |
Marco Elver | 82eb691 | 2021-11-30 12:44:17 +0100 | [diff] [blame] | 343 | 5. **Memory Ordering:** KCSAN is aware of only a subset of LKMM ordering rules; |
| 344 | this may result in missed data races (false negatives). |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 345 | |
| 346 | 6. **Analysis Accuracy:** For observed executions, due to using a sampling |
| 347 | strategy, the analysis is *unsound* (false negatives possible), but aims to |
| 348 | be complete (no false positives). |
| 349 | |
| 350 | Alternatives Considered |
| 351 | ----------------------- |
| 352 | |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 353 | An alternative data race detection approach for the kernel can be found in the |
Marco Elver | 905e672 | 2019-11-14 19:02:56 +0100 | [diff] [blame] | 354 | `Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_. |
| 355 | KTSAN is a happens-before data race detector, which explicitly establishes the |
| 356 | happens-before order between memory operations, which can then be used to |
Marco Elver | e7325b7 | 2020-03-05 15:21:08 +0100 | [diff] [blame] | 357 | determine data races as defined in `Data Races`_. |
| 358 | |
| 359 | To build a correct happens-before relation, KTSAN must be aware of all ordering |
| 360 | rules of the LKMM and synchronization primitives. Unfortunately, any omission |
| 361 | leads to large numbers of false positives, which is especially detrimental in |
| 362 | the context of the kernel which includes numerous custom synchronization |
| 363 | mechanisms. To track the happens-before relation, KTSAN's implementation |
| 364 | requires metadata for each memory location (shadow memory), which for each page |
| 365 | corresponds to 4 pages of shadow memory, and can translate into overhead of |
| 366 | tens of GiB on a large system. |