Fenghua Yu | 1897907 | 2021-04-19 21:49:55 +0000 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | .. include:: <isonum.txt> |
| 4 | |
| 5 | =============================== |
| 6 | Bus lock detection and handling |
| 7 | =============================== |
| 8 | |
| 9 | :Copyright: |copy| 2021 Intel Corporation |
| 10 | :Authors: - Fenghua Yu <fenghua.yu@intel.com> |
| 11 | - Tony Luck <tony.luck@intel.com> |
| 12 | |
| 13 | Problem |
| 14 | ======= |
| 15 | |
| 16 | A split lock is any atomic operation whose operand crosses two cache lines. |
| 17 | Since the operand spans two cache lines and the operation must be atomic, |
| 18 | the system locks the bus while the CPU accesses the two cache lines. |
| 19 | |
| 20 | A bus lock is acquired through either split locked access to writeback (WB) |
| 21 | memory or any locked access to non-WB memory. This is typically thousands of |
| 22 | cycles slower than an atomic operation within a cache line. It also disrupts |
| 23 | performance on other cores and brings the whole system to its knees. |
| 24 | |
| 25 | Detection |
| 26 | ========= |
| 27 | |
| 28 | Intel processors may support either or both of the following hardware |
| 29 | mechanisms to detect split locks and bus locks. |
| 30 | |
| 31 | #AC exception for split lock detection |
| 32 | -------------------------------------- |
| 33 | |
| 34 | Beginning with the Tremont Atom CPU split lock operations may raise an |
| 35 | Alignment Check (#AC) exception when a split lock operation is attemped. |
| 36 | |
| 37 | #DB exception for bus lock detection |
| 38 | ------------------------------------ |
| 39 | |
| 40 | Some CPUs have the ability to notify the kernel by an #DB trap after a user |
| 41 | instruction acquires a bus lock and is executed. This allows the kernel to |
| 42 | terminate the application or to enforce throttling. |
| 43 | |
| 44 | Software handling |
| 45 | ================= |
| 46 | |
| 47 | The kernel #AC and #DB handlers handle bus lock based on the kernel |
| 48 | parameter "split_lock_detect". Here is a summary of different options: |
| 49 | |
| 50 | +------------------+----------------------------+-----------------------+ |
| 51 | |split_lock_detect=|#AC for split lock |#DB for bus lock | |
| 52 | +------------------+----------------------------+-----------------------+ |
| 53 | |off |Do nothing |Do nothing | |
| 54 | +------------------+----------------------------+-----------------------+ |
| 55 | |warn |Kernel OOPs |Warn once per task and | |
| 56 | |(default) |Warn once per task and |and continues to run. | |
| 57 | | |disable future checking | | |
| 58 | | |When both features are | | |
| 59 | | |supported, warn in #AC | | |
| 60 | +------------------+----------------------------+-----------------------+ |
| 61 | |fatal |Kernel OOPs |Send SIGBUS to user. | |
| 62 | | |Send SIGBUS to user | | |
| 63 | | |When both features are | | |
| 64 | | |supported, fatal in #AC | | |
| 65 | +------------------+----------------------------+-----------------------+ |
Fenghua Yu | d28397e | 2021-04-19 21:49:58 +0000 | [diff] [blame] | 66 | |ratelimit:N |Do nothing |Limit bus lock rate to | |
| 67 | |(0 < N <= 1000) | |N bus locks per second | |
| 68 | | | |system wide and warn on| |
| 69 | | | |bus locks. | |
| 70 | +------------------+----------------------------+-----------------------+ |
Fenghua Yu | 1897907 | 2021-04-19 21:49:55 +0000 | [diff] [blame] | 71 | |
| 72 | Usages |
| 73 | ====== |
| 74 | |
| 75 | Detecting and handling bus lock may find usages in various areas: |
| 76 | |
| 77 | It is critical for real time system designers who build consolidated real |
| 78 | time systems. These systems run hard real time code on some cores and run |
| 79 | "untrusted" user processes on other cores. The hard real time cannot afford |
| 80 | to have any bus lock from the untrusted processes to hurt real time |
| 81 | performance. To date the designers have been unable to deploy these |
| 82 | solutions as they have no way to prevent the "untrusted" user code from |
| 83 | generating split lock and bus lock to block the hard real time code to |
| 84 | access memory during bus locking. |
| 85 | |
| 86 | It's also useful for general computing to prevent guests or user |
| 87 | applications from slowing down the overall system by executing instructions |
| 88 | with bus lock. |
| 89 | |
| 90 | |
| 91 | Guidance |
| 92 | ======== |
| 93 | off |
| 94 | --- |
| 95 | |
| 96 | Disable checking for split lock and bus lock. This option can be useful if |
| 97 | there are legacy applications that trigger these events at a low rate so |
| 98 | that mitigation is not needed. |
| 99 | |
| 100 | warn |
| 101 | ---- |
| 102 | |
| 103 | A warning is emitted when a bus lock is detected which allows to identify |
| 104 | the offending application. This is the default behavior. |
| 105 | |
| 106 | fatal |
| 107 | ----- |
| 108 | |
| 109 | In this case, the bus lock is not tolerated and the process is killed. |
Fenghua Yu | d28397e | 2021-04-19 21:49:58 +0000 | [diff] [blame] | 110 | |
| 111 | ratelimit |
| 112 | --------- |
| 113 | |
| 114 | A system wide bus lock rate limit N is specified where 0 < N <= 1000. This |
| 115 | allows a bus lock rate up to N bus locks per second. When the bus lock rate |
| 116 | is exceeded then any task which is caught via the buslock #DB exception is |
| 117 | throttled by enforced sleeps until the rate goes under the limit again. |
| 118 | |
| 119 | This is an effective mitigation in cases where a minimal impact can be |
| 120 | tolerated, but an eventual Denial of Service attack has to be prevented. It |
| 121 | allows to identify the offending processes and analyze whether they are |
| 122 | malicious or just badly written. |
| 123 | |
| 124 | Selecting a rate limit of 1000 allows the bus to be locked for up to about |
| 125 | seven million cycles each second (assuming 7000 cycles for each bus |
| 126 | lock). On a 2 GHz processor that would be about 0.35% system slowdown. |