Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 1 | =========================================================================== |
| 2 | Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe |
| 3 | =========================================================================== |
| 4 | |
| 5 | :Author: Robert Love <rml@tech9.net> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 6 | |
| 7 | |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 8 | Introduction |
| 9 | ============ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 10 | |
| 11 | |
| 12 | A preemptible kernel creates new locking issues. The issues are the same as |
| 13 | those under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible |
| 14 | kernel model leverages existing SMP locking mechanisms. Thus, the kernel |
| 15 | requires explicit additional locking for very few additional situations. |
| 16 | |
| 17 | This document is for all kernel hackers. Developing code in the kernel |
| 18 | requires protecting these situations. |
| 19 | |
| 20 | |
| 21 | RULE #1: Per-CPU data structures need explicit protection |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 22 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 23 | |
| 24 | |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 25 | Two similar problems arise. An example code snippet:: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 26 | |
| 27 | struct this_needs_locking tux[NR_CPUS]; |
| 28 | tux[smp_processor_id()] = some_value; |
| 29 | /* task is preempted here... */ |
| 30 | something = tux[smp_processor_id()]; |
| 31 | |
| 32 | First, since the data is per-CPU, it may not have explicit SMP locking, but |
| 33 | require it otherwise. Second, when a preempted task is finally rescheduled, |
| 34 | the previous value of smp_processor_id may not equal the current. You must |
| 35 | protect these situations by disabling preemption around them. |
| 36 | |
| 37 | You can also use put_cpu() and get_cpu(), which will disable preemption. |
| 38 | |
| 39 | |
| 40 | RULE #2: CPU state must be protected. |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 41 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 42 | |
| 43 | |
| 44 | Under preemption, the state of the CPU must be protected. This is arch- |
| 45 | dependent, but includes CPU structures and state not preserved over a context |
| 46 | switch. For example, on x86, entering and exiting FPU mode is now a critical |
| 47 | section that must occur while preemption is disabled. Think what would happen |
| 48 | if the kernel is executing a floating-point instruction and is then preempted. |
| 49 | Remember, the kernel does not save FPU state except for user tasks. Therefore, |
| 50 | upon preemption, the FPU registers will be sold to the lowest bidder. Thus, |
| 51 | preemption must be disabled around such regions. |
| 52 | |
| 53 | Note, some FPU functions are already explicitly preempt safe. For example, |
| 54 | kernel_fpu_begin and kernel_fpu_end will disable and enable preemption. |
Ingo Molnar | 3a0aee4 | 2015-04-22 13:16:47 +0200 | [diff] [blame] | 55 | However, fpu__restore() must be called with preemption disabled. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 56 | |
| 57 | |
| 58 | RULE #3: Lock acquire and release must be performed by same task |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 59 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 60 | |
| 61 | |
| 62 | A lock acquired in one task must be released by the same task. This |
| 63 | means you can't do oddball things like acquire a lock and go off to |
| 64 | play while another task releases it. If you want to do something |
| 65 | like this, acquire and release the task in the same code path and |
| 66 | have the caller wait on an event by the other task. |
| 67 | |
| 68 | |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 69 | Solution |
| 70 | ======== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 71 | |
| 72 | |
| 73 | Data protection under preemption is achieved by disabling preemption for the |
| 74 | duration of the critical region. |
| 75 | |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 76 | :: |
| 77 | |
| 78 | preempt_enable() decrement the preempt counter |
| 79 | preempt_disable() increment the preempt counter |
| 80 | preempt_enable_no_resched() decrement, but do not immediately preempt |
| 81 | preempt_check_resched() if needed, reschedule |
| 82 | preempt_count() return the preempt counter |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 83 | |
| 84 | The functions are nestable. In other words, you can call preempt_disable |
| 85 | n-times in a code path, and preemption will not be reenabled until the n-th |
| 86 | call to preempt_enable. The preempt statements define to nothing if |
| 87 | preemption is not enabled. |
| 88 | |
| 89 | Note that you do not need to explicitly prevent preemption if you are holding |
| 90 | any locks or interrupts are disabled, since preemption is implicitly disabled |
| 91 | in those cases. |
| 92 | |
| 93 | But keep in mind that 'irqs disabled' is a fundamentally unsafe way of |
Andrew Murray | 4428069 | 2018-10-08 14:15:15 +0100 | [diff] [blame] | 94 | disabling preemption - any cond_resched() or cond_resched_lock() might trigger |
| 95 | a reschedule if the preempt count is 0. A simple printk() might trigger a |
| 96 | reschedule. So use this implicit preemption-disabling property only if you |
| 97 | know that the affected codepath does not do any of this. Best policy is to use |
| 98 | this only for small, atomic code that you wrote and which calls no complex |
| 99 | functions. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 100 | |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 101 | Example:: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 102 | |
| 103 | cpucache_t *cc; /* this is per-CPU */ |
| 104 | preempt_disable(); |
| 105 | cc = cc_data(searchp); |
| 106 | if (cc && cc->avail) { |
| 107 | __free_block(searchp, cc_entry(cc), cc->avail); |
| 108 | cc->avail = 0; |
| 109 | } |
| 110 | preempt_enable(); |
| 111 | return 0; |
| 112 | |
| 113 | Notice how the preemption statements must encompass every reference of the |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 114 | critical variables. Another example:: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 115 | |
| 116 | int buf[NR_CPUS]; |
| 117 | set_cpu_val(buf); |
| 118 | if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n"); |
| 119 | spin_lock(&buf_lock); |
| 120 | /* ... */ |
| 121 | |
| 122 | This code is not preempt-safe, but see how easily we can fix it by simply |
| 123 | moving the spin_lock up two lines. |
| 124 | |
| 125 | |
Mauro Carvalho Chehab | 9cc07df | 2017-05-16 21:58:47 -0300 | [diff] [blame] | 126 | Preventing preemption using interrupt disabling |
| 127 | =============================================== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 128 | |
| 129 | |
| 130 | It is possible to prevent a preemption event using local_irq_disable and |
| 131 | local_irq_save. Note, when doing so, you must be very careful to not cause |
| 132 | an event that would set need_resched and result in a preemption check. When |
| 133 | in doubt, rely on locking or explicit preemption disabling. |
| 134 | |
| 135 | Note in 2.5 interrupt disabling is now only per-CPU (e.g. local). |
| 136 | |
| 137 | An additional concern is proper usage of local_irq_disable and local_irq_save. |
| 138 | These may be used to protect from preemption, however, on exit, if preemption |
| 139 | may be enabled, a test to see if preemption is required should be done. If |
| 140 | these are called from the spin_lock and read/write lock macros, the right thing |
| 141 | is done. They may also be called within a spin-lock protected region, however, |
| 142 | if they are ever called outside of this context, a test for preemption should |
| 143 | be made. Do note that calls from interrupt context or bottom half/ tasklets |
| 144 | are also protected by preemption locks and so may use the versions which do |
| 145 | not check preemption. |