Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 1 | .. _up_doc: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 2 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 3 | RCU on Uniprocessor Systems |
| 4 | =========================== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 5 | |
| 6 | A common misconception is that, on UP systems, the call_rcu() primitive |
Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 7 | may immediately invoke its function. The basis of this misconception |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 8 | is that since there is only one CPU, it should not be necessary to |
| 9 | wait for anything else to get done, since there are no other CPUs for |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 10 | anything else to be happening on. Although this approach will *sort of* |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 11 | work a surprising amount of the time, it is a very bad idea in general. |
Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 12 | This document presents three examples that demonstrate exactly how bad |
| 13 | an idea this is. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 14 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 15 | Example 1: softirq Suicide |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 16 | -------------------------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 17 | |
| 18 | Suppose that an RCU-based algorithm scans a linked list containing |
| 19 | elements A, B, and C in process context, and can delete elements from |
| 20 | this same list in softirq context. Suppose that the process-context scan |
| 21 | is referencing element B when it is interrupted by softirq processing, |
| 22 | which deletes element B, and then invokes call_rcu() to free element B |
| 23 | after a grace period. |
| 24 | |
| 25 | Now, if call_rcu() were to directly invoke its arguments, then upon return |
| 26 | from softirq, the list scan would find itself referencing a newly freed |
| 27 | element B. This situation can greatly decrease the life expectancy of |
| 28 | your kernel. |
| 29 | |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 30 | This same problem can occur if call_rcu() is invoked from a hardware |
| 31 | interrupt handler. |
| 32 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 33 | Example 2: Function-Call Fatality |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 34 | --------------------------------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 35 | |
| 36 | Of course, one could avert the suicide described in the preceding example |
| 37 | by having call_rcu() directly invoke its arguments only if it was called |
| 38 | from process context. However, this can fail in a similar manner. |
| 39 | |
| 40 | Suppose that an RCU-based algorithm again scans a linked list containing |
| 41 | elements A, B, and C in process contexts, but that it invokes a function |
| 42 | on each element as it is scanned. Suppose further that this function |
| 43 | deletes element B from the list, then passes it to call_rcu() for deferred |
| 44 | freeing. This may be a bit unconventional, but it is perfectly legal |
| 45 | RCU usage, since call_rcu() must wait for a grace period to elapse. |
| 46 | Therefore, in this case, allowing call_rcu() to immediately invoke |
| 47 | its arguments would cause it to fail to make the fundamental guarantee |
| 48 | underlying RCU, namely that call_rcu() defers invoking its arguments until |
| 49 | all RCU read-side critical sections currently executing have completed. |
| 50 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 51 | Quick Quiz #1: |
| 52 | Why is it *not* legal to invoke synchronize_rcu() in this case? |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 53 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 54 | :ref:`Answers to Quick Quiz <answer_quick_quiz_up>` |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 55 | |
| 56 | Example 3: Death by Deadlock |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 57 | ---------------------------- |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 58 | |
| 59 | Suppose that call_rcu() is invoked while holding a lock, and that the |
| 60 | callback function must acquire this same lock. In this case, if |
| 61 | call_rcu() were to directly invoke the callback, the result would |
| 62 | be self-deadlock. |
| 63 | |
| 64 | In some cases, it would possible to restructure to code so that |
| 65 | the call_rcu() is delayed until after the lock is released. However, |
| 66 | there are cases where this can be quite ugly: |
| 67 | |
| 68 | 1. If a number of items need to be passed to call_rcu() within |
| 69 | the same critical section, then the code would need to create |
| 70 | a list of them, then traverse the list once the lock was |
| 71 | released. |
| 72 | |
| 73 | 2. In some cases, the lock will be held across some kernel API, |
| 74 | so that delaying the call_rcu() until the lock is released |
| 75 | requires that the data item be passed up via a common API. |
| 76 | It is far better to guarantee that callbacks are invoked |
| 77 | with no locks held than to have to modify such APIs to allow |
| 78 | arbitrary data items to be passed back up through them. |
| 79 | |
| 80 | If call_rcu() directly invokes the callback, painful locking restrictions |
| 81 | or API changes would be required. |
| 82 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 83 | Quick Quiz #2: |
| 84 | What locking restriction must RCU callbacks respect? |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 85 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 86 | :ref:`Answers to Quick Quiz <answer_quick_quiz_up>` |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 87 | |
| 88 | Summary |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 89 | ------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 90 | |
Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 91 | Permitting call_rcu() to immediately invoke its arguments breaks RCU, |
| 92 | even on a UP system. So do not do it! Even on a UP system, the RCU |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 93 | infrastructure *must* respect grace periods, and *must* invoke callbacks |
Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 94 | from a known environment in which no locks are held. |
| 95 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 96 | Note that it *is* safe for synchronize_rcu() to return immediately on |
| 97 | UP systems, including PREEMPT SMP builds running on UP systems. |
Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 98 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 99 | Quick Quiz #3: |
| 100 | Why can't synchronize_rcu() return immediately on UP systems running |
| 101 | preemptable RCU? |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 102 | |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 103 | .. _answer_quick_quiz_up: |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 104 | |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 105 | Answer to Quick Quiz #1: |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 106 | Why is it *not* legal to invoke synchronize_rcu() in this case? |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 107 | |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 108 | Because the calling function is scanning an RCU-protected linked |
| 109 | list, and is therefore within an RCU read-side critical section. |
| 110 | Therefore, the called function has been invoked within an RCU |
| 111 | read-side critical section, and is not permitted to block. |
| 112 | |
| 113 | Answer to Quick Quiz #2: |
| 114 | What locking restriction must RCU callbacks respect? |
| 115 | |
Jiunn Chang | acb6258 | 2019-06-27 16:01:47 -0500 | [diff] [blame] | 116 | Any lock that is acquired within an RCU callback must be acquired |
| 117 | elsewhere using an _bh variant of the spinlock primitive. |
| 118 | For example, if "mylock" is acquired by an RCU callback, then |
| 119 | a process-context acquisition of this lock must use something |
| 120 | like spin_lock_bh() to acquire the lock. Please note that |
| 121 | it is also OK to use _irq variants of spinlocks, for example, |
| 122 | spin_lock_irqsave(). |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 123 | |
| 124 | If the process-context code were to simply use spin_lock(), |
| 125 | then, since RCU callbacks can be invoked from softirq context, |
| 126 | the callback might be called from a softirq that interrupted |
| 127 | the process-context critical section. This would result in |
| 128 | self-deadlock. |
| 129 | |
| 130 | This restriction might seem gratuitous, since very few RCU |
| 131 | callbacks acquire locks directly. However, a great many RCU |
Jiunn Chang | 2a5b0c84 | 2019-06-26 15:07:03 -0500 | [diff] [blame] | 132 | callbacks do acquire locks *indirectly*, for example, via |
Paul E. McKenney | dd81eca | 2005-09-10 00:26:24 -0700 | [diff] [blame] | 133 | the kfree() primitive. |
Paul E. McKenney | 240ebbf | 2009-06-25 09:08:18 -0700 | [diff] [blame] | 134 | |
| 135 | Answer to Quick Quiz #3: |
| 136 | Why can't synchronize_rcu() return immediately on UP systems |
| 137 | running preemptable RCU? |
| 138 | |
| 139 | Because some other task might have been preempted in the middle |
| 140 | of an RCU read-side critical section. If synchronize_rcu() |
| 141 | simply immediately returned, it would prematurely signal the |
| 142 | end of the grace period, which would come as a nasty shock to |
| 143 | that other thread when it started running again. |