blob: 5e6429d66c2425b8b94b6d78fbcdb8f527586657 [file] [log] [blame]
Paul E. McKenneyd19720a2006-02-01 03:06:42 -08001Reference-count design for elements of lists/arrays protected by RCU.
Dipankar Sarmac0dfb292005-09-09 13:04:09 -07002
Paul E. McKenney99631852014-06-23 11:52:59 -07003
4Please note that the percpu-ref feature is likely your first
5stop if you need to combine reference counts and RCU. Please see
6include/linux/percpu-refcount.h for more information. However, in
7those unusual cases where percpu-ref would consume too much memory,
8please read on.
9
10------------------------------------------------------------------------
11
Paul E. McKenneyd19720a2006-02-01 03:06:42 -080012Reference counting on elements of lists which are protected by traditional
13reader/writer spinlocks or semaphores are straightforward:
Dipankar Sarmac0dfb292005-09-09 13:04:09 -070014
Joel Fernandes (Google)de1dbce2019-03-29 10:05:55 -040015CODE LISTING A:
Nick Piggin095975d2006-01-08 01:02:19 -0800161. 2.
17add() search_and_reference()
18{ {
19 alloc_object read_lock(&list_lock);
20 ... search_for_element
21 atomic_set(&el->rc, 1); atomic_inc(&el->rc);
22 write_lock(&list_lock); ...
23 add_element read_unlock(&list_lock);
24 ... ...
25 write_unlock(&list_lock); }
Dipankar Sarmac0dfb292005-09-09 13:04:09 -070026}
27
283. 4.
29release_referenced() delete()
30{ {
Nick Piggin095975d2006-01-08 01:02:19 -080031 ... write_lock(&list_lock);
Joel Fernandes (Google)de1dbce2019-03-29 10:05:55 -040032 if(atomic_dec_and_test(&el->rc)) ...
33 kfree(el);
Paul E. McKenneya4d611f2012-10-27 16:34:51 -070034 ... remove_element
Nick Piggin095975d2006-01-08 01:02:19 -080035} write_unlock(&list_lock);
36 ...
37 if (atomic_dec_and_test(&el->rc))
38 kfree(el);
39 ...
Dipankar Sarmac0dfb292005-09-09 13:04:09 -070040 }
41
Paul E. McKenneyd19720a2006-02-01 03:06:42 -080042If this list/array is made lock free using RCU as in changing the
Lai Jiangshane8aed682008-09-10 11:01:07 +080043write_lock() in add() and delete() to spin_lock() and changing read_lock()
44in search_and_reference() to rcu_read_lock(), the atomic_inc() in
45search_and_reference() could potentially hold reference to an element which
Paul E. McKenneyd19720a2006-02-01 03:06:42 -080046has already been deleted from the list/array. Use atomic_inc_not_zero()
47in this scenario as follows:
Dipankar Sarmac0dfb292005-09-09 13:04:09 -070048
Joel Fernandes (Google)de1dbce2019-03-29 10:05:55 -040049CODE LISTING B:
Dipankar Sarmac0dfb292005-09-09 13:04:09 -0700501. 2.
51add() search_and_reference()
52{ {
Nick Piggin095975d2006-01-08 01:02:19 -080053 alloc_object rcu_read_lock();
54 ... search_for_element
Lai Jiangshane8aed682008-09-10 11:01:07 +080055 atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
56 spin_lock(&list_lock); rcu_read_unlock();
Nick Piggin095975d2006-01-08 01:02:19 -080057 return FAIL;
58 add_element }
59 ... ...
Lai Jiangshane8aed682008-09-10 11:01:07 +080060 spin_unlock(&list_lock); rcu_read_unlock();
Dipankar Sarmac0dfb292005-09-09 13:04:09 -070061} }
623. 4.
63release_referenced() delete()
64{ {
Lai Jiangshane8aed682008-09-10 11:01:07 +080065 ... spin_lock(&list_lock);
Paul E. McKenneyd19720a2006-02-01 03:06:42 -080066 if (atomic_dec_and_test(&el->rc)) ...
Paul E. McKenneya4d611f2012-10-27 16:34:51 -070067 call_rcu(&el->head, el_free); remove_element
Lai Jiangshane8aed682008-09-10 11:01:07 +080068 ... spin_unlock(&list_lock);
Paul E. McKenneyd19720a2006-02-01 03:06:42 -080069} ...
Nick Piggin095975d2006-01-08 01:02:19 -080070 if (atomic_dec_and_test(&el->rc))
71 call_rcu(&el->head, el_free);
72 ...
Dipankar Sarmac0dfb292005-09-09 13:04:09 -070073 }
74
Paul E. McKenneyd19720a2006-02-01 03:06:42 -080075Sometimes, a reference to the element needs to be obtained in the
76update (write) stream. In such cases, atomic_inc_not_zero() might be
77overkill, since we hold the update-side spinlock. One might instead
78use atomic_inc() in such cases.
Paul E. McKenneya4d611f2012-10-27 16:34:51 -070079
80It is not always convenient to deal with "FAIL" in the
81search_and_reference() code path. In such cases, the
82atomic_dec_and_test() may be moved from delete() to el_free()
83as follows:
84
Joel Fernandes (Google)de1dbce2019-03-29 10:05:55 -040085CODE LISTING C:
Paul E. McKenneya4d611f2012-10-27 16:34:51 -0700861. 2.
87add() search_and_reference()
88{ {
89 alloc_object rcu_read_lock();
90 ... search_for_element
91 atomic_set(&el->rc, 1); atomic_inc(&el->rc);
92 spin_lock(&list_lock); ...
93
94 add_element rcu_read_unlock();
95 ... }
96 spin_unlock(&list_lock); 4.
97} delete()
983. {
99release_referenced() spin_lock(&list_lock);
100{ ...
101 ... remove_element
102 if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
103 kfree(el); ...
104 ... call_rcu(&el->head, el_free);
105} ...
1065. }
107void el_free(struct rcu_head *rhp)
108{
109 release_referenced();
110}
111
112The key point is that the initial reference added by add() is not removed
113until after a grace period has elapsed following removal. This means that
114search_and_reference() cannot find this element, which means that the value
115of el->rc cannot increase. Thus, once it reaches zero, there are no
116readers that can or ever will be able to reference the element. The
117element can therefore safely be freed. This in turn guarantees that if
118any reader finds the element, that reader may safely acquire a reference
119without checking the value of the reference counter.
120
Joel Fernandes (Google)de1dbce2019-03-29 10:05:55 -0400121A clear advantage of the RCU-based pattern in listing C over the one
122in listing B is that any call to search_and_reference() that locates
123a given object will succeed in obtaining a reference to that object,
124even given a concurrent invocation of delete() for that same object.
125Similarly, a clear advantage of both listings B and C over listing A is
126that a call to delete() is not delayed even if there are an arbitrarily
127large number of calls to search_and_reference() searching for the same
128object that delete() was invoked on. Instead, all that is delayed is
129the eventual invocation of kfree(), which is usually not a problem on
130modern computer systems, even the small ones.
131
Paul E. McKenneya4d611f2012-10-27 16:34:51 -0700132In cases where delete() can sleep, synchronize_rcu() can be called from
133delete(), so that el_free() can be subsumed into delete as follows:
134
1354.
136delete()
137{
138 spin_lock(&list_lock);
139 ...
140 remove_element
141 spin_unlock(&list_lock);
142 ...
143 synchronize_rcu();
144 if (atomic_dec_and_test(&el->rc))
145 kfree(el);
146 ...
147}
Joel Fernandes (Google)de1dbce2019-03-29 10:05:55 -0400148
149As additional examples in the kernel, the pattern in listing C is used by
150reference counting of struct pid, while the pattern in listing B is used by
151struct posix_acl.