Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 1 | ====================================== |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 2 | Wound/Wait Deadlock-Proof Mutex Design |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 3 | ====================================== |
| 4 | |
| 5 | Please read mutex-design.txt first, as it applies to wait/wound mutexes too. |
| 6 | |
| 7 | Motivation for WW-Mutexes |
| 8 | ------------------------- |
| 9 | |
| 10 | GPU's do operations that commonly involve many buffers. Those buffers |
| 11 | can be shared across contexts/processes, exist in different memory |
| 12 | domains (for example VRAM vs system memory), and so on. And with |
| 13 | PRIME / dmabuf, they can even be shared across devices. So there are |
| 14 | a handful of situations where the driver needs to wait for buffers to |
| 15 | become ready. If you think about this in terms of waiting on a buffer |
| 16 | mutex for it to become available, this presents a problem because |
| 17 | there is no way to guarantee that buffers appear in a execbuf/batch in |
| 18 | the same order in all contexts. That is directly under control of |
| 19 | userspace, and a result of the sequence of GL calls that an application |
| 20 | makes. Which results in the potential for deadlock. The problem gets |
| 21 | more complex when you consider that the kernel may need to migrate the |
| 22 | buffer(s) into VRAM before the GPU operates on the buffer(s), which |
| 23 | may in turn require evicting some other buffers (and you don't want to |
| 24 | evict other buffers which are already queued up to the GPU), but for a |
| 25 | simplified understanding of the problem you can ignore this. |
| 26 | |
| 27 | The algorithm that the TTM graphics subsystem came up with for dealing with |
| 28 | this problem is quite simple. For each group of buffers (execbuf) that need |
| 29 | to be locked, the caller would be assigned a unique reservation id/ticket, |
| 30 | from a global counter. In case of deadlock while locking all the buffers |
| 31 | associated with a execbuf, the one with the lowest reservation ticket (i.e. |
| 32 | the oldest task) wins, and the one with the higher reservation id (i.e. the |
| 33 | younger task) unlocks all of the buffers that it has already locked, and then |
| 34 | tries again. |
| 35 | |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 36 | In the RDBMS literature, a reservation ticket is associated with a transaction. |
| 37 | and the deadlock handling approach is called Wait-Die. The name is based on |
| 38 | the actions of a locking thread when it encounters an already locked mutex. |
| 39 | If the transaction holding the lock is younger, the locking transaction waits. |
| 40 | If the transaction holding the lock is older, the locking transaction backs off |
| 41 | and dies. Hence Wait-Die. |
| 42 | There is also another algorithm called Wound-Wait: |
| 43 | If the transaction holding the lock is younger, the locking transaction |
| 44 | wounds the transaction holding the lock, requesting it to die. |
| 45 | If the transaction holding the lock is older, it waits for the other |
| 46 | transaction. Hence Wound-Wait. |
| 47 | The two algorithms are both fair in that a transaction will eventually succeed. |
| 48 | However, the Wound-Wait algorithm is typically stated to generate fewer backoffs |
| 49 | compared to Wait-Die, but is, on the other hand, associated with more work than |
| 50 | Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive |
| 51 | algorithm in that transactions are wounded by other transactions, and that |
Randy Dunlap | 8b1a17c | 2020-07-03 14:36:49 -0700 | [diff] [blame] | 52 | requires a reliable way to pick up the wounded condition and preempt the |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 53 | running transaction. Note that this is not the same as process preemption. A |
| 54 | Wound-Wait transaction is considered preempted when it dies (returning |
| 55 | -EDEADLK) following a wound. |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 56 | |
| 57 | Concepts |
| 58 | -------- |
| 59 | |
| 60 | Compared to normal mutexes two additional concepts/objects show up in the lock |
| 61 | interface for w/w mutexes: |
| 62 | |
| 63 | Acquire context: To ensure eventual forward progress it is important the a task |
| 64 | trying to acquire locks doesn't grab a new reservation id, but keeps the one it |
| 65 | acquired when starting the lock acquisition. This ticket is stored in the |
| 66 | acquire context. Furthermore the acquire context keeps track of debugging state |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 67 | to catch w/w mutex interface abuse. An acquire context is representing a |
| 68 | transaction. |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 69 | |
| 70 | W/w class: In contrast to normal mutexes the lock class needs to be explicit for |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 71 | w/w mutexes, since it is required to initialize the acquire context. The lock |
| 72 | class also specifies what algorithm to use, Wound-Wait or Wait-Die. |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 73 | |
| 74 | Furthermore there are three different class of w/w lock acquire functions: |
| 75 | |
| 76 | * Normal lock acquisition with a context, using ww_mutex_lock. |
| 77 | |
Peter Ziljstra | 55f036c | 2018-06-15 10:07:12 +0200 | [diff] [blame] | 78 | * Slowpath lock acquisition on the contending lock, used by the task that just |
| 79 | killed its transaction after having dropped all already acquired locks. |
| 80 | These functions have the _slow postfix. |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 81 | |
| 82 | From a simple semantics point-of-view the _slow functions are not strictly |
| 83 | required, since simply calling the normal ww_mutex_lock functions on the |
| 84 | contending lock (after having dropped all other already acquired locks) will |
| 85 | work correctly. After all if no other ww mutex has been acquired yet there's |
| 86 | no deadlock potential and hence the ww_mutex_lock call will block and not |
| 87 | prematurely return -EDEADLK. The advantage of the _slow functions is in |
| 88 | interface safety: |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 89 | |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 90 | - ww_mutex_lock has a __must_check int return type, whereas ww_mutex_lock_slow |
| 91 | has a void return type. Note that since ww mutex code needs loops/retries |
| 92 | anyway the __must_check doesn't result in spurious warnings, even though the |
| 93 | very first lock operation can never fail. |
| 94 | - When full debugging is enabled ww_mutex_lock_slow checks that all acquired |
| 95 | ww mutex have been released (preventing deadlocks) and makes sure that we |
| 96 | block on the contending lock (preventing spinning through the -EDEADLK |
| 97 | slowpath until the contended lock can be acquired). |
| 98 | |
| 99 | * Functions to only acquire a single w/w mutex, which results in the exact same |
| 100 | semantics as a normal mutex. This is done by calling ww_mutex_lock with a NULL |
| 101 | context. |
| 102 | |
| 103 | Again this is not strictly required. But often you only want to acquire a |
| 104 | single lock in which case it's pointless to set up an acquire context (and so |
| 105 | better to avoid grabbing a deadlock avoidance ticket). |
| 106 | |
| 107 | Of course, all the usual variants for handling wake-ups due to signals are also |
| 108 | provided. |
| 109 | |
| 110 | Usage |
| 111 | ----- |
| 112 | |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 113 | The algorithm (Wait-Die vs Wound-Wait) is chosen by using either |
| 114 | DEFINE_WW_CLASS() (Wound-Wait) or DEFINE_WD_CLASS() (Wait-Die) |
| 115 | As a rough rule of thumb, use Wound-Wait iff you |
| 116 | expect the number of simultaneous competing transactions to be typically small, |
| 117 | and you want to reduce the number of rollbacks. |
| 118 | |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 119 | Three different ways to acquire locks within the same w/w class. Common |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 120 | definitions for methods #1 and #2:: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 121 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 122 | static DEFINE_WW_CLASS(ww_class); |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 123 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 124 | struct obj { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 125 | struct ww_mutex lock; |
| 126 | /* obj data */ |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 127 | }; |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 128 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 129 | struct obj_entry { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 130 | struct list_head head; |
| 131 | struct obj *obj; |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 132 | }; |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 133 | |
| 134 | Method 1, using a list in execbuf->buffers that's not allowed to be reordered. |
| 135 | This is useful if a list of required objects is already tracked somewhere. |
| 136 | Furthermore the lock helper can use propagate the -EALREADY return code back to |
| 137 | the caller as a signal that an object is twice on the list. This is useful if |
| 138 | the list is constructed from userspace input and the ABI requires userspace to |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 139 | not have duplicate entries (e.g. for a gpu commandbuffer submission ioctl):: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 140 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 141 | int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) |
| 142 | { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 143 | struct obj *res_obj = NULL; |
| 144 | struct obj_entry *contended_entry = NULL; |
| 145 | struct obj_entry *entry; |
| 146 | |
| 147 | ww_acquire_init(ctx, &ww_class); |
| 148 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 149 | retry: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 150 | list_for_each_entry (entry, list, head) { |
| 151 | if (entry->obj == res_obj) { |
| 152 | res_obj = NULL; |
| 153 | continue; |
| 154 | } |
| 155 | ret = ww_mutex_lock(&entry->obj->lock, ctx); |
| 156 | if (ret < 0) { |
| 157 | contended_entry = entry; |
| 158 | goto err; |
| 159 | } |
| 160 | } |
| 161 | |
| 162 | ww_acquire_done(ctx); |
| 163 | return 0; |
| 164 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 165 | err: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 166 | list_for_each_entry_continue_reverse (entry, list, head) |
| 167 | ww_mutex_unlock(&entry->obj->lock); |
| 168 | |
| 169 | if (res_obj) |
| 170 | ww_mutex_unlock(&res_obj->lock); |
| 171 | |
| 172 | if (ret == -EDEADLK) { |
| 173 | /* we lost out in a seqno race, lock and retry.. */ |
| 174 | ww_mutex_lock_slow(&contended_entry->obj->lock, ctx); |
| 175 | res_obj = contended_entry->obj; |
| 176 | goto retry; |
| 177 | } |
| 178 | ww_acquire_fini(ctx); |
| 179 | |
| 180 | return ret; |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 181 | } |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 182 | |
| 183 | Method 2, using a list in execbuf->buffers that can be reordered. Same semantics |
| 184 | of duplicate entry detection using -EALREADY as method 1 above. But the |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 185 | list-reordering allows for a bit more idiomatic code:: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 186 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 187 | int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) |
| 188 | { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 189 | struct obj_entry *entry, *entry2; |
| 190 | |
| 191 | ww_acquire_init(ctx, &ww_class); |
| 192 | |
| 193 | list_for_each_entry (entry, list, head) { |
| 194 | ret = ww_mutex_lock(&entry->obj->lock, ctx); |
| 195 | if (ret < 0) { |
| 196 | entry2 = entry; |
| 197 | |
| 198 | list_for_each_entry_continue_reverse (entry2, list, head) |
| 199 | ww_mutex_unlock(&entry2->obj->lock); |
| 200 | |
| 201 | if (ret != -EDEADLK) { |
| 202 | ww_acquire_fini(ctx); |
| 203 | return ret; |
| 204 | } |
| 205 | |
| 206 | /* we lost out in a seqno race, lock and retry.. */ |
| 207 | ww_mutex_lock_slow(&entry->obj->lock, ctx); |
| 208 | |
| 209 | /* |
| 210 | * Move buf to head of the list, this will point |
| 211 | * buf->next to the first unlocked entry, |
| 212 | * restarting the for loop. |
| 213 | */ |
| 214 | list_del(&entry->head); |
| 215 | list_add(&entry->head, list); |
| 216 | } |
| 217 | } |
| 218 | |
| 219 | ww_acquire_done(ctx); |
| 220 | return 0; |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 221 | } |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 222 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 223 | Unlocking works the same way for both methods #1 and #2:: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 224 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 225 | void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) |
| 226 | { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 227 | struct obj_entry *entry; |
| 228 | |
| 229 | list_for_each_entry (entry, list, head) |
| 230 | ww_mutex_unlock(&entry->obj->lock); |
| 231 | |
| 232 | ww_acquire_fini(ctx); |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 233 | } |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 234 | |
| 235 | Method 3 is useful if the list of objects is constructed ad-hoc and not upfront, |
| 236 | e.g. when adjusting edges in a graph where each node has its own ww_mutex lock, |
| 237 | and edges can only be changed when holding the locks of all involved nodes. w/w |
| 238 | mutexes are a natural fit for such a case for two reasons: |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 239 | |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 240 | - They can handle lock-acquisition in any order which allows us to start walking |
| 241 | a graph from a starting point and then iteratively discovering new edges and |
| 242 | locking down the nodes those edges connect to. |
| 243 | - Due to the -EALREADY return code signalling that a given objects is already |
| 244 | held there's no need for additional book-keeping to break cycles in the graph |
| 245 | or keep track off which looks are already held (when using more than one node |
| 246 | as a starting point). |
| 247 | |
| 248 | Note that this approach differs in two important ways from the above methods: |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 249 | |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 250 | - Since the list of objects is dynamically constructed (and might very well be |
Peter Ziljstra | 55f036c | 2018-06-15 10:07:12 +0200 | [diff] [blame] | 251 | different when retrying due to hitting the -EDEADLK die condition) there's |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 252 | no need to keep any object on a persistent list when it's not locked. We can |
| 253 | therefore move the list_head into the object itself. |
| 254 | - On the other hand the dynamic object list construction also means that the -EALREADY return |
| 255 | code can't be propagated. |
| 256 | |
| 257 | Note also that methods #1 and #2 and method #3 can be combined, e.g. to first lock a |
| 258 | list of starting nodes (passed in from userspace) using one of the above |
| 259 | methods. And then lock any additional objects affected by the operations using |
| 260 | method #3 below. The backoff/retry procedure will be a bit more involved, since |
| 261 | when the dynamic locking step hits -EDEADLK we also need to unlock all the |
| 262 | objects acquired with the fixed list. But the w/w mutex debug checks will catch |
| 263 | any interface misuse for these cases. |
| 264 | |
| 265 | Also, method 3 can't fail the lock acquisition step since it doesn't return |
| 266 | -EALREADY. Of course this would be different when using the _interruptible |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 267 | variants, but that's outside of the scope of these examples here:: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 268 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 269 | struct obj { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 270 | struct ww_mutex ww_mutex; |
| 271 | struct list_head locked_list; |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 272 | }; |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 273 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 274 | static DEFINE_WW_CLASS(ww_class); |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 275 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 276 | void __unlock_objs(struct list_head *list) |
| 277 | { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 278 | struct obj *entry, *temp; |
| 279 | |
| 280 | list_for_each_entry_safe (entry, temp, list, locked_list) { |
| 281 | /* need to do that before unlocking, since only the current lock holder is |
| 282 | allowed to use object */ |
| 283 | list_del(&entry->locked_list); |
| 284 | ww_mutex_unlock(entry->ww_mutex) |
| 285 | } |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 286 | } |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 287 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 288 | void lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) |
| 289 | { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 290 | struct obj *obj; |
| 291 | |
| 292 | ww_acquire_init(ctx, &ww_class); |
| 293 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 294 | retry: |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 295 | /* re-init loop start state */ |
| 296 | loop { |
| 297 | /* magic code which walks over a graph and decides which objects |
| 298 | * to lock */ |
| 299 | |
| 300 | ret = ww_mutex_lock(obj->ww_mutex, ctx); |
| 301 | if (ret == -EALREADY) { |
| 302 | /* we have that one already, get to the next object */ |
| 303 | continue; |
| 304 | } |
| 305 | if (ret == -EDEADLK) { |
| 306 | __unlock_objs(list); |
| 307 | |
| 308 | ww_mutex_lock_slow(obj, ctx); |
| 309 | list_add(&entry->locked_list, list); |
| 310 | goto retry; |
| 311 | } |
| 312 | |
| 313 | /* locked a new object, add it to the list */ |
| 314 | list_add_tail(&entry->locked_list, list); |
| 315 | } |
| 316 | |
| 317 | ww_acquire_done(ctx); |
| 318 | return 0; |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 319 | } |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 320 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 321 | void unlock_objs(struct list_head *list, struct ww_acquire_ctx *ctx) |
| 322 | { |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 323 | __unlock_objs(list); |
| 324 | ww_acquire_fini(ctx); |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 325 | } |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 326 | |
| 327 | Method 4: Only lock one single objects. In that case deadlock detection and |
| 328 | prevention is obviously overkill, since with grabbing just one lock you can't |
| 329 | produce a deadlock within just one class. To simplify this case the w/w mutex |
| 330 | api can be used with a NULL context. |
| 331 | |
| 332 | Implementation Details |
| 333 | ---------------------- |
| 334 | |
| 335 | Design: |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 336 | ^^^^^^^ |
| 337 | |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 338 | ww_mutex currently encapsulates a struct mutex, this means no extra overhead for |
| 339 | normal mutex locks, which are far more common. As such there is only a small |
| 340 | increase in code size if wait/wound mutexes are not used. |
| 341 | |
Nicolai Hähnle | 27bd57a | 2016-12-21 19:46:40 +0100 | [diff] [blame] | 342 | We maintain the following invariants for the wait list: |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 343 | |
Nicolai Hähnle | 27bd57a | 2016-12-21 19:46:40 +0100 | [diff] [blame] | 344 | (1) Waiters with an acquire context are sorted by stamp order; waiters |
| 345 | without an acquire context are interspersed in FIFO order. |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 346 | (2) For Wait-Die, among waiters with contexts, only the first one can have |
| 347 | other locks acquired already (ctx->acquired > 0). Note that this waiter |
| 348 | may come after other waiters without contexts in the list. |
| 349 | |
| 350 | The Wound-Wait preemption is implemented with a lazy-preemption scheme: |
| 351 | The wounded status of the transaction is checked only when there is |
| 352 | contention for a new lock and hence a true chance of deadlock. In that |
| 353 | situation, if the transaction is wounded, it backs off, clears the |
| 354 | wounded status and retries. A great benefit of implementing preemption in |
| 355 | this way is that the wounded transaction can identify a contending lock to |
| 356 | wait for before restarting the transaction. Just blindly restarting the |
| 357 | transaction would likely make the transaction end up in a situation where |
| 358 | it would have to back off again. |
Nicolai Hähnle | 27bd57a | 2016-12-21 19:46:40 +0100 | [diff] [blame] | 359 | |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 360 | In general, not much contention is expected. The locks are typically used to |
Thomas Hellstrom | 08295b3 | 2018-06-15 10:17:38 +0200 | [diff] [blame] | 361 | serialize access to resources for devices, and optimization focus should |
| 362 | therefore be directed towards the uncontended cases. |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 363 | |
| 364 | Lockdep: |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 365 | ^^^^^^^^ |
| 366 | |
Maarten Lankhorst | 040a0a3 | 2013-06-24 10:30:04 +0200 | [diff] [blame] | 367 | Special care has been taken to warn for as many cases of api abuse |
| 368 | as possible. Some common api abuses will be caught with |
| 369 | CONFIG_DEBUG_MUTEXES, but CONFIG_PROVE_LOCKING is recommended. |
| 370 | |
| 371 | Some of the errors which will be warned about: |
| 372 | - Forgetting to call ww_acquire_fini or ww_acquire_init. |
| 373 | - Attempting to lock more mutexes after ww_acquire_done. |
| 374 | - Attempting to lock the wrong mutex after -EDEADLK and |
| 375 | unlocking all mutexes. |
| 376 | - Attempting to lock the right mutex after -EDEADLK, |
| 377 | before unlocking all mutexes. |
| 378 | |
| 379 | - Calling ww_mutex_lock_slow before -EDEADLK was returned. |
| 380 | |
| 381 | - Unlocking mutexes with the wrong unlock function. |
| 382 | - Calling one of the ww_acquire_* twice on the same context. |
| 383 | - Using a different ww_class for the mutex than for the ww_acquire_ctx. |
| 384 | - Normal lockdep errors that can result in deadlocks. |
| 385 | |
| 386 | Some of the lockdep errors that can result in deadlocks: |
| 387 | - Calling ww_acquire_init to initialize a second ww_acquire_ctx before |
| 388 | having called ww_acquire_fini on the first. |
| 389 | - 'normal' deadlocks that can occur. |
| 390 | |
Mauro Carvalho Chehab | 387b146 | 2019-04-10 08:32:41 -0300 | [diff] [blame] | 391 | FIXME: |
| 392 | Update this section once we have the TASK_DEADLOCK task state flag magic |
| 393 | implemented. |