David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 1 | ==================================== |
| 2 | SLOW WORK ITEM EXECUTION THREAD POOL |
| 3 | ==================================== |
| 4 | |
| 5 | By: David Howells <dhowells@redhat.com> |
| 6 | |
| 7 | The slow work item execution thread pool is a pool of threads for performing |
| 8 | things that take a relatively long time, such as making mkdir calls. |
| 9 | Typically, when processing something, these items will spend a lot of time |
| 10 | blocking a thread on I/O, thus making that thread unavailable for doing other |
| 11 | work. |
| 12 | |
| 13 | The standard workqueue model is unsuitable for this class of work item as that |
| 14 | limits the owner to a single thread or a single thread per CPU. For some |
| 15 | tasks, however, more threads - or fewer - are required. |
| 16 | |
| 17 | There is just one pool per system. It contains no threads unless something |
| 18 | wants to use it - and that something must register its interest first. When |
| 19 | the pool is active, the number of threads it contains is dynamic, varying |
| 20 | between a maximum and minimum setting, depending on the load. |
| 21 | |
| 22 | |
| 23 | ==================== |
| 24 | CLASSES OF WORK ITEM |
| 25 | ==================== |
| 26 | |
| 27 | This pool support two classes of work items: |
| 28 | |
| 29 | (*) Slow work items. |
| 30 | |
| 31 | (*) Very slow work items. |
| 32 | |
| 33 | The former are expected to finish much quicker than the latter. |
| 34 | |
| 35 | An operation of the very slow class may do a batch combination of several |
| 36 | lookups, mkdirs, and a create for instance. |
| 37 | |
| 38 | An operation of the ordinarily slow class may, for example, write stuff or |
| 39 | expand files, provided the time taken to do so isn't too long. |
| 40 | |
| 41 | Operations of both types may sleep during execution, thus tying up the thread |
| 42 | loaned to it. |
| 43 | |
Jens Axboe | 6b8268b | 2009-11-19 18:10:47 +0000 | [diff] [blame] | 44 | A further class of work item is available, based on the slow work item class: |
| 45 | |
| 46 | (*) Delayed slow work items. |
| 47 | |
| 48 | These are slow work items that have a timer to defer queueing of the item for |
| 49 | a while. |
| 50 | |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 51 | |
| 52 | THREAD-TO-CLASS ALLOCATION |
| 53 | -------------------------- |
| 54 | |
| 55 | Not all the threads in the pool are available to work on very slow work items. |
| 56 | The number will be between one and one fewer than the number of active threads. |
| 57 | This is configurable (see the "Pool Configuration" section). |
| 58 | |
| 59 | All the threads are available to work on ordinarily slow work items, but a |
| 60 | percentage of the threads will prefer to work on very slow work items. |
| 61 | |
| 62 | The configuration ensures that at least one thread will be available to work on |
| 63 | very slow work items, and at least one thread will be available that won't work |
| 64 | on very slow work items at all. |
| 65 | |
| 66 | |
| 67 | ===================== |
| 68 | USING SLOW WORK ITEMS |
| 69 | ===================== |
| 70 | |
| 71 | Firstly, a module or subsystem wanting to make use of slow work items must |
| 72 | register its interest: |
| 73 | |
David Howells | 3d7a641 | 2009-11-19 18:10:23 +0000 | [diff] [blame] | 74 | int ret = slow_work_register_user(struct module *module); |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 75 | |
David Howells | 3d7a641 | 2009-11-19 18:10:23 +0000 | [diff] [blame] | 76 | This will return 0 if successful, or a -ve error upon failure. The module |
| 77 | pointer should be the module interested in using this facility (almost |
| 78 | certainly THIS_MODULE). |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 79 | |
| 80 | |
| 81 | Slow work items may then be set up by: |
| 82 | |
| 83 | (1) Declaring a slow_work struct type variable: |
| 84 | |
| 85 | #include <linux/slow-work.h> |
| 86 | |
| 87 | struct slow_work myitem; |
| 88 | |
| 89 | (2) Declaring the operations to be used for this item: |
| 90 | |
| 91 | struct slow_work_ops myitem_ops = { |
| 92 | .get_ref = myitem_get_ref, |
| 93 | .put_ref = myitem_put_ref, |
| 94 | .execute = myitem_execute, |
| 95 | }; |
| 96 | |
| 97 | [*] For a description of the ops, see section "Item Operations". |
| 98 | |
| 99 | (3) Initialising the item: |
| 100 | |
| 101 | slow_work_init(&myitem, &myitem_ops); |
| 102 | |
| 103 | or: |
| 104 | |
Jens Axboe | 6b8268b | 2009-11-19 18:10:47 +0000 | [diff] [blame] | 105 | delayed_slow_work_init(&myitem, &myitem_ops); |
| 106 | |
| 107 | or: |
| 108 | |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 109 | vslow_work_init(&myitem, &myitem_ops); |
| 110 | |
| 111 | depending on its class. |
| 112 | |
| 113 | A suitably set up work item can then be enqueued for processing: |
| 114 | |
| 115 | int ret = slow_work_enqueue(&myitem); |
| 116 | |
| 117 | This will return a -ve error if the thread pool is unable to gain a reference |
Jens Axboe | 6b8268b | 2009-11-19 18:10:47 +0000 | [diff] [blame] | 118 | on the item, 0 otherwise, or (for delayed work): |
| 119 | |
| 120 | int ret = delayed_slow_work_enqueue(&myitem, my_jiffy_delay); |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 121 | |
| 122 | |
| 123 | The items are reference counted, so there ought to be no need for a flush |
Jens Axboe | 0160950 | 2009-11-19 18:10:43 +0000 | [diff] [blame] | 124 | operation. But as the reference counting is optional, means to cancel |
| 125 | existing work items are also included: |
| 126 | |
| 127 | cancel_slow_work(&myitem); |
Jens Axboe | 6b8268b | 2009-11-19 18:10:47 +0000 | [diff] [blame] | 128 | cancel_delayed_slow_work(&myitem); |
Jens Axboe | 0160950 | 2009-11-19 18:10:43 +0000 | [diff] [blame] | 129 | |
| 130 | can be used to cancel pending work. The above cancel function waits for |
| 131 | existing work to have been executed (or prevent execution of them, depending |
| 132 | on timing). |
| 133 | |
| 134 | |
| 135 | When all a module's slow work items have been processed, and the |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 136 | module has no further interest in the facility, it should unregister its |
| 137 | interest: |
| 138 | |
David Howells | 3d7a641 | 2009-11-19 18:10:23 +0000 | [diff] [blame] | 139 | slow_work_unregister_user(struct module *module); |
| 140 | |
| 141 | The module pointer is used to wait for all outstanding work items for that |
| 142 | module before completing the unregistration. This prevents the put_ref() code |
| 143 | from being taken away before it completes. module should almost certainly be |
| 144 | THIS_MODULE. |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 145 | |
| 146 | |
| 147 | =============== |
| 148 | ITEM OPERATIONS |
| 149 | =============== |
| 150 | |
| 151 | Each work item requires a table of operations of type struct slow_work_ops. |
David Howells | 8fba10a | 2009-11-19 18:10:51 +0000 | [diff] [blame^] | 152 | Only ->execute() is required; the getting and putting of a reference and the |
| 153 | describing of an item are all optional. |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 154 | |
| 155 | (*) Get a reference on an item: |
| 156 | |
| 157 | int (*get_ref)(struct slow_work *work); |
| 158 | |
| 159 | This allows the thread pool to attempt to pin an item by getting a |
| 160 | reference on it. This function should return 0 if the reference was |
| 161 | granted, or a -ve error otherwise. If an error is returned, |
| 162 | slow_work_enqueue() will fail. |
| 163 | |
| 164 | The reference is held whilst the item is queued and whilst it is being |
| 165 | executed. The item may then be requeued with the same reference held, or |
| 166 | the reference will be released. |
| 167 | |
| 168 | (*) Release a reference on an item: |
| 169 | |
| 170 | void (*put_ref)(struct slow_work *work); |
| 171 | |
| 172 | This allows the thread pool to unpin an item by releasing the reference on |
| 173 | it. The thread pool will not touch the item again once this has been |
| 174 | called. |
| 175 | |
| 176 | (*) Execute an item: |
| 177 | |
| 178 | void (*execute)(struct slow_work *work); |
| 179 | |
| 180 | This should perform the work required of the item. It may sleep, it may |
| 181 | perform disk I/O and it may wait for locks. |
| 182 | |
David Howells | 8fba10a | 2009-11-19 18:10:51 +0000 | [diff] [blame^] | 183 | (*) View an item through /proc: |
| 184 | |
| 185 | void (*desc)(struct slow_work *work, struct seq_file *m); |
| 186 | |
| 187 | If supplied, this should print to 'm' a small string describing the work |
| 188 | the item is to do. This should be no more than about 40 characters, and |
| 189 | shouldn't include a newline character. |
| 190 | |
| 191 | See the 'Viewing executing and queued items' section below. |
| 192 | |
David Howells | 8f0aa2f | 2009-04-03 16:42:35 +0100 | [diff] [blame] | 193 | |
| 194 | ================== |
| 195 | POOL CONFIGURATION |
| 196 | ================== |
| 197 | |
| 198 | The slow-work thread pool has a number of configurables: |
| 199 | |
| 200 | (*) /proc/sys/kernel/slow-work/min-threads |
| 201 | |
| 202 | The minimum number of threads that should be in the pool whilst it is in |
| 203 | use. This may be anywhere between 2 and max-threads. |
| 204 | |
| 205 | (*) /proc/sys/kernel/slow-work/max-threads |
| 206 | |
| 207 | The maximum number of threads that should in the pool. This may be |
| 208 | anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater. |
| 209 | |
| 210 | (*) /proc/sys/kernel/slow-work/vslow-percentage |
| 211 | |
| 212 | The percentage of active threads in the pool that may be used to execute |
| 213 | very slow work items. This may be between 1 and 99. The resultant number |
| 214 | is bounded to between 1 and one fewer than the number of active threads. |
| 215 | This ensures there is always at least one thread that can process very |
| 216 | slow work items, and always at least one thread that won't. |
David Howells | 8fba10a | 2009-11-19 18:10:51 +0000 | [diff] [blame^] | 217 | |
| 218 | |
| 219 | ================================== |
| 220 | VIEWING EXECUTING AND QUEUED ITEMS |
| 221 | ================================== |
| 222 | |
| 223 | If CONFIG_SLOW_WORK_PROC is enabled, a proc file is made available: |
| 224 | |
| 225 | /proc/slow_work_rq |
| 226 | |
| 227 | through which the list of work items being executed and the queues of items to |
| 228 | be executed may be viewed. The owner of a work item is given the chance to |
| 229 | add some information of its own. |
| 230 | |
| 231 | The contents look something like the following: |
| 232 | |
| 233 | THR PID ITEM ADDR FL MARK DESC |
| 234 | === ===== ================ == ===== ========== |
| 235 | 0 3005 ffff880023f52348 a 952ms FSC: OBJ17d3: LOOK |
| 236 | 1 3006 ffff880024e33668 2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2 |
| 237 | 2 3165 ffff8800296dd180 a 424ms FSC: OBJ17e4: LOOK |
| 238 | 3 4089 ffff8800262c8d78 a 212ms FSC: OBJ17ea: CRTN |
| 239 | 4 4090 ffff88002792bed8 2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2 |
| 240 | 5 4092 ffff88002a0ef308 2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2 |
| 241 | 6 4094 ffff88002abaf4b8 2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2 |
| 242 | 7 4095 ffff88002bb188e0 a 388ms FSC: OBJ17e9: CRTN |
| 243 | vsq - ffff880023d99668 1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2 |
| 244 | vsq - ffff8800295d1740 1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2 |
| 245 | vsq - ffff880025ba3308 1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2 |
| 246 | vsq - ffff880024ec83e0 1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2 |
| 247 | vsq - ffff880026618e00 1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2 |
| 248 | vsq - ffff880025a2a4b8 1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2 |
| 249 | vsq - ffff880023cbe6d8 9 212ms FSC: OBJ17eb: LOOK |
| 250 | vsq - ffff880024d37590 9 212ms FSC: OBJ17ec: LOOK |
| 251 | vsq - ffff880027746cb0 9 212ms FSC: OBJ17ed: LOOK |
| 252 | vsq - ffff880024d37ae8 9 212ms FSC: OBJ17ee: LOOK |
| 253 | vsq - ffff880024d37cb0 9 212ms FSC: OBJ17ef: LOOK |
| 254 | vsq - ffff880025036550 9 212ms FSC: OBJ17f0: LOOK |
| 255 | vsq - ffff8800250368e0 9 212ms FSC: OBJ17f1: LOOK |
| 256 | vsq - ffff880025036aa8 9 212ms FSC: OBJ17f2: LOOK |
| 257 | |
| 258 | In the 'THR' column, executing items show the thread they're occupying and |
| 259 | queued threads indicate which queue they're on. 'PID' shows the process ID of |
| 260 | a slow-work thread that's executing something. 'FL' shows the work item flags. |
| 261 | 'MARK' indicates how long since an item was queued or began executing. Lastly, |
| 262 | the 'DESC' column permits the owner of an item to give some information. |
| 263 | |