Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 1 | ======================================= |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 2 | The padata parallel execution mechanism |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 3 | ======================================= |
| 4 | |
| 5 | :Last updated: for 2.6.36 |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 6 | |
| 7 | Padata is a mechanism by which the kernel can farm work out to be done in |
| 8 | parallel on multiple CPUs while retaining the ordering of tasks. It was |
| 9 | developed for use with the IPsec code, which needs to be able to perform |
| 10 | encryption and decryption on large numbers of packets without reordering |
| 11 | those packets. The crypto developers made a point of writing padata in a |
| 12 | sufficiently general fashion that it could be put to other uses as well. |
| 13 | |
| 14 | The first step in using padata is to set up a padata_instance structure for |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 15 | overall control of how tasks are to be run:: |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 16 | |
| 17 | #include <linux/padata.h> |
| 18 | |
Daniel Jordan | b128a30 | 2019-09-05 21:40:21 -0400 | [diff] [blame] | 19 | struct padata_instance *padata_alloc(const char *name, |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 20 | const struct cpumask *pcpumask, |
| 21 | const struct cpumask *cbcpumask); |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 22 | |
Daniel Jordan | b128a30 | 2019-09-05 21:40:21 -0400 | [diff] [blame] | 23 | 'name' simply identifies the instance. |
| 24 | |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 25 | The pcpumask describes which processors will be used to execute work |
| 26 | submitted to this instance in parallel. The cbcpumask defines which |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 27 | processors are allowed to be used as the serialization callback processor. |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 28 | The workqueue wq is where the work will actually be done; it should be |
| 29 | a multithreaded queue, naturally. |
| 30 | |
| 31 | To allocate a padata instance with the cpu_possible_mask for both |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 32 | cpumasks this helper function can be used:: |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 33 | |
| 34 | struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq); |
| 35 | |
| 36 | Note: Padata maintains two kinds of cpumasks internally. The user supplied |
| 37 | cpumasks, submitted by padata_alloc/padata_alloc_possible and the 'usable' |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 38 | cpumasks. The usable cpumasks are always a subset of active CPUs in the |
| 39 | user supplied cpumasks; these are the cpumasks padata actually uses. So |
| 40 | it is legal to supply a cpumask to padata that contains offline CPUs. |
| 41 | Once an offline CPU in the user supplied cpumask comes online, padata |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 42 | is going to use it. |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 43 | |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 44 | There are functions for enabling and disabling the instance:: |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 45 | |
Steffen Klassert | 2197f9a | 2010-07-07 15:34:03 +0200 | [diff] [blame] | 46 | int padata_start(struct padata_instance *pinst); |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 47 | void padata_stop(struct padata_instance *pinst); |
| 48 | |
Steffen Klassert | 2197f9a | 2010-07-07 15:34:03 +0200 | [diff] [blame] | 49 | These functions are setting or clearing the "PADATA_INIT" flag; |
| 50 | if that flag is not set, other functions will refuse to work. |
| 51 | padata_start returns zero on success (flag set) or -EINVAL if the |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 52 | padata cpumask contains no active CPU (flag not set). |
Steffen Klassert | 2197f9a | 2010-07-07 15:34:03 +0200 | [diff] [blame] | 53 | padata_stop clears the flag and blocks until the padata instance |
| 54 | is unused. |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 55 | |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 56 | The list of CPUs to be used can be adjusted with these functions:: |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 57 | |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 58 | int padata_set_cpumasks(struct padata_instance *pinst, |
| 59 | cpumask_var_t pcpumask, |
| 60 | cpumask_var_t cbcpumask); |
| 61 | int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 62 | cpumask_var_t cpumask); |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 63 | int padata_add_cpu(struct padata_instance *pinst, int cpu, int mask); |
| 64 | int padata_remove_cpu(struct padata_instance *pinst, int cpu, int mask); |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 65 | |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 66 | Changing the CPU masks are expensive operations, though, so it should not be |
| 67 | done with great frequency. |
| 68 | |
| 69 | It's possible to change both cpumasks of a padata instance with |
| 70 | padata_set_cpumasks by specifying the cpumasks for parallel execution (pcpumask) |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 71 | and for the serial callback function (cbcpumask). padata_set_cpumask is used to |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 72 | change just one of the cpumasks. Here cpumask_type is one of PADATA_CPU_SERIAL, |
| 73 | PADATA_CPU_PARALLEL and cpumask specifies the new cpumask to use. |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 74 | To simply add or remove one CPU from a certain cpumask the functions |
| 75 | padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 76 | remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL. |
| 77 | |
| 78 | If a user is interested in padata cpumask changes, he can register to |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 79 | the padata cpumask change notifier:: |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 80 | |
| 81 | int padata_register_cpumask_notifier(struct padata_instance *pinst, |
| 82 | struct notifier_block *nblock); |
| 83 | |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 84 | To unregister from that notifier:: |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 85 | |
| 86 | int padata_unregister_cpumask_notifier(struct padata_instance *pinst, |
| 87 | struct notifier_block *nblock); |
| 88 | |
| 89 | The padata cpumask change notifier notifies about changes of the usable |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 90 | cpumasks, i.e. the subset of active CPUs in the user supplied cpumask. |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 91 | |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 92 | Padata calls the notifier chain with:: |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 93 | |
| 94 | blocking_notifier_call_chain(&pinst->cpumask_change_notifier, |
| 95 | notification_mask, |
| 96 | &pd_new->cpumask); |
| 97 | |
| 98 | Here cpumask_change_notifier is registered notifier, notification_mask |
| 99 | is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 100 | to a struct padata_cpumask that contains the new cpumask information. |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 101 | |
| 102 | Actually submitting work to the padata instance requires the creation of a |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 103 | padata_priv structure:: |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 104 | |
| 105 | struct padata_priv { |
| 106 | /* Other stuff here... */ |
| 107 | void (*parallel)(struct padata_priv *padata); |
| 108 | void (*serial)(struct padata_priv *padata); |
| 109 | }; |
| 110 | |
| 111 | This structure will almost certainly be embedded within some larger |
Randy Dunlap | 2b24706 | 2010-08-10 18:02:53 -0700 | [diff] [blame] | 112 | structure specific to the work to be done. Most of its fields are private to |
Steffen Klassert | 313910d | 2010-07-27 07:20:47 +0200 | [diff] [blame] | 113 | padata, but the structure should be zeroed at initialisation time, and the |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 114 | parallel() and serial() functions should be provided. Those functions will |
| 115 | be called in the process of getting the work done as we will see |
| 116 | momentarily. |
| 117 | |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 118 | The submission of work is done with:: |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 119 | |
| 120 | int padata_do_parallel(struct padata_instance *pinst, |
| 121 | struct padata_priv *padata, int cb_cpu); |
| 122 | |
| 123 | The pinst and padata structures must be set up as described above; cb_cpu |
| 124 | specifies which CPU will be used for the final callback when the work is |
| 125 | done; it must be in the current instance's CPU mask. The return value from |
Steffen Klassert | 2197f9a | 2010-07-07 15:34:03 +0200 | [diff] [blame] | 126 | padata_do_parallel() is zero on success, indicating that the work is in |
| 127 | progress. -EBUSY means that somebody, somewhere else is messing with the |
| 128 | instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being |
| 129 | in that CPU mask or about a not running instance. |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 130 | |
| 131 | Each task submitted to padata_do_parallel() will, in turn, be passed to |
| 132 | exactly one call to the above-mentioned parallel() function, on one CPU, so |
Daniel Jordan | b128a30 | 2019-09-05 21:40:21 -0400 | [diff] [blame] | 133 | true parallelism is achieved by submitting multiple tasks. parallel() runs with |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 134 | software interrupts disabled and thus cannot sleep. The parallel() |
| 135 | function gets the padata_priv structure pointer as its lone parameter; |
| 136 | information about the actual work to be done is probably obtained by using |
| 137 | container_of() to find the enclosing structure. |
| 138 | |
| 139 | Note that parallel() has no return value; the padata subsystem assumes that |
| 140 | parallel() will take responsibility for the task from this point. The work |
| 141 | need not be completed during this call, but, if parallel() leaves work |
| 142 | outstanding, it should be prepared to be called again with a new job before |
| 143 | the previous one completes. When a task does complete, parallel() (or |
| 144 | whatever function actually finishes the job) should inform padata of the |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 145 | fact with a call to:: |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 146 | |
| 147 | void padata_do_serial(struct padata_priv *padata); |
| 148 | |
| 149 | At some point in the future, padata_do_serial() will trigger a call to the |
| 150 | serial() function in the padata_priv structure. That call will happen on |
| 151 | the CPU requested in the initial call to padata_do_parallel(); it, too, is |
Daniel Jordan | b128a30 | 2019-09-05 21:40:21 -0400 | [diff] [blame] | 152 | run with local software interrupts disabled. |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 153 | Note that this call may be deferred for a while since the padata code takes |
| 154 | pains to ensure that tasks are completed in the order in which they were |
| 155 | submitted. |
| 156 | |
| 157 | The one remaining function in the padata API should be called to clean up |
Mauro Carvalho Chehab | 7576b2b | 2017-05-16 10:06:48 -0300 | [diff] [blame] | 158 | when a padata instance is no longer needed:: |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 159 | |
| 160 | void padata_free(struct padata_instance *pinst); |
| 161 | |
| 162 | This function will busy-wait while any remaining tasks are completed, so it |
Daniel Jordan | b128a30 | 2019-09-05 21:40:21 -0400 | [diff] [blame] | 163 | might be best not to call it while there is work outstanding. |