Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | ======================================= |
| 4 | The padata parallel execution mechanism |
| 5 | ======================================= |
| 6 | |
Daniel Jordan | ec3b39c | 2020-06-03 15:59:59 -0700 | [diff] [blame] | 7 | :Date: May 2020 |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 8 | |
| 9 | Padata is a mechanism by which the kernel can farm jobs out to be done in |
Daniel Jordan | ec3b39c | 2020-06-03 15:59:59 -0700 | [diff] [blame] | 10 | parallel on multiple CPUs while optionally retaining their ordering. |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 11 | |
Daniel Jordan | ec3b39c | 2020-06-03 15:59:59 -0700 | [diff] [blame] | 12 | It was originally developed for IPsec, which needs to perform encryption and |
| 13 | decryption on large numbers of packets without reordering those packets. This |
| 14 | is currently the sole consumer of padata's serialized job support. |
| 15 | |
| 16 | Padata also supports multithreaded jobs, splitting up the job evenly while load |
| 17 | balancing and coordinating between threads. |
| 18 | |
| 19 | Running Serialized Jobs |
| 20 | ======================= |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 21 | |
| 22 | Initializing |
| 23 | ------------ |
| 24 | |
Daniel Jordan | ec3b39c | 2020-06-03 15:59:59 -0700 | [diff] [blame] | 25 | The first step in using padata to run serialized jobs is to set up a |
| 26 | padata_instance structure for overall control of how jobs are to be run:: |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 27 | |
| 28 | #include <linux/padata.h> |
| 29 | |
Daniel Jordan | 3f25719 | 2020-07-14 16:13:55 -0400 | [diff] [blame] | 30 | struct padata_instance *padata_alloc(const char *name); |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 31 | |
| 32 | 'name' simply identifies the instance. |
| 33 | |
Daniel Jordan | 350ef05 | 2020-07-14 16:13:52 -0400 | [diff] [blame] | 34 | Then, complete padata initialization by allocating a padata_shell:: |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 35 | |
| 36 | struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); |
| 37 | |
| 38 | A padata_shell is used to submit a job to padata and allows a series of such |
| 39 | jobs to be serialized independently. A padata_instance may have one or more |
| 40 | padata_shells associated with it, each allowing a separate series of jobs. |
| 41 | |
| 42 | Modifying cpumasks |
| 43 | ------------------ |
| 44 | |
| 45 | The CPUs used to run jobs can be changed in two ways, programatically with |
| 46 | padata_set_cpumask() or via sysfs. The former is defined:: |
| 47 | |
| 48 | int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, |
| 49 | cpumask_var_t cpumask); |
| 50 | |
| 51 | Here cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a |
| 52 | parallel cpumask describes which processors will be used to execute jobs |
| 53 | submitted to this instance in parallel and a serial cpumask defines which |
| 54 | processors are allowed to be used as the serialization callback processor. |
| 55 | cpumask specifies the new cpumask to use. |
| 56 | |
| 57 | There may be sysfs files for an instance's cpumasks. For example, pcrypt's |
| 58 | live in /sys/kernel/pcrypt/<instance-name>. Within an instance's directory |
| 59 | there are two files, parallel_cpumask and serial_cpumask, and either cpumask |
| 60 | may be changed by echoing a bitmask into the file, for example:: |
| 61 | |
| 62 | echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask |
| 63 | |
| 64 | Reading one of these files shows the user-supplied cpumask, which may be |
| 65 | different from the 'usable' cpumask. |
| 66 | |
| 67 | Padata maintains two pairs of cpumasks internally, the user-supplied cpumasks |
| 68 | and the 'usable' cpumasks. (Each pair consists of a parallel and a serial |
| 69 | cpumask.) The user-supplied cpumasks default to all possible CPUs on instance |
| 70 | allocation and may be changed as above. The usable cpumasks are always a |
| 71 | subset of the user-supplied cpumasks and contain only the online CPUs in the |
| 72 | user-supplied masks; these are the cpumasks padata actually uses. So it is |
| 73 | legal to supply a cpumask to padata that contains offline CPUs. Once an |
| 74 | offline CPU in the user-supplied cpumask comes online, padata is going to use |
| 75 | it. |
| 76 | |
| 77 | Changing the CPU masks are expensive operations, so it should not be done with |
| 78 | great frequency. |
| 79 | |
| 80 | Running A Job |
| 81 | ------------- |
| 82 | |
| 83 | Actually submitting work to the padata instance requires the creation of a |
| 84 | padata_priv structure, which represents one job:: |
| 85 | |
| 86 | struct padata_priv { |
| 87 | /* Other stuff here... */ |
| 88 | void (*parallel)(struct padata_priv *padata); |
| 89 | void (*serial)(struct padata_priv *padata); |
| 90 | }; |
| 91 | |
| 92 | This structure will almost certainly be embedded within some larger |
| 93 | structure specific to the work to be done. Most of its fields are private to |
| 94 | padata, but the structure should be zeroed at initialisation time, and the |
| 95 | parallel() and serial() functions should be provided. Those functions will |
| 96 | be called in the process of getting the work done as we will see |
| 97 | momentarily. |
| 98 | |
| 99 | The submission of the job is done with:: |
| 100 | |
| 101 | int padata_do_parallel(struct padata_shell *ps, |
| 102 | struct padata_priv *padata, int *cb_cpu); |
| 103 | |
| 104 | The ps and padata structures must be set up as described above; cb_cpu |
| 105 | points to the preferred CPU to be used for the final callback when the job is |
| 106 | done; it must be in the current instance's CPU mask (if not the cb_cpu pointer |
| 107 | is updated to point to the CPU actually chosen). The return value from |
| 108 | padata_do_parallel() is zero on success, indicating that the job is in |
| 109 | progress. -EBUSY means that somebody, somewhere else is messing with the |
| 110 | instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the |
| 111 | serial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped |
| 112 | instance. |
| 113 | |
| 114 | Each job submitted to padata_do_parallel() will, in turn, be passed to |
| 115 | exactly one call to the above-mentioned parallel() function, on one CPU, so |
| 116 | true parallelism is achieved by submitting multiple jobs. parallel() runs with |
| 117 | software interrupts disabled and thus cannot sleep. The parallel() |
| 118 | function gets the padata_priv structure pointer as its lone parameter; |
| 119 | information about the actual work to be done is probably obtained by using |
| 120 | container_of() to find the enclosing structure. |
| 121 | |
| 122 | Note that parallel() has no return value; the padata subsystem assumes that |
| 123 | parallel() will take responsibility for the job from this point. The job |
| 124 | need not be completed during this call, but, if parallel() leaves work |
| 125 | outstanding, it should be prepared to be called again with a new job before |
| 126 | the previous one completes. |
| 127 | |
| 128 | Serializing Jobs |
| 129 | ---------------- |
| 130 | |
| 131 | When a job does complete, parallel() (or whatever function actually finishes |
| 132 | the work) should inform padata of the fact with a call to:: |
| 133 | |
| 134 | void padata_do_serial(struct padata_priv *padata); |
| 135 | |
| 136 | At some point in the future, padata_do_serial() will trigger a call to the |
| 137 | serial() function in the padata_priv structure. That call will happen on |
| 138 | the CPU requested in the initial call to padata_do_parallel(); it, too, is |
| 139 | run with local software interrupts disabled. |
| 140 | Note that this call may be deferred for a while since the padata code takes |
| 141 | pains to ensure that jobs are completed in the order in which they were |
| 142 | submitted. |
| 143 | |
| 144 | Destroying |
| 145 | ---------- |
| 146 | |
Daniel Jordan | 350ef05 | 2020-07-14 16:13:52 -0400 | [diff] [blame] | 147 | Cleaning up a padata instance predictably involves calling the two free |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 148 | functions that correspond to the allocation in reverse:: |
| 149 | |
| 150 | void padata_free_shell(struct padata_shell *ps); |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 151 | void padata_free(struct padata_instance *pinst); |
| 152 | |
| 153 | It is the user's responsibility to ensure all outstanding jobs are complete |
| 154 | before any of the above are called. |
| 155 | |
Daniel Jordan | ec3b39c | 2020-06-03 15:59:59 -0700 | [diff] [blame] | 156 | Running Multithreaded Jobs |
| 157 | ========================== |
| 158 | |
| 159 | A multithreaded job has a main thread and zero or more helper threads, with the |
| 160 | main thread participating in the job and then waiting until all helpers have |
| 161 | finished. padata splits the job into units called chunks, where a chunk is a |
| 162 | piece of the job that one thread completes in one call to the thread function. |
| 163 | |
| 164 | A user has to do three things to run a multithreaded job. First, describe the |
| 165 | job by defining a padata_mt_job structure, which is explained in the Interface |
| 166 | section. This includes a pointer to the thread function, which padata will |
| 167 | call each time it assigns a job chunk to a thread. Then, define the thread |
| 168 | function, which accepts three arguments, ``start``, ``end``, and ``arg``, where |
| 169 | the first two delimit the range that the thread operates on and the last is a |
| 170 | pointer to the job's shared state, if any. Prepare the shared state, which is |
| 171 | typically allocated on the main thread's stack. Last, call |
| 172 | padata_do_multithreaded(), which will return once the job is finished. |
| 173 | |
Daniel Jordan | bfcdcef8 | 2019-12-03 14:31:14 -0500 | [diff] [blame] | 174 | Interface |
| 175 | ========= |
| 176 | |
| 177 | .. kernel-doc:: include/linux/padata.h |
| 178 | .. kernel-doc:: kernel/padata.c |