Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 1 | The padata parallel execution mechanism |
| 2 | Last updated for 2.6.34 |
| 3 | |
| 4 | Padata is a mechanism by which the kernel can farm work out to be done in |
| 5 | parallel on multiple CPUs while retaining the ordering of tasks. It was |
| 6 | developed for use with the IPsec code, which needs to be able to perform |
| 7 | encryption and decryption on large numbers of packets without reordering |
| 8 | those packets. The crypto developers made a point of writing padata in a |
| 9 | sufficiently general fashion that it could be put to other uses as well. |
| 10 | |
| 11 | The first step in using padata is to set up a padata_instance structure for |
| 12 | overall control of how tasks are to be run: |
| 13 | |
| 14 | #include <linux/padata.h> |
| 15 | |
| 16 | struct padata_instance *padata_alloc(const struct cpumask *cpumask, |
| 17 | struct workqueue_struct *wq); |
| 18 | |
| 19 | The cpumask describes which processors will be used to execute work |
| 20 | submitted to this instance. The workqueue wq is where the work will |
| 21 | actually be done; it should be a multithreaded queue, naturally. |
| 22 | |
| 23 | There are functions for enabling and disabling the instance: |
| 24 | |
Steffen Klassert | 2197f9a | 2010-07-07 15:34:03 +0200 | [diff] [blame^] | 25 | int padata_start(struct padata_instance *pinst); |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 26 | void padata_stop(struct padata_instance *pinst); |
| 27 | |
Steffen Klassert | 2197f9a | 2010-07-07 15:34:03 +0200 | [diff] [blame^] | 28 | These functions are setting or clearing the "PADATA_INIT" flag; |
| 29 | if that flag is not set, other functions will refuse to work. |
| 30 | padata_start returns zero on success (flag set) or -EINVAL if the |
| 31 | padata cpumask contains no active cpu (flag not set). |
| 32 | padata_stop clears the flag and blocks until the padata instance |
| 33 | is unused. |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 34 | |
| 35 | The list of CPUs to be used can be adjusted with these functions: |
| 36 | |
| 37 | int padata_set_cpumask(struct padata_instance *pinst, |
| 38 | cpumask_var_t cpumask); |
| 39 | int padata_add_cpu(struct padata_instance *pinst, int cpu); |
| 40 | int padata_remove_cpu(struct padata_instance *pinst, int cpu); |
| 41 | |
| 42 | Changing the CPU mask has the look of an expensive operation, though, so it |
| 43 | probably should not be done with great frequency. |
| 44 | |
| 45 | Actually submitting work to the padata instance requires the creation of a |
| 46 | padata_priv structure: |
| 47 | |
| 48 | struct padata_priv { |
| 49 | /* Other stuff here... */ |
| 50 | void (*parallel)(struct padata_priv *padata); |
| 51 | void (*serial)(struct padata_priv *padata); |
| 52 | }; |
| 53 | |
| 54 | This structure will almost certainly be embedded within some larger |
| 55 | structure specific to the work to be done. Most its fields are private to |
| 56 | padata, but the structure should be zeroed at initialization time, and the |
| 57 | parallel() and serial() functions should be provided. Those functions will |
| 58 | be called in the process of getting the work done as we will see |
| 59 | momentarily. |
| 60 | |
| 61 | The submission of work is done with: |
| 62 | |
| 63 | int padata_do_parallel(struct padata_instance *pinst, |
| 64 | struct padata_priv *padata, int cb_cpu); |
| 65 | |
| 66 | The pinst and padata structures must be set up as described above; cb_cpu |
| 67 | specifies which CPU will be used for the final callback when the work is |
| 68 | done; it must be in the current instance's CPU mask. The return value from |
Steffen Klassert | 2197f9a | 2010-07-07 15:34:03 +0200 | [diff] [blame^] | 69 | padata_do_parallel() is zero on success, indicating that the work is in |
| 70 | progress. -EBUSY means that somebody, somewhere else is messing with the |
| 71 | instance's CPU mask, while -EINVAL is a complaint about cb_cpu not being |
| 72 | in that CPU mask or about a not running instance. |
Jonathan Corbet | 4047f8b | 2010-05-12 14:23:48 -0600 | [diff] [blame] | 73 | |
| 74 | Each task submitted to padata_do_parallel() will, in turn, be passed to |
| 75 | exactly one call to the above-mentioned parallel() function, on one CPU, so |
| 76 | true parallelism is achieved by submitting multiple tasks. Despite the |
| 77 | fact that the workqueue is used to make these calls, parallel() is run with |
| 78 | software interrupts disabled and thus cannot sleep. The parallel() |
| 79 | function gets the padata_priv structure pointer as its lone parameter; |
| 80 | information about the actual work to be done is probably obtained by using |
| 81 | container_of() to find the enclosing structure. |
| 82 | |
| 83 | Note that parallel() has no return value; the padata subsystem assumes that |
| 84 | parallel() will take responsibility for the task from this point. The work |
| 85 | need not be completed during this call, but, if parallel() leaves work |
| 86 | outstanding, it should be prepared to be called again with a new job before |
| 87 | the previous one completes. When a task does complete, parallel() (or |
| 88 | whatever function actually finishes the job) should inform padata of the |
| 89 | fact with a call to: |
| 90 | |
| 91 | void padata_do_serial(struct padata_priv *padata); |
| 92 | |
| 93 | At some point in the future, padata_do_serial() will trigger a call to the |
| 94 | serial() function in the padata_priv structure. That call will happen on |
| 95 | the CPU requested in the initial call to padata_do_parallel(); it, too, is |
| 96 | done through the workqueue, but with local software interrupts disabled. |
| 97 | Note that this call may be deferred for a while since the padata code takes |
| 98 | pains to ensure that tasks are completed in the order in which they were |
| 99 | submitted. |
| 100 | |
| 101 | The one remaining function in the padata API should be called to clean up |
| 102 | when a padata instance is no longer needed: |
| 103 | |
| 104 | void padata_free(struct padata_instance *pinst); |
| 105 | |
| 106 | This function will busy-wait while any remaining tasks are completed, so it |
| 107 | might be best not to call it while there is work outstanding. Shutting |
| 108 | down the workqueue, if necessary, should be done separately. |