Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | CPU frequency and voltage scaling code in the Linux(TM) kernel |
| 2 | |
| 3 | |
| 4 | L i n u x C P U F r e q |
| 5 | |
| 6 | C P U D r i v e r s |
| 7 | |
| 8 | - information for developers - |
| 9 | |
| 10 | |
| 11 | Dominik Brodowski <linux@brodo.de> |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 12 | Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
| 13 | Viresh Kumar <viresh.kumar@linaro.org> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 14 | |
| 15 | |
| 16 | |
| 17 | Clock scaling allows you to change the clock speed of the CPUs on the |
| 18 | fly. This is a nice method to save battery power, because the lower |
| 19 | the clock speed, the less power the CPU consumes. |
| 20 | |
| 21 | |
| 22 | Contents: |
| 23 | --------- |
| 24 | 1. What To Do? |
| 25 | 1.1 Initialization |
| 26 | 1.2 Per-CPU Initialization |
| 27 | 1.3 verify |
Viresh Kumar | 9c0ebcf | 2013-10-25 19:45:48 +0530 | [diff] [blame] | 28 | 1.4 target/target_index or setpolicy? |
| 29 | 1.5 target/target_index |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 30 | 1.6 setpolicy |
Viresh Kumar | 1c03a2d | 2014-06-02 22:49:28 +0530 | [diff] [blame] | 31 | 1.7 get_intermediate and target_intermediate |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 32 | 2. Frequency Table Helpers |
| 33 | |
| 34 | |
| 35 | |
| 36 | 1. What To Do? |
| 37 | ============== |
| 38 | |
| 39 | So, you just got a brand-new CPU / chipset with datasheets and want to |
| 40 | add cpufreq support for this CPU / chipset? Great. Here are some hints |
| 41 | on what is necessary: |
| 42 | |
| 43 | |
| 44 | 1.1 Initialization |
| 45 | ------------------ |
| 46 | |
| 47 | First of all, in an __initcall level 7 (module_init()) or later |
| 48 | function check whether this kernel runs on the right CPU and the right |
| 49 | chipset. If so, register a struct cpufreq_driver with the CPUfreq core |
| 50 | using cpufreq_register_driver() |
| 51 | |
| 52 | What shall this struct cpufreq_driver contain? |
| 53 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 54 | .name - The name of this driver. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 55 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 56 | .init - A pointer to the per-policy initialization function. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 57 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 58 | .verify - A pointer to a "verification" function. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 59 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 60 | .setpolicy _or_ .fast_switch _or_ .target _or_ .target_index - See |
| 61 | below on the differences. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 62 | |
| 63 | And optionally |
| 64 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 65 | .flags - Hints for the cpufreq core. |
Dirk Brandewie | 367dc4a | 2014-03-19 08:45:53 -0700 | [diff] [blame] | 66 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 67 | .driver_data - cpufreq driver specific data. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 68 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 69 | .resolve_freq - Returns the most appropriate frequency for a target |
| 70 | frequency. Doesn't change the frequency though. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 71 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 72 | .get_intermediate and target_intermediate - Used to switch to stable |
| 73 | frequency while changing CPU frequency. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 74 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 75 | .get - Returns current frequency of the CPU. |
| 76 | |
| 77 | .bios_limit - Returns HW/BIOS max frequency limitations for the CPU. |
| 78 | |
| 79 | .exit - A pointer to a per-policy cleanup function called during |
| 80 | CPU_POST_DEAD phase of cpu hotplug process. |
| 81 | |
| 82 | .stop_cpu - A pointer to a per-policy stop function called during |
| 83 | CPU_DOWN_PREPARE phase of cpu hotplug process. |
| 84 | |
| 85 | .suspend - A pointer to a per-policy suspend function which is called |
| 86 | with interrupts disabled and _after_ the governor is stopped for the |
| 87 | policy. |
| 88 | |
| 89 | .resume - A pointer to a per-policy resume function which is called |
| 90 | with interrupts disabled and _before_ the governor is started again. |
| 91 | |
| 92 | .ready - A pointer to a per-policy ready function which is called after |
| 93 | the policy is fully initialized. |
| 94 | |
| 95 | .attr - A pointer to a NULL-terminated list of "struct freq_attr" which |
| 96 | allow to export values to sysfs. |
| 97 | |
| 98 | .boost_enabled - If set, boost frequencies are enabled. |
| 99 | |
| 100 | .set_boost - A pointer to a per-policy function to enable/disable boost |
| 101 | frequencies. |
Viresh Kumar | 1c03a2d | 2014-06-02 22:49:28 +0530 | [diff] [blame] | 102 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 103 | |
| 104 | 1.2 Per-CPU Initialization |
| 105 | -------------------------- |
| 106 | |
| 107 | Whenever a new CPU is registered with the device model, or after the |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 108 | cpufreq driver registers itself, the per-policy initialization function |
| 109 | cpufreq_driver.init is called if no cpufreq policy existed for the CPU. |
| 110 | Note that the .init() and .exit() routines are called only once for the |
| 111 | policy and not for each CPU managed by the policy. It takes a struct |
| 112 | cpufreq_policy *policy as argument. What to do now? |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 113 | |
| 114 | If necessary, activate the CPUfreq support on your CPU. |
| 115 | |
| 116 | Then, the driver must fill in the following values: |
| 117 | |
| 118 | policy->cpuinfo.min_freq _and_ |
| 119 | policy->cpuinfo.max_freq - the minimum and maximum frequency |
| 120 | (in kHz) which is supported by |
| 121 | this CPU |
| 122 | policy->cpuinfo.transition_latency the time it takes on this CPU to |
Mark Brown | bbe237a | 2009-11-12 16:06:45 +0000 | [diff] [blame] | 123 | switch between two frequencies in |
| 124 | nanoseconds (if appropriate, else |
| 125 | specify CPUFREQ_ETERNAL) |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 126 | |
| 127 | policy->cur The current operating frequency of |
| 128 | this CPU (if appropriate) |
| 129 | policy->min, |
| 130 | policy->max, |
| 131 | policy->policy and, if necessary, |
| 132 | policy->governor must contain the "default policy" for |
| 133 | this CPU. A few moments later, |
| 134 | cpufreq_driver.verify and either |
| 135 | cpufreq_driver.setpolicy or |
Viresh Kumar | 9c0ebcf | 2013-10-25 19:45:48 +0530 | [diff] [blame] | 136 | cpufreq_driver.target/target_index is called |
| 137 | with these values. |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 138 | policy->cpus Update this with the masks of the |
| 139 | (online + offline) CPUs that do DVFS |
| 140 | along with this CPU (i.e. that share |
| 141 | clock/voltage rails with it). |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 142 | |
Viresh Kumar | eb2f50f | 2013-04-01 12:57:48 +0000 | [diff] [blame] | 143 | For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the |
| 144 | frequency table helpers might be helpful. See the section 2 for more information |
| 145 | on them. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 146 | |
| 147 | |
| 148 | 1.3 verify |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 149 | ---------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 150 | |
| 151 | When the user decides a new policy (consisting of |
| 152 | "policy,governor,min,max") shall be set, this policy must be validated |
| 153 | so that incompatible values can be corrected. For verifying these |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 154 | values cpufreq_verify_within_limits(struct cpufreq_policy *policy, |
| 155 | unsigned int min_freq, unsigned int max_freq) function might be helpful. |
| 156 | See section 2 for details on frequency table helpers. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 157 | |
| 158 | You need to make sure that at least one valid frequency (or operating |
| 159 | range) is within policy->min and policy->max. If necessary, increase |
| 160 | policy->max first, and only if this is no solution, decrease policy->min. |
| 161 | |
| 162 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 163 | 1.4 target or target_index or setpolicy or fast_switch? |
| 164 | ------------------------------------------------------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 165 | |
| 166 | Most cpufreq drivers or even most cpu frequency scaling algorithms |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 167 | only allow the CPU frequency to be set to predefined fixed values. For |
| 168 | these, you use the ->target(), ->target_index() or ->fast_switch() |
| 169 | callbacks. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 170 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 171 | Some cpufreq capable processors switch the frequency between certain |
| 172 | limits on their own. These shall use the ->setpolicy() callback. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 173 | |
| 174 | |
Viresh Kumar | 1c03a2d | 2014-06-02 22:49:28 +0530 | [diff] [blame] | 175 | 1.5. target/target_index |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 176 | ------------------------ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 177 | |
Viresh Kumar | 9c0ebcf | 2013-10-25 19:45:48 +0530 | [diff] [blame] | 178 | The target_index call has two arguments: struct cpufreq_policy *policy, |
| 179 | and unsigned int index (into the exposed frequency table). |
| 180 | |
| 181 | The CPUfreq driver must set the new frequency when called here. The |
| 182 | actual frequency must be determined by freq_table[index].frequency. |
| 183 | |
Viresh Kumar | 1c03a2d | 2014-06-02 22:49:28 +0530 | [diff] [blame] | 184 | It should always restore to earlier frequency (i.e. policy->restore_freq) in |
| 185 | case of errors, even if we switched to intermediate frequency earlier. |
| 186 | |
Viresh Kumar | 9c0ebcf | 2013-10-25 19:45:48 +0530 | [diff] [blame] | 187 | Deprecated: |
| 188 | ---------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 189 | The target call has three arguments: struct cpufreq_policy *policy, |
| 190 | unsigned int target_frequency, unsigned int relation. |
| 191 | |
| 192 | The CPUfreq driver must set the new frequency when called here. The |
| 193 | actual frequency must be determined using the following rules: |
| 194 | |
| 195 | - keep close to "target_freq" |
| 196 | - policy->min <= new_freq <= policy->max (THIS MUST BE VALID!!!) |
| 197 | - if relation==CPUFREQ_REL_L, try to select a new_freq higher than or equal |
| 198 | target_freq. ("L for lowest, but no lower than") |
| 199 | - if relation==CPUFREQ_REL_H, try to select a new_freq lower than or equal |
| 200 | target_freq. ("H for highest, but no higher than") |
| 201 | |
Chumbalkar Nagananda | 51555c0 | 2009-05-21 23:29:48 +0000 | [diff] [blame] | 202 | Here again the frequency table helper might assist you - see section 2 |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 203 | for details. |
| 204 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 205 | 1.6. fast_switch |
| 206 | ---------------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 207 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 208 | This function is used for frequency switching from scheduler's context. |
| 209 | Not all drivers are expected to implement it, as sleeping from within |
| 210 | this callback isn't allowed. This callback must be highly optimized to |
| 211 | do switching as fast as possible. |
| 212 | |
| 213 | This function has two arguments: struct cpufreq_policy *policy and |
| 214 | unsigned int target_frequency. |
| 215 | |
| 216 | |
| 217 | 1.7 setpolicy |
| 218 | ------------- |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 219 | |
| 220 | The setpolicy call only takes a struct cpufreq_policy *policy as |
| 221 | argument. You need to set the lower limit of the in-processor or |
| 222 | in-chipset dynamic frequency switching to policy->min, the upper limit |
| 223 | to policy->max, and -if supported- select a performance-oriented |
| 224 | setting when policy->policy is CPUFREQ_POLICY_PERFORMANCE, and a |
| 225 | powersaving-oriented setting when CPUFREQ_POLICY_POWERSAVE. Also check |
Wanlong Gao | 25eb650 | 2011-06-13 17:53:53 +0800 | [diff] [blame] | 226 | the reference implementation in drivers/cpufreq/longrun.c |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 227 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 228 | 1.8 get_intermediate and target_intermediate |
Viresh Kumar | 1c03a2d | 2014-06-02 22:49:28 +0530 | [diff] [blame] | 229 | -------------------------------------------- |
| 230 | |
| 231 | Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. |
| 232 | |
| 233 | get_intermediate should return a stable intermediate frequency platform wants to |
sayli karnik | 54f5d13 | 2017-03-09 11:48:21 +0530 | [diff] [blame] | 234 | switch to, and target_intermediate() should set CPU to that frequency, before |
Viresh Kumar | 1c03a2d | 2014-06-02 22:49:28 +0530 | [diff] [blame] | 235 | jumping to the frequency corresponding to 'index'. Core will take care of |
| 236 | sending notifications and driver doesn't have to handle them in |
| 237 | target_intermediate() or target_index(). |
| 238 | |
| 239 | Drivers can return '0' from get_intermediate() in case they don't wish to switch |
| 240 | to intermediate frequency for some target frequency. In that case core will |
| 241 | directly call ->target_index(). |
| 242 | |
| 243 | NOTE: ->target_index() should restore to policy->restore_freq in case of |
| 244 | failures as core would send notifications for that. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 245 | |
| 246 | |
| 247 | 2. Frequency Table Helpers |
| 248 | ========================== |
| 249 | |
| 250 | As most cpufreq processors only allow for being set to a few specific |
| 251 | frequencies, a "frequency table" with some functions might assist in |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 252 | some work of the processor driver. Such a "frequency table" consists of |
| 253 | an array of struct cpufreq_frequency_table entries, with driver specific |
| 254 | values in "driver_data", the corresponding frequency in "frequency" and |
| 255 | flags set. At the end of the table, you need to add a |
| 256 | cpufreq_frequency_table entry with frequency set to CPUFREQ_TABLE_END. |
| 257 | And if you want to skip one entry in the table, set the frequency to |
| 258 | CPUFREQ_ENTRY_INVALID. The entries don't need to be in sorted in any |
| 259 | particular order, but if they are cpufreq core will do DVFS a bit |
| 260 | quickly for them as search for best match is faster. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 261 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 262 | By calling cpufreq_table_validate_and_show(), the cpuinfo.min_freq and |
| 263 | cpuinfo.max_freq values are detected, and policy->min and policy->max |
| 264 | are set to the same values. This is helpful for the per-CPU |
| 265 | initialization stage. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 266 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 267 | cpufreq_frequency_table_verify() assures that at least one valid |
| 268 | frequency is within policy->min and policy->max, and all other criteria |
| 269 | are met. This is helpful for the ->verify call. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 270 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 271 | cpufreq_frequency_table_target() is the corresponding frequency table |
| 272 | helper for the ->target stage. Just pass the values to this function, |
| 273 | and this function returns the of the frequency table entry which |
| 274 | contains the frequency the CPU shall be set to. |
Stratos Karafotis | 27e289d | 2014-04-25 23:15:23 +0300 | [diff] [blame] | 275 | |
| 276 | The following macros can be used as iterators over cpufreq_frequency_table: |
| 277 | |
| 278 | cpufreq_for_each_entry(pos, table) - iterates over all entries of frequency |
| 279 | table. |
| 280 | |
Viresh Kumar | 7de962c | 2017-01-06 11:08:05 +0530 | [diff] [blame] | 281 | cpufreq_for_each_valid_entry(pos, table) - iterates over all entries, |
Stratos Karafotis | 27e289d | 2014-04-25 23:15:23 +0300 | [diff] [blame] | 282 | excluding CPUFREQ_ENTRY_INVALID frequencies. |
| 283 | Use arguments "pos" - a cpufreq_frequency_table * as a loop cursor and |
| 284 | "table" - the cpufreq_frequency_table * you want to iterate over. |
| 285 | |
| 286 | For example: |
| 287 | |
| 288 | struct cpufreq_frequency_table *pos, *driver_freq_table; |
| 289 | |
| 290 | cpufreq_for_each_entry(pos, driver_freq_table) { |
| 291 | /* Do something with pos */ |
| 292 | pos->frequency = ... |
| 293 | } |
Dominik Brodowski | ffd81dc | 2018-01-30 06:42:37 +0100 | [diff] [blame^] | 294 | |
| 295 | If you need to work with the position of pos within driver_freq_table, |
| 296 | do not subtract the pointers, as it is quite costly. Instead, use the |
| 297 | macros cpufreq_for_each_entry_idx() and cpufreq_for_each_valid_entry_idx(). |