Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 1 | CPU frequency and voltage scaling code in the Linux(TM) kernel |
| 2 | |
| 3 | |
| 4 | L i n u x C P U F r e q |
| 5 | |
| 6 | C P U F r e q G o v e r n o r s |
| 7 | |
| 8 | - information for users and developers - |
| 9 | |
| 10 | |
| 11 | Dominik Brodowski <linux@brodo.de> |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 12 | some additions and corrections by Nico Golde <nico@ngolde.de> |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 13 | |
| 14 | |
| 15 | |
| 16 | Clock scaling allows you to change the clock speed of the CPUs on the |
| 17 | fly. This is a nice method to save battery power, because the lower |
| 18 | the clock speed, the less power the CPU consumes. |
| 19 | |
| 20 | |
| 21 | Contents: |
| 22 | --------- |
| 23 | 1. What is a CPUFreq Governor? |
| 24 | |
| 25 | 2. Governors In the Linux Kernel |
| 26 | 2.1 Performance |
| 27 | 2.2 Powersave |
| 28 | 2.3 Userspace |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 29 | 2.4 Ondemand |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 30 | 2.5 Conservative |
Mike Chan | ef96969 | 2010-06-22 11:26:45 -0700 | [diff] [blame] | 31 | 2.6 Interactive |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 32 | |
| 33 | 3. The Governor Interface in the CPUfreq Core |
| 34 | |
| 35 | |
| 36 | |
| 37 | 1. What Is A CPUFreq Governor? |
| 38 | ============================== |
| 39 | |
Prarit Bhargava | 1df1b36 | 2015-06-01 09:36:04 -0400 | [diff] [blame] | 40 | Most cpufreq drivers (except the intel_pstate and longrun) or even most |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 41 | cpu frequency scaling algorithms only offer the CPU to be set to one |
| 42 | frequency. In order to offer dynamic frequency scaling, the cpufreq |
| 43 | core must be able to tell these drivers of a "target frequency". So |
Viresh Kumar | 9c0ebcf | 2013-10-25 19:45:48 +0530 | [diff] [blame] | 44 | these specific drivers will be transformed to offer a "->target/target_index" |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 45 | call instead of the existing "->setpolicy" call. For "longrun", all |
| 46 | stays the same, though. |
| 47 | |
| 48 | How to decide what frequency within the CPUfreq policy should be used? |
| 49 | That's done using "cpufreq governors". Two are already in this patch |
| 50 | -- they're the already existing "powersave" and "performance" which |
| 51 | set the frequency statically to the lowest or highest frequency, |
| 52 | respectively. At least two more such governors will be ready for |
| 53 | addition in the near future, but likely many more as there are various |
| 54 | different theories and models about dynamic frequency scaling |
| 55 | around. Using such a generic interface as cpufreq offers to scaling |
| 56 | governors, these can be tested extensively, and the best one can be |
| 57 | selected for each specific use. |
| 58 | |
| 59 | Basically, it's the following flow graph: |
| 60 | |
Matt LaPlante | 2fe0ae7 | 2006-10-03 22:50:39 +0200 | [diff] [blame] | 61 | CPU can be set to switch independently | CPU can only be set |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 62 | within specific "limits" | to specific frequencies |
| 63 | |
| 64 | "CPUfreq policy" |
| 65 | consists of frequency limits (policy->{min,max}) |
| 66 | and CPUfreq governor to be used |
| 67 | / \ |
| 68 | / \ |
| 69 | / the cpufreq governor decides |
| 70 | / (dynamically or statically) |
| 71 | / what target_freq to set within |
| 72 | / the limits of policy->{min,max} |
| 73 | / \ |
| 74 | / \ |
Viresh Kumar | 9c0ebcf | 2013-10-25 19:45:48 +0530 | [diff] [blame] | 75 | Using the ->setpolicy call, Using the ->target/target_index call, |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 76 | the limits and the the frequency closest |
| 77 | "policy" is set. to target_freq is set. |
| 78 | It is assured that it |
| 79 | is within policy->{min,max} |
| 80 | |
| 81 | |
| 82 | 2. Governors In the Linux Kernel |
| 83 | ================================ |
| 84 | |
| 85 | 2.1 Performance |
| 86 | --------------- |
| 87 | |
| 88 | The CPUfreq governor "performance" sets the CPU statically to the |
| 89 | highest frequency within the borders of scaling_min_freq and |
| 90 | scaling_max_freq. |
| 91 | |
| 92 | |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 93 | 2.2 Powersave |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 94 | ------------- |
| 95 | |
| 96 | The CPUfreq governor "powersave" sets the CPU statically to the |
| 97 | lowest frequency within the borders of scaling_min_freq and |
| 98 | scaling_max_freq. |
| 99 | |
| 100 | |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 101 | 2.3 Userspace |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 102 | ------------- |
| 103 | |
| 104 | The CPUfreq governor "userspace" allows the user, or any userspace |
| 105 | program running with UID "root", to set the CPU to a specific frequency |
| 106 | by making a sysfs file "scaling_setspeed" available in the CPU-device |
| 107 | directory. |
| 108 | |
| 109 | |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 110 | 2.4 Ondemand |
| 111 | ------------ |
| 112 | |
Matt LaPlante | a2ffd27 | 2006-10-03 22:49:15 +0200 | [diff] [blame] | 113 | The CPUfreq governor "ondemand" sets the CPU depending on the |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 114 | current usage. To do this the CPU must have the capability to |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 115 | switch the frequency very quickly. There are a number of sysfs file |
| 116 | accessible parameters: |
| 117 | |
| 118 | sampling_rate: measured in uS (10^-6 seconds), this is how often you |
| 119 | want the kernel to look at the CPU usage and to make decisions on |
| 120 | what to do about the frequency. Typically this is set to values of |
Thomas Renninger | 112124a | 2009-02-04 11:55:12 +0100 | [diff] [blame] | 121 | around '10000' or more. It's default value is (cmp. with users-guide.txt): |
| 122 | transition_latency * 1000 |
Thomas Renninger | 112124a | 2009-02-04 11:55:12 +0100 | [diff] [blame] | 123 | Be aware that transition latency is in ns and sampling_rate is in us, so you |
| 124 | get the same sysfs value by default. |
| 125 | Sampling rate should always get adjusted considering the transition latency |
| 126 | To set the sampling rate 750 times as high as the transition latency |
| 127 | in the bash (as said, 1000 is default), do: |
| 128 | echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) \ |
| 129 | >ondemand/sampling_rate |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 130 | |
Paul Bolle | e7cbb5b | 2011-11-08 10:16:03 +0100 | [diff] [blame] | 131 | sampling_rate_min: |
Thomas Renninger | 4f4d1ad | 2009-04-22 13:48:31 +0200 | [diff] [blame] | 132 | The sampling rate is limited by the HW transition latency: |
| 133 | transition_latency * 100 |
| 134 | Or by kernel restrictions: |
Frederic Weisbecker | 3451d02 | 2011-08-10 23:21:01 +0200 | [diff] [blame] | 135 | If CONFIG_NO_HZ_COMMON is set, the limit is 10ms fixed. |
| 136 | If CONFIG_NO_HZ_COMMON is not set or nohz=off boot parameter is used, the |
Thomas Renninger | 4f4d1ad | 2009-04-22 13:48:31 +0200 | [diff] [blame] | 137 | limits depend on the CONFIG_HZ option: |
| 138 | HZ=1000: min=20000us (20ms) |
| 139 | HZ=250: min=80000us (80ms) |
| 140 | HZ=100: min=200000us (200ms) |
| 141 | The highest value of kernel and HW latency restrictions is shown and |
| 142 | used as the minimum sampling rate. |
| 143 | |
Matt LaPlante | d919588 | 2008-07-25 19:45:33 -0700 | [diff] [blame] | 144 | up_threshold: defines what the average CPU usage between the samplings |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 145 | of 'sampling_rate' needs to be for the kernel to make a decision on |
| 146 | whether it should increase the frequency. For example when it is set |
Mike Frysinger | 292e004 | 2009-12-09 06:56:40 -0500 | [diff] [blame] | 147 | to its default value of '95' it means that between the checking |
| 148 | intervals the CPU needs to be on average more than 95% in use to then |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 149 | decide that the CPU frequency needs to be increased. |
| 150 | |
Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 151 | ignore_nice_load: this parameter takes a value of '0' or '1'. When |
| 152 | set to '0' (its default), all processes are counted towards the |
| 153 | 'cpu utilisation' value. When set to '1', the processes that are |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 154 | run with a 'nice' value will not count (and thus be ignored) in the |
Matt LaPlante | 992caac | 2006-10-03 22:52:05 +0200 | [diff] [blame] | 155 | overall usage calculation. This is useful if you are running a CPU |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 156 | intensive calculation on your laptop that you do not care how long it |
| 157 | takes to complete as you can 'nice' it and prevent it from taking part |
| 158 | in the deciding process of whether to increase your CPU frequency. |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 159 | |
Vishwanath BS | 5b95364 | 2011-01-25 20:12:41 +0530 | [diff] [blame] | 160 | sampling_down_factor: this parameter controls the rate at which the |
| 161 | kernel makes a decision on when to decrease the frequency while running |
| 162 | at top speed. When set to 1 (the default) decisions to reevaluate load |
| 163 | are made at the same interval regardless of current clock speed. But |
| 164 | when set to greater than 1 (e.g. 100) it acts as a multiplier for the |
| 165 | scheduling interval for reevaluating load when the CPU is at its top |
| 166 | speed due to high load. This improves performance by reducing the overhead |
| 167 | of load evaluation and helping the CPU stay at its top speed when truly |
| 168 | busy, rather than shifting back and forth in speed. This tunable has no |
| 169 | effect on behavior at lower speeds/lower CPU loads. |
| 170 | |
Jacob Shin | 9c5320c | 2013-04-04 16:19:04 +0000 | [diff] [blame] | 171 | powersave_bias: this parameter takes a value between 0 to 1000. It |
| 172 | defines the percentage (times 10) value of the target frequency that |
| 173 | will be shaved off of the target. For example, when set to 100 -- 10%, |
| 174 | when ondemand governor would have targeted 1000 MHz, it will target |
| 175 | 1000 MHz - (10% of 1000 MHz) = 900 MHz instead. This is set to 0 |
| 176 | (disabled) by default. |
| 177 | When AMD frequency sensitivity powersave bias driver -- |
| 178 | drivers/cpufreq/amd_freq_sensitivity.c is loaded, this parameter |
| 179 | defines the workload frequency sensitivity threshold in which a lower |
| 180 | frequency is chosen instead of ondemand governor's original target. |
| 181 | The frequency sensitivity is a hardware reported (on AMD Family 16h |
| 182 | Processors and above) value between 0 to 100% that tells software how |
| 183 | the performance of the workload running on a CPU will change when |
| 184 | frequency changes. A workload with sensitivity of 0% (memory/IO-bound) |
| 185 | will not perform any better on higher core frequency, whereas a |
| 186 | workload with sensitivity of 100% (CPU-bound) will perform better |
| 187 | higher the frequency. When the driver is loaded, this is set to 400 |
| 188 | by default -- for CPUs running workloads with sensitivity value below |
| 189 | 40%, a lower frequency is chosen. Unloading the driver or writing 0 |
| 190 | will disable this feature. |
| 191 | |
Nico Golde | 594dd2c | 2005-06-25 14:58:33 -0700 | [diff] [blame] | 192 | |
Alexander Clouter | 537208c | 2005-12-01 01:09:23 -0800 | [diff] [blame] | 193 | 2.5 Conservative |
| 194 | ---------------- |
| 195 | |
| 196 | The CPUfreq governor "conservative", much like the "ondemand" |
| 197 | governor, sets the CPU depending on the current usage. It differs in |
| 198 | behaviour in that it gracefully increases and decreases the CPU speed |
| 199 | rather than jumping to max speed the moment there is any load on the |
| 200 | CPU. This behaviour more suitable in a battery powered environment. |
| 201 | The governor is tweaked in the same manner as the "ondemand" governor |
| 202 | through sysfs with the addition of: |
| 203 | |
| 204 | freq_step: this describes what percentage steps the cpu freq should be |
| 205 | increased and decreased smoothly by. By default the cpu frequency will |
| 206 | increase in 5% chunks of your maximum cpu frequency. You can change this |
| 207 | value to anywhere between 0 and 100 where '0' will effectively lock your |
| 208 | CPU at a speed regardless of its load whilst '100' will, in theory, make |
| 209 | it behave identically to the "ondemand" governor. |
| 210 | |
| 211 | down_threshold: same as the 'up_threshold' found for the "ondemand" |
| 212 | governor but for the opposite direction. For example when set to its |
| 213 | default value of '20' it means that if the CPU usage needs to be below |
| 214 | 20% between samples to have the frequency decreased. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 215 | |
Stratos Karafotis | 7af1c05 | 2013-03-05 22:06:29 +0000 | [diff] [blame] | 216 | sampling_down_factor: similar functionality as in "ondemand" governor. |
| 217 | But in "conservative", it controls the rate at which the kernel makes |
| 218 | a decision on when to decrease the frequency while running in any |
| 219 | speed. Load for frequency increase is still evaluated every |
| 220 | sampling rate. |
| 221 | |
Mike Chan | ef96969 | 2010-06-22 11:26:45 -0700 | [diff] [blame] | 222 | 2.6 Interactive |
| 223 | --------------- |
| 224 | |
| 225 | The CPUfreq governor "interactive" is designed for latency-sensitive, |
| 226 | interactive workloads. This governor sets the CPU speed depending on |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 227 | usage, similar to "ondemand" and "conservative" governors, but with a |
| 228 | different set of configurable behaviors. |
Mike Chan | ef96969 | 2010-06-22 11:26:45 -0700 | [diff] [blame] | 229 | |
| 230 | The tuneable values for this governor are: |
| 231 | |
Todd Poynor | e9c6074 | 2012-11-14 11:41:21 -0800 | [diff] [blame] | 232 | target_loads: CPU load values used to adjust speed to influence the |
| 233 | current CPU load toward that value. In general, the lower the target |
| 234 | load, the more often the governor will raise CPU speeds to bring load |
| 235 | below the target. The format is a single target load, optionally |
| 236 | followed by pairs of CPU speeds and CPU loads to target at or above |
| 237 | those speeds. Colons can be used between the speeds and associated |
| 238 | target loads for readability. For example: |
| 239 | |
| 240 | 85 1000000:90 1700000:99 |
| 241 | |
| 242 | targets CPU load 85% below speed 1GHz, 90% at or above 1GHz, until |
| 243 | 1.7GHz and above, at which load 99% is targeted. If speeds are |
| 244 | specified these must appear in ascending order. Higher target load |
| 245 | values are typically specified for higher speeds, that is, target load |
| 246 | values also usually appear in an ascending order. The default is |
| 247 | target load 90% for all speeds. |
| 248 | |
Mike Chan | ef96969 | 2010-06-22 11:26:45 -0700 | [diff] [blame] | 249 | min_sample_time: The minimum amount of time to spend at the current |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 250 | frequency before ramping down. Default is 80000 uS. |
Mike Chan | ef96969 | 2010-06-22 11:26:45 -0700 | [diff] [blame] | 251 | |
Todd Poynor | a380aa8 | 2012-04-17 17:39:34 -0700 | [diff] [blame] | 252 | hispeed_freq: An intermediate "hi speed" at which to initially ramp |
| 253 | when CPU load hits the value specified in go_hispeed_load. If load |
| 254 | stays high for the amount of time specified in above_hispeed_delay, |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 255 | then speed may be bumped higher. Default is the maximum speed |
| 256 | allowed by the policy at governor initialization time. |
Todd Poynor | a380aa8 | 2012-04-17 17:39:34 -0700 | [diff] [blame] | 257 | |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 258 | go_hispeed_load: The CPU load at which to ramp to hispeed_freq. |
| 259 | Default is 99%. |
Todd Poynor | a380aa8 | 2012-04-17 17:39:34 -0700 | [diff] [blame] | 260 | |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 261 | above_hispeed_delay: When speed is at or above hispeed_freq, wait for |
| 262 | this long before raising speed in response to continued high load. |
Minsung Kim | 9c1f83a | 2013-02-25 23:48:04 +0900 | [diff] [blame] | 263 | The format is a single delay value, optionally followed by pairs of |
| 264 | CPU speeds and the delay to use at or above those speeds. Colons can |
| 265 | be used between the speeds and associated delays for readability. For |
| 266 | example: |
| 267 | |
| 268 | 80000 1300000:200000 1500000:40000 |
| 269 | |
| 270 | uses delay 80000 uS until CPU speed 1.3 GHz, at which speed delay |
| 271 | 200000 uS is used until speed 1.5 GHz, at which speed (and above) |
| 272 | delay 40000 uS is used. If speeds are specified these must appear in |
| 273 | ascending order. Default is 20000 uS. |
Mike Chan | ef96969 | 2010-06-22 11:26:45 -0700 | [diff] [blame] | 274 | |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 275 | timer_rate: Sample rate for reevaluating CPU load when the CPU is not |
| 276 | idle. A deferrable timer is used, such that the CPU will not be woken |
| 277 | from idle to service this timer until something else needs to run. |
| 278 | (The maximum time to allow deferring this timer when not running at |
| 279 | minimum speed is configurable via timer_slack.) Default is 20000 uS. |
Todd Poynor | a380aa8 | 2012-04-17 17:39:34 -0700 | [diff] [blame] | 280 | |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 281 | timer_slack: Maximum additional time to defer handling the governor |
| 282 | sampling timer beyond timer_rate when running at speeds above the |
| 283 | minimum. For platforms that consume additional power at idle when |
| 284 | CPUs are running at speeds greater than minimum, this places an upper |
| 285 | bound on how long the timer will be deferred prior to re-evaluating |
| 286 | load and dropping speed. For example, if timer_rate is 20000uS and |
| 287 | timer_slack is 10000uS then timers will be deferred for up to 30msec |
| 288 | when not at lowest speed. A value of -1 means defer timers |
| 289 | indefinitely at all speeds. Default is 80000 uS. |
Todd Poynor | ab8dc40 | 2012-04-02 17:17:14 -0700 | [diff] [blame] | 290 | |
Todd Poynor | 442a312 | 2012-05-03 00:16:55 -0700 | [diff] [blame] | 291 | boost: If non-zero, immediately boost speed of all CPUs to at least |
| 292 | hispeed_freq until zero is written to this attribute. If zero, allow |
| 293 | CPU speeds to drop below hispeed_freq according to load as usual. |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 294 | Default is zero. |
Todd Poynor | 442a312 | 2012-05-03 00:16:55 -0700 | [diff] [blame] | 295 | |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 296 | boostpulse: On each write, immediately boost speed of all CPUs to |
| 297 | hispeed_freq for at least the period of time specified by |
| 298 | boostpulse_duration, after which speeds are allowed to drop below |
Todd Poynor | 442a312 | 2012-05-03 00:16:55 -0700 | [diff] [blame] | 299 | hispeed_freq according to load as usual. |
Todd Poynor | 15a9ea0 | 2012-04-23 20:42:41 -0700 | [diff] [blame] | 300 | |
Todd Poynor | 9cdc130 | 2012-12-21 15:13:01 -0800 | [diff] [blame] | 301 | boostpulse_duration: Length of time to hold CPU speed at hispeed_freq |
| 302 | on a write to boostpulse, before allowing speed to drop according to |
| 303 | load as usual. Default is 80000 uS. |
| 304 | |
Junjie Wu | 62b27ec | 2015-08-25 15:19:26 -0700 | [diff] [blame] | 305 | align_windows: If non-zero, align governor timer window to fire at |
| 306 | multiples of number of jiffies timer_rate converts to. |
| 307 | |
| 308 | use_sched_load: If non-zero, query scheduler for CPU busy time, |
| 309 | instead of collecting it directly in governor. This would allow |
| 310 | scheduler to adjust the busy time of each CPU to account for known |
| 311 | information such as migration. If non-zero, this also implies governor |
| 312 | sampling windows are aligned across CPUs, with same timer_rate, |
| 313 | regardless what align_windows is set to. Default is zero. |
| 314 | |
Junjie Wu | aceecc06 | 2015-09-18 18:13:01 -0700 | [diff] [blame] | 315 | use_migration_notif: If non-zero, schedule hrtimer to fire in 1ms |
| 316 | to reevaluate frequency of notified CPU, unless the hrtimer is already |
| 317 | pending. If zero, ignore scheduler notification. Default is zero. |
Junjie Wu | 62b27ec | 2015-08-25 15:19:26 -0700 | [diff] [blame] | 318 | |
| 319 | max_freq_hysteresis: Each time freq evaluation chooses policy->max, |
| 320 | next max_freq_hysteresis is considered as hysteresis period. During |
| 321 | this period, frequency target will not drop below hispeed_freq, no |
| 322 | matter how light actual workload is. If CPU load of any sampling |
| 323 | window exceeds go_hispeed_load during this period, governor will |
| 324 | directly increase frequency back to policy->max. Default is 0 uS. |
| 325 | |
| 326 | ignore_hispeed_on_notif: If non-zero, do not apply hispeed related |
| 327 | logic if frequency evaluation is triggered by scheduler notification. |
| 328 | This includes ignoring go_hispeed_load, hispeed_freq in frequency |
| 329 | selection, and ignoring above_hispeed_delay that prevents frequency |
| 330 | ramp up. For evaluation triggered by timer, hispeed logic is still |
| 331 | always applied. ignore_hispeed_on_notif has no effect if |
| 332 | use_migration_notif is set to zero. Default is zero. |
| 333 | |
| 334 | fast_ramp_down: If non-zero, do not apply min_sample_time if |
| 335 | frequency evaluation is triggered by scheduler notification. For |
| 336 | evaluation triggered by timer, min_sample_time is still always |
| 337 | enforced. fast_ramp_down has no effect if use_migration_notif is |
| 338 | set to zero. Default is zero. |
Mike Chan | ef96969 | 2010-06-22 11:26:45 -0700 | [diff] [blame] | 339 | |
Junjie Wu | 7c12860 | 2015-06-09 17:36:11 -0700 | [diff] [blame] | 340 | enable_prediction: If non-zero, two frequencies will be calculated |
| 341 | during each sampling period: one based on busy time in previous sampling |
| 342 | period (f_prev), and the other based on prediction provided by scheduler |
| 343 | (f_pred). Max of both will be selected as final frequency. Hispeed |
| 344 | related logic, including both frequency selection and delay is ignored |
| 345 | if enable_prediction is set. If only f_pred but not f_prev picked |
| 346 | policy->max, max_freq_hysteresis period is not started/extended. |
| 347 | use_sched_load must be turned on before enabling this feature. |
| 348 | Default is zero. |
| 349 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 350 | 3. The Governor Interface in the CPUfreq Core |
| 351 | ============================================= |
| 352 | |
| 353 | A new governor must register itself with the CPUfreq core using |
| 354 | "cpufreq_register_governor". The struct cpufreq_governor, which has to |
| 355 | be passed to that function, must contain the following values: |
| 356 | |
| 357 | governor->name - A unique name for this governor |
| 358 | governor->governor - The governor callback function |
| 359 | governor->owner - .THIS_MODULE for the governor module (if |
| 360 | appropriate) |
| 361 | |
| 362 | The governor->governor callback is called with the current (or to-be-set) |
| 363 | cpufreq_policy struct for that CPU, and an unsigned int event. The |
| 364 | following events are currently defined: |
| 365 | |
| 366 | CPUFREQ_GOV_START: This governor shall start its duty for the CPU |
| 367 | policy->cpu |
| 368 | CPUFREQ_GOV_STOP: This governor shall end its duty for the CPU |
| 369 | policy->cpu |
| 370 | CPUFREQ_GOV_LIMITS: The limits for CPU policy->cpu have changed to |
| 371 | policy->min and policy->max. |
| 372 | |
| 373 | If you need other "events" externally of your driver, _only_ use the |
| 374 | cpufreq_governor_l(unsigned int cpu, unsigned int event) call to the |
| 375 | CPUfreq core to ensure proper locking. |
| 376 | |
| 377 | |
| 378 | The CPUfreq governor may call the CPU processor driver using one of |
| 379 | these two functions: |
| 380 | |
| 381 | int cpufreq_driver_target(struct cpufreq_policy *policy, |
| 382 | unsigned int target_freq, |
| 383 | unsigned int relation); |
| 384 | |
| 385 | int __cpufreq_driver_target(struct cpufreq_policy *policy, |
| 386 | unsigned int target_freq, |
| 387 | unsigned int relation); |
| 388 | |
| 389 | target_freq must be within policy->min and policy->max, of course. |
| 390 | What's the difference between these two functions? When your governor |
| 391 | still is in a direct code path of a call to governor->governor, the |
| 392 | per-CPU cpufreq lock is still held in the cpufreq core, and there's |
| 393 | no need to lock it again (in fact, this would cause a deadlock). So |
| 394 | use __cpufreq_driver_target only in these cases. In all other cases |
| 395 | (for example, when there's a "daemonized" function that wakes up |
| 396 | every second), use cpufreq_driver_target to lock the cpufreq per-CPU |
| 397 | lock before the command is passed to the cpufreq processor driver. |
| 398 | |