Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 1 | ======================= |
| 2 | Energy Aware Scheduling |
| 3 | ======================= |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 4 | |
| 5 | 1. Introduction |
| 6 | --------------- |
| 7 | |
| 8 | Energy Aware Scheduling (or EAS) gives the scheduler the ability to predict |
| 9 | the impact of its decisions on the energy consumed by CPUs. EAS relies on an |
| 10 | Energy Model (EM) of the CPUs to select an energy efficient CPU for each task, |
| 11 | with a minimal impact on throughput. This document aims at providing an |
| 12 | introduction on how EAS works, what are the main design decisions behind it, and |
| 13 | details what is needed to get it to run. |
| 14 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 15 | Before going any further, please note that at the time of writing:: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 16 | |
| 17 | /!\ EAS does not support platforms with symmetric CPU topologies /!\ |
| 18 | |
| 19 | EAS operates only on heterogeneous CPU topologies (such as Arm big.LITTLE) |
| 20 | because this is where the potential for saving energy through scheduling is |
| 21 | the highest. |
| 22 | |
| 23 | The actual EM used by EAS is _not_ maintained by the scheduler, but by a |
| 24 | dedicated framework. For details about this framework and what it provides, |
Linus Torvalds | fb4da21 | 2019-07-15 20:44:49 -0700 | [diff] [blame] | 25 | please refer to its documentation (see Documentation/power/energy-model.rst). |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 26 | |
| 27 | |
| 28 | 2. Background and Terminology |
| 29 | ----------------------------- |
| 30 | |
| 31 | To make it clear from the start: |
| 32 | - energy = [joule] (resource like a battery on powered devices) |
| 33 | - power = energy/time = [joule/second] = [watt] |
| 34 | |
| 35 | The goal of EAS is to minimize energy, while still getting the job done. That |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 36 | is, we want to maximize:: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 37 | |
| 38 | performance [inst/s] |
| 39 | -------------------- |
| 40 | power [W] |
| 41 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 42 | which is equivalent to minimizing:: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 43 | |
| 44 | energy [J] |
| 45 | ----------- |
| 46 | instruction |
| 47 | |
| 48 | while still getting 'good' performance. It is essentially an alternative |
| 49 | optimization objective to the current performance-only objective for the |
| 50 | scheduler. This alternative considers two objectives: energy-efficiency and |
| 51 | performance. |
| 52 | |
| 53 | The idea behind introducing an EM is to allow the scheduler to evaluate the |
| 54 | implications of its decisions rather than blindly applying energy-saving |
| 55 | techniques that may have positive effects only on some platforms. At the same |
| 56 | time, the EM must be as simple as possible to minimize the scheduler latency |
| 57 | impact. |
| 58 | |
| 59 | In short, EAS changes the way CFS tasks are assigned to CPUs. When it is time |
| 60 | for the scheduler to decide where a task should run (during wake-up), the EM |
| 61 | is used to break the tie between several good CPU candidates and pick the one |
| 62 | that is predicted to yield the best energy consumption without harming the |
| 63 | system's throughput. The predictions made by EAS rely on specific elements of |
| 64 | knowledge about the platform's topology, which include the 'capacity' of CPUs, |
| 65 | and their respective energy costs. |
| 66 | |
| 67 | |
| 68 | 3. Topology information |
| 69 | ----------------------- |
| 70 | |
| 71 | EAS (as well as the rest of the scheduler) uses the notion of 'capacity' to |
| 72 | differentiate CPUs with different computing throughput. The 'capacity' of a CPU |
| 73 | represents the amount of work it can absorb when running at its highest |
| 74 | frequency compared to the most capable CPU of the system. Capacity values are |
| 75 | normalized in a 1024 range, and are comparable with the utilization signals of |
| 76 | tasks and CPUs computed by the Per-Entity Load Tracking (PELT) mechanism. Thanks |
| 77 | to capacity and utilization values, EAS is able to estimate how big/busy a |
| 78 | task/CPU is, and to take this into consideration when evaluating performance vs |
| 79 | energy trade-offs. The capacity of CPUs is provided via arch-specific code |
| 80 | through the arch_scale_cpu_capacity() callback. |
| 81 | |
| 82 | The rest of platform knowledge used by EAS is directly read from the Energy |
| 83 | Model (EM) framework. The EM of a platform is composed of a power cost table |
Linus Torvalds | fb4da21 | 2019-07-15 20:44:49 -0700 | [diff] [blame] | 84 | per 'performance domain' in the system (see Documentation/power/energy-model.rst |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 85 | for futher details about performance domains). |
| 86 | |
| 87 | The scheduler manages references to the EM objects in the topology code when the |
| 88 | scheduling domains are built, or re-built. For each root domain (rd), the |
| 89 | scheduler maintains a singly linked list of all performance domains intersecting |
| 90 | the current rd->span. Each node in the list contains a pointer to a struct |
| 91 | em_perf_domain as provided by the EM framework. |
| 92 | |
| 93 | The lists are attached to the root domains in order to cope with exclusive |
| 94 | cpuset configurations. Since the boundaries of exclusive cpusets do not |
| 95 | necessarily match those of performance domains, the lists of different root |
| 96 | domains can contain duplicate elements. |
| 97 | |
| 98 | Example 1. |
| 99 | Let us consider a platform with 12 CPUs, split in 3 performance domains |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 100 | (pd0, pd4 and pd8), organized as follows:: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 101 | |
| 102 | CPUs: 0 1 2 3 4 5 6 7 8 9 10 11 |
| 103 | PDs: |--pd0--|--pd4--|---pd8---| |
| 104 | RDs: |----rd1----|-----rd2-----| |
| 105 | |
| 106 | Now, consider that userspace decided to split the system with two |
| 107 | exclusive cpusets, hence creating two independent root domains, each |
| 108 | containing 6 CPUs. The two root domains are denoted rd1 and rd2 in the |
| 109 | above figure. Since pd4 intersects with both rd1 and rd2, it will be |
| 110 | present in the linked list '->pd' attached to each of them: |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 111 | |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 112 | * rd1->pd: pd0 -> pd4 |
| 113 | * rd2->pd: pd4 -> pd8 |
| 114 | |
| 115 | Please note that the scheduler will create two duplicate list nodes for |
| 116 | pd4 (one for each list). However, both just hold a pointer to the same |
| 117 | shared data structure of the EM framework. |
| 118 | |
| 119 | Since the access to these lists can happen concurrently with hotplug and other |
| 120 | things, they are protected by RCU, like the rest of topology structures |
| 121 | manipulated by the scheduler. |
| 122 | |
| 123 | EAS also maintains a static key (sched_energy_present) which is enabled when at |
| 124 | least one root domain meets all conditions for EAS to start. Those conditions |
| 125 | are summarized in Section 6. |
| 126 | |
| 127 | |
| 128 | 4. Energy-Aware task placement |
| 129 | ------------------------------ |
| 130 | |
| 131 | EAS overrides the CFS task wake-up balancing code. It uses the EM of the |
| 132 | platform and the PELT signals to choose an energy-efficient target CPU during |
| 133 | wake-up balance. When EAS is enabled, select_task_rq_fair() calls |
| 134 | find_energy_efficient_cpu() to do the placement decision. This function looks |
| 135 | for the CPU with the highest spare capacity (CPU capacity - CPU utilization) in |
| 136 | each performance domain since it is the one which will allow us to keep the |
| 137 | frequency the lowest. Then, the function checks if placing the task there could |
| 138 | save energy compared to leaving it on prev_cpu, i.e. the CPU where the task ran |
| 139 | in its previous activation. |
| 140 | |
| 141 | find_energy_efficient_cpu() uses compute_energy() to estimate what will be the |
| 142 | energy consumed by the system if the waking task was migrated. compute_energy() |
| 143 | looks at the current utilization landscape of the CPUs and adjusts it to |
| 144 | 'simulate' the task migration. The EM framework provides the em_pd_energy() API |
| 145 | which computes the expected energy consumption of each performance domain for |
| 146 | the given utilization landscape. |
| 147 | |
| 148 | An example of energy-optimized task placement decision is detailed below. |
| 149 | |
| 150 | Example 2. |
| 151 | Let us consider a (fake) platform with 2 independent performance domains |
| 152 | composed of two CPUs each. CPU0 and CPU1 are little CPUs; CPU2 and CPU3 |
| 153 | are big. |
| 154 | |
| 155 | The scheduler must decide where to place a task P whose util_avg = 200 |
| 156 | and prev_cpu = 0. |
| 157 | |
| 158 | The current utilization landscape of the CPUs is depicted on the graph |
| 159 | below. CPUs 0-3 have a util_avg of 400, 100, 600 and 500 respectively |
| 160 | Each performance domain has three Operating Performance Points (OPPs). |
| 161 | The CPU capacity and power cost associated with each OPP is listed in |
| 162 | the Energy Model table. The util_avg of P is shown on the figures |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 163 | below as 'PP':: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 164 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 165 | CPU util. |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 166 | 1024 - - - - - - - Energy Model |
| 167 | +-----------+-------------+ |
| 168 | | Little | Big | |
| 169 | 768 ============= +-----+-----+------+------+ |
| 170 | | Cap | Pwr | Cap | Pwr | |
| 171 | +-----+-----+------+------+ |
| 172 | 512 =========== - ##- - - - - | 170 | 50 | 512 | 400 | |
| 173 | ## ## | 341 | 150 | 768 | 800 | |
| 174 | 341 -PP - - - - ## ## | 512 | 300 | 1024 | 1700 | |
| 175 | PP ## ## +-----+-----+------+------+ |
| 176 | 170 -## - - - - ## ## |
| 177 | ## ## ## ## |
| 178 | ------------ ------------- |
| 179 | CPU0 CPU1 CPU2 CPU3 |
| 180 | |
| 181 | Current OPP: ===== Other OPP: - - - util_avg (100 each): ## |
| 182 | |
| 183 | |
| 184 | find_energy_efficient_cpu() will first look for the CPUs with the |
| 185 | maximum spare capacity in the two performance domains. In this example, |
| 186 | CPU1 and CPU3. Then it will estimate the energy of the system if P was |
| 187 | placed on either of them, and check if that would save some energy |
| 188 | compared to leaving P on CPU0. EAS assumes that OPPs follow utilization |
| 189 | (which is coherent with the behaviour of the schedutil CPUFreq |
| 190 | governor, see Section 6. for more details on this topic). |
| 191 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 192 | **Case 1. P is migrated to CPU1**:: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 193 | |
| 194 | 1024 - - - - - - - |
| 195 | |
| 196 | Energy calculation: |
| 197 | 768 ============= * CPU0: 200 / 341 * 150 = 88 |
| 198 | * CPU1: 300 / 341 * 150 = 131 |
| 199 | * CPU2: 600 / 768 * 800 = 625 |
| 200 | 512 - - - - - - - ##- - - - - * CPU3: 500 / 768 * 800 = 520 |
| 201 | ## ## => total_energy = 1364 |
| 202 | 341 =========== ## ## |
| 203 | PP ## ## |
| 204 | 170 -## - - PP- ## ## |
| 205 | ## ## ## ## |
| 206 | ------------ ------------- |
| 207 | CPU0 CPU1 CPU2 CPU3 |
| 208 | |
| 209 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 210 | **Case 2. P is migrated to CPU3**:: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 211 | |
| 212 | 1024 - - - - - - - |
| 213 | |
| 214 | Energy calculation: |
| 215 | 768 ============= * CPU0: 200 / 341 * 150 = 88 |
| 216 | * CPU1: 100 / 341 * 150 = 43 |
| 217 | PP * CPU2: 600 / 768 * 800 = 625 |
| 218 | 512 - - - - - - - ##- - -PP - * CPU3: 700 / 768 * 800 = 729 |
| 219 | ## ## => total_energy = 1485 |
| 220 | 341 =========== ## ## |
| 221 | ## ## |
| 222 | 170 -## - - - - ## ## |
| 223 | ## ## ## ## |
| 224 | ------------ ------------- |
| 225 | CPU0 CPU1 CPU2 CPU3 |
| 226 | |
| 227 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 228 | **Case 3. P stays on prev_cpu / CPU 0**:: |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 229 | |
| 230 | 1024 - - - - - - - |
| 231 | |
| 232 | Energy calculation: |
| 233 | 768 ============= * CPU0: 400 / 512 * 300 = 234 |
| 234 | * CPU1: 100 / 512 * 300 = 58 |
| 235 | * CPU2: 600 / 768 * 800 = 625 |
| 236 | 512 =========== - ##- - - - - * CPU3: 500 / 768 * 800 = 520 |
| 237 | ## ## => total_energy = 1437 |
| 238 | 341 -PP - - - - ## ## |
| 239 | PP ## ## |
| 240 | 170 -## - - - - ## ## |
| 241 | ## ## ## ## |
| 242 | ------------ ------------- |
| 243 | CPU0 CPU1 CPU2 CPU3 |
| 244 | |
| 245 | |
| 246 | From these calculations, the Case 1 has the lowest total energy. So CPU 1 |
| 247 | is be the best candidate from an energy-efficiency standpoint. |
| 248 | |
| 249 | Big CPUs are generally more power hungry than the little ones and are thus used |
| 250 | mainly when a task doesn't fit the littles. However, little CPUs aren't always |
| 251 | necessarily more energy-efficient than big CPUs. For some systems, the high OPPs |
| 252 | of the little CPUs can be less energy-efficient than the lowest OPPs of the |
| 253 | bigs, for example. So, if the little CPUs happen to have enough utilization at |
| 254 | a specific point in time, a small task waking up at that moment could be better |
| 255 | of executing on the big side in order to save energy, even though it would fit |
| 256 | on the little side. |
| 257 | |
| 258 | And even in the case where all OPPs of the big CPUs are less energy-efficient |
| 259 | than those of the little, using the big CPUs for a small task might still, under |
| 260 | specific conditions, save energy. Indeed, placing a task on a little CPU can |
| 261 | result in raising the OPP of the entire performance domain, and that will |
| 262 | increase the cost of the tasks already running there. If the waking task is |
| 263 | placed on a big CPU, its own execution cost might be higher than if it was |
| 264 | running on a little, but it won't impact the other tasks of the little CPUs |
| 265 | which will keep running at a lower OPP. So, when considering the total energy |
| 266 | consumed by CPUs, the extra cost of running that one task on a big core can be |
| 267 | smaller than the cost of raising the OPP on the little CPUs for all the other |
| 268 | tasks. |
| 269 | |
| 270 | The examples above would be nearly impossible to get right in a generic way, and |
| 271 | for all platforms, without knowing the cost of running at different OPPs on all |
| 272 | CPUs of the system. Thanks to its EM-based design, EAS should cope with them |
| 273 | correctly without too many troubles. However, in order to ensure a minimal |
| 274 | impact on throughput for high-utilization scenarios, EAS also implements another |
| 275 | mechanism called 'over-utilization'. |
| 276 | |
| 277 | |
| 278 | 5. Over-utilization |
| 279 | ------------------- |
| 280 | |
| 281 | From a general standpoint, the use-cases where EAS can help the most are those |
| 282 | involving a light/medium CPU utilization. Whenever long CPU-bound tasks are |
| 283 | being run, they will require all of the available CPU capacity, and there isn't |
| 284 | much that can be done by the scheduler to save energy without severly harming |
| 285 | throughput. In order to avoid hurting performance with EAS, CPUs are flagged as |
| 286 | 'over-utilized' as soon as they are used at more than 80% of their compute |
| 287 | capacity. As long as no CPUs are over-utilized in a root domain, load balancing |
| 288 | is disabled and EAS overridess the wake-up balancing code. EAS is likely to load |
| 289 | the most energy efficient CPUs of the system more than the others if that can be |
| 290 | done without harming throughput. So, the load-balancer is disabled to prevent |
| 291 | it from breaking the energy-efficient task placement found by EAS. It is safe to |
| 292 | do so when the system isn't overutilized since being below the 80% tipping point |
| 293 | implies that: |
| 294 | |
| 295 | a. there is some idle time on all CPUs, so the utilization signals used by |
| 296 | EAS are likely to accurately represent the 'size' of the various tasks |
| 297 | in the system; |
| 298 | b. all tasks should already be provided with enough CPU capacity, |
| 299 | regardless of their nice values; |
| 300 | c. since there is spare capacity all tasks must be blocking/sleeping |
| 301 | regularly and balancing at wake-up is sufficient. |
| 302 | |
| 303 | As soon as one CPU goes above the 80% tipping point, at least one of the three |
| 304 | assumptions above becomes incorrect. In this scenario, the 'overutilized' flag |
| 305 | is raised for the entire root domain, EAS is disabled, and the load-balancer is |
| 306 | re-enabled. By doing so, the scheduler falls back onto load-based algorithms for |
| 307 | wake-up and load balance under CPU-bound conditions. This provides a better |
| 308 | respect of the nice values of tasks. |
| 309 | |
| 310 | Since the notion of overutilization largely relies on detecting whether or not |
| 311 | there is some idle time in the system, the CPU capacity 'stolen' by higher |
| 312 | (than CFS) scheduling classes (as well as IRQ) must be taken into account. As |
| 313 | such, the detection of overutilization accounts for the capacity used not only |
| 314 | by CFS tasks, but also by the other scheduling classes and IRQ. |
| 315 | |
| 316 | |
| 317 | 6. Dependencies and requirements for EAS |
| 318 | ---------------------------------------- |
| 319 | |
| 320 | Energy Aware Scheduling depends on the CPUs of the system having specific |
| 321 | hardware properties and on other features of the kernel being enabled. This |
| 322 | section lists these dependencies and provides hints as to how they can be met. |
| 323 | |
| 324 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 325 | 6.1 - Asymmetric CPU topology |
| 326 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 327 | |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 328 | |
| 329 | As mentioned in the introduction, EAS is only supported on platforms with |
| 330 | asymmetric CPU topologies for now. This requirement is checked at run-time by |
Beata Michalska | adf3c31 | 2021-06-03 15:06:27 +0100 | [diff] [blame] | 331 | looking for the presence of the SD_ASYM_CPUCAPACITY_FULL flag when the scheduling |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 332 | domains are built. |
| 333 | |
Mauro Carvalho Chehab | e4e29e7 | 2020-09-09 16:10:36 +0200 | [diff] [blame] | 334 | See Documentation/scheduler/sched-capacity.rst for requirements to be met for this |
Valentin Schneider | 949bcb8 | 2020-07-31 20:20:16 +0100 | [diff] [blame] | 335 | flag to be set in the sched_domain hierarchy. |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 336 | |
| 337 | Please note that EAS is not fundamentally incompatible with SMP, but no |
| 338 | significant savings on SMP platforms have been observed yet. This restriction |
| 339 | could be amended in the future if proven otherwise. |
| 340 | |
| 341 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 342 | 6.2 - Energy Model presence |
| 343 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 344 | |
| 345 | EAS uses the EM of a platform to estimate the impact of scheduling decisions on |
| 346 | energy. So, your platform must provide power cost tables to the EM framework in |
| 347 | order to make EAS start. To do so, please refer to documentation of the |
Linus Torvalds | fb4da21 | 2019-07-15 20:44:49 -0700 | [diff] [blame] | 348 | independent EM framework in Documentation/power/energy-model.rst. |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 349 | |
| 350 | Please also note that the scheduling domains need to be re-built after the |
| 351 | EM has been registered in order to start EAS. |
| 352 | |
Lukasz Luba | 5a64f77 | 2020-11-03 09:05:58 +0000 | [diff] [blame] | 353 | EAS uses the EM to make a forecasting decision on energy usage and thus it is |
| 354 | more focused on the difference when checking possible options for task |
| 355 | placement. For EAS it doesn't matter whether the EM power values are expressed |
| 356 | in milli-Watts or in an 'abstract scale'. |
| 357 | |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 358 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 359 | 6.3 - Energy Model complexity |
| 360 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 361 | |
| 362 | The task wake-up path is very latency-sensitive. When the EM of a platform is |
| 363 | too complex (too many CPUs, too many performance domains, too many performance |
| 364 | states, ...), the cost of using it in the wake-up path can become prohibitive. |
| 365 | The energy-aware wake-up algorithm has a complexity of: |
| 366 | |
| 367 | C = Nd * (Nc + Ns) |
| 368 | |
| 369 | with: Nd the number of performance domains; Nc the number of CPUs; and Ns the |
| 370 | total number of OPPs (ex: for two perf. domains with 4 OPPs each, Ns = 8). |
| 371 | |
| 372 | A complexity check is performed at the root domain level, when scheduling |
| 373 | domains are built. EAS will not start on a root domain if its C happens to be |
| 374 | higher than the completely arbitrary EM_MAX_COMPLEXITY threshold (2048 at the |
| 375 | time of writing). |
| 376 | |
| 377 | If you really want to use EAS but the complexity of your platform's Energy |
| 378 | Model is too high to be used with a single root domain, you're left with only |
| 379 | two possible options: |
| 380 | |
| 381 | 1. split your system into separate, smaller, root domains using exclusive |
| 382 | cpusets and enable EAS locally on each of them. This option has the |
| 383 | benefit to work out of the box but the drawback of preventing load |
| 384 | balance between root domains, which can result in an unbalanced system |
| 385 | overall; |
| 386 | 2. submit patches to reduce the complexity of the EAS wake-up algorithm, |
| 387 | hence enabling it to cope with larger EMs in reasonable time. |
| 388 | |
| 389 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 390 | 6.4 - Schedutil governor |
| 391 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 392 | |
| 393 | EAS tries to predict at which OPP will the CPUs be running in the close future |
| 394 | in order to estimate their energy consumption. To do so, it is assumed that OPPs |
| 395 | of CPUs follow their utilization. |
| 396 | |
| 397 | Although it is very difficult to provide hard guarantees regarding the accuracy |
| 398 | of this assumption in practice (because the hardware might not do what it is |
| 399 | told to do, for example), schedutil as opposed to other CPUFreq governors at |
| 400 | least _requests_ frequencies calculated using the utilization signals. |
| 401 | Consequently, the only sane governor to use together with EAS is schedutil, |
| 402 | because it is the only one providing some degree of consistency between |
| 403 | frequency requests and energy predictions. |
| 404 | |
| 405 | Using EAS with any other governor than schedutil is not supported. |
| 406 | |
| 407 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 408 | 6.5 Scale-invariant utilization signals |
| 409 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 410 | |
| 411 | In order to make accurate prediction across CPUs and for all performance |
| 412 | states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can |
| 413 | be obtained using the architecture-defined arch_scale{cpu,freq}_capacity() |
| 414 | callbacks. |
| 415 | |
| 416 | Using EAS on a platform that doesn't implement these two callbacks is not |
| 417 | supported. |
| 418 | |
| 419 | |
Mauro Carvalho Chehab | d6a3b24 | 2019-06-12 14:53:03 -0300 | [diff] [blame] | 420 | 6.6 Multithreading (SMT) |
| 421 | ^^^^^^^^^^^^^^^^^^^^^^^^ |
Quentin Perret | 81a930d | 2019-01-10 11:05:46 +0000 | [diff] [blame] | 422 | |
| 423 | EAS in its current form is SMT unaware and is not able to leverage |
| 424 | multithreaded hardware to save energy. EAS considers threads as independent |
| 425 | CPUs, which can actually be counter-productive for both performance and energy. |
| 426 | |
| 427 | EAS on SMT is not supported. |