Mauro Carvalho Chehab | 2756df6 | 2020-02-10 07:02:43 +0100 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | =========================== |
Suraj Jitindar Singh | 6ccad8c | 2016-10-14 11:53:24 +1100 | [diff] [blame] | 4 | The KVM halt polling system |
| 5 | =========================== |
| 6 | |
| 7 | The KVM halt polling system provides a feature within KVM whereby the latency |
| 8 | of a guest can, under some circumstances, be reduced by polling in the host |
| 9 | for some time period after the guest has elected to no longer run by cedeing. |
| 10 | That is, when a guest vcpu has ceded, or in the case of powerpc when all of the |
| 11 | vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions |
| 12 | before giving up the cpu to the scheduler in order to let something else run. |
| 13 | |
| 14 | Polling provides a latency advantage in cases where the guest can be run again |
| 15 | very quickly by at least saving us a trip through the scheduler, normally on |
| 16 | the order of a few micro-seconds, although performance benefits are workload |
| 17 | dependant. In the event that no wakeup source arrives during the polling |
| 18 | interval or some other task on the runqueue is runnable the scheduler is |
| 19 | invoked. Thus halt polling is especially useful on workloads with very short |
| 20 | wakeup periods where the time spent halt polling is minimised and the time |
| 21 | savings of not invoking the scheduler are distinguishable. |
| 22 | |
| 23 | The generic halt polling code is implemented in: |
| 24 | |
| 25 | virt/kvm/kvm_main.c: kvm_vcpu_block() |
| 26 | |
| 27 | The powerpc kvm-hv specific case is implemented in: |
| 28 | |
| 29 | arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked() |
| 30 | |
| 31 | Halt Polling Interval |
| 32 | ===================== |
| 33 | |
| 34 | The maximum time for which to poll before invoking the scheduler, referred to |
| 35 | as the halt polling interval, is increased and decreased based on the perceived |
| 36 | effectiveness of the polling in an attempt to limit pointless polling. |
| 37 | This value is stored in either the vcpu struct: |
| 38 | |
| 39 | kvm_vcpu->halt_poll_ns |
| 40 | |
| 41 | or in the case of powerpc kvm-hv, in the vcore struct: |
| 42 | |
| 43 | kvmppc_vcore->halt_poll_ns |
| 44 | |
| 45 | Thus this is a per vcpu (or vcore) value. |
| 46 | |
| 47 | During polling if a wakeup source is received within the halt polling interval, |
| 48 | the interval is left unchanged. In the event that a wakeup source isn't |
| 49 | received during the polling interval (and thus schedule is invoked) there are |
| 50 | two options, either the polling interval and total block time[0] were less than |
| 51 | the global max polling interval (see module params below), or the total block |
| 52 | time was greater than the global max polling interval. |
| 53 | |
| 54 | In the event that both the polling interval and total block time were less than |
| 55 | the global max polling interval then the polling interval can be increased in |
| 56 | the hope that next time during the longer polling interval the wake up source |
| 57 | will be received while the host is polling and the latency benefits will be |
| 58 | received. The polling interval is grown in the function grow_halt_poll_ns() and |
Nir Weiner | 49113d3 | 2019-01-27 12:17:15 +0200 | [diff] [blame] | 59 | is multiplied by the module parameters halt_poll_ns_grow and |
| 60 | halt_poll_ns_grow_start. |
Suraj Jitindar Singh | 6ccad8c | 2016-10-14 11:53:24 +1100 | [diff] [blame] | 61 | |
| 62 | In the event that the total block time was greater than the global max polling |
| 63 | interval then the host will never poll for long enough (limited by the global |
| 64 | max) to wakeup during the polling interval so it may as well be shrunk in order |
| 65 | to avoid pointless polling. The polling interval is shrunk in the function |
| 66 | shrink_halt_poll_ns() and is divided by the module parameter |
| 67 | halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0. |
| 68 | |
| 69 | It is worth noting that this adjustment process attempts to hone in on some |
| 70 | steady state polling interval but will only really do a good job for wakeups |
| 71 | which come at an approximately constant rate, otherwise there will be constant |
| 72 | adjustment of the polling interval. |
| 73 | |
Mauro Carvalho Chehab | 2756df6 | 2020-02-10 07:02:43 +0100 | [diff] [blame] | 74 | [0] total block time: |
| 75 | the time between when the halt polling function is |
Suraj Jitindar Singh | 6ccad8c | 2016-10-14 11:53:24 +1100 | [diff] [blame] | 76 | invoked and a wakeup source received (irrespective of |
| 77 | whether the scheduler is invoked within that function). |
| 78 | |
| 79 | Module Parameters |
| 80 | ================= |
| 81 | |
| 82 | The kvm module has 3 tuneable module parameters to adjust the global max |
| 83 | polling interval as well as the rate at which the polling interval is grown and |
| 84 | shrunk. These variables are defined in include/linux/kvm_host.h and as module |
| 85 | parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the |
| 86 | powerpc kvm-hv case. |
| 87 | |
Mauro Carvalho Chehab | 2756df6 | 2020-02-10 07:02:43 +0100 | [diff] [blame] | 88 | +-----------------------+---------------------------+-------------------------+ |
| 89 | |Module Parameter | Description | Default Value | |
| 90 | +-----------------------+---------------------------+-------------------------+ |
| 91 | |halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT| |
| 92 | | | interval which defines | | |
| 93 | | | the ceiling value of the | | |
| 94 | | | polling interval for | (per arch value) | |
| 95 | | | each vcpu. | | |
| 96 | +-----------------------+---------------------------+-------------------------+ |
| 97 | |halt_poll_ns_grow | The value by which the | 2 | |
| 98 | | | halt polling interval is | | |
| 99 | | | multiplied in the | | |
| 100 | | | grow_halt_poll_ns() | | |
| 101 | | | function. | | |
| 102 | +-----------------------+---------------------------+-------------------------+ |
| 103 | |halt_poll_ns_grow_start| The initial value to grow | 10000 | |
| 104 | | | to from zero in the | | |
| 105 | | | grow_halt_poll_ns() | | |
| 106 | | | function. | | |
| 107 | +-----------------------+---------------------------+-------------------------+ |
| 108 | |halt_poll_ns_shrink | The value by which the | 0 | |
| 109 | | | halt polling interval is | | |
| 110 | | | divided in the | | |
| 111 | | | shrink_halt_poll_ns() | | |
| 112 | | | function. | | |
| 113 | +-----------------------+---------------------------+-------------------------+ |
Suraj Jitindar Singh | 6ccad8c | 2016-10-14 11:53:24 +1100 | [diff] [blame] | 114 | |
| 115 | These module parameters can be set from the debugfs files in: |
| 116 | |
| 117 | /sys/module/kvm/parameters/ |
| 118 | |
| 119 | Note: that these module parameters are system wide values and are not able to |
| 120 | be tuned on a per vm basis. |
| 121 | |
| 122 | Further Notes |
| 123 | ============= |
| 124 | |
Mauro Carvalho Chehab | 2756df6 | 2020-02-10 07:02:43 +0100 | [diff] [blame] | 125 | - Care should be taken when setting the halt_poll_ns module parameter as a large value |
| 126 | has the potential to drive the cpu usage to 100% on a machine which would be almost |
| 127 | entirely idle otherwise. This is because even if a guest has wakeups during which very |
| 128 | little work is done and which are quite far apart, if the period is shorter than the |
| 129 | global max polling interval (halt_poll_ns) then the host will always poll for the |
| 130 | entire block time and thus cpu utilisation will go to 100%. |
Suraj Jitindar Singh | 6ccad8c | 2016-10-14 11:53:24 +1100 | [diff] [blame] | 131 | |
Mauro Carvalho Chehab | 2756df6 | 2020-02-10 07:02:43 +0100 | [diff] [blame] | 132 | - Halt polling essentially presents a trade off between power usage and latency and |
| 133 | the module parameters should be used to tune the affinity for this. Idle cpu time is |
| 134 | essentially converted to host kernel time with the aim of decreasing latency when |
| 135 | entering the guest. |
Suraj Jitindar Singh | 6ccad8c | 2016-10-14 11:53:24 +1100 | [diff] [blame] | 136 | |
Mauro Carvalho Chehab | 2756df6 | 2020-02-10 07:02:43 +0100 | [diff] [blame] | 137 | - Halt polling will only be conducted by the host when no other tasks are runnable on |
| 138 | that cpu, otherwise the polling will cease immediately and schedule will be invoked to |
| 139 | allow that other task to run. Thus this doesn't allow a guest to denial of service the |
| 140 | cpu. |