Blame - Documentation/virtual/kvm/halt-polling.txt - SHIFTPHONES/kernel/common

blob: 4f791b128dd27a0ed9bc4ad79eddc8794bcab2bd [file] [log] [blame]

Suraj Jitindar Singh	6ccad8c	2016-10-14 11:53:24 +1100	[diff] [blame]	1	The KVM halt polling system
				2	===========================
				3
				4	The KVM halt polling system provides a feature within KVM whereby the latency
				5	of a guest can, under some circumstances, be reduced by polling in the host
				6	for some time period after the guest has elected to no longer run by cedeing.
				7	That is, when a guest vcpu has ceded, or in the case of powerpc when all of the
				8	vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
				9	before giving up the cpu to the scheduler in order to let something else run.
				10
				11	Polling provides a latency advantage in cases where the guest can be run again
				12	very quickly by at least saving us a trip through the scheduler, normally on
				13	the order of a few micro-seconds, although performance benefits are workload
				14	dependant. In the event that no wakeup source arrives during the polling
				15	interval or some other task on the runqueue is runnable the scheduler is
				16	invoked. Thus halt polling is especially useful on workloads with very short
				17	wakeup periods where the time spent halt polling is minimised and the time
				18	savings of not invoking the scheduler are distinguishable.
				19
				20	The generic halt polling code is implemented in:
				21
				22	virt/kvm/kvm_main.c: kvm_vcpu_block()
				23
				24	The powerpc kvm-hv specific case is implemented in:
				25
				26	arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
				27
				28	Halt Polling Interval
				29	=====================
				30
				31	The maximum time for which to poll before invoking the scheduler, referred to
				32	as the halt polling interval, is increased and decreased based on the perceived
				33	effectiveness of the polling in an attempt to limit pointless polling.
				34	This value is stored in either the vcpu struct:
				35
				36	kvm_vcpu->halt_poll_ns
				37
				38	or in the case of powerpc kvm-hv, in the vcore struct:
				39
				40	kvmppc_vcore->halt_poll_ns
				41
				42	Thus this is a per vcpu (or vcore) value.
				43
				44	During polling if a wakeup source is received within the halt polling interval,
				45	the interval is left unchanged. In the event that a wakeup source isn't
				46	received during the polling interval (and thus schedule is invoked) there are
				47	two options, either the polling interval and total block time[0] were less than
				48	the global max polling interval (see module params below), or the total block
				49	time was greater than the global max polling interval.
				50
				51	In the event that both the polling interval and total block time were less than
				52	the global max polling interval then the polling interval can be increased in
				53	the hope that next time during the longer polling interval the wake up source
				54	will be received while the host is polling and the latency benefits will be
				55	received. The polling interval is grown in the function grow_halt_poll_ns() and
Nir Weiner	49113d3	2019-01-27 12:17:15 +0200	[diff] [blame^]	56	is multiplied by the module parameters halt_poll_ns_grow and
				57	halt_poll_ns_grow_start.
Suraj Jitindar Singh	6ccad8c	2016-10-14 11:53:24 +1100	[diff] [blame]	58
				59	In the event that the total block time was greater than the global max polling
				60	interval then the host will never poll for long enough (limited by the global
				61	max) to wakeup during the polling interval so it may as well be shrunk in order
				62	to avoid pointless polling. The polling interval is shrunk in the function
				63	shrink_halt_poll_ns() and is divided by the module parameter
				64	halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
				65
				66	It is worth noting that this adjustment process attempts to hone in on some
				67	steady state polling interval but will only really do a good job for wakeups
				68	which come at an approximately constant rate, otherwise there will be constant
				69	adjustment of the polling interval.
				70
				71	[0] total block time: the time between when the halt polling function is
				72	invoked and a wakeup source received (irrespective of
				73	whether the scheduler is invoked within that function).
				74
				75	Module Parameters
				76	=================
				77
				78	The kvm module has 3 tuneable module parameters to adjust the global max
				79	polling interval as well as the rate at which the polling interval is grown and
				80	shrunk. These variables are defined in include/linux/kvm_host.h and as module
				81	parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
				82	powerpc kvm-hv case.
				83
Nir Weiner	49113d3	2019-01-27 12:17:15 +0200	[diff] [blame^]	84	Module Parameter \| Description \| Default Value
Suraj Jitindar Singh	6ccad8c	2016-10-14 11:53:24 +1100	[diff] [blame]	85	--------------------------------------------------------------------------------
Nir Weiner	49113d3	2019-01-27 12:17:15 +0200	[diff] [blame^]	86	halt_poll_ns \| The global max polling \| KVM_HALT_POLL_NS_DEFAULT
				87	\| interval which defines \|
				88	\| the ceiling value of the \|
				89	\| polling interval for \| (per arch value)
				90	\| each vcpu. \|
Suraj Jitindar Singh	6ccad8c	2016-10-14 11:53:24 +1100	[diff] [blame]	91	--------------------------------------------------------------------------------
Nir Weiner	49113d3	2019-01-27 12:17:15 +0200	[diff] [blame^]	92	halt_poll_ns_grow \| The value by which the \| 2
				93	\| halt polling interval is \|
				94	\| multiplied in the \|
				95	\| grow_halt_poll_ns() \|
				96	\| function. \|
Suraj Jitindar Singh	6ccad8c	2016-10-14 11:53:24 +1100	[diff] [blame]	97	--------------------------------------------------------------------------------
Nir Weiner	49113d3	2019-01-27 12:17:15 +0200	[diff] [blame^]	98	halt_poll_ns_grow_start \| The initial value to grow \| 10000
				99	\| to from zero in the \|
				100	\| grow_halt_poll_ns() \|
				101	\| function. \|
				102	--------------------------------------------------------------------------------
				103	halt_poll_ns_shrink \| The value by which the \| 0
				104	\| halt polling interval is \|
				105	\| divided in the \|
				106	\| shrink_halt_poll_ns() \|
				107	\| function. \|
Suraj Jitindar Singh	6ccad8c	2016-10-14 11:53:24 +1100	[diff] [blame]	108	--------------------------------------------------------------------------------
				109
				110	These module parameters can be set from the debugfs files in:
				111
				112	/sys/module/kvm/parameters/
				113
				114	Note: that these module parameters are system wide values and are not able to
				115	be tuned on a per vm basis.
				116
				117	Further Notes
				118	=============
				119
				120	- Care should be taken when setting the halt_poll_ns module parameter as a
				121	large value has the potential to drive the cpu usage to 100% on a machine which
				122	would be almost entirely idle otherwise. This is because even if a guest has
				123	wakeups during which very little work is done and which are quite far apart, if
				124	the period is shorter than the global max polling interval (halt_poll_ns) then
				125	the host will always poll for the entire block time and thus cpu utilisation
				126	will go to 100%.
				127
				128	- Halt polling essentially presents a trade off between power usage and latency
				129	and the module parameters should be used to tune the affinity for this. Idle
				130	cpu time is essentially converted to host kernel time with the aim of decreasing
				131	latency when entering the guest.
				132
				133	- Halt polling will only be conducted by the host when no other tasks are
				134	runnable on that cpu, otherwise the polling will cease immediately and
				135	schedule will be invoked to allow that other task to run. Thus this doesn't
				136	allow a guest to denial of service the cpu.