KVM: x86: optimize delivery of TSC deadline timer interrupt

The newly-added tracepoint shows the following results on
the tscdeadline_latency test:

        qemu-kvm-8387  [002]  6425.558974: kvm_vcpu_wakeup:      poll time 10407 ns
        qemu-kvm-8387  [002]  6425.558984: kvm_vcpu_wakeup:      poll time 0 ns
        qemu-kvm-8387  [002]  6425.561242: kvm_vcpu_wakeup:      poll time 10477 ns
        qemu-kvm-8387  [002]  6425.561251: kvm_vcpu_wakeup:      poll time 0 ns

and so on.  This is because we need to go through kvm_vcpu_block again
after the timer IRQ is injected.  Avoid it by polling once before
entering kvm_vcpu_block.

On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%)
from the latency of the TSC deadline timer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6256dfa..50861dd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6406,12 +6406,13 @@
 
 static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
 {
-	srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
-	kvm_vcpu_block(vcpu);
-	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
-
-	if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
-		return 1;
+	if (!kvm_arch_vcpu_runnable(vcpu)) {
+		srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+		kvm_vcpu_block(vcpu);
+		vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+		if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
+			return 1;
+	}
 
 	kvm_apic_accept_events(vcpu);
 	switch(vcpu->arch.mp_state) {