[IA64] smp_flush_tlb_mm() should only send IPI's to cpus in cpu_vm_mask

Having flush_tlb_mm->smp_flush_tlb_mm() send an IPI to every cpu
on the system is occasionally triggering spin_lock contention in
generic_smp_call_function_interrupt().

Follow x86 arch's lead and only sends IPIs to the cpus in mm->cpu_vm_mask.

Experiments with this change have shown significant improvement in this
contention issue.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
diff --git a/arch/ia64/kernel/smp.c b/arch/ia64/kernel/smp.c
index c2d9823..5230eaa 100644
--- a/arch/ia64/kernel/smp.c
+++ b/arch/ia64/kernel/smp.c
@@ -301,15 +301,12 @@
 		return;
 	}
 
+	smp_call_function_mask(mm->cpu_vm_mask,
+		(void (*)(void *))local_finish_flush_tlb_mm, mm, 1);
+	local_irq_disable();
+	local_finish_flush_tlb_mm(mm);
+	local_irq_enable();
 	preempt_enable();
-	/*
-	 * We could optimize this further by using mm->cpu_vm_mask to track which CPUs
-	 * have been running in the address space.  It's not clear that this is worth the
-	 * trouble though: to avoid races, we have to raise the IPI on the target CPU
-	 * anyhow, and once a CPU is interrupted, the cost of local_flush_tlb_all() is
-	 * rather trivial.
-	 */
-	on_each_cpu((void (*)(void *))local_finish_flush_tlb_mm, mm, 1);
 }
 
 void arch_send_call_function_single_ipi(int cpu)