iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table

When writing a new table entry, we must ensure that the contents of the
table is made visible to the SMMU page table walker before the updated
table entry itself.

This is currently achieved using wmb(), which expands to an expensive and
unnecessary DSB instruction. Ideally, we'd just use cmpxchg64_release when
writing the table entry, but this doesn't have memory ordering semantics
on !SMP systems.

Instead, use dma_wmb(), which emits DMB OSHST. Strictly speaking, this
does more than we require (since it targets the outer-shareable domain),
but it's likely to be significantly faster than the DSB approach.

Reported-by: Linu Cherian <linu.cherian@cavium.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 6906294..af330f5 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -408,8 +408,12 @@ static arm_v7s_iopte arm_v7s_install_table(arm_v7s_iopte *table,
 	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
 		new |= ARM_V7S_ATTR_NS_TABLE;
 
-	/* Ensure the table itself is visible before its PTE can be */
-	wmb();
+	/*
+	 * Ensure the table itself is visible before its PTE can be.
+	 * Whilst we could get away with cmpxchg64_release below, this
+	 * doesn't have any ordering semantics when !CONFIG_SMP.
+	 */
+	dma_wmb();
 
 	old = cmpxchg_relaxed(ptep, curr, new);
 	__arm_v7s_pte_sync(ptep, 1, cfg);
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 52700fa..b182039 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -331,8 +331,12 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table,
 	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_NS)
 		new |= ARM_LPAE_PTE_NSTABLE;
 
-	/* Ensure the table itself is visible before its PTE can be */
-	wmb();
+	/*
+	 * Ensure the table itself is visible before its PTE can be.
+	 * Whilst we could get away with cmpxchg64_release below, this
+	 * doesn't have any ordering semantics when !CONFIG_SMP.
+	 */
+	dma_wmb();
 
 	old = cmpxchg64_relaxed(ptep, curr, new);