blob: df5d7777fc6b53c49bb2d6b7bf16b51c5d110235 [file] [log] [blame]
Mike Rapoport16f9f7f2018-03-21 21:22:28 +02001.. _mmu_notifier:
2
Jérôme Glisse0f108512017-11-15 17:34:07 -08003When do you need to notify inside page table lock ?
Mike Rapoport16f9f7f2018-03-21 21:22:28 +02004===================================================
Jérôme Glisse0f108512017-11-15 17:34:07 -08005
6When clearing a pte/pmd we are given a choice to notify the event through
Mike Rapoport16f9f7f2018-03-21 21:22:28 +02007(notify version of \*_clear_flush call mmu_notifier_invalidate_range) under
Jérôme Glisse0f108512017-11-15 17:34:07 -08008the page table lock. But that notification is not necessary in all cases.
9
10For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when device use
11thing like ATS/PASID to get the IOMMU to walk the CPU page table to access a
12process virtual address space). There is only 2 cases when you need to notify
13those secondary TLB while holding page table lock when clearing a pte/pmd:
14
15 A) page backing address is free before mmu_notifier_invalidate_range_end()
16 B) a page table entry is updated to point to a new page (COW, write fault
17 on zero page, __replace_page(), ...)
18
19Case A is obvious you do not want to take the risk for the device to write to
20a page that might now be used by some completely different task.
21
22Case B is more subtle. For correctness it requires the following sequence to
23happen:
Mike Rapoport16f9f7f2018-03-21 21:22:28 +020024
Jérôme Glisse0f108512017-11-15 17:34:07 -080025 - take page table lock
26 - clear page table entry and notify ([pmd/pte]p_huge_clear_flush_notify())
27 - set page table entry to point to new page
28
29If clearing the page table entry is not followed by a notify before setting
30the new pte/pmd value then you can break memory model like C11 or C++11 for
31the device.
32
33Consider the following scenario (device use a feature similar to ATS/PASID):
34
Mike Rapoport16f9f7f2018-03-21 21:22:28 +020035Two address addrA and addrB such that \|addrA - addrB\| >= PAGE_SIZE we assume
Jérôme Glisse0f108512017-11-15 17:34:07 -080036they are write protected for COW (other case of B apply too).
37
Mike Rapoport16f9f7f2018-03-21 21:22:28 +020038::
39
40 [Time N] --------------------------------------------------------------------
41 CPU-thread-0 {try to write to addrA}
42 CPU-thread-1 {try to write to addrB}
43 CPU-thread-2 {}
44 CPU-thread-3 {}
45 DEV-thread-0 {read addrA and populate device TLB}
46 DEV-thread-2 {read addrB and populate device TLB}
47 [Time N+1] ------------------------------------------------------------------
48 CPU-thread-0 {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
49 CPU-thread-1 {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
50 CPU-thread-2 {}
51 CPU-thread-3 {}
52 DEV-thread-0 {}
53 DEV-thread-2 {}
54 [Time N+2] ------------------------------------------------------------------
55 CPU-thread-0 {COW_step1: {update page table to point to new page for addrA}}
56 CPU-thread-1 {COW_step1: {update page table to point to new page for addrB}}
57 CPU-thread-2 {}
58 CPU-thread-3 {}
59 DEV-thread-0 {}
60 DEV-thread-2 {}
61 [Time N+3] ------------------------------------------------------------------
62 CPU-thread-0 {preempted}
63 CPU-thread-1 {preempted}
64 CPU-thread-2 {write to addrA which is a write to new page}
65 CPU-thread-3 {}
66 DEV-thread-0 {}
67 DEV-thread-2 {}
68 [Time N+3] ------------------------------------------------------------------
69 CPU-thread-0 {preempted}
70 CPU-thread-1 {preempted}
71 CPU-thread-2 {}
72 CPU-thread-3 {write to addrB which is a write to new page}
73 DEV-thread-0 {}
74 DEV-thread-2 {}
75 [Time N+4] ------------------------------------------------------------------
76 CPU-thread-0 {preempted}
77 CPU-thread-1 {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
78 CPU-thread-2 {}
79 CPU-thread-3 {}
80 DEV-thread-0 {}
81 DEV-thread-2 {}
82 [Time N+5] ------------------------------------------------------------------
83 CPU-thread-0 {preempted}
84 CPU-thread-1 {}
85 CPU-thread-2 {}
86 CPU-thread-3 {}
87 DEV-thread-0 {read addrA from old page}
88 DEV-thread-2 {read addrB from new page}
Jérôme Glisse0f108512017-11-15 17:34:07 -080089
90So here because at time N+2 the clear page table entry was not pair with a
91notification to invalidate the secondary TLB, the device see the new value for
Colin Ian King94ebdd22020-10-22 15:26:53 +010092addrB before seeing the new value for addrA. This break total memory ordering
Jérôme Glisse0f108512017-11-15 17:34:07 -080093for the device.
94
95When changing a pte to write protect or to point to a new write protected page
96with same content (KSM) it is fine to delay the mmu_notifier_invalidate_range
97call to mmu_notifier_invalidate_range_end() outside the page table lock. This
98is true even if the thread doing the page table update is preempted right after
99releasing page table lock but before call mmu_notifier_invalidate_range_end().