blob: a08b36aba01594a82d3f2cad5e580352c365eda5 [file] [log] [blame]
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -03001==================================
2Memory Attribute Aliasing on IA-64
3==================================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -06004
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -03005Bjorn Helgaas <bjorn.helgaas@hp.com>
6
7May 4, 2006
Bjorn Helgaas32e62c62006-05-05 17:19:50 -06008
9
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030010Memory Attributes
11=================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060012
13 Itanium supports several attributes for virtual memory references.
14 The attribute is part of the virtual translation, i.e., it is
15 contained in the TLB entry. The ones of most interest to the Linux
16 kernel are:
17
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030018 == ======================
19 WB Write-back (cacheable)
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060020 UC Uncacheable
21 WC Write-coalescing
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030022 == ======================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060023
24 System memory typically uses the WB attribute. The UC attribute is
25 used for memory-mapped I/O devices. The WC attribute is uncacheable
26 like UC is, but writes may be delayed and combined to increase
27 performance for things like frame buffers.
28
29 The Itanium architecture requires that we avoid accessing the same
30 page with both a cacheable mapping and an uncacheable mapping[1].
31
32 The design of the chipset determines which attributes are supported
33 on which regions of the address space. For example, some chipsets
34 support either WB or UC access to main memory, while others support
35 only WB access.
36
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030037Memory Map
38==========
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060039
40 Platform firmware describes the physical memory map and the
41 supported attributes for each region. At boot-time, the kernel uses
42 the EFI GetMemoryMap() interface. ACPI can also describe memory
43 devices and the attributes they support, but Linux/ia64 currently
44 doesn't use this information.
45
46 The kernel uses the efi_memmap table returned from GetMemoryMap() to
47 learn the attributes supported by each region of physical address
48 space. Unfortunately, this table does not completely describe the
49 address space because some machines omit some or all of the MMIO
50 regions from the map.
51
52 The kernel maintains another table, kern_memmap, which describes the
53 memory Linux is actually using and the attribute for each region.
54 This contains only system memory; it does not contain MMIO space.
55
56 The kern_memmap table typically contains only a subset of the system
57 memory described by the efi_memmap. Linux/ia64 can't use all memory
58 in the system because of constraints imposed by the identity mapping
59 scheme.
60
61 The efi_memmap table is preserved unmodified because the original
62 boot-time information is required for kexec.
63
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030064Kernel Identify Mappings
65========================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060066
67 Linux/ia64 identity mappings are done with large pages, currently
68 either 16MB or 64MB, referred to as "granules." Cacheable mappings
69 are speculative[2], so the processor can read any location in the
70 page at any time, independent of the programmer's intentions. This
71 means that to avoid attribute aliasing, Linux can create a cacheable
72 identity mapping only when the entire granule supports cacheable
73 access.
74
75 Therefore, kern_memmap contains only full granule-sized regions that
76 can referenced safely by an identity mapping.
77
78 Uncacheable mappings are not speculative, so the processor will
79 generate UC accesses only to locations explicitly referenced by
80 software. This allows UC identity mappings to cover granules that
81 are only partially populated, or populated with a combination of UC
82 and WB regions.
83
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030084User Mappings
85=============
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060086
87 User mappings are typically done with 16K or 64K pages. The smaller
88 page size allows more flexibility because only 16K or 64K has to be
89 homogeneous with respect to memory attributes.
90
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030091Potential Attribute Aliasing Cases
92==================================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060093
94 There are several ways the kernel creates new mappings:
95
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -030096mmap of /dev/mem
97----------------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -060098
99 This uses remap_pfn_range(), which creates user mappings. These
100 mappings may be either WB or UC. If the region being mapped
101 happens to be in kern_memmap, meaning that it may also be mapped
102 by a kernel identity mapping, the user mapping must use the same
103 attribute as the kernel mapping.
104
105 If the region is not in kern_memmap, the user mapping should use
106 an attribute reported as being supported in the EFI memory map.
107
108 Since the EFI memory map does not describe MMIO on some
109 machines, this should use an uncacheable mapping as a fallback.
110
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300111mmap of /sys/class/pci_bus/.../legacy_mem
112-----------------------------------------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600113
114 This is very similar to mmap of /dev/mem, except that legacy_mem
115 only allows mmap of the one megabyte "legacy MMIO" area for a
116 specific PCI bus. Typically this is the first megabyte of
117 physical address space, but it may be different on machines with
118 several VGA devices.
119
120 "X" uses this to access VGA frame buffers. Using legacy_mem
121 rather than /dev/mem allows multiple instances of X to talk to
122 different VGA cards.
123
124 The /dev/mem mmap constraints apply.
125
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300126mmap of /proc/bus/pci/.../??.?
127------------------------------
Alex Chiang012b7102007-07-11 11:02:15 -0600128
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300129 This is an MMIO mmap of PCI functions, which additionally may or
Alex Chiang012b7102007-07-11 11:02:15 -0600130 may not be requested as using the WC attribute.
131
132 If WC is requested, and the region in kern_memmap is either WC
133 or UC, and the EFI memory map designates the region as WC, then
134 the WC mapping is allowed.
135
136 Otherwise, the user mapping must use the same attribute as the
137 kernel mapping.
138
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300139read/write of /dev/mem
140----------------------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600141
142 This uses copy_from_user(), which implicitly uses a kernel
143 identity mapping. This is obviously safe for things in
144 kern_memmap.
145
146 There may be corner cases of things that are not in kern_memmap,
147 but could be accessed this way. For example, registers in MMIO
148 space are not in kern_memmap, but could be accessed with a UC
149 mapping. This would not cause attribute aliasing. But
150 registers typically can be accessed only with four-byte or
151 eight-byte accesses, and the copy_from_user() path doesn't allow
152 any control over the access size, so this would be dangerous.
153
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300154ioremap()
155---------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600156
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600157 This returns a mapping for use inside the kernel.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600158
159 If the region is in kern_memmap, we should use the attribute
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600160 specified there.
161
162 If the EFI memory map reports that the entire granule supports
163 WB, we should use that (granules that are partially reserved
164 or occupied by firmware do not appear in kern_memmap).
165
166 If the granule contains non-WB memory, but we can cover the
167 region safely with kernel page table mappings, we can use
168 ioremap_page_range() as most other architectures do.
169
170 Failing all of the above, we have to fall back to a UC mapping.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600171
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300172Past Problem Cases
173==================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600174
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300175mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
176--------------------------------------------------------------------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600177
178 The EFI memory map may not report these MMIO regions.
179
180 These must be allowed so that X will work. This means that
181 when the EFI memory map is incomplete, every /dev/mem mmap must
182 succeed. It may create either WB or UC user mappings, depending
183 on whether the region is in kern_memmap or the EFI memory map.
184
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300185mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
186----------------------------------------------------------------------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600187
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600188 The EFI memory map reports the following attributes:
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300189
190 =============== ======= ==================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600191 0x00000-0x9FFFF WB only
192 0xA0000-0xBFFFF UC only (VGA frame buffer)
193 0xC0000-0xFFFFF WB only
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300194 =============== ======= ==================
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600195
196 This mmap is done with user pages, not kernel identity mappings,
197 so it is safe to use WB mappings.
198
199 The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600200 which uses a granule-sized UC mapping. This granule will cover some
201 WB-only memory, but since UC is non-speculative, the processor will
202 never generate an uncacheable reference to the WB-only areas unless
203 the driver explicitly touches them.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600204
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300205mmap of 0x0-0xFFFFF legacy_mem by "X"
206-------------------------------------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600207
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600208 If the EFI memory map reports that the entire range supports the
209 same attributes, we can allow the mmap (and we will prefer WB if
210 supported, as is the case with HP sx[12]000 machines with VGA
211 disabled).
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600212
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600213 If EFI reports the range as partly WB and partly UC (as on sx[12]000
214 machines with VGA enabled), we must fail the mmap because there's no
215 safe attribute to use.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600216
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600217 If EFI reports some of the range but not all (as on Intel firmware
218 that doesn't report the VGA frame buffer at all), we should fail the
219 mmap and force the user to map just the specific region of interest.
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600220
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300221mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
222------------------------------------------------------------------------
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600223
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300224 The EFI memory map reports the following attributes::
225
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600226 0x00000-0xFFFFF WB only (no VGA MMIO hole)
227
228 This is a special case of the previous case, and the mmap should
229 fail for the same reason as above.
230
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300231read of /sys/devices/.../rom
232----------------------------
Bjorn Helgaasddd83ef2007-03-30 10:39:42 -0600233
234 For VGA devices, this may cause an ioremap() of 0xC0000. This
235 used to be done with a UC mapping, because the VGA frame buffer
236 at 0xA0000 prevents use of a WB granule. The UC mapping causes
237 an MCA on HP sx[12]000 chipsets.
238
239 We should use WB page table mappings to avoid covering the VGA
240 frame buffer.
241
Mauro Carvalho Chehabdb9a0972019-04-18 10:10:33 -0300242Notes
243=====
Bjorn Helgaas32e62c62006-05-05 17:19:50 -0600244
245 [1] SDM rev 2.2, vol 2, sec 4.4.1.
246 [2] SDM rev 2.2, vol 2, sec 4.4.6.