blob: ee51eb66a578d97e91f245146387e3102b026f01 [file] [log] [blame]
Mauro Carvalho Chehabb693d0b2019-06-12 14:52:38 -03001==============================
2Memory Layout on AArch64 Linux
3==============================
4
5Author: Catalin Marinas <catalin.marinas@arm.com>
6
7This document describes the virtual memory layout used by the AArch64
8Linux kernel. The architecture allows up to 4 levels of translation
9tables with a 4KB page size and up to 3 levels with a 64KB page size.
10
11AArch64 Linux uses either 3 levels or 4 levels of translation tables
12with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
13(256TB) virtual addresses, respectively, for both user and kernel. With
1464KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
15virtual address, are used but the memory layout is the same.
16
Steve Capperd2c68de2019-08-07 16:55:24 +010017ARMv8.2 adds optional support for Large Virtual Address space. This is
18only available when running with a 64KB page size and expands the
19number of descriptors in the first level of translation.
20
Mauro Carvalho Chehabb693d0b2019-06-12 14:52:38 -030021User addresses have bits 63:48 set to 0 while the kernel addresses have
22the same bits set to 1. TTBRx selection is given by bit 63 of the
23virtual address. The swapper_pg_dir contains only kernel (global)
24mappings while the user pgd contains only user (non-global) mappings.
25The swapper_pg_dir address is written to TTBR1 and never written to
26TTBR0.
27
28
Steve Capperd2c68de2019-08-07 16:55:24 +010029AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
Mauro Carvalho Chehabb693d0b2019-06-12 14:52:38 -030030
31 Start End Size Use
32 -----------------------------------------------------------------------
33 0000000000000000 0000ffffffffffff 256TB user
Steve Capperd2c68de2019-08-07 16:55:24 +010034 ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
Ard Biesheuvelf4693c22020-10-08 17:36:00 +020035[ ffff600000000000 ffff7fffffffffff ] 32TB [ kasan shadow region ]
36 ffff800000000000 ffff800007ffffff 128MB bpf jit region
37 ffff800008000000 ffff80000fffffff 128MB modules
38 ffff800010000000 fffffdffbffeffff 125TB vmalloc
Steve Capperd2c68de2019-08-07 16:55:24 +010039 fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region]
40 fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings
41 fffffdfffea00000 fffffdfffebfffff 2MB [guard region]
42 fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space
43 fffffdffffc00000 fffffdffffdfffff 2MB [guard region]
44 fffffdffffe00000 ffffffffffdfffff 2TB vmemmap
45 ffffffffffe00000 ffffffffffffffff 2MB [guard region]
Mauro Carvalho Chehabb693d0b2019-06-12 14:52:38 -030046
47
Steve Capperd2c68de2019-08-07 16:55:24 +010048AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
Mauro Carvalho Chehabb693d0b2019-06-12 14:52:38 -030049
50 Start End Size Use
51 -----------------------------------------------------------------------
Steve Capperd2c68de2019-08-07 16:55:24 +010052 0000000000000000 000fffffffffffff 4PB user
Ard Biesheuvelf4693c22020-10-08 17:36:00 +020053 fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map
54[ fffd800000000000 ffff7fffffffffff ] 512TB [ kasan shadow region ]
55 ffff800000000000 ffff800007ffffff 128MB bpf jit region
56 ffff800008000000 ffff80000fffffff 128MB modules
57 ffff800010000000 fffff81ffffeffff 120TB vmalloc
Steve Capperd2c68de2019-08-07 16:55:24 +010058 fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region]
59 fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings
60 fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region]
61 fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space
62 fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region]
63 fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap
64 ffffffffffe00000 ffffffffffffffff 2MB [guard region]
Mauro Carvalho Chehabb693d0b2019-06-12 14:52:38 -030065
66
67Translation table lookup with 4KB pages::
68
69 +--------+--------+--------+--------+--------+--------+--------+--------+
70 |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
71 +--------+--------+--------+--------+--------+--------+--------+--------+
72 | | | | | |
73 | | | | | v
74 | | | | | [11:0] in-page offset
75 | | | | +-> [20:12] L3 index
76 | | | +-----------> [29:21] L2 index
77 | | +---------------------> [38:30] L1 index
78 | +-------------------------------> [47:39] L0 index
79 +-------------------------------------------------> [63] TTBR0/1
80
81
82Translation table lookup with 64KB pages::
83
84 +--------+--------+--------+--------+--------+--------+--------+--------+
85 |63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0|
86 +--------+--------+--------+--------+--------+--------+--------+--------+
87 | | | | |
88 | | | | v
89 | | | | [15:0] in-page offset
90 | | | +----------> [28:16] L3 index
91 | | +--------------------------> [41:29] L2 index
Steve Capperd2c68de2019-08-07 16:55:24 +010092 | +-------------------------------> [47:42] L1 index (48-bit)
93 | [51:42] L1 index (52-bit)
Mauro Carvalho Chehabb693d0b2019-06-12 14:52:38 -030094 +-------------------------------------------------> [63] TTBR0/1
95
96
97When using KVM without the Virtualization Host Extensions, the
98hypervisor maps kernel pages in EL2 at a fixed (and potentially
99random) offset from the linear mapping. See the kern_hyp_va macro and
100kvm_update_va_mask function for more details. MMIO devices such as
101GICv2 gets mapped next to the HYP idmap page, as do vectors when
102ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
103
104When using KVM with the Virtualization Host Extensions, no additional
105mappings are created, since the host kernel runs directly in EL2.
Steve Capperd2c68de2019-08-07 16:55:24 +0100106
10752-bit VA support in the kernel
108-------------------------------
109If the ARMv8.2-LVA optional feature is present, and we are running
110with a 64KB page size; then it is possible to use 52-bits of address
111space for both userspace and kernel addresses. However, any kernel
112binary that supports 52-bit must also be able to fall back to 48-bit
113at early boot time if the hardware feature is not present.
114
115This fallback mechanism necessitates the kernel .text to be in the
116higher addresses such that they are invariant to 48/52-bit VAs. Due
117to the kasan shadow being a fraction of the entire kernel VA space,
118the end of the kasan shadow must also be in the higher half of the
119kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
120the end of the kasan shadow is invariant and dependent on ~0UL,
121whilst the start address will "grow" towards the lower addresses).
122
123In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
124is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
125this obviates the need for an extra variable read. The physvirt
126offset and vmemmap offsets are computed at early boot to enable
127this logic.
128
129As a single binary will need to support both 48-bit and 52-bit VA
130spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
Scott Brandence4a64e2020-02-19 14:14:03 -0800131also must be sized large enough to accommodate a fixed PAGE_OFFSET.
Steve Capperd2c68de2019-08-07 16:55:24 +0100132
133Most code in the kernel should not need to consider the VA_BITS, for
134code that does need to know the VA size the variables are
135defined as follows:
136
137VA_BITS constant the *maximum* VA space size
138
139VA_BITS_MIN constant the *minimum* VA space size
140
141vabits_actual variable the *actual* VA space size
142
143
144Maximum and minimum sizes can be useful to ensure that buffers are
145sized large enough or that addresses are positioned close enough for
146the "worst" case.
147
14852-bit userspace VAs
149--------------------
150To maintain compatibility with software that relies on the ARMv8.0
151VA space maximum size of 48-bits, the kernel will, by default,
152return virtual addresses to userspace from a 48-bit range.
153
154Software can "opt-in" to receiving VAs from a 52-bit space by
155specifying an mmap hint parameter that is larger than 48-bit.
Adam Zerellaa2b99dc2019-09-28 22:58:19 +1000156
Steve Capperd2c68de2019-08-07 16:55:24 +0100157For example:
Adam Zerellaa2b99dc2019-09-28 22:58:19 +1000158
159.. code-block:: c
160
161 maybe_high_address = mmap(~0UL, size, prot, flags,...);
Steve Capperd2c68de2019-08-07 16:55:24 +0100162
163It is also possible to build a debug kernel that returns addresses
164from a 52-bit space by enabling the following kernel config options:
Adam Zerellaa2b99dc2019-09-28 22:58:19 +1000165
166.. code-block:: sh
167
Steve Capperd2c68de2019-08-07 16:55:24 +0100168 CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
169
170Note that this option is only intended for debugging applications
171and should not be used in production.