blob: 49d9833af8719dba25db48badf01ca961f056b4d [file] [log] [blame]
Changbin Du28e21ea2019-05-08 23:21:26 +08001.. SPDX-License-Identifier: GPL-2.0
2
3======================
4Memory Protection Keys
5======================
6
Dave Hansenc51ff2c2017-11-10 16:12:28 -08007Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
8which is found on Intel's Skylake "Scalable Processor" Server CPUs.
9It will be avalable in future non-server parts.
10
11For anyone wishing to test or use this feature, it is available in
12Amazon's EC2 C5 instances and is known to work there using an Ubuntu
1317.04 image.
Dave Hansen591b1d82015-12-14 11:06:34 -080014
15Memory Protection Keys provides a mechanism for enforcing page-based
16protections, but without requiring modification of the page tables
17when an application changes protection domains. It works by
18dedicating 4 previously ignored bits in each page table entry to a
19"protection key", giving 16 possible keys.
20
21There is also a new user-accessible register (PKRU) with two separate
22bits (Access Disable and Write Disable) for each key. Being a CPU
23register, PKRU is inherently thread-local, potentially giving each
24thread a different set of protections from every other thread.
25
26There are two new instructions (RDPKRU/WRPKRU) for reading and writing
27to the new register. The feature is only available in 64-bit mode,
28even though there is theoretically space in the PAE PTEs. These
29permissions are enforced on data access only and have no effect on
30instruction fetches.
31
Changbin Du28e21ea2019-05-08 23:21:26 +080032Syscalls
33========
Dave Hansenc74fe392016-07-29 09:30:20 -070034
Changbin Du28e21ea2019-05-08 23:21:26 +080035There are 3 system calls which directly interact with pkeys::
Dave Hansenc74fe392016-07-29 09:30:20 -070036
37 int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
38 int pkey_free(int pkey);
39 int pkey_mprotect(unsigned long start, size_t len,
40 unsigned long prot, int pkey);
41
42Before a pkey can be used, it must first be allocated with
43pkey_alloc(). An application calls the WRPKRU instruction
44directly in order to change access permissions to memory covered
45with a key. In this example WRPKRU is wrapped by a C function
46called pkey_set().
Changbin Du28e21ea2019-05-08 23:21:26 +080047::
Dave Hansenc74fe392016-07-29 09:30:20 -070048
49 int real_prot = PROT_READ|PROT_WRITE;
Wang Kaif90e2d92017-07-24 21:03:46 +080050 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
Dave Hansenc74fe392016-07-29 09:30:20 -070051 ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
52 ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
53 ... application runs here
54
55Now, if the application needs to update the data at 'ptr', it can
Changbin Du28e21ea2019-05-08 23:21:26 +080056gain access, do the update, then remove its write access::
Dave Hansenc74fe392016-07-29 09:30:20 -070057
Wang Kaif90e2d92017-07-24 21:03:46 +080058 pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
Dave Hansenc74fe392016-07-29 09:30:20 -070059 *ptr = foo; // assign something
Wang Kaif90e2d92017-07-24 21:03:46 +080060 pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
Dave Hansenc74fe392016-07-29 09:30:20 -070061
62Now when it frees the memory, it will also free the pkey since it
Changbin Du28e21ea2019-05-08 23:21:26 +080063is no longer in use::
Dave Hansenc74fe392016-07-29 09:30:20 -070064
65 munmap(ptr, PAGE_SIZE);
66 pkey_free(pkey);
67
Changbin Du28e21ea2019-05-08 23:21:26 +080068.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
69 An example implementation can be found in
70 tools/testing/selftests/x86/protection_keys.c.
Dave Hansen6679dac2016-10-04 09:38:57 -070071
Changbin Du28e21ea2019-05-08 23:21:26 +080072Behavior
73========
Dave Hansenc74fe392016-07-29 09:30:20 -070074
75The kernel attempts to make protection keys consistent with the
Changbin Du28e21ea2019-05-08 23:21:26 +080076behavior of a plain mprotect(). For instance if you do this::
Dave Hansenc74fe392016-07-29 09:30:20 -070077
78 mprotect(ptr, size, PROT_NONE);
79 something(ptr);
80
Changbin Du28e21ea2019-05-08 23:21:26 +080081you can expect the same effects with protection keys when doing this::
Dave Hansenc74fe392016-07-29 09:30:20 -070082
83 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
84 pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
85 something(ptr);
86
87That should be true whether something() is a direct access to 'ptr'
Changbin Du28e21ea2019-05-08 23:21:26 +080088like::
Dave Hansenc74fe392016-07-29 09:30:20 -070089
90 *ptr = foo;
91
92or when the kernel does the access on the application's behalf like
Changbin Du28e21ea2019-05-08 23:21:26 +080093with a read()::
Dave Hansenc74fe392016-07-29 09:30:20 -070094
95 read(fd, ptr, 1);
96
97The kernel will send a SIGSEGV in both cases, but si_code will be set
98to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
99the plain mprotect() permissions are violated.