Will Deacon | 702f438 | 2021-07-30 12:24:43 +0100 | [diff] [blame] | 1 | ====================== |
| 2 | Asymmetric 32-bit SoCs |
| 3 | ====================== |
| 4 | |
| 5 | Author: Will Deacon <will@kernel.org> |
| 6 | |
| 7 | This document describes the impact of asymmetric 32-bit SoCs on the |
| 8 | execution of 32-bit (``AArch32``) applications. |
| 9 | |
| 10 | Date: 2021-05-17 |
| 11 | |
| 12 | Introduction |
| 13 | ============ |
| 14 | |
| 15 | Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset |
| 16 | of the CPUs are capable of executing 32-bit user applications. On such |
| 17 | a system, Linux by default treats the asymmetry as a "mismatch" and |
| 18 | disables support for both the ``PER_LINUX32`` personality and |
| 19 | ``execve(2)`` of 32-bit ELF binaries, with the latter returning |
| 20 | ``-ENOEXEC``. If the mismatch is detected during late onlining of a |
| 21 | 64-bit-only CPU, then the onlining operation fails and the new CPU is |
| 22 | unavailable for scheduling. |
| 23 | |
| 24 | Surprisingly, these SoCs have been produced with the intention of |
| 25 | running legacy 32-bit binaries. Unsurprisingly, that doesn't work very |
| 26 | well with the default behaviour of Linux. |
| 27 | |
| 28 | It seems inevitable that future SoCs will drop 32-bit support |
| 29 | altogether, so if you're stuck in the unenviable position of needing to |
| 30 | run 32-bit code on one of these transitionary platforms then you would |
| 31 | be wise to consider alternatives such as recompilation, emulation or |
| 32 | retirement. If neither of those options are practical, then read on. |
| 33 | |
| 34 | Enabling kernel support |
| 35 | ======================= |
| 36 | |
| 37 | Since the kernel support is not completely transparent to userspace, |
| 38 | allowing 32-bit tasks to run on an asymmetric 32-bit system requires an |
| 39 | explicit "opt-in" and can be enabled by passing the |
| 40 | ``allow_mismatched_32bit_el0`` parameter on the kernel command-line. |
| 41 | |
| 42 | For the remainder of this document we will refer to an *asymmetric |
| 43 | system* to mean an asymmetric 32-bit SoC running Linux with this kernel |
| 44 | command-line option enabled. |
| 45 | |
| 46 | Userspace impact |
| 47 | ================ |
| 48 | |
| 49 | 32-bit tasks running on an asymmetric system behave in mostly the same |
| 50 | way as on a homogeneous system, with a few key differences relating to |
| 51 | CPU affinity. |
| 52 | |
| 53 | sysfs |
| 54 | ----- |
| 55 | |
| 56 | The subset of CPUs capable of running 32-bit tasks is described in |
| 57 | ``/sys/devices/system/cpu/aarch32_el0`` and is documented further in |
| 58 | ``Documentation/ABI/testing/sysfs-devices-system-cpu``. |
| 59 | |
| 60 | **Note:** CPUs are advertised by this file as they are detected and so |
| 61 | late-onlining of 32-bit-capable CPUs can result in the file contents |
| 62 | being modified by the kernel at runtime. Once advertised, CPUs are never |
| 63 | removed from the file. |
| 64 | |
| 65 | ``execve(2)`` |
| 66 | ------------- |
| 67 | |
| 68 | On a homogeneous system, the CPU affinity of a task is preserved across |
| 69 | ``execve(2)``. This is not always possible on an asymmetric system, |
| 70 | specifically when the new program being executed is 32-bit yet the |
| 71 | affinity mask contains 64-bit-only CPUs. In this situation, the kernel |
| 72 | determines the new affinity mask as follows: |
| 73 | |
| 74 | 1. If the 32-bit-capable subset of the affinity mask is not empty, |
| 75 | then the affinity is restricted to that subset and the old affinity |
| 76 | mask is saved. This saved mask is inherited over ``fork(2)`` and |
| 77 | preserved across ``execve(2)`` of 32-bit programs. |
| 78 | |
| 79 | **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks. |
| 80 | See `SCHED_DEADLINE`_. |
| 81 | |
| 82 | 2. Otherwise, the cpuset hierarchy of the task is walked until an |
| 83 | ancestor is found containing at least one 32-bit-capable CPU. The |
| 84 | affinity of the task is then changed to match the 32-bit-capable |
| 85 | subset of the cpuset determined by the walk. |
| 86 | |
| 87 | 3. On failure (i.e. out of memory), the affinity is changed to the set |
| 88 | of all 32-bit-capable CPUs of which the kernel is aware. |
| 89 | |
| 90 | A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will |
| 91 | invalidate the affinity mask saved in (1) and attempt to restore the CPU |
| 92 | affinity of the task using the saved mask if it was previously valid. |
| 93 | This restoration may fail due to intervening changes to the deadline |
| 94 | policy or cpuset hierarchy, in which case the ``execve(2)`` continues |
| 95 | with the affinity unchanged. |
| 96 | |
| 97 | Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only |
| 98 | the 32-bit-capable CPUs of the requested affinity mask. On success, the |
| 99 | affinity for the task is updated and any saved mask from a prior |
| 100 | ``execve(2)`` is invalidated. |
| 101 | |
| 102 | ``SCHED_DEADLINE`` |
| 103 | ------------------ |
| 104 | |
| 105 | Explicit admission of a 32-bit deadline task to the default root domain |
| 106 | (e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric |
| 107 | 32-bit system unless admission control is disabled by writing -1 to |
| 108 | ``/proc/sys/kernel/sched_rt_runtime_us``. |
| 109 | |
| 110 | ``execve(2)`` of a 32-bit program from a 64-bit deadline task will |
| 111 | return ``-ENOEXEC`` if the root domain for the task contains any |
| 112 | 64-bit-only CPUs and admission control is enabled. Concurrent offlining |
| 113 | of 32-bit-capable CPUs may still necessitate the procedure described in |
| 114 | `execve(2)`_, in which case step (1) is skipped and a warning is |
| 115 | emitted on the console. |
| 116 | |
| 117 | **Note:** It is recommended that a set of 32-bit-capable CPUs are placed |
| 118 | into a separate root domain if ``SCHED_DEADLINE`` is to be used with |
| 119 | 32-bit tasks on an asymmetric system. Failure to do so is likely to |
| 120 | result in missed deadlines. |
| 121 | |
| 122 | Cpusets |
| 123 | ------- |
| 124 | |
| 125 | The affinity of a 32-bit task on an asymmetric system may include CPUs |
| 126 | that are not explicitly allowed by the cpuset to which it is attached. |
| 127 | This can occur as a result of the following two situations: |
| 128 | |
| 129 | - A 64-bit task attached to a cpuset which allows only 64-bit CPUs |
| 130 | executes a 32-bit program. |
| 131 | |
| 132 | - All of the 32-bit-capable CPUs allowed by a cpuset containing a |
| 133 | 32-bit task are offlined. |
| 134 | |
| 135 | In both of these cases, the new affinity is calculated according to step |
| 136 | (2) of the process described in `execve(2)`_ and the cpuset hierarchy is |
| 137 | unchanged irrespective of the cgroup version. |
| 138 | |
| 139 | CPU hotplug |
| 140 | ----------- |
| 141 | |
| 142 | On an asymmetric system, the first detected 32-bit-capable CPU is |
| 143 | prevented from being offlined by userspace and any such attempt will |
| 144 | return ``-EPERM``. Note that suspend is still permitted even if the |
| 145 | primary CPU (i.e. CPU 0) is 64-bit-only. |
| 146 | |
| 147 | KVM |
| 148 | --- |
| 149 | |
| 150 | Although KVM will not advertise 32-bit EL0 support to any vCPUs on an |
| 151 | asymmetric system, a broken guest at EL1 could still attempt to execute |
| 152 | 32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit |
| 153 | mode will return to host userspace with an ``exit_reason`` of |
| 154 | ``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully |
| 155 | re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation. |