Mauro Carvalho Chehab | b693d0b | 2019-06-12 14:52:38 -0300 | [diff] [blame] | 1 | =================================================== |
| 2 | Scalable Vector Extension support for AArch64 Linux |
| 3 | =================================================== |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 4 | |
| 5 | Author: Dave Martin <Dave.Martin@arm.com> |
Mauro Carvalho Chehab | b693d0b | 2019-06-12 14:52:38 -0300 | [diff] [blame] | 6 | |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 7 | Date: 4 August 2017 |
| 8 | |
| 9 | This document outlines briefly the interface provided to userspace by Linux in |
| 10 | order to support use of the ARM Scalable Vector Extension (SVE). |
| 11 | |
| 12 | This is an outline of the most important features and issues only and not |
| 13 | intended to be exhaustive. |
| 14 | |
| 15 | This document does not aim to describe the SVE architecture or programmer's |
| 16 | model. To aid understanding, a minimal description of relevant programmer's |
| 17 | model features for SVE is included in Appendix A. |
| 18 | |
| 19 | |
| 20 | 1. General |
| 21 | ----------- |
| 22 | |
| 23 | * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are |
| 24 | tracked per-thread. |
| 25 | |
| 26 | * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector |
| 27 | AT_HWCAP entry. Presence of this flag implies the presence of the SVE |
| 28 | instructions and registers, and the Linux-specific system interfaces |
| 29 | described in this document. SVE is reported in /proc/cpuinfo as "sve". |
| 30 | |
| 31 | * Support for the execution of SVE instructions in userspace can also be |
| 32 | detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS |
| 33 | instruction, and checking that the value of the SVE field is nonzero. [3] |
| 34 | |
| 35 | It does not guarantee the presence of the system interfaces described in the |
| 36 | following sections: software that needs to verify that those interfaces are |
| 37 | present must check for HWCAP_SVE instead. |
| 38 | |
Dave Martin | 06a916f | 2019-04-18 18:41:38 +0100 | [diff] [blame] | 39 | * On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also |
| 40 | be reported in the AT_HWCAP2 aux vector entry. In addition to this, |
| 41 | optional extensions to SVE2 may be reported by the presence of: |
| 42 | |
| 43 | HWCAP2_SVE2 |
| 44 | HWCAP2_SVEAES |
| 45 | HWCAP2_SVEPMULL |
| 46 | HWCAP2_SVEBITPERM |
| 47 | HWCAP2_SVESHA3 |
| 48 | HWCAP2_SVESM4 |
| 49 | |
| 50 | This list may be extended over time as the SVE architecture evolves. |
| 51 | |
| 52 | These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, |
| 53 | which userspace can read using an MRS instruction. See elf_hwcaps.txt and |
| 54 | cpu-feature-registers.txt for details. |
| 55 | |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 56 | * Debuggers should restrict themselves to interacting with the target via the |
| 57 | NT_ARM_SVE regset. The recommended way of detecting support for this regset |
| 58 | is to connect to a target process first and then attempt a |
| 59 | ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). |
| 60 | |
Dave Martin | 41040cf | 2019-06-12 17:00:32 +0100 | [diff] [blame] | 61 | * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory |
| 62 | between userspace and the kernel, the register value is encoded in memory in |
| 63 | an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at |
| 64 | byte offset i from the start of the memory representation. This affects for |
| 65 | example the signal frame (struct sve_context) and ptrace interface |
| 66 | (struct user_sve_header) and associated data. |
| 67 | |
| 68 | Beware that on big-endian systems this results in a different byte order than |
| 69 | for the FPSIMD V-registers, which are stored as single host-endian 128-bit |
| 70 | values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at |
| 71 | byte offset i. (struct fpsimd_context, struct user_fpsimd_state). |
| 72 | |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 73 | |
| 74 | 2. Vector length terminology |
| 75 | ----------------------------- |
| 76 | |
| 77 | The size of an SVE vector (Z) register is referred to as the "vector length". |
| 78 | |
| 79 | To avoid confusion about the units used to express vector length, the kernel |
| 80 | adopts the following conventions: |
| 81 | |
| 82 | * Vector length (VL) = size of a Z-register in bytes |
| 83 | |
| 84 | * Vector quadwords (VQ) = size of a Z-register in units of 128 bits |
| 85 | |
| 86 | (So, VL = 16 * VQ.) |
| 87 | |
| 88 | The VQ convention is used where the underlying granularity is important, such |
| 89 | as in data structure definitions. In most other situations, the VL convention |
| 90 | is used. This is consistent with the meaning of the "VL" pseudo-register in |
| 91 | the SVE instruction set architecture. |
| 92 | |
| 93 | |
| 94 | 3. System call behaviour |
| 95 | ------------------------- |
| 96 | |
| 97 | * On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of |
| 98 | Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR |
| 99 | become unspecified on return from a syscall. |
| 100 | |
| 101 | * The SVE registers are not used to pass arguments to or receive results from |
| 102 | any syscall. |
| 103 | |
| 104 | * In practice the affected registers/bits will be preserved or will be replaced |
| 105 | with zeros on return from a syscall, but userspace should not make |
| 106 | assumptions about this. The kernel behaviour may vary on a case-by-case |
| 107 | basis. |
| 108 | |
| 109 | * All other SVE state of a thread, including the currently configured vector |
| 110 | length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector |
| 111 | length (if any), is preserved across all syscalls, subject to the specific |
| 112 | exceptions for execve() described in section 6. |
| 113 | |
| 114 | In particular, on return from a fork() or clone(), the parent and new child |
| 115 | process or thread share identical SVE configuration, matching that of the |
| 116 | parent before the call. |
| 117 | |
| 118 | |
| 119 | 4. Signal handling |
| 120 | ------------------- |
| 121 | |
| 122 | * A new signal frame record sve_context encodes the SVE registers on signal |
| 123 | delivery. [1] |
| 124 | |
| 125 | * This record is supplementary to fpsimd_context. The FPSR and FPCR registers |
| 126 | are only present in fpsimd_context. For convenience, the content of V0..V31 |
| 127 | is duplicated between sve_context and fpsimd_context. |
| 128 | |
| 129 | * The signal frame record for SVE always contains basic metadata, in particular |
| 130 | the thread's vector length (in sve_context.vl). |
| 131 | |
| 132 | * The SVE registers may or may not be included in the record, depending on |
| 133 | whether the registers are live for the thread. The registers are present if |
| 134 | and only if: |
| 135 | sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). |
| 136 | |
| 137 | * If the registers are present, the remainder of the record has a vl-dependent |
| 138 | size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to |
| 139 | the members. |
| 140 | |
Dave Martin | 41040cf | 2019-06-12 17:00:32 +0100 | [diff] [blame] | 141 | * Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant |
| 142 | layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the |
| 143 | start of the register's representation in memory. |
| 144 | |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 145 | * If the SVE context is too big to fit in sigcontext.__reserved[], then extra |
| 146 | space is allocated on the stack, an extra_context record is written in |
| 147 | __reserved[] referencing this space. sve_context is then written in the |
| 148 | extra space. Refer to [1] for further details about this mechanism. |
| 149 | |
| 150 | |
| 151 | 5. Signal return |
| 152 | ----------------- |
| 153 | |
| 154 | When returning from a signal handler: |
| 155 | |
| 156 | * If there is no sve_context record in the signal frame, or if the record is |
| 157 | present but contains no register data as desribed in the previous section, |
| 158 | then the SVE registers/bits become non-live and take unspecified values. |
| 159 | |
| 160 | * If sve_context is present in the signal frame and contains full register |
| 161 | data, the SVE registers become live and are populated with the specified |
| 162 | data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 |
| 163 | are always restored from the corresponding members of fpsimd_context.vregs[] |
| 164 | and not from sve_context. The remaining bits are restored from sve_context. |
| 165 | |
| 166 | * Inclusion of fpsimd_context in the signal frame remains mandatory, |
| 167 | irrespective of whether sve_context is present or not. |
| 168 | |
| 169 | * The vector length cannot be changed via signal return. If sve_context.vl in |
| 170 | the signal frame does not match the current vector length, the signal return |
| 171 | attempt is treated as illegal, resulting in a forced SIGSEGV. |
| 172 | |
| 173 | |
| 174 | 6. prctl extensions |
| 175 | -------------------- |
| 176 | |
| 177 | Some new prctl() calls are added to allow programs to manage the SVE vector |
| 178 | length: |
| 179 | |
| 180 | prctl(PR_SVE_SET_VL, unsigned long arg) |
| 181 | |
| 182 | Sets the vector length of the calling thread and related flags, where |
| 183 | arg == vl | flags. Other threads of the calling process are unaffected. |
| 184 | |
| 185 | vl is the desired vector length, where sve_vl_valid(vl) must be true. |
| 186 | |
| 187 | flags: |
| 188 | |
Dave Martin | 9ba6a9e | 2020-06-10 18:03:09 +0100 | [diff] [blame] | 189 | PR_SVE_VL_INHERIT |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 190 | |
| 191 | Inherit the current vector length across execve(). Otherwise, the |
| 192 | vector length is reset to the system default at execve(). (See |
| 193 | Section 9.) |
| 194 | |
| 195 | PR_SVE_SET_VL_ONEXEC |
| 196 | |
| 197 | Defer the requested vector length change until the next execve() |
| 198 | performed by this thread. |
| 199 | |
| 200 | The effect is equivalent to implicit exceution of the following |
| 201 | call immediately after the next execve() (if any) by the thread: |
| 202 | |
| 203 | prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) |
| 204 | |
| 205 | This allows launching of a new program with a different vector |
| 206 | length, while avoiding runtime side effects in the caller. |
| 207 | |
| 208 | |
| 209 | Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect |
| 210 | immediately. |
| 211 | |
| 212 | |
| 213 | Return value: a nonnegative on success, or a negative value on error: |
| 214 | EINVAL: SVE not supported, invalid vector length requested, or |
| 215 | invalid flags. |
| 216 | |
| 217 | |
| 218 | On success: |
| 219 | |
| 220 | * Either the calling thread's vector length or the deferred vector length |
| 221 | to be applied at the next execve() by the thread (dependent on whether |
| 222 | PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value |
| 223 | supported by the system that is less than or equal to vl. If vl == |
| 224 | SVE_VL_MAX, the value set will be the largest value supported by the |
| 225 | system. |
| 226 | |
| 227 | * Any previously outstanding deferred vector length change in the calling |
| 228 | thread is cancelled. |
| 229 | |
| 230 | * The returned value describes the resulting configuration, encoded as for |
| 231 | PR_SVE_GET_VL. The vector length reported in this value is the new |
| 232 | current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not |
| 233 | present in arg; otherwise, the reported vector length is the deferred |
| 234 | vector length that will be applied at the next execve() by the calling |
| 235 | thread. |
| 236 | |
| 237 | * Changing the vector length causes all of P0..P15, FFR and all bits of |
Julien Grall | afce0cc | 2018-08-14 11:33:32 +0100 | [diff] [blame] | 238 | Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 239 | unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current |
| 240 | vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC |
| 241 | flag, does not constitute a change to the vector length for this purpose. |
| 242 | |
| 243 | |
| 244 | prctl(PR_SVE_GET_VL) |
| 245 | |
| 246 | Gets the vector length of the calling thread. |
| 247 | |
| 248 | The following flag may be OR-ed into the result: |
| 249 | |
Dave Martin | 9ba6a9e | 2020-06-10 18:03:09 +0100 | [diff] [blame] | 250 | PR_SVE_VL_INHERIT |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 251 | |
| 252 | Vector length will be inherited across execve(). |
| 253 | |
| 254 | There is no way to determine whether there is an outstanding deferred |
| 255 | vector length change (which would only normally be the case between a |
| 256 | fork() or vfork() and the corresponding execve() in typical use). |
| 257 | |
| 258 | To extract the vector length from the result, and it with |
| 259 | PR_SVE_VL_LEN_MASK. |
| 260 | |
| 261 | Return value: a nonnegative value on success, or a negative value on error: |
| 262 | EINVAL: SVE not supported. |
| 263 | |
| 264 | |
| 265 | 7. ptrace extensions |
| 266 | --------------------- |
| 267 | |
| 268 | * A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and |
| 269 | PTRACE_SETREGSET. |
| 270 | |
| 271 | Refer to [2] for definitions. |
| 272 | |
| 273 | The regset data starts with struct user_sve_header, containing: |
| 274 | |
| 275 | size |
| 276 | |
| 277 | Size of the complete regset, in bytes. |
| 278 | This depends on vl and possibly on other things in the future. |
| 279 | |
| 280 | If a call to PTRACE_GETREGSET requests less data than the value of |
| 281 | size, the caller can allocate a larger buffer and retry in order to |
| 282 | read the complete regset. |
| 283 | |
| 284 | max_size |
| 285 | |
| 286 | Maximum size in bytes that the regset can grow to for the target |
| 287 | thread. The regset won't grow bigger than this even if the target |
| 288 | thread changes its vector length etc. |
| 289 | |
| 290 | vl |
| 291 | |
| 292 | Target thread's current vector length, in bytes. |
| 293 | |
| 294 | max_vl |
| 295 | |
| 296 | Maximum possible vector length for the target thread. |
| 297 | |
| 298 | flags |
| 299 | |
| 300 | either |
| 301 | |
| 302 | SVE_PT_REGS_FPSIMD |
| 303 | |
| 304 | SVE registers are not live (GETREGSET) or are to be made |
| 305 | non-live (SETREGSET). |
| 306 | |
| 307 | The payload is of type struct user_fpsimd_state, with the same |
| 308 | meaning as for NT_PRFPREG, starting at offset |
| 309 | SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. |
| 310 | |
| 311 | Extra data might be appended in the future: the size of the |
| 312 | payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). |
| 313 | |
| 314 | vq should be obtained using sve_vq_from_vl(vl). |
| 315 | |
| 316 | or |
| 317 | |
| 318 | SVE_PT_REGS_SVE |
| 319 | |
| 320 | SVE registers are live (GETREGSET) or are to be made live |
| 321 | (SETREGSET). |
| 322 | |
| 323 | The payload contains the SVE register data, starting at offset |
| 324 | SVE_PT_SVE_OFFSET from the start of user_sve_header, and with |
| 325 | size SVE_PT_SVE_SIZE(vq, flags); |
| 326 | |
| 327 | ... OR-ed with zero or more of the following flags, which have the same |
| 328 | meaning and behaviour as the corresponding PR_SET_VL_* flags: |
| 329 | |
| 330 | SVE_PT_VL_INHERIT |
| 331 | |
| 332 | SVE_PT_VL_ONEXEC (SETREGSET only). |
| 333 | |
| 334 | * The effects of changing the vector length and/or flags are equivalent to |
| 335 | those documented for PR_SVE_SET_VL. |
| 336 | |
| 337 | The caller must make a further GETREGSET call if it needs to know what VL is |
| 338 | actually set by SETREGSET, unless is it known in advance that the requested |
| 339 | VL is supported. |
| 340 | |
| 341 | * In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on |
| 342 | the header fields. The SVE_PT_SVE_*() macros are provided to facilitate |
| 343 | access to the members. |
| 344 | |
| 345 | * In either case, for SETREGSET it is permissible to omit the payload, in which |
| 346 | case only the vector length and flags are changed (along with any |
| 347 | consequences of those changes). |
| 348 | |
| 349 | * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the |
| 350 | requested VL is not supported, the effect will be the same as if the |
| 351 | payload were omitted, except that an EIO error is reported. No |
| 352 | attempt is made to translate the payload data to the correct layout |
| 353 | for the vector length actually set. The thread's FPSIMD state is |
| 354 | preserved, but the remaining bits of the SVE registers become |
| 355 | unspecified. It is up to the caller to translate the payload layout |
| 356 | for the actual VL and retry. |
| 357 | |
| 358 | * The effect of writing a partial, incomplete payload is unspecified. |
| 359 | |
| 360 | |
| 361 | 8. ELF coredump extensions |
| 362 | --------------------------- |
| 363 | |
| 364 | * A NT_ARM_SVE note will be added to each coredump for each thread of the |
| 365 | dumped process. The contents will be equivalent to the data that would have |
| 366 | been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread |
| 367 | when the coredump was generated. |
| 368 | |
| 369 | |
| 370 | 9. System runtime configuration |
| 371 | -------------------------------- |
| 372 | |
| 373 | * To mitigate the ABI impact of expansion of the signal frame, a policy |
| 374 | mechanism is provided for administrators, distro maintainers and developers |
| 375 | to set the default vector length for userspace processes: |
| 376 | |
| 377 | /proc/sys/abi/sve_default_vector_length |
| 378 | |
| 379 | Writing the text representation of an integer to this file sets the system |
| 380 | default vector length to the specified value, unless the value is greater |
| 381 | than the maximum vector length supported by the system in which case the |
| 382 | default vector length is set to that maximum. |
| 383 | |
| 384 | The result can be determined by reopening the file and reading its |
| 385 | contents. |
| 386 | |
| 387 | At boot, the default vector length is initially set to 64 or the maximum |
| 388 | supported vector length, whichever is smaller. This determines the initial |
| 389 | vector length of the init process (PID 1). |
| 390 | |
| 391 | Reading this file returns the current system default vector length. |
| 392 | |
| 393 | * At every execve() call, the new vector length of the new process is set to |
| 394 | the system default vector length, unless |
| 395 | |
Dave Martin | 9ba6a9e | 2020-06-10 18:03:09 +0100 | [diff] [blame] | 396 | * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 397 | calling thread, or |
| 398 | |
| 399 | * a deferred vector length change is pending, established via the |
| 400 | PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). |
| 401 | |
| 402 | * Modifying the system default vector length does not affect the vector length |
| 403 | of any existing process or thread that does not make an execve() call. |
| 404 | |
| 405 | |
| 406 | Appendix A. SVE programmer's model (informative) |
| 407 | ================================================= |
| 408 | |
| 409 | This section provides a minimal description of the additions made by SVE to the |
| 410 | ARMv8-A programmer's model that are relevant to this document. |
| 411 | |
| 412 | Note: This section is for information only and not intended to be complete or |
| 413 | to replace any architectural specification. |
| 414 | |
| 415 | A.1. Registers |
| 416 | --------------- |
| 417 | |
| 418 | In A64 state, SVE adds the following: |
| 419 | |
| 420 | * 32 8VL-bit vector registers Z0..Z31 |
| 421 | For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. |
| 422 | |
| 423 | A register write using a Vn register name zeros all bits of the corresponding |
| 424 | Zn except for bits [127:0]. |
| 425 | |
| 426 | * 16 VL-bit predicate registers P0..P15 |
| 427 | |
| 428 | * 1 VL-bit special-purpose predicate register FFR (the "first-fault register") |
| 429 | |
| 430 | * a VL "pseudo-register" that determines the size of each vector register |
| 431 | |
| 432 | The SVE instruction set architecture provides no way to write VL directly. |
| 433 | Instead, it can be modified only by EL1 and above, by writing appropriate |
| 434 | system registers. |
| 435 | |
| 436 | * The value of VL can be configured at runtime by EL1 and above: |
| 437 | 16 <= VL <= VLmax, where VL must be a multiple of 16. |
| 438 | |
| 439 | * The maximum vector length is determined by the hardware: |
| 440 | 16 <= VLmax <= 256. |
| 441 | |
| 442 | (The SVE architecture specifies 256, but permits future architecture |
| 443 | revisions to raise this limit.) |
| 444 | |
| 445 | * FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point |
| 446 | operations in a similar way to the way in which they interact with ARMv8 |
Mauro Carvalho Chehab | b693d0b | 2019-06-12 14:52:38 -0300 | [diff] [blame] | 447 | floating-point operations:: |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 448 | |
| 449 | 8VL-1 128 0 bit index |
| 450 | +---- //// -----------------+ |
| 451 | Z0 | : V0 | |
| 452 | : : |
| 453 | Z7 | : V7 | |
| 454 | Z8 | : * V8 | |
| 455 | : : : |
| 456 | Z15 | : *V15 | |
| 457 | Z16 | : V16 | |
| 458 | : : |
| 459 | Z31 | : V31 | |
| 460 | +---- //// -----------------+ |
| 461 | 31 0 |
| 462 | VL-1 0 +-------+ |
| 463 | +---- //// --+ FPSR | | |
| 464 | P0 | | +-------+ |
| 465 | : | | *FPCR | | |
| 466 | P15 | | +-------+ |
| 467 | +---- //// --+ |
| 468 | FFR | | +-----+ |
| 469 | +---- //// --+ VL | | |
| 470 | +-----+ |
| 471 | |
| 472 | (*) callee-save: |
| 473 | This only applies to bits [63:0] of Z-/V-registers. |
| 474 | FPCR contains callee-save and caller-save bits. See [4] for details. |
| 475 | |
| 476 | |
| 477 | A.2. Procedure call standard |
| 478 | ----------------------------- |
| 479 | |
| 480 | The ARMv8-A base procedure call standard is extended as follows with respect to |
| 481 | the additional SVE register state: |
| 482 | |
| 483 | * All SVE register bits that are not shared with FP/SIMD are caller-save. |
| 484 | |
| 485 | * Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. |
| 486 | |
| 487 | This follows from the way these bits are mapped to V8..V15, which are caller- |
| 488 | save in the base procedure call standard. |
| 489 | |
| 490 | |
| 491 | Appendix B. ARMv8-A FP/SIMD programmer's model |
| 492 | =============================================== |
| 493 | |
| 494 | Note: This section is for information only and not intended to be complete or |
| 495 | to replace any architectural specification. |
| 496 | |
Randy Dunlap | 8c046cd | 2020-07-03 13:51:10 -0700 | [diff] [blame] | 497 | Refer to [4] for more information. |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 498 | |
| 499 | ARMv8-A defines the following floating-point / SIMD register state: |
| 500 | |
| 501 | * 32 128-bit vector registers V0..V31 |
| 502 | * 2 32-bit status/control registers FPSR, FPCR |
| 503 | |
Mauro Carvalho Chehab | b693d0b | 2019-06-12 14:52:38 -0300 | [diff] [blame] | 504 | :: |
| 505 | |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 506 | 127 0 bit index |
| 507 | +---------------+ |
| 508 | V0 | | |
| 509 | : : : |
| 510 | V7 | | |
| 511 | * V8 | | |
| 512 | : : : : |
| 513 | *V15 | | |
| 514 | V16 | | |
| 515 | : : : |
| 516 | V31 | | |
| 517 | +---------------+ |
| 518 | |
| 519 | 31 0 |
| 520 | +-------+ |
| 521 | FPSR | | |
| 522 | +-------+ |
| 523 | *FPCR | | |
| 524 | +-------+ |
| 525 | |
| 526 | (*) callee-save: |
| 527 | This only applies to bits [63:0] of V-registers. |
| 528 | FPCR contains a mixture of callee-save and caller-save bits. |
| 529 | |
| 530 | |
| 531 | References |
| 532 | ========== |
| 533 | |
| 534 | [1] arch/arm64/include/uapi/asm/sigcontext.h |
| 535 | AArch64 Linux signal ABI definitions |
| 536 | |
| 537 | [2] arch/arm64/include/uapi/asm/ptrace.h |
| 538 | AArch64 Linux ptrace ABI definitions |
| 539 | |
Mauro Carvalho Chehab | b693d0b | 2019-06-12 14:52:38 -0300 | [diff] [blame] | 540 | [3] Documentation/arm64/cpu-feature-registers.rst |
Dave Martin | ce69908 | 2017-10-31 15:51:20 +0000 | [diff] [blame] | 541 | |
| 542 | [4] ARM IHI0055C |
| 543 | http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf |
| 544 | http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html |
| 545 | Procedure Call Standard for the ARM 64-bit Architecture (AArch64) |