Aneesh Kumar K.V | 1c6b5a7 | 2021-08-12 18:52:23 +0530 | [diff] [blame] | 1 | ============================ |
| 2 | NUMA resource associativity |
Aneesh Kumar K.V | f50da6e | 2021-08-25 09:54:47 +0530 | [diff] [blame] | 3 | ============================ |
Aneesh Kumar K.V | 1c6b5a7 | 2021-08-12 18:52:23 +0530 | [diff] [blame] | 4 | |
| 5 | Associativity represents the groupings of the various platform resources into |
| 6 | domains of substantially similar mean performance relative to resources outside |
| 7 | of that domain. Resources subsets of a given domain that exhibit better |
| 8 | performance relative to each other than relative to other resources subsets |
| 9 | are represented as being members of a sub-grouping domain. This performance |
| 10 | characteristic is presented in terms of NUMA node distance within the Linux kernel. |
| 11 | From the platform view, these groups are also referred to as domains. |
| 12 | |
| 13 | PAPR interface currently supports different ways of communicating these resource |
| 14 | grouping details to the OS. These are referred to as Form 0, Form 1 and Form2 |
| 15 | associativity grouping. Form 0 is the oldest format and is now considered deprecated. |
| 16 | |
| 17 | Hypervisor indicates the type/form of associativity used via "ibm,architecture-vec-5 property". |
| 18 | Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of Form 0 or Form 1. |
| 19 | A value of 1 indicates the usage of Form 1 associativity. For Form 2 associativity |
| 20 | bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used. |
| 21 | |
| 22 | Form 0 |
Aneesh Kumar K.V | f50da6e | 2021-08-25 09:54:47 +0530 | [diff] [blame] | 23 | ------ |
Aneesh Kumar K.V | 1c6b5a7 | 2021-08-12 18:52:23 +0530 | [diff] [blame] | 24 | Form 0 associativity supports only two NUMA distances (LOCAL and REMOTE). |
| 25 | |
| 26 | Form 1 |
Aneesh Kumar K.V | f50da6e | 2021-08-25 09:54:47 +0530 | [diff] [blame] | 27 | ------ |
Aneesh Kumar K.V | 1c6b5a7 | 2021-08-12 18:52:23 +0530 | [diff] [blame] | 28 | With Form 1 a combination of ibm,associativity-reference-points, and ibm,associativity |
| 29 | device tree properties are used to determine the NUMA distance between resource groups/domains. |
| 30 | |
| 31 | The “ibm,associativity” property contains a list of one or more numbers (domainID) |
| 32 | representing the resource’s platform grouping domains. |
| 33 | |
| 34 | The “ibm,associativity-reference-points” property contains a list of one or more numbers |
| 35 | (domainID index) that represents the 1 based ordinal in the associativity lists. |
| 36 | The list of domainID indexes represents an increasing hierarchy of resource grouping. |
| 37 | |
| 38 | ex: |
| 39 | { primary domainID index, secondary domainID index, tertiary domainID index.. } |
| 40 | |
| 41 | Linux kernel uses the domainID at the primary domainID index as the NUMA node id. |
| 42 | Linux kernel computes NUMA distance between two domains by recursively comparing |
| 43 | if they belong to the same higher-level domains. For mismatch at every higher |
| 44 | level of the resource group, the kernel doubles the NUMA distance between the |
| 45 | comparing domains. |
| 46 | |
| 47 | Form 2 |
| 48 | ------- |
| 49 | Form 2 associativity format adds separate device tree properties representing NUMA node distance |
| 50 | thereby making the node distance computation flexible. Form 2 also allows flexible primary |
| 51 | domain numbering. With numa distance computation now detached from the index value in |
| 52 | "ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain |
| 53 | ids at the same domainID index representing resource groups of different performance/latency |
| 54 | characteristics. |
| 55 | |
| 56 | Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the |
| 57 | "ibm,architecture-vec-5" property. |
| 58 | |
| 59 | "ibm,numa-lookup-index-table" property contains a list of one or more numbers representing |
| 60 | the domainIDs present in the system. The offset of the domainID in this property is |
| 61 | used as an index while computing numa distance information via "ibm,numa-distance-table". |
| 62 | |
| 63 | prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by |
| 64 | N domainID encoded as with encode-int |
| 65 | |
| 66 | For ex: |
| 67 | "ibm,numa-lookup-index-table" = {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when |
| 68 | computing the distance of domain 8 from other domains present in the system. For the rest of |
| 69 | this document, this offset will be referred to as domain distance offset. |
| 70 | |
| 71 | "ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA |
| 72 | distance between resource groups/domains present in the system. |
| 73 | |
| 74 | prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by |
| 75 | N distance values encoded as with encode-bytes. The max distance value we could encode is 255. |
| 76 | The number N must be equal to the square of m where m is the number of domainIDs in the |
| 77 | numa-lookup-index-table. |
| 78 | |
| 79 | For ex: |
| 80 | ibm,numa-lookup-index-table = <3 0 8 40>; |
Aneesh Kumar K.V | f50da6e | 2021-08-25 09:54:47 +0530 | [diff] [blame] | 81 | ibm,numa-distace-table = <9>, /bits/ 8 < 10 20 80 20 10 160 80 160 10>; |
| 82 | |
| 83 | :: |
| 84 | |
| 85 | | 0 8 40 |
| 86 | --|------------ |
| 87 | | |
| 88 | 0 | 10 20 80 |
| 89 | | |
| 90 | 8 | 20 10 160 |
| 91 | | |
| 92 | 40| 80 160 10 |
Aneesh Kumar K.V | 1c6b5a7 | 2021-08-12 18:52:23 +0530 | [diff] [blame] | 93 | |
| 94 | A possible "ibm,associativity" property for resources in node 0, 8 and 40 |
| 95 | |
| 96 | { 3, 6, 7, 0 } |
| 97 | { 3, 6, 9, 8 } |
| 98 | { 3, 6, 7, 40} |
| 99 | |
| 100 | With "ibm,associativity-reference-points" { 0x3 } |
| 101 | |
| 102 | "ibm,lookup-index-table" helps in having a compact representation of distance matrix. |
| 103 | Since domainID can be sparse, the matrix of distances can also be effectively sparse. |
| 104 | With "ibm,lookup-index-table" we can achieve a compact representation of |
| 105 | distance information. |