Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 1 | =============================== |
| 2 | Adjunct Processor (AP) facility |
| 3 | =============================== |
| 4 | |
| 5 | |
| 6 | Introduction |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 7 | ============ |
| 8 | The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised |
| 9 | of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards. |
| 10 | The AP devices provide cryptographic functions to all CPUs assigned to a |
| 11 | linux system running in an IBM Z system LPAR. |
| 12 | |
| 13 | The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap |
| 14 | is to make AP cards available to KVM guests using the VFIO mediated device |
| 15 | framework. This implementation relies considerably on the s390 virtualization |
| 16 | facilities which do most of the hard work of providing direct access to AP |
| 17 | devices. |
| 18 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 19 | AP Architectural Overview |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 20 | ========================= |
| 21 | To facilitate the comprehension of the design, let's start with some |
| 22 | definitions: |
| 23 | |
| 24 | * AP adapter |
| 25 | |
| 26 | An AP adapter is an IBM Z adapter card that can perform cryptographic |
| 27 | functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters |
| 28 | assigned to the LPAR in which a linux host is running will be available to |
| 29 | the linux host. Each adapter is identified by a number from 0 to 255; however, |
| 30 | the maximum adapter number is determined by machine model and/or adapter type. |
| 31 | When installed, an AP adapter is accessed by AP instructions executed by any |
| 32 | CPU. |
| 33 | |
| 34 | The AP adapter cards are assigned to a given LPAR via the system's Activation |
| 35 | Profile which can be edited via the HMC. When the linux host system is IPL'd |
| 36 | in the LPAR, the AP bus detects the AP adapter cards assigned to the LPAR and |
| 37 | creates a sysfs device for each assigned adapter. For example, if AP adapters |
| 38 | 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will create the following |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 39 | sysfs device entries:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 40 | |
| 41 | /sys/devices/ap/card04 |
| 42 | /sys/devices/ap/card0a |
| 43 | |
| 44 | Symbolic links to these devices will also be created in the AP bus devices |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 45 | sub-directory:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 46 | |
| 47 | /sys/bus/ap/devices/[card04] |
| 48 | /sys/bus/ap/devices/[card04] |
| 49 | |
| 50 | * AP domain |
| 51 | |
| 52 | An adapter is partitioned into domains. An adapter can hold up to 256 domains |
| 53 | depending upon the adapter type and hardware configuration. A domain is |
| 54 | identified by a number from 0 to 255; however, the maximum domain number is |
| 55 | determined by machine model and/or adapter type.. A domain can be thought of |
| 56 | as a set of hardware registers and memory used for processing AP commands. A |
| 57 | domain can be configured with a secure private key used for clear key |
| 58 | encryption. A domain is classified in one of two ways depending upon how it |
| 59 | may be accessed: |
| 60 | |
| 61 | * Usage domains are domains that are targeted by an AP instruction to |
| 62 | process an AP command. |
| 63 | |
| 64 | * Control domains are domains that are changed by an AP command sent to a |
| 65 | usage domain; for example, to set the secure private key for the control |
| 66 | domain. |
| 67 | |
| 68 | The AP usage and control domains are assigned to a given LPAR via the system's |
| 69 | Activation Profile which can be edited via the HMC. When a linux host system |
| 70 | is IPL'd in the LPAR, the AP bus module detects the AP usage and control |
| 71 | domains assigned to the LPAR. The domain number of each usage domain and |
| 72 | adapter number of each AP adapter are combined to create AP queue devices |
| 73 | (see AP Queue section below). The domain number of each control domain will be |
| 74 | represented in a bitmask and stored in a sysfs file |
| 75 | /sys/bus/ap/ap_control_domain_mask. The bits in the mask, from most to least |
| 76 | significant bit, correspond to domains 0-255. |
| 77 | |
| 78 | * AP Queue |
| 79 | |
| 80 | An AP queue is the means by which an AP command is sent to a usage domain |
| 81 | inside a specific adapter. An AP queue is identified by a tuple |
| 82 | comprised of an AP adapter ID (APID) and an AP queue index (APQI). The |
| 83 | APQI corresponds to a given usage domain number within the adapter. This tuple |
| 84 | forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP |
| 85 | instructions include a field containing the APQN to identify the AP queue to |
| 86 | which the AP command is to be sent for processing. |
| 87 | |
| 88 | The AP bus will create a sysfs device for each APQN that can be derived from |
| 89 | the cross product of the AP adapter and usage domain numbers detected when the |
| 90 | AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage |
| 91 | domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 92 | following sysfs entries:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 93 | |
| 94 | /sys/devices/ap/card04/04.0006 |
| 95 | /sys/devices/ap/card04/04.0047 |
| 96 | /sys/devices/ap/card0a/0a.0006 |
| 97 | /sys/devices/ap/card0a/0a.0047 |
| 98 | |
| 99 | The following symbolic links to these devices will be created in the AP bus |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 100 | devices subdirectory:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 101 | |
| 102 | /sys/bus/ap/devices/[04.0006] |
| 103 | /sys/bus/ap/devices/[04.0047] |
| 104 | /sys/bus/ap/devices/[0a.0006] |
| 105 | /sys/bus/ap/devices/[0a.0047] |
| 106 | |
| 107 | * AP Instructions: |
| 108 | |
| 109 | There are three AP instructions: |
| 110 | |
| 111 | * NQAP: to enqueue an AP command-request message to a queue |
| 112 | * DQAP: to dequeue an AP command-reply message from a queue |
| 113 | * PQAP: to administer the queues |
| 114 | |
| 115 | AP instructions identify the domain that is targeted to process the AP |
| 116 | command; this must be one of the usage domains. An AP command may modify a |
| 117 | domain that is not one of the usage domains, but the modified domain |
| 118 | must be one of the control domains. |
| 119 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 120 | AP and SIE |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 121 | ========== |
| 122 | Let's now take a look at how AP instructions executed on a guest are interpreted |
| 123 | by the hardware. |
| 124 | |
| 125 | A satellite control block called the Crypto Control Block (CRYCB) is attached to |
| 126 | our main hardware virtualization control block. The CRYCB contains three fields |
| 127 | to identify the adapters, usage domains and control domains assigned to the KVM |
| 128 | guest: |
| 129 | |
| 130 | * The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned |
| 131 | to the KVM guest. Each bit in the mask, from left to right (i.e. from most |
| 132 | significant to least significant bit in big endian order), corresponds to |
| 133 | an APID from 0-255. If a bit is set, the corresponding adapter is valid for |
| 134 | use by the KVM guest. |
| 135 | |
| 136 | * The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains |
| 137 | assigned to the KVM guest. Each bit in the mask, from left to right (i.e. from |
| 138 | most significant to least significant bit in big endian order), corresponds to |
| 139 | an AP queue index (APQI) from 0-255. If a bit is set, the corresponding queue |
| 140 | is valid for use by the KVM guest. |
| 141 | |
| 142 | * The AP Domain Mask field is a bit mask that identifies the AP control domains |
| 143 | assigned to the KVM guest. The ADM bit mask controls which domains can be |
| 144 | changed by an AP command-request message sent to a usage domain from the |
| 145 | guest. Each bit in the mask, from left to right (i.e. from most significant to |
| 146 | least significant bit in big endian order), corresponds to a domain from |
| 147 | 0-255. If a bit is set, the corresponding domain can be modified by an AP |
| 148 | command-request message sent to a usage domain. |
| 149 | |
| 150 | If you recall from the description of an AP Queue, AP instructions include |
| 151 | an APQN to identify the AP queue to which an AP command-request message is to be |
| 152 | sent (NQAP and PQAP instructions), or from which a command-reply message is to |
| 153 | be received (DQAP instruction). The validity of an APQN is defined by the matrix |
| 154 | calculated from the APM and AQM; it is the cross product of all assigned adapter |
| 155 | numbers (APM) with all assigned queue indexes (AQM). For example, if adapters 1 |
| 156 | and 2 and usage domains 5 and 6 are assigned to a guest, the APQNs (1,5), (1,6), |
| 157 | (2,5) and (2,6) will be valid for the guest. |
| 158 | |
| 159 | The APQNs can provide secure key functionality - i.e., a private key is stored |
| 160 | on the adapter card for each of its domains - so each APQN must be assigned to |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 161 | at most one guest or to the linux host:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 162 | |
| 163 | Example 1: Valid configuration: |
| 164 | ------------------------------ |
| 165 | Guest1: adapters 1,2 domains 5,6 |
| 166 | Guest2: adapter 1,2 domain 7 |
| 167 | |
| 168 | This is valid because both guests have a unique set of APQNs: |
| 169 | Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); |
| 170 | Guest2 has APQNs (1,7), (2,7) |
| 171 | |
| 172 | Example 2: Valid configuration: |
| 173 | ------------------------------ |
| 174 | Guest1: adapters 1,2 domains 5,6 |
| 175 | Guest2: adapters 3,4 domains 5,6 |
| 176 | |
| 177 | This is also valid because both guests have a unique set of APQNs: |
| 178 | Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); |
| 179 | Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) |
| 180 | |
| 181 | Example 3: Invalid configuration: |
| 182 | -------------------------------- |
| 183 | Guest1: adapters 1,2 domains 5,6 |
| 184 | Guest2: adapter 1 domains 6,7 |
| 185 | |
| 186 | This is an invalid configuration because both guests have access to |
| 187 | APQN (1,6). |
| 188 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 189 | The Design |
| 190 | ========== |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 191 | The design introduces three new objects: |
| 192 | |
| 193 | 1. AP matrix device |
| 194 | 2. VFIO AP device driver (vfio_ap.ko) |
| 195 | 3. VFIO AP mediated matrix pass-through device |
| 196 | |
| 197 | The VFIO AP device driver |
| 198 | ------------------------- |
| 199 | The VFIO AP (vfio_ap) device driver serves the following purposes: |
| 200 | |
| 201 | 1. Provides the interfaces to secure APQNs for exclusive use of KVM guests. |
| 202 | |
| 203 | 2. Sets up the VFIO mediated device interfaces to manage a mediated matrix |
| 204 | device and creates the sysfs interfaces for assigning adapters, usage |
| 205 | domains, and control domains comprising the matrix for a KVM guest. |
| 206 | |
| 207 | 3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's |
| 208 | SIE state description to grant the guest access to a matrix of AP devices |
| 209 | |
| 210 | Reserve APQNs for exclusive use of KVM guests |
| 211 | --------------------------------------------- |
| 212 | The following block diagram illustrates the mechanism by which APQNs are |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 213 | reserved:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 214 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 215 | +------------------+ |
| 216 | 7 remove | | |
| 217 | +--------------------> cex4queue driver | |
| 218 | | | | |
| 219 | | +------------------+ |
| 220 | | |
| 221 | | |
| 222 | | +------------------+ +----------------+ |
| 223 | | 5 register driver | | 3 create | | |
| 224 | | +----------------> Device core +----------> matrix device | |
| 225 | | | | | | | |
| 226 | | | +--------^---------+ +----------------+ |
| 227 | | | | |
| 228 | | | +-------------------+ |
| 229 | | | +-----------------------------------+ | |
| 230 | | | | 4 register AP driver | | 2 register device |
| 231 | | | | | | |
| 232 | +--------+---+-v---+ +--------+-------+-+ |
| 233 | | | | | |
| 234 | | ap_bus +--------------------- > vfio_ap driver | |
| 235 | | | 8 probe | | |
| 236 | +--------^---------+ +--^--^------------+ |
| 237 | 6 edit | | | |
| 238 | apmask | +-----------------------------+ | 9 mdev create |
| 239 | aqmask | | 1 modprobe | |
| 240 | +--------+-----+---+ +----------------+-+ +----------------+ |
| 241 | | | | |8 create | mediated | |
| 242 | | admin | | VFIO device core |---------> matrix | |
| 243 | | + | | | device | |
| 244 | +------+-+---------+ +--------^---------+ +--------^-------+ |
| 245 | | | | | |
| 246 | | | 9 create vfio_ap-passthrough | | |
| 247 | | +------------------------------+ | |
| 248 | +-------------------------------------------------------------+ |
| 249 | 10 assign adapter/domain/control domain |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 250 | |
| 251 | The process for reserving an AP queue for use by a KVM guest is: |
| 252 | |
| 253 | 1. The administrator loads the vfio_ap device driver |
| 254 | 2. The vfio-ap driver during its initialization will register a single 'matrix' |
| 255 | device with the device core. This will serve as the parent device for |
| 256 | all mediated matrix devices used to configure an AP matrix for a guest. |
| 257 | 3. The /sys/devices/vfio_ap/matrix device is created by the device core |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 258 | 4. The vfio_ap device driver will register with the AP bus for AP queue devices |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 259 | of type 10 and higher (CEX4 and newer). The driver will provide the vfio_ap |
| 260 | driver's probe and remove callback interfaces. Devices older than CEX4 queues |
| 261 | are not supported to simplify the implementation by not needlessly |
| 262 | complicating the design by supporting older devices that will go out of |
| 263 | service in the relatively near future, and for which there are few older |
| 264 | systems around on which to test. |
| 265 | 5. The AP bus registers the vfio_ap device driver with the device core |
| 266 | 6. The administrator edits the AP adapter and queue masks to reserve AP queues |
| 267 | for use by the vfio_ap device driver. |
| 268 | 7. The AP bus removes the AP queues reserved for the vfio_ap driver from the |
| 269 | default zcrypt cex4queue driver. |
| 270 | 8. The AP bus probes the vfio_ap device driver to bind the queues reserved for |
| 271 | it. |
| 272 | 9. The administrator creates a passthrough type mediated matrix device to be |
| 273 | used by a guest |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 274 | 10. The administrator assigns the adapters, usage domains and control domains |
| 275 | to be exclusively used by a guest. |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 276 | |
| 277 | Set up the VFIO mediated device interfaces |
| 278 | ------------------------------------------ |
| 279 | The VFIO AP device driver utilizes the common interface of the VFIO mediated |
| 280 | device core driver to: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 281 | |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 282 | * Register an AP mediated bus driver to add a mediated matrix device to and |
| 283 | remove it from a VFIO group. |
| 284 | * Create and destroy a mediated matrix device |
| 285 | * Add a mediated matrix device to and remove it from the AP mediated bus driver |
| 286 | * Add a mediated matrix device to and remove it from an IOMMU group |
| 287 | |
| 288 | The following high-level block diagram shows the main components and interfaces |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 289 | of the VFIO AP mediated matrix device driver:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 290 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 291 | +-------------+ |
| 292 | | | |
| 293 | | +---------+ | mdev_register_driver() +--------------+ |
| 294 | | | Mdev | +<-----------------------+ | |
| 295 | | | bus | | | vfio_mdev.ko | |
| 296 | | | driver | +----------------------->+ |<-> VFIO user |
| 297 | | +---------+ | probe()/remove() +--------------+ APIs |
| 298 | | | |
| 299 | | MDEV CORE | |
| 300 | | MODULE | |
| 301 | | mdev.ko | |
| 302 | | +---------+ | mdev_register_device() +--------------+ |
| 303 | | |Physical | +<-----------------------+ | |
| 304 | | | device | | | vfio_ap.ko |<-> matrix |
| 305 | | |interface| +----------------------->+ | device |
| 306 | | +---------+ | callback +--------------+ |
| 307 | +-------------+ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 308 | |
| 309 | During initialization of the vfio_ap module, the matrix device is registered |
| 310 | with an 'mdev_parent_ops' structure that provides the sysfs attribute |
| 311 | structures, mdev functions and callback interfaces for managing the mediated |
| 312 | matrix device. |
| 313 | |
| 314 | * sysfs attribute structures: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 315 | |
| 316 | supported_type_groups |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 317 | The VFIO mediated device framework supports creation of user-defined |
| 318 | mediated device types. These mediated device types are specified |
| 319 | via the 'supported_type_groups' structure when a device is registered |
| 320 | with the mediated device framework. The registration process creates the |
| 321 | sysfs structures for each mediated device type specified in the |
| 322 | 'mdev_supported_types' sub-directory of the device being registered. Along |
| 323 | with the device type, the sysfs attributes of the mediated device type are |
| 324 | provided. |
| 325 | |
| 326 | The VFIO AP device driver will register one mediated device type for |
| 327 | passthrough devices: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 328 | |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 329 | /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 330 | |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 331 | Only the read-only attributes required by the VFIO mdev framework will |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 332 | be provided:: |
| 333 | |
| 334 | ... name |
| 335 | ... device_api |
| 336 | ... available_instances |
| 337 | ... device_api |
| 338 | |
| 339 | Where: |
| 340 | |
| 341 | * name: |
| 342 | specifies the name of the mediated device type |
| 343 | * device_api: |
| 344 | the mediated device type's API |
| 345 | * available_instances: |
| 346 | the number of mediated matrix passthrough devices |
| 347 | that can be created |
| 348 | * device_api: |
| 349 | specifies the VFIO API |
| 350 | mdev_attr_groups |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 351 | This attribute group identifies the user-defined sysfs attributes of the |
| 352 | mediated device. When a device is registered with the VFIO mediated device |
| 353 | framework, the sysfs attribute files identified in the 'mdev_attr_groups' |
| 354 | structure will be created in the mediated matrix device's directory. The |
| 355 | sysfs attributes for a mediated matrix device are: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 356 | |
| 357 | assign_adapter / unassign_adapter: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 358 | Write-only attributes for assigning/unassigning an AP adapter to/from the |
| 359 | mediated matrix device. To assign/unassign an adapter, the APID of the |
| 360 | adapter is echoed to the respective attribute file. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 361 | assign_domain / unassign_domain: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 362 | Write-only attributes for assigning/unassigning an AP usage domain to/from |
| 363 | the mediated matrix device. To assign/unassign a domain, the domain |
| 364 | number of the the usage domain is echoed to the respective attribute |
| 365 | file. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 366 | matrix: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 367 | A read-only file for displaying the APQNs derived from the cross product |
| 368 | of the adapter and domain numbers assigned to the mediated matrix device. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 369 | assign_control_domain / unassign_control_domain: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 370 | Write-only attributes for assigning/unassigning an AP control domain |
| 371 | to/from the mediated matrix device. To assign/unassign a control domain, |
| 372 | the ID of the domain to be assigned/unassigned is echoed to the respective |
| 373 | attribute file. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 374 | control_domains: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 375 | A read-only file for displaying the control domain numbers assigned to the |
| 376 | mediated matrix device. |
| 377 | |
| 378 | * functions: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 379 | |
| 380 | create: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 381 | allocates the ap_matrix_mdev structure used by the vfio_ap driver to: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 382 | |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 383 | * Store the reference to the KVM structure for the guest using the mdev |
| 384 | * Store the AP matrix configuration for the adapters, domains, and control |
| 385 | domains assigned via the corresponding sysfs attributes files |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 386 | |
| 387 | remove: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 388 | deallocates the mediated matrix device's ap_matrix_mdev structure. This will |
| 389 | be allowed only if a running guest is not using the mdev. |
| 390 | |
| 391 | * callback interfaces |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 392 | |
| 393 | open: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 394 | The vfio_ap driver uses this callback to register a |
| 395 | VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix |
| 396 | device. The open is invoked when QEMU connects the VFIO iommu group |
| 397 | for the mdev matrix device to the MDEV bus. Access to the KVM structure used |
| 398 | to configure the KVM guest is provided via this callback. The KVM structure, |
| 399 | is used to configure the guest's access to the AP matrix defined via the |
| 400 | mediated matrix device's sysfs attribute files. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 401 | release: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 402 | unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the |
| 403 | mdev matrix device and deconfigures the guest's AP matrix. |
| 404 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 405 | Configure the APM, AQM and ADM in the CRYCB |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 406 | ------------------------------------------- |
| 407 | Configuring the AP matrix for a KVM guest will be performed when the |
| 408 | VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier |
| 409 | function is called when QEMU connects to KVM. The guest's AP matrix is |
| 410 | configured via it's CRYCB by: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 411 | |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 412 | * Setting the bits in the APM corresponding to the APIDs assigned to the |
| 413 | mediated matrix device via its 'assign_adapter' interface. |
| 414 | * Setting the bits in the AQM corresponding to the domains assigned to the |
| 415 | mediated matrix device via its 'assign_domain' interface. |
| 416 | * Setting the bits in the ADM corresponding to the domain dIDs assigned to the |
| 417 | mediated matrix device via its 'assign_control_domains' interface. |
| 418 | |
| 419 | The CPU model features for AP |
| 420 | ----------------------------- |
| 421 | The AP stack relies on the presence of the AP instructions as well as two |
| 422 | facilities: The AP Facilities Test (APFT) facility; and the AP Query |
| 423 | Configuration Information (QCI) facility. These features/facilities are made |
| 424 | available to a KVM guest via the following CPU model features: |
| 425 | |
| 426 | 1. ap: Indicates whether the AP instructions are installed on the guest. This |
| 427 | feature will be enabled by KVM only if the AP instructions are installed |
| 428 | on the host. |
| 429 | |
| 430 | 2. apft: Indicates the APFT facility is available on the guest. This facility |
| 431 | can be made available to the guest only if it is available on the host (i.e., |
| 432 | facility bit 15 is set). |
| 433 | |
| 434 | 3. apqci: Indicates the AP QCI facility is available on the guest. This facility |
| 435 | can be made available to the guest only if it is available on the host (i.e., |
| 436 | facility bit 12 is set). |
| 437 | |
| 438 | Note: If the user chooses to specify a CPU model different than the 'host' |
| 439 | model to QEMU, the CPU model features and facilities need to be turned on |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 440 | explicitly; for example:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 441 | |
| 442 | /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on |
| 443 | |
| 444 | A guest can be precluded from using AP features/facilities by turning them off |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 445 | explicitly; for example:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 446 | |
| 447 | /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off |
| 448 | |
| 449 | Note: If the APFT facility is turned off (apft=off) for the guest, the guest |
| 450 | will not see any AP devices. The zcrypt device drivers that register for type 10 |
| 451 | and newer AP devices - i.e., the cex4card and cex4queue device drivers - need |
| 452 | the APFT facility to ascertain the facilities installed on a given AP device. If |
| 453 | the APFT facility is not installed on the guest, then the probe of device |
| 454 | drivers will fail since only type 10 and newer devices can be configured for |
| 455 | guest use. |
| 456 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 457 | Example |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 458 | ======= |
| 459 | Let's now provide an example to illustrate how KVM guests may be given |
| 460 | access to AP facilities. For this example, we will show how to configure |
| 461 | three guests such that executing the lszcrypt command on the guests would |
| 462 | look like this: |
| 463 | |
| 464 | Guest1 |
| 465 | ------ |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 466 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 467 | CARD.DOMAIN TYPE MODE |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 468 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 469 | 05 CEX5C CCA-Coproc |
| 470 | 05.0004 CEX5C CCA-Coproc |
| 471 | 05.00ab CEX5C CCA-Coproc |
| 472 | 06 CEX5A Accelerator |
| 473 | 06.0004 CEX5A Accelerator |
| 474 | 06.00ab CEX5C CCA-Coproc |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 475 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 476 | |
| 477 | Guest2 |
| 478 | ------ |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 479 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 480 | CARD.DOMAIN TYPE MODE |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 481 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 482 | 05 CEX5A Accelerator |
| 483 | 05.0047 CEX5A Accelerator |
| 484 | 05.00ff CEX5A Accelerator |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 485 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 486 | |
| 487 | Guest2 |
| 488 | ------ |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 489 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 490 | CARD.DOMAIN TYPE MODE |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 491 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 492 | 06 CEX5A Accelerator |
| 493 | 06.0047 CEX5A Accelerator |
| 494 | 06.00ff CEX5A Accelerator |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 495 | =========== ===== ============ |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 496 | |
| 497 | These are the steps: |
| 498 | |
| 499 | 1. Install the vfio_ap module on the linux host. The dependency chain for the |
| 500 | vfio_ap module is: |
| 501 | * iommu |
| 502 | * s390 |
| 503 | * zcrypt |
| 504 | * vfio |
| 505 | * vfio_mdev |
| 506 | * vfio_mdev_device |
| 507 | * KVM |
| 508 | |
| 509 | To build the vfio_ap module, the kernel build must be configured with the |
| 510 | following Kconfig elements selected: |
| 511 | * IOMMU_SUPPORT |
| 512 | * S390 |
| 513 | * ZCRYPT |
| 514 | * S390_AP_IOMMU |
| 515 | * VFIO |
| 516 | * VFIO_MDEV |
| 517 | * VFIO_MDEV_DEVICE |
| 518 | * KVM |
| 519 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 520 | If using make menuconfig select the following to build the vfio_ap module:: |
| 521 | |
| 522 | -> Device Drivers |
| 523 | -> IOMMU Hardware Support |
| 524 | select S390 AP IOMMU Support |
| 525 | -> VFIO Non-Privileged userspace driver framework |
| 526 | -> Mediated device driver frramework |
| 527 | -> VFIO driver for Mediated devices |
| 528 | -> I/O subsystem |
| 529 | -> VFIO support for AP devices |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 530 | |
| 531 | 2. Secure the AP queues to be used by the three guests so that the host can not |
| 532 | access them. To secure them, there are two sysfs files that specify |
| 533 | bitmasks marking a subset of the APQN range as 'usable by the default AP |
| 534 | queue device drivers' or 'not usable by the default device drivers' and thus |
| 535 | available for use by the vfio_ap device driver'. The location of the sysfs |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 536 | files containing the masks are:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 537 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 538 | /sys/bus/ap/apmask |
| 539 | /sys/bus/ap/aqmask |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 540 | |
| 541 | The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs |
| 542 | (APID). Each bit in the mask, from left to right (i.e., from most significant |
| 543 | to least significant bit in big endian order), corresponds to an APID from |
| 544 | 0-255. If a bit is set, the APID is marked as usable only by the default AP |
| 545 | queue device drivers; otherwise, the APID is usable by the vfio_ap |
| 546 | device driver. |
| 547 | |
| 548 | The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes |
| 549 | (APQI). Each bit in the mask, from left to right (i.e., from most significant |
| 550 | to least significant bit in big endian order), corresponds to an APQI from |
| 551 | 0-255. If a bit is set, the APQI is marked as usable only by the default AP |
| 552 | queue device drivers; otherwise, the APQI is usable by the vfio_ap device |
| 553 | driver. |
| 554 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 555 | Take, for example, the following mask:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 556 | |
| 557 | 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff |
| 558 | |
| 559 | It indicates: |
| 560 | |
| 561 | 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 |
| 562 | belong to the vfio_ap device driver's pool. |
| 563 | |
| 564 | The APQN of each AP queue device assigned to the linux host is checked by the |
| 565 | AP bus against the set of APQNs derived from the cross product of APIDs |
| 566 | and APQIs marked as usable only by the default AP queue device drivers. If a |
| 567 | match is detected, only the default AP queue device drivers will be probed; |
| 568 | otherwise, the vfio_ap device driver will be probed. |
| 569 | |
| 570 | By default, the two masks are set to reserve all APQNs for use by the default |
| 571 | AP queue device drivers. There are two ways the default masks can be changed: |
| 572 | |
| 573 | 1. The sysfs mask files can be edited by echoing a string into the |
| 574 | respective sysfs mask file in one of two formats: |
| 575 | |
| 576 | * An absolute hex string starting with 0x - like "0x12345678" - sets |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 577 | the mask. If the given string is shorter than the mask, it is padded |
| 578 | with 0s on the right; for example, specifying a mask value of 0x41 is |
| 579 | the same as specifying:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 580 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 581 | 0x4100000000000000000000000000000000000000000000000000000000000000 |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 582 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 583 | Keep in mind that the mask reads from left to right (i.e., most |
| 584 | significant to least significant bit in big endian order), so the mask |
| 585 | above identifies device numbers 1 and 7 (01000001). |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 586 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 587 | If the string is longer than the mask, the operation is terminated with |
| 588 | an error (EINVAL). |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 589 | |
| 590 | * Individual bits in the mask can be switched on and off by specifying |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 591 | each bit number to be switched in a comma separated list. Each bit |
| 592 | number string must be prepended with a ('+') or minus ('-') to indicate |
| 593 | the corresponding bit is to be switched on ('+') or off ('-'). Some |
| 594 | valid values are: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 595 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 596 | - "+0" switches bit 0 on |
| 597 | - "-13" switches bit 13 off |
| 598 | - "+0x41" switches bit 65 on |
| 599 | - "-0xff" switches bit 255 off |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 600 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 601 | The following example: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 602 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 603 | +0,-6,+0x47,-0xf0 |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 604 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 605 | Switches bits 0 and 71 (0x47) on |
| 606 | |
| 607 | Switches bits 6 and 240 (0xf0) off |
| 608 | |
| 609 | Note that the bits not specified in the list remain as they were before |
| 610 | the operation. |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 611 | |
| 612 | 2. The masks can also be changed at boot time via parameters on the kernel |
| 613 | command line like this: |
| 614 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 615 | ap.apmask=0xffff ap.aqmask=0x40 |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 616 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 617 | This would create the following masks:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 618 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 619 | apmask: |
| 620 | 0xffff000000000000000000000000000000000000000000000000000000000000 |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 621 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 622 | aqmask: |
| 623 | 0x4000000000000000000000000000000000000000000000000000000000000000 |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 624 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 625 | Resulting in these two pools:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 626 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 627 | default drivers pool: adapter 0-15, domain 1 |
| 628 | alternate drivers pool: adapter 16-255, domains 0, 2-255 |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 629 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 630 | Securing the APQNs for our example |
| 631 | ---------------------------------- |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 632 | To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 06.0047, |
| 633 | 06.00ab, and 06.00ff for use by the vfio_ap device driver, the corresponding |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 634 | APQNs can either be removed from the default masks:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 635 | |
| 636 | echo -5,-6 > /sys/bus/ap/apmask |
| 637 | |
| 638 | echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask |
| 639 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 640 | Or the masks can be set as follows:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 641 | |
| 642 | echo 0xf9ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff \ |
| 643 | > apmask |
| 644 | |
| 645 | echo 0xf7fffffffffffffffeffffffffffffffffffffffffeffffffffffffffffffffe \ |
| 646 | > aqmask |
| 647 | |
| 648 | This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, |
| 649 | 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The |
| 650 | sysfs directory for the vfio_ap device driver will now contain symbolic links |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 651 | to the AP queue devices bound to it:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 652 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 653 | /sys/bus/ap |
| 654 | ... [drivers] |
| 655 | ...... [vfio_ap] |
| 656 | ......... [05.0004] |
| 657 | ......... [05.0047] |
| 658 | ......... [05.00ab] |
| 659 | ......... [05.00ff] |
| 660 | ......... [06.0004] |
| 661 | ......... [06.0047] |
| 662 | ......... [06.00ab] |
| 663 | ......... [06.00ff] |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 664 | |
| 665 | Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) |
| 666 | can be bound to the vfio_ap device driver. The reason for this is to |
| 667 | simplify the implementation by not needlessly complicating the design by |
| 668 | supporting older devices that will go out of service in the relatively near |
| 669 | future and for which there are few older systems on which to test. |
| 670 | |
| 671 | The administrator, therefore, must take care to secure only AP queues that |
| 672 | can be bound to the vfio_ap device driver. The device type for a given AP |
| 673 | queue device can be read from the parent card's sysfs directory. For example, |
| 674 | to see the hardware type of the queue 05.0004: |
| 675 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 676 | cat /sys/bus/ap/devices/card05/hwtype |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 677 | |
| 678 | The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the |
| 679 | vfio_ap device driver. |
| 680 | |
| 681 | 3. Create the mediated devices needed to configure the AP matrixes for the |
| 682 | three guests and to provide an interface to the vfio_ap driver for |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 683 | use by the guests:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 684 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 685 | /sys/devices/vfio_ap/matrix/ |
| 686 | --- [mdev_supported_types] |
| 687 | ------ [vfio_ap-passthrough] (passthrough mediated matrix device type) |
| 688 | --------- create |
| 689 | --------- [devices] |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 690 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 691 | To create the mediated devices for the three guests:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 692 | |
| 693 | uuidgen > create |
| 694 | uuidgen > create |
| 695 | uuidgen > create |
| 696 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 697 | or |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 698 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 699 | echo $uuid1 > create |
| 700 | echo $uuid2 > create |
| 701 | echo $uuid3 > create |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 702 | |
| 703 | This will create three mediated devices in the [devices] subdirectory named |
| 704 | after the UUID written to the create attribute file. We call them $uuid1, |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 705 | $uuid2 and $uuid3 and this is the sysfs directory structure after creation:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 706 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 707 | /sys/devices/vfio_ap/matrix/ |
| 708 | --- [mdev_supported_types] |
| 709 | ------ [vfio_ap-passthrough] |
| 710 | --------- [devices] |
| 711 | ------------ [$uuid1] |
| 712 | --------------- assign_adapter |
| 713 | --------------- assign_control_domain |
| 714 | --------------- assign_domain |
| 715 | --------------- matrix |
| 716 | --------------- unassign_adapter |
| 717 | --------------- unassign_control_domain |
| 718 | --------------- unassign_domain |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 719 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 720 | ------------ [$uuid2] |
| 721 | --------------- assign_adapter |
| 722 | --------------- assign_control_domain |
| 723 | --------------- assign_domain |
| 724 | --------------- matrix |
| 725 | --------------- unassign_adapter |
| 726 | ----------------unassign_control_domain |
| 727 | ----------------unassign_domain |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 728 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 729 | ------------ [$uuid3] |
| 730 | --------------- assign_adapter |
| 731 | --------------- assign_control_domain |
| 732 | --------------- assign_domain |
| 733 | --------------- matrix |
| 734 | --------------- unassign_adapter |
| 735 | ----------------unassign_control_domain |
| 736 | ----------------unassign_domain |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 737 | |
| 738 | 4. The administrator now needs to configure the matrixes for the mediated |
| 739 | devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). |
| 740 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 741 | This is how the matrix is configured for Guest1:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 742 | |
| 743 | echo 5 > assign_adapter |
| 744 | echo 6 > assign_adapter |
| 745 | echo 4 > assign_domain |
| 746 | echo 0xab > assign_domain |
| 747 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 748 | Control domains can similarly be assigned using the assign_control_domain |
| 749 | sysfs file. |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 750 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 751 | If a mistake is made configuring an adapter, domain or control domain, |
| 752 | you can use the unassign_xxx files to unassign the adapter, domain or |
| 753 | control domain. |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 754 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 755 | To display the matrix configuration for Guest1:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 756 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 757 | cat matrix |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 758 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 759 | This is how the matrix is configured for Guest2:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 760 | |
| 761 | echo 5 > assign_adapter |
| 762 | echo 0x47 > assign_domain |
| 763 | echo 0xff > assign_domain |
| 764 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 765 | This is how the matrix is configured for Guest3:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 766 | |
| 767 | echo 6 > assign_adapter |
| 768 | echo 0x47 > assign_domain |
| 769 | echo 0xff > assign_domain |
| 770 | |
| 771 | In order to successfully assign an adapter: |
| 772 | |
| 773 | * The adapter number specified must represent a value from 0 up to the |
| 774 | maximum adapter number configured for the system. If an adapter number |
| 775 | higher than the maximum is specified, the operation will terminate with |
| 776 | an error (ENODEV). |
| 777 | |
| 778 | * All APQNs that can be derived from the adapter ID and the IDs of |
| 779 | the previously assigned domains must be bound to the vfio_ap device |
| 780 | driver. If no domains have yet been assigned, then there must be at least |
| 781 | one APQN with the specified APID bound to the vfio_ap driver. If no such |
| 782 | APQNs are bound to the driver, the operation will terminate with an |
| 783 | error (EADDRNOTAVAIL). |
| 784 | |
| 785 | No APQN that can be derived from the adapter ID and the IDs of the |
| 786 | previously assigned domains can be assigned to another mediated matrix |
| 787 | device. If an APQN is assigned to another mediated matrix device, the |
| 788 | operation will terminate with an error (EADDRINUSE). |
| 789 | |
| 790 | In order to successfully assign a domain: |
| 791 | |
| 792 | * The domain number specified must represent a value from 0 up to the |
| 793 | maximum domain number configured for the system. If a domain number |
| 794 | higher than the maximum is specified, the operation will terminate with |
| 795 | an error (ENODEV). |
| 796 | |
| 797 | * All APQNs that can be derived from the domain ID and the IDs of |
| 798 | the previously assigned adapters must be bound to the vfio_ap device |
| 799 | driver. If no domains have yet been assigned, then there must be at least |
| 800 | one APQN with the specified APQI bound to the vfio_ap driver. If no such |
| 801 | APQNs are bound to the driver, the operation will terminate with an |
| 802 | error (EADDRNOTAVAIL). |
| 803 | |
| 804 | No APQN that can be derived from the domain ID and the IDs of the |
| 805 | previously assigned adapters can be assigned to another mediated matrix |
| 806 | device. If an APQN is assigned to another mediated matrix device, the |
| 807 | operation will terminate with an error (EADDRINUSE). |
| 808 | |
| 809 | In order to successfully assign a control domain, the domain number |
| 810 | specified must represent a value from 0 up to the maximum domain number |
| 811 | configured for the system. If a control domain number higher than the maximum |
| 812 | is specified, the operation will terminate with an error (ENODEV). |
| 813 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 814 | 5. Start Guest1:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 815 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 816 | /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
| 817 | -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 818 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 819 | 7. Start Guest2:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 820 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 821 | /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
| 822 | -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 823 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 824 | 7. Start Guest3:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 825 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 826 | /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ |
| 827 | -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 828 | |
| 829 | When the guest is shut down, the mediated matrix devices may be removed. |
| 830 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 831 | Using our example again, to remove the mediated matrix device $uuid1:: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 832 | |
| 833 | /sys/devices/vfio_ap/matrix/ |
| 834 | --- [mdev_supported_types] |
| 835 | ------ [vfio_ap-passthrough] |
| 836 | --------- [devices] |
| 837 | ------------ [$uuid1] |
| 838 | --------------- remove |
| 839 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 840 | :: |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 841 | |
| 842 | echo 1 > remove |
| 843 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 844 | This will remove all of the mdev matrix device's sysfs structures including |
| 845 | the mdev device itself. To recreate and reconfigure the mdev matrix device, |
| 846 | all of the steps starting with step 3 will have to be performed again. Note |
| 847 | that the remove will fail if a guest using the mdev is still running. |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 848 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame^] | 849 | It is not necessary to remove an mdev matrix device, but one may want to |
| 850 | remove it if no guest will use it during the remaining lifetime of the linux |
| 851 | host. If the mdev matrix device is removed, one may want to also reconfigure |
| 852 | the pool of adapters and queues reserved for use by the default drivers. |
Tony Krowiak | 492a6be | 2018-09-25 19:16:41 -0400 | [diff] [blame] | 853 | |
| 854 | Limitations |
| 855 | =========== |
| 856 | * The KVM/kernel interfaces do not provide a way to prevent restoring an APQN |
| 857 | to the default drivers pool of a queue that is still assigned to a mediated |
| 858 | device in use by a guest. It is incumbent upon the administrator to |
| 859 | ensure there is no mediated device in use by a guest to which the APQN is |
| 860 | assigned lest the host be given access to the private data of the AP queue |
| 861 | device such as a private key configured specifically for the guest. |
| 862 | |
| 863 | * Dynamically modifying the AP matrix for a running guest (which would amount to |
| 864 | hot(un)plug of AP devices for the guest) is currently not supported |
| 865 | |
| 866 | * Live guest migration is not supported for guests using AP devices. |