Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 1 | ================================== |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 2 | vfio-ccw: the basic infrastructure |
| 3 | ================================== |
| 4 | |
| 5 | Introduction |
| 6 | ------------ |
| 7 | |
| 8 | Here we describe the vfio support for I/O subchannel devices for |
| 9 | Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a |
| 10 | virtual machine, while vfio is the means. |
| 11 | |
| 12 | Different than other hardware architectures, s390 has defined a unified |
| 13 | I/O access method, which is so called Channel I/O. It has its own access |
| 14 | patterns: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 15 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 16 | - Channel programs run asynchronously on a separate (co)processor. |
| 17 | - The channel subsystem will access any memory designated by the caller |
| 18 | in the channel program directly, i.e. there is no iommu involved. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 19 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 20 | Thus when we introduce vfio support for these devices, we realize it |
| 21 | with a mediated device (mdev) implementation. The vfio mdev will be |
| 22 | added to an iommu group, so as to make itself able to be managed by the |
| 23 | vfio framework. And we add read/write callbacks for special vfio I/O |
| 24 | regions to pass the channel programs from the mdev to its parent device |
| 25 | (the real I/O subchannel device) to do further address translation and |
| 26 | to perform I/O instructions. |
| 27 | |
| 28 | This document does not intend to explain the s390 I/O architecture in |
| 29 | every detail. More information/reference could be found here: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 30 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 31 | - A good start to know Channel I/O in general: |
| 32 | https://en.wikipedia.org/wiki/Channel_I/O |
| 33 | - s390 architecture: |
| 34 | s390 Principles of Operation manual (IBM Form. No. SA22-7832) |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 35 | - The existing QEMU code which implements a simple emulated channel |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 36 | subsystem could also be a good reference. It makes it easier to follow |
| 37 | the flow. |
| 38 | qemu/hw/s390x/css.c |
| 39 | |
| 40 | For vfio mediated device framework: |
Mauro Carvalho Chehab | baa293e | 2019-06-27 15:39:22 -0300 | [diff] [blame] | 41 | - Documentation/driver-api/vfio-mediated-device.rst |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 42 | |
| 43 | Motivation of vfio-ccw |
| 44 | ---------------------- |
| 45 | |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 46 | Typically, a guest virtualized via QEMU/KVM on s390 only sees |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 47 | paravirtualized virtio devices via the "Virtio Over Channel I/O |
| 48 | (virtio-ccw)" transport. This makes virtio devices discoverable via |
| 49 | standard operating system algorithms for handling channel devices. |
| 50 | |
| 51 | However this is not enough. On s390 for the majority of devices, which |
| 52 | use the standard Channel I/O based mechanism, we also need to provide |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 53 | the functionality of passing through them to a QEMU virtual machine. |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 54 | This includes devices that don't have a virtio counterpart (e.g. tape |
| 55 | drives) or that have specific characteristics which guests want to |
| 56 | exploit. |
| 57 | |
| 58 | For passing a device to a guest, we want to use the same interface as |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 59 | everybody else, namely vfio. We implement this vfio support for channel |
| 60 | devices via the vfio mediated device framework and the subchannel device |
| 61 | driver "vfio_ccw". |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 62 | |
| 63 | Access patterns of CCW devices |
| 64 | ------------------------------ |
| 65 | |
| 66 | s390 architecture has implemented a so called channel subsystem, that |
| 67 | provides a unified view of the devices physically attached to the |
| 68 | systems. Though the s390 hardware platform knows about a huge variety of |
| 69 | different peripheral attachments like disk devices (aka. DASDs), tapes, |
| 70 | communication controllers, etc. They can all be accessed by a well |
| 71 | defined access method and they are presenting I/O completion a unified |
| 72 | way: I/O interruptions. |
| 73 | |
| 74 | All I/O requires the use of channel command words (CCWs). A CCW is an |
| 75 | instruction to a specialized I/O channel processor. A channel program is |
| 76 | a sequence of CCWs which are executed by the I/O channel subsystem. To |
| 77 | issue a channel program to the channel subsystem, it is required to |
| 78 | build an operation request block (ORB), which can be used to point out |
| 79 | the format of the CCW and other control information to the system. The |
| 80 | operating system signals the I/O channel subsystem to begin executing |
| 81 | the channel program with a SSCH (start sub-channel) instruction. The |
| 82 | central processor is then free to proceed with non-I/O instructions |
| 83 | until interrupted. The I/O completion result is received by the |
| 84 | interrupt handler in the form of interrupt response block (IRB). |
| 85 | |
| 86 | Back to vfio-ccw, in short: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 87 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 88 | - ORBs and channel programs are built in guest kernel (with guest |
| 89 | physical addresses). |
| 90 | - ORBs and channel programs are passed to the host kernel. |
| 91 | - Host kernel translates the guest physical addresses to real addresses |
| 92 | and starts the I/O with issuing a privileged Channel I/O instruction |
| 93 | (e.g SSCH). |
| 94 | - channel programs run asynchronously on a separate processor. |
| 95 | - I/O completion will be signaled to the host with I/O interruptions. |
| 96 | And it will be copied as IRB to user space to pass it back to the |
| 97 | guest. |
| 98 | |
| 99 | Physical vfio ccw device and its child mdev |
| 100 | ------------------------------------------- |
| 101 | |
| 102 | As mentioned above, we realize vfio-ccw with a mdev implementation. |
| 103 | |
| 104 | Channel I/O does not have IOMMU hardware support, so the physical |
| 105 | vfio-ccw device does not have an IOMMU level translation or isolation. |
| 106 | |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 107 | Subchannel I/O instructions are all privileged instructions. When |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 108 | handling the I/O instruction interception, vfio-ccw has the software |
| 109 | policing and translation how the channel program is programmed before |
| 110 | it gets sent to hardware. |
| 111 | |
| 112 | Within this implementation, we have two drivers for two types of |
| 113 | devices: |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 114 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 115 | - The vfio_ccw driver for the physical subchannel device. |
| 116 | This is an I/O subchannel driver for the real subchannel device. It |
| 117 | realizes a group of callbacks and registers to the mdev framework as a |
| 118 | parent (physical) device. As a consequence, mdev provides vfio_ccw a |
| 119 | generic interface (sysfs) to create mdev devices. A vfio mdev could be |
| 120 | created by vfio_ccw then and added to the mediated bus. It is the vfio |
| 121 | device that added to an IOMMU group and a vfio group. |
| 122 | vfio_ccw also provides an I/O region to accept channel program |
| 123 | request from user space and store I/O interrupt result for user |
| 124 | space to retrieve. To notify user space an I/O completion, it offers |
| 125 | an interface to setup an eventfd fd for asynchronous signaling. |
| 126 | |
| 127 | - The vfio_mdev driver for the mediated vfio ccw device. |
| 128 | This is provided by the mdev framework. It is a vfio device driver for |
| 129 | the mdev that created by vfio_ccw. |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 130 | It realizes a group of vfio device driver callbacks, adds itself to a |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 131 | vfio group, and registers itself to the mdev framework as a mdev |
| 132 | driver. |
| 133 | It uses a vfio iommu backend that uses the existing map and unmap |
| 134 | ioctls, but rather than programming them into an IOMMU for a device, |
| 135 | it simply stores the translations for use by later requests. This |
| 136 | means that a device programmed in a VM with guest physical addresses |
| 137 | can have the vfio kernel convert that address to process virtual |
| 138 | address, pin the page and program the hardware with the host physical |
| 139 | address in one step. |
| 140 | For a mdev, the vfio iommu backend will not pin the pages during the |
| 141 | VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database |
| 142 | of the iova<->vaddr mappings in this operation. And they export a |
| 143 | vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu |
| 144 | backend for the physical devices to pin and unpin pages by demand. |
| 145 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 146 | Below is a high Level block diagram:: |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 147 | |
| 148 | +-------------+ |
| 149 | | | |
| 150 | | +---------+ | mdev_register_driver() +--------------+ |
| 151 | | | Mdev | +<-----------------------+ | |
| 152 | | | bus | | | vfio_mdev.ko | |
| 153 | | | driver | +----------------------->+ |<-> VFIO user |
| 154 | | +---------+ | probe()/remove() +--------------+ APIs |
| 155 | | | |
| 156 | | MDEV CORE | |
| 157 | | MODULE | |
| 158 | | mdev.ko | |
| 159 | | +---------+ | mdev_register_device() +--------------+ |
| 160 | | |Physical | +<-----------------------+ | |
| 161 | | | device | | | vfio_ccw.ko |<-> subchannel |
| 162 | | |interface| +----------------------->+ | device |
| 163 | | +---------+ | callback +--------------+ |
| 164 | +-------------+ |
| 165 | |
| 166 | The process of how these work together. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 167 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 168 | 1. vfio_ccw.ko drives the physical I/O subchannel, and registers the |
| 169 | physical device (with callbacks) to mdev framework. |
| 170 | When vfio_ccw probing the subchannel device, it registers device |
| 171 | pointer and callbacks to the mdev framework. Mdev related file nodes |
| 172 | under the device node in sysfs would be created for the subchannel |
| 173 | device, namely 'mdev_create', 'mdev_destroy' and |
| 174 | 'mdev_supported_types'. |
| 175 | 2. Create a mediated vfio ccw device. |
| 176 | Use the 'mdev_create' sysfs file, we need to manually create one (and |
| 177 | only one for our case) mediated device. |
| 178 | 3. vfio_mdev.ko drives the mediated ccw device. |
| 179 | vfio_mdev is also the vfio device drvier. It will probe the mdev and |
| 180 | add it to an iommu_group and a vfio_group. Then we could pass through |
| 181 | the mdev to a guest. |
| 182 | |
Farhan Ali | 127e621 | 2019-07-11 10:28:55 -0400 | [diff] [blame] | 183 | |
| 184 | VFIO-CCW Regions |
| 185 | ---------------- |
| 186 | |
| 187 | The vfio-ccw driver exposes MMIO regions to accept requests from and return |
| 188 | results to userspace. |
| 189 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 190 | vfio-ccw I/O region |
| 191 | ------------------- |
| 192 | |
| 193 | An I/O region is used to accept channel program request from user |
| 194 | space and store I/O interrupt result for user space to retrieve. The |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 195 | definition of the region is:: |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 196 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 197 | struct ccw_io_region { |
| 198 | #define ORB_AREA_SIZE 12 |
| 199 | __u8 orb_area[ORB_AREA_SIZE]; |
| 200 | #define SCSW_AREA_SIZE 12 |
| 201 | __u8 scsw_area[SCSW_AREA_SIZE]; |
| 202 | #define IRB_AREA_SIZE 96 |
| 203 | __u8 irb_area[IRB_AREA_SIZE]; |
| 204 | __u32 ret_code; |
| 205 | } __packed; |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 206 | |
Cornelia Huck | 430220b | 2020-04-07 13:16:05 +0200 | [diff] [blame] | 207 | This region is always available. |
| 208 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 209 | While starting an I/O request, orb_area should be filled with the |
| 210 | guest ORB, and scsw_area should be filled with the SCSW of the Virtual |
| 211 | Subchannel. |
| 212 | |
| 213 | irb_area stores the I/O result. |
| 214 | |
Cornelia Huck | 430220b | 2020-04-07 13:16:05 +0200 | [diff] [blame] | 215 | ret_code stores a return code for each access of the region. The following |
| 216 | values may occur: |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 217 | |
Cornelia Huck | 430220b | 2020-04-07 13:16:05 +0200 | [diff] [blame] | 218 | ``0`` |
| 219 | The operation was successful. |
| 220 | |
| 221 | ``-EOPNOTSUPP`` |
| 222 | The orb specified transport mode or an unidentified IDAW format, or the |
| 223 | scsw specified a function other than the start function. |
| 224 | |
| 225 | ``-EIO`` |
| 226 | A request was issued while the device was not in a state ready to accept |
| 227 | requests, or an internal error occurred. |
| 228 | |
| 229 | ``-EBUSY`` |
| 230 | The subchannel was status pending or busy, or a request is already active. |
| 231 | |
| 232 | ``-EAGAIN`` |
| 233 | A request was being processed, and the caller should retry. |
| 234 | |
| 235 | ``-EACCES`` |
| 236 | The channel path(s) used for the I/O were found to be not operational. |
| 237 | |
| 238 | ``-ENODEV`` |
| 239 | The device was found to be not operational. |
| 240 | |
| 241 | ``-EINVAL`` |
| 242 | The orb specified a chain longer than 255 ccws, or an internal error |
| 243 | occurred. |
| 244 | |
Farhan Ali | 127e621 | 2019-07-11 10:28:55 -0400 | [diff] [blame] | 245 | |
| 246 | vfio-ccw cmd region |
| 247 | ------------------- |
| 248 | |
| 249 | The vfio-ccw cmd region is used to accept asynchronous instructions |
Cornelia Huck | 4c4cbba | 2019-07-17 11:35:35 +0200 | [diff] [blame] | 250 | from userspace:: |
Farhan Ali | 127e621 | 2019-07-11 10:28:55 -0400 | [diff] [blame] | 251 | |
Cornelia Huck | 4c4cbba | 2019-07-17 11:35:35 +0200 | [diff] [blame] | 252 | #define VFIO_CCW_ASYNC_CMD_HSCH (1 << 0) |
| 253 | #define VFIO_CCW_ASYNC_CMD_CSCH (1 << 1) |
| 254 | struct ccw_cmd_region { |
| 255 | __u32 command; |
| 256 | __u32 ret_code; |
| 257 | } __packed; |
Farhan Ali | 127e621 | 2019-07-11 10:28:55 -0400 | [diff] [blame] | 258 | |
| 259 | This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD. |
| 260 | |
| 261 | Currently, CLEAR SUBCHANNEL and HALT SUBCHANNEL use this region. |
| 262 | |
Cornelia Huck | 430220b | 2020-04-07 13:16:05 +0200 | [diff] [blame] | 263 | command specifies the command to be issued; ret_code stores a return code |
| 264 | for each access of the region. The following values may occur: |
| 265 | |
| 266 | ``0`` |
| 267 | The operation was successful. |
| 268 | |
| 269 | ``-ENODEV`` |
| 270 | The device was found to be not operational. |
| 271 | |
| 272 | ``-EINVAL`` |
| 273 | A command other than halt or clear was specified. |
| 274 | |
| 275 | ``-EIO`` |
| 276 | A request was issued while the device was not in a state ready to accept |
| 277 | requests. |
| 278 | |
| 279 | ``-EAGAIN`` |
| 280 | A request was being processed, and the caller should retry. |
| 281 | |
| 282 | ``-EBUSY`` |
| 283 | The subchannel was status pending or busy while processing a halt request. |
| 284 | |
Farhan Ali | 24c9867 | 2020-05-05 14:27:41 +0200 | [diff] [blame] | 285 | vfio-ccw schib region |
| 286 | --------------------- |
| 287 | |
| 288 | The vfio-ccw schib region is used to return Subchannel-Information |
| 289 | Block (SCHIB) data to userspace:: |
| 290 | |
| 291 | struct ccw_schib_region { |
| 292 | #define SCHIB_AREA_SIZE 52 |
| 293 | __u8 schib_area[SCHIB_AREA_SIZE]; |
| 294 | } __packed; |
| 295 | |
| 296 | This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_SCHIB. |
| 297 | |
| 298 | Reading this region triggers a STORE SUBCHANNEL to be issued to the |
| 299 | associated hardware. |
Cornelia Huck | 430220b | 2020-04-07 13:16:05 +0200 | [diff] [blame] | 300 | |
Farhan Ali | d8cac29 | 2020-05-05 14:27:43 +0200 | [diff] [blame] | 301 | vfio-ccw crw region |
| 302 | --------------------- |
| 303 | |
| 304 | The vfio-ccw crw region is used to return Channel Report Word (CRW) |
| 305 | data to userspace:: |
| 306 | |
| 307 | struct ccw_crw_region { |
| 308 | __u32 crw; |
| 309 | __u32 pad; |
| 310 | } __packed; |
| 311 | |
| 312 | This region is exposed via region type VFIO_REGION_SUBTYPE_CCW_CRW. |
| 313 | |
| 314 | Reading this region returns a CRW if one that is relevant for this |
| 315 | subchannel (e.g. one reporting changes in channel path state) is |
| 316 | pending, or all zeroes if not. If multiple CRWs are pending (including |
| 317 | possibly chained CRWs), reading this region again will return the next |
| 318 | one, until no more CRWs are pending and zeroes are returned. This is |
| 319 | similar to how STORE CHANNEL REPORT WORD works. |
| 320 | |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 321 | vfio-ccw operation details |
| 322 | -------------------------- |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 323 | |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 324 | vfio-ccw follows what vfio-pci did on the s390 platform and uses |
| 325 | vfio-iommu-type1 as the vfio iommu backend. |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 326 | |
| 327 | * CCW translation APIs |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 328 | A group of APIs (start with `cp_`) to do CCW translation. The CCWs |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 329 | passed in by a user space program are organized with their guest |
| 330 | physical memory addresses. These APIs will copy the CCWs into kernel |
| 331 | space, and assemble a runnable kernel channel program by updating the |
| 332 | guest physical addresses with their corresponding host physical addresses. |
| 333 | Note that we have to use IDALs even for direct-access CCWs, as the |
| 334 | referenced memory can be located anywhere, including above 2G. |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 335 | |
| 336 | * vfio_ccw device driver |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 337 | This driver utilizes the CCW translation APIs and introduces |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 338 | vfio_ccw, which is the driver for the I/O subchannel devices you want |
| 339 | to pass through. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 340 | vfio_ccw implements the following vfio ioctls:: |
| 341 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 342 | VFIO_DEVICE_GET_INFO |
| 343 | VFIO_DEVICE_GET_IRQ_INFO |
| 344 | VFIO_DEVICE_GET_REGION_INFO |
| 345 | VFIO_DEVICE_RESET |
| 346 | VFIO_DEVICE_SET_IRQS |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 347 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 348 | This provides an I/O region, so that the user space program can pass a |
| 349 | channel program to the kernel, to do further CCW translation before |
| 350 | issuing them to a real device. |
| 351 | This also provides the SET_IRQ ioctl to setup an event notifier to |
| 352 | notify the user space program the I/O completion in an asynchronous |
| 353 | way. |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 354 | |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 355 | The use of vfio-ccw is not limited to QEMU, while QEMU is definitely a |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 356 | good example to get understand how these patches work. Here is a little |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 357 | bit more detail how an I/O request triggered by the QEMU guest will be |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 358 | handled (without error handling). |
| 359 | |
| 360 | Explanation: |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 361 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 362 | - Q1-Q7: QEMU side process. |
| 363 | - K1-K5: Kernel side process. |
| 364 | |
| 365 | Q1. |
| 366 | Get I/O region info during initialization. |
| 367 | |
| 368 | Q2. |
| 369 | Setup event notifier and handler to handle I/O completion. |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 370 | |
| 371 | ... ... |
| 372 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 373 | Q3. |
| 374 | Intercept a ssch instruction. |
| 375 | Q4. |
| 376 | Write the guest channel program and ORB to the I/O region. |
| 377 | |
| 378 | K1. |
| 379 | Copy from guest to kernel. |
| 380 | K2. |
| 381 | Translate the guest channel program to a host kernel space |
| 382 | channel program, which becomes runnable for a real device. |
| 383 | K3. |
| 384 | With the necessary information contained in the orb passed in |
| 385 | by QEMU, issue the ccwchain to the device. |
| 386 | K4. |
| 387 | Return the ssch CC code. |
| 388 | Q5. |
| 389 | Return the CC code to the guest. |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 390 | |
| 391 | ... ... |
| 392 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 393 | K5. |
| 394 | Interrupt handler gets the I/O result and write the result to |
| 395 | the I/O region. |
| 396 | K6. |
| 397 | Signal QEMU to retrieve the result. |
| 398 | |
| 399 | Q6. |
| 400 | Get the signal and event handler reads out the result from the I/O |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 401 | region. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 402 | Q7. |
| 403 | Update the irb for the guest. |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 404 | |
| 405 | Limitations |
| 406 | ----------- |
| 407 | |
| 408 | The current vfio-ccw implementation focuses on supporting basic commands |
| 409 | needed to implement block device functionality (read/write) of DASD/ECKD |
| 410 | device only. Some commands may need special handling in the future, for |
| 411 | example, anything related to path grouping. |
| 412 | |
| 413 | DASD is a kind of storage device. While ECKD is a data recording format. |
| 414 | More information for DASD and ECKD could be found here: |
| 415 | https://en.wikipedia.org/wiki/Direct-access_storage_device |
| 416 | https://en.wikipedia.org/wiki/Count_key_data |
| 417 | |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 418 | Together with the corresponding work in QEMU, we can bring the passed |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 419 | through DASD/ECKD device online in a guest now and use it as a block |
| 420 | device. |
| 421 | |
Farhan Ali | 127e621 | 2019-07-11 10:28:55 -0400 | [diff] [blame] | 422 | The current code allows the guest to start channel programs via |
Farhan Ali | 24c9867 | 2020-05-05 14:27:41 +0200 | [diff] [blame] | 423 | START SUBCHANNEL, and to issue HALT SUBCHANNEL, CLEAR SUBCHANNEL, |
| 424 | and STORE SUBCHANNEL. |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 425 | |
Jared Rossi | 725b94d | 2020-05-06 17:24:40 -0400 | [diff] [blame] | 426 | Currently all channel programs are prefetched, regardless of the |
| 427 | p-bit setting in the ORB. As a result, self modifying channel |
| 428 | programs are not supported. For this reason, IPL has to be handled as |
| 429 | a special case by a userspace/guest program; this has been implemented |
| 430 | in QEMU's s390-ccw bios as of QEMU 4.1. |
| 431 | |
Cornelia Huck | 69cfd92 | 2018-01-11 17:58:43 +0100 | [diff] [blame] | 432 | vfio-ccw supports classic (command mode) channel I/O only. Transport |
| 433 | mode (HPF) is not supported. |
| 434 | |
| 435 | QDIO subchannels are currently not supported. Classic devices other than |
| 436 | DASD/ECKD might work, but have not been tested. |
| 437 | |
Dong Jia Shi | 25627ba | 2017-03-17 04:17:41 +0100 | [diff] [blame] | 438 | Reference |
| 439 | --------- |
| 440 | 1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832) |
| 441 | 2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204) |
| 442 | 3. https://en.wikipedia.org/wiki/Channel_I/O |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 443 | 4. Documentation/s390/cds.rst |
Mauro Carvalho Chehab | baa293e | 2019-06-27 15:39:22 -0300 | [diff] [blame] | 444 | 5. Documentation/driver-api/vfio.rst |
| 445 | 6. Documentation/driver-api/vfio-mediated-device.rst |