Kenneth Lee | aa017ab9 | 2020-02-11 15:54:22 +0800 | [diff] [blame] | 1 | .. SPDX-License-Identifier: GPL-2.0 |
| 2 | |
| 3 | Introduction of Uacce |
| 4 | --------------------- |
| 5 | |
| 6 | Uacce (Unified/User-space-access-intended Accelerator Framework) targets to |
| 7 | provide Shared Virtual Addressing (SVA) between accelerators and processes. |
| 8 | So accelerator can access any data structure of the main cpu. |
| 9 | This differs from the data sharing between cpu and io device, which share |
| 10 | only data content rather than address. |
| 11 | Because of the unified address, hardware and user space of process can |
| 12 | share the same virtual address in the communication. |
| 13 | Uacce takes the hardware accelerator as a heterogeneous processor, while |
| 14 | IOMMU share the same CPU page tables and as a result the same translation |
| 15 | from va to pa. |
| 16 | |
| 17 | :: |
| 18 | |
| 19 | __________________________ __________________________ |
| 20 | | | | | |
| 21 | | User application (CPU) | | Hardware Accelerator | |
| 22 | |__________________________| |__________________________| |
| 23 | |
| 24 | | | |
| 25 | | va | va |
| 26 | V V |
| 27 | __________ __________ |
| 28 | | | | | |
| 29 | | MMU | | IOMMU | |
| 30 | |__________| |__________| |
| 31 | | | |
| 32 | | | |
| 33 | V pa V pa |
| 34 | _______________________________________ |
| 35 | | | |
| 36 | | Memory | |
| 37 | |_______________________________________| |
| 38 | |
| 39 | |
| 40 | |
| 41 | Architecture |
| 42 | ------------ |
| 43 | |
| 44 | Uacce is the kernel module, taking charge of iommu and address sharing. |
| 45 | The user drivers and libraries are called WarpDrive. |
| 46 | |
| 47 | The uacce device, built around the IOMMU SVA API, can access multiple |
| 48 | address spaces, including the one without PASID. |
| 49 | |
| 50 | A virtual concept, queue, is used for the communication. It provides a |
| 51 | FIFO-like interface. And it maintains a unified address space between the |
| 52 | application and all involved hardware. |
| 53 | |
| 54 | :: |
| 55 | |
| 56 | ___________________ ________________ |
| 57 | | | user API | | |
| 58 | | WarpDrive library | ------------> | user driver | |
| 59 | |___________________| |________________| |
| 60 | | | |
| 61 | | | |
| 62 | | queue fd | |
| 63 | | | |
| 64 | | | |
| 65 | v | |
| 66 | ___________________ _________ | |
| 67 | | | | | | mmap memory |
| 68 | | Other framework | | uacce | | r/w interface |
| 69 | | crypto/nic/others | |_________| | |
| 70 | |___________________| | |
| 71 | | | | |
| 72 | | register | register | |
| 73 | | | | |
| 74 | | | | |
| 75 | | _________________ __________ | |
| 76 | | | | | | | |
| 77 | ------------- | Device Driver | | IOMMU | | |
| 78 | |_________________| |__________| | |
| 79 | | | |
| 80 | | V |
| 81 | | ___________________ |
| 82 | | | | |
| 83 | -------------------------- | Device(Hardware) | |
| 84 | |___________________| |
| 85 | |
| 86 | |
| 87 | How does it work |
| 88 | ---------------- |
| 89 | |
| 90 | Uacce uses mmap and IOMMU to play the trick. |
| 91 | |
| 92 | Uacce creates a chrdev for every device registered to it. New queue is |
| 93 | created when user application open the chrdev. The file descriptor is used |
| 94 | as the user handle of the queue. |
| 95 | The accelerator device present itself as an Uacce object, which exports as |
| 96 | a chrdev to the user space. The user application communicates with the |
| 97 | hardware by ioctl (as control path) or share memory (as data path). |
| 98 | |
| 99 | The control path to the hardware is via file operation, while data path is |
| 100 | via mmap space of the queue fd. |
| 101 | |
| 102 | The queue file address space: |
| 103 | |
| 104 | :: |
| 105 | |
| 106 | /** |
| 107 | * enum uacce_qfrt: qfrt type |
| 108 | * @UACCE_QFRT_MMIO: device mmio region |
| 109 | * @UACCE_QFRT_DUS: device user share region |
| 110 | */ |
| 111 | enum uacce_qfrt { |
| 112 | UACCE_QFRT_MMIO = 0, |
| 113 | UACCE_QFRT_DUS = 1, |
| 114 | }; |
| 115 | |
| 116 | All regions are optional and differ from device type to type. |
| 117 | Each region can be mmapped only once, otherwise -EEXIST returns. |
| 118 | |
| 119 | The device mmio region is mapped to the hardware mmio space. It is generally |
| 120 | used for doorbell or other notification to the hardware. It is not fast enough |
| 121 | as data channel. |
| 122 | |
| 123 | The device user share region is used for share data buffer between user process |
| 124 | and device. |
| 125 | |
| 126 | |
| 127 | The Uacce register API |
| 128 | ---------------------- |
| 129 | |
| 130 | The register API is defined in uacce.h. |
| 131 | |
| 132 | :: |
| 133 | |
| 134 | struct uacce_interface { |
| 135 | char name[UACCE_MAX_NAME_SIZE]; |
| 136 | unsigned int flags; |
| 137 | const struct uacce_ops *ops; |
| 138 | }; |
| 139 | |
| 140 | According to the IOMMU capability, uacce_interface flags can be: |
| 141 | |
| 142 | :: |
| 143 | |
| 144 | /** |
| 145 | * UACCE Device flags: |
| 146 | * UACCE_DEV_SVA: Shared Virtual Addresses |
| 147 | * Support PASID |
| 148 | * Support device page faults (PCI PRI or SMMU Stall) |
| 149 | */ |
| 150 | #define UACCE_DEV_SVA BIT(0) |
| 151 | |
| 152 | struct uacce_device *uacce_alloc(struct device *parent, |
| 153 | struct uacce_interface *interface); |
| 154 | int uacce_register(struct uacce_device *uacce); |
| 155 | void uacce_remove(struct uacce_device *uacce); |
| 156 | |
| 157 | uacce_register results can be: |
| 158 | |
| 159 | a. If uacce module is not compiled, ERR_PTR(-ENODEV) |
| 160 | |
| 161 | b. Succeed with the desired flags |
| 162 | |
| 163 | c. Succeed with the negotiated flags, for example |
| 164 | |
| 165 | uacce_interface.flags = UACCE_DEV_SVA but uacce->flags = ~UACCE_DEV_SVA |
| 166 | |
| 167 | So user driver need check return value as well as the negotiated uacce->flags. |
| 168 | |
| 169 | |
| 170 | The user driver |
| 171 | --------------- |
| 172 | |
| 173 | The queue file mmap space will need a user driver to wrap the communication |
| 174 | protocol. Uacce provides some attributes in sysfs for the user driver to |
| 175 | match the right accelerator accordingly. |
| 176 | More details in Documentation/ABI/testing/sysfs-driver-uacce. |