Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 1 | ================================================= |
| 2 | Linux API for read access to z/VM Monitor Records |
| 3 | ================================================= |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 4 | |
| 5 | Date : 2004-Nov-26 |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 6 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 7 | Author: Gerald Schaefer (geraldsc@de.ibm.com) |
| 8 | |
| 9 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 10 | |
| 11 | |
| 12 | Description |
| 13 | =========== |
| 14 | This item delivers a new Linux API in the form of a misc char device that is |
Carlos Garcia | c98be0c | 2014-04-04 22:31:00 -0400 | [diff] [blame] | 15 | usable from user space and allows read access to the z/VM Monitor Records |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 16 | collected by the `*MONITOR` System Service of z/VM. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 17 | |
| 18 | |
| 19 | User Requirements |
| 20 | ================= |
| 21 | The z/VM guest on which you want to access this API needs to be configured in |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 22 | order to allow IUCV connections to the `*MONITOR` service, i.e. it needs the |
| 23 | IUCV `*MONITOR` statement in its user entry. If the monitor DCSS to be used is |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 24 | restricted (likely), you also need the NAMESAVE <DCSS NAME> statement. |
| 25 | This item will use the IUCV device driver to access the z/VM services, so you |
| 26 | need a kernel with IUCV support. You also need z/VM version 4.4 or 5.1. |
| 27 | |
| 28 | There are two options for being able to load the monitor DCSS (examples assume |
| 29 | that the monitor DCSS begins at 144 MB and ends at 152 MB). You can query the |
| 30 | location of the monitor DCSS with the Class E privileged CP command Q NSS MAP |
| 31 | (the values BEGPAG and ENDPAG are given in units of 4K pages). |
| 32 | |
| 33 | See also "CP Command and Utility Reference" (SC24-6081-00) for more information |
| 34 | on the DEF STOR and Q NSS MAP commands, as well as "Saved Segments Planning |
| 35 | and Administration" (SC24-6116-00) for more information on DCSSes. |
| 36 | |
| 37 | 1st option: |
| 38 | ----------- |
| 39 | You can use the CP command DEF STOR CONFIG to define a "memory hole" in your |
| 40 | guest virtual storage around the address range of the DCSS. |
| 41 | |
| 42 | Example: DEF STOR CONFIG 0.140M 200M.200M |
| 43 | |
| 44 | This defines two blocks of storage, the first is 140MB in size an begins at |
| 45 | address 0MB, the second is 200MB in size and begins at address 200MB, |
| 46 | resulting in a total storage of 340MB. Note that the first block should |
| 47 | always start at 0 and be at least 64MB in size. |
| 48 | |
| 49 | 2nd option: |
| 50 | ----------- |
| 51 | Your guest virtual storage has to end below the starting address of the DCSS |
| 52 | and you have to specify the "mem=" kernel parameter in your parmfile with a |
| 53 | value greater than the ending address of the DCSS. |
| 54 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 55 | Example:: |
| 56 | |
| 57 | DEF STOR 140M |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 58 | |
| 59 | This defines 140MB storage size for your guest, the parameter "mem=160M" is |
| 60 | added to the parmfile. |
| 61 | |
| 62 | |
| 63 | User Interface |
| 64 | ============== |
| 65 | The char device is implemented as a kernel module named "monreader", |
| 66 | which can be loaded via the modprobe command, or it can be compiled into the |
| 67 | kernel instead. There is one optional module (or kernel) parameter, "mondcss", |
| 68 | to specify the name of the monitor DCSS. If the module is compiled into the |
| 69 | kernel, the kernel parameter "monreader.mondcss=<DCSS NAME>" can be specified |
| 70 | in the parmfile. |
| 71 | |
| 72 | The default name for the DCSS is "MONDCSS" if none is specified. In case that |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 73 | there are other users already connected to the `*MONITOR` service (e.g. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 74 | Performance Toolkit), the monitor DCSS is already defined and you have to use |
| 75 | the same DCSS. The CP command Q MONITOR (Class E privileged) shows the name |
| 76 | of the monitor DCSS, if already defined, and the users connected to the |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 77 | `*MONITOR` service. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 78 | Refer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor |
| 79 | DCSS if your z/VM doesn't have one already, you need Class E privileges to |
| 80 | define and save a DCSS. |
| 81 | |
| 82 | Example: |
| 83 | -------- |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 84 | |
| 85 | :: |
| 86 | |
| 87 | modprobe monreader mondcss=MYDCSS |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 88 | |
| 89 | This loads the module and sets the DCSS name to "MYDCSS". |
| 90 | |
| 91 | NOTE: |
| 92 | ----- |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 93 | This API provides no interface to control the `*MONITOR` service, e.g. specify |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 94 | which data should be collected. This can be done by the CP command MONITOR |
| 95 | (Class E privileged), see "CP Command and Utility Reference". |
| 96 | |
| 97 | Device nodes with udev: |
| 98 | ----------------------- |
| 99 | After loading the module, a char device will be created along with the device |
| 100 | node /<udev directory>/monreader. |
| 101 | |
| 102 | Device nodes without udev: |
| 103 | -------------------------- |
| 104 | If your distribution does not support udev, a device node will not be created |
| 105 | automatically and you have to create it manually after loading the module. |
| 106 | Therefore you need to know the major and minor numbers of the device. These |
| 107 | numbers can be found in /sys/class/misc/monreader/dev. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 108 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 109 | Typing cat /sys/class/misc/monreader/dev will give an output of the form |
| 110 | <major>:<minor>. The device node can be created via the mknod command, enter |
| 111 | mknod <name> c <major> <minor>, where <name> is the name of the device node |
| 112 | to be created. |
| 113 | |
| 114 | Example: |
| 115 | -------- |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 116 | |
| 117 | :: |
| 118 | |
| 119 | # modprobe monreader |
| 120 | # cat /sys/class/misc/monreader/dev |
| 121 | 10:63 |
| 122 | # mknod /dev/monreader c 10 63 |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 123 | |
| 124 | This loads the module with the default monitor DCSS (MONDCSS) and creates a |
| 125 | device node. |
| 126 | |
| 127 | File operations: |
| 128 | ---------------- |
| 129 | The following file operations are supported: open, release, read, poll. |
| 130 | There are two alternative methods for reading: either non-blocking read in |
| 131 | conjunction with polling, or blocking read without polling. IOCTLs are not |
| 132 | supported. |
| 133 | |
| 134 | Read: |
| 135 | ----- |
| 136 | Reading from the device provides a 12 Byte monitor control element (MCE), |
| 137 | followed by a set of one or more contiguous monitor records (similar to the |
| 138 | output of the CMS utility MONWRITE without the 4K control blocks). The MCE |
| 139 | contains information on the type of the following record set (sample/event |
| 140 | data), the monitor domains contained within it and the start and end address |
| 141 | of the record set in the monitor DCSS. The start and end address can be used |
| 142 | to determine the size of the record set, the end address is the address of the |
| 143 | last byte of data. The start address is needed to handle "end-of-frame" records |
| 144 | correctly (domain 1, record 13), i.e. it can be used to determine the record |
| 145 | start offset relative to a 4K page (frame) boundary. |
| 146 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 147 | See "Appendix A: `*MONITOR`" in the "z/VM Performance" document for a description |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 148 | of the monitor control element layout. The layout of the monitor records can |
Alexander A. Klimov | 90a9f51 | 2020-07-09 20:27:42 +0200 | [diff] [blame] | 149 | be found here (z/VM 5.1): https://www.vm.ibm.com/pubs/mon510/index.html |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 150 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 151 | The layout of the data stream provided by the monreader device is as follows:: |
| 152 | |
| 153 | ... |
| 154 | <0 byte read> |
| 155 | <first MCE> \ |
| 156 | <first set of records> | |
| 157 | ... |- data set |
| 158 | <last MCE> | |
| 159 | <last set of records> / |
| 160 | <0 byte read> |
| 161 | ... |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 162 | |
| 163 | There may be more than one combination of MCE and corresponding record set |
| 164 | within one data set and the end of each data set is indicated by a successful |
| 165 | read with a return value of 0 (0 byte read). |
| 166 | Any received data must be considered invalid until a complete set was |
| 167 | read successfully, including the closing 0 byte read. Therefore you should |
| 168 | always read the complete set into a buffer before processing the data. |
| 169 | |
| 170 | The maximum size of a data set can be as large as the size of the |
| 171 | monitor DCSS, so design the buffer adequately or use dynamic memory allocation. |
| 172 | The size of the monitor DCSS will be printed into syslog after loading the |
| 173 | module. You can also use the (Class E privileged) CP command Q NSS MAP to |
| 174 | list all available segments and information about them. |
| 175 | |
| 176 | As with most char devices, error conditions are indicated by returning a |
| 177 | negative value for the number of bytes read. In this case, the errno variable |
| 178 | indicates the error condition: |
| 179 | |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 180 | EIO: |
| 181 | reply failed, read data is invalid and the application |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 182 | should discard the data read since the last successful read with 0 size. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 183 | EFAULT: |
| 184 | copy_to_user failed, read data is invalid and the application should |
| 185 | discard the data read since the last successful read with 0 size. |
| 186 | EAGAIN: |
| 187 | occurs on a non-blocking read if there is no data available at the |
| 188 | moment. There is no data missing or corrupted, just try again or rather |
| 189 | use polling for non-blocking reads. |
| 190 | EOVERFLOW: |
| 191 | message limit reached, the data read since the last successful |
| 192 | read with 0 size is valid but subsequent records may be missing. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 193 | |
| 194 | In the last case (EOVERFLOW) there may be missing data, in the first two cases |
| 195 | (EIO, EFAULT) there will be missing data. It's up to the application if it will |
| 196 | continue reading subsequent data or rather exit. |
| 197 | |
| 198 | Open: |
| 199 | ----- |
| 200 | Only one user is allowed to open the char device. If it is already in use, the |
| 201 | open function will fail (return a negative value) and set errno to EBUSY. |
Mauro Carvalho Chehab | 8b4a503 | 2019-06-08 23:27:16 -0300 | [diff] [blame] | 202 | The open function may also fail if an IUCV connection to the `*MONITOR` service |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 203 | cannot be established. In this case errno will be set to EIO and an error |
| 204 | message with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER |
| 205 | codes are described in the "z/VM Performance" book, Appendix A. |
| 206 | |
| 207 | NOTE: |
| 208 | ----- |
| 209 | As soon as the device is opened, incoming messages will be accepted and they |
| 210 | will account for the message limit, i.e. opening the device without reading |
| 211 | from it will provoke the "message limit reached" error (EOVERFLOW error code) |
| 212 | eventually. |