Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 1 | ====================== |
| 2 | Firmware-Assisted Dump |
| 3 | ====================== |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 4 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 5 | July 2011 |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 6 | |
| 7 | The goal of firmware-assisted dump is to enable the dump of |
| 8 | a crashed system, and to do so from a fully-reset system, and |
| 9 | to minimize the total elapsed time until the system is back |
| 10 | in production use. |
| 11 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 12 | - Firmware-Assisted Dump (FADump) infrastructure is intended to replace |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 13 | the existing phyp assisted dump. |
| 14 | - Fadump uses the same firmware interfaces and memory reservation model |
| 15 | as phyp assisted dump. |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 16 | - Unlike phyp dump, FADump exports the memory dump through /proc/vmcore |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 17 | in the ELF format in the same way as kdump. This helps us reuse the |
| 18 | kdump infrastructure for dump capture and filtering. |
| 19 | - Unlike phyp dump, userspace tool does not need to refer any sysfs |
| 20 | interface while reading /proc/vmcore. |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 21 | - Unlike phyp dump, FADump allows user to release all the memory reserved |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 22 | for dump, with a single operation of echo 1 > /sys/kernel/fadump_release_mem. |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 23 | - Once enabled through kernel boot parameter, FADump can be |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 24 | started/stopped through /sys/kernel/fadump_registered interface (see |
| 25 | sysfs files section below) and can be easily integrated with kdump |
| 26 | service start/stop init scripts. |
| 27 | |
| 28 | Comparing with kdump or other strategies, firmware-assisted |
| 29 | dump offers several strong, practical advantages: |
| 30 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 31 | - Unlike kdump, the system has been reset, and loaded |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 32 | with a fresh copy of the kernel. In particular, |
| 33 | PCI and I/O devices have been reinitialized and are |
| 34 | in a clean, consistent state. |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 35 | - Once the dump is copied out, the memory that held the dump |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 36 | is immediately available to the running kernel. And therefore, |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 37 | unlike kdump, FADump doesn't need a 2nd reboot to get back |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 38 | the system to the production configuration. |
| 39 | |
| 40 | The above can only be accomplished by coordination with, |
| 41 | and assistance from the Power firmware. The procedure is |
| 42 | as follows: |
| 43 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 44 | - The first kernel registers the sections of memory with the |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 45 | Power firmware for dump preservation during OS initialization. |
| 46 | These registered sections of memory are reserved by the first |
| 47 | kernel during early boot. |
| 48 | |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 49 | - When system crashes, the Power firmware will copy the registered |
| 50 | low memory regions (boot memory) from source to destination area. |
| 51 | It will also save hardware PTE's. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 52 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 53 | NOTE: |
| 54 | The term 'boot memory' means size of the low memory chunk |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 55 | that is required for a kernel to boot successfully when |
| 56 | booted with restricted memory. By default, the boot memory |
| 57 | size will be the larger of 5% of system RAM or 256MB. |
| 58 | Alternatively, user can also specify boot memory size |
Hari Bathini | 92019ef | 2017-05-08 15:56:31 -0700 | [diff] [blame] | 59 | through boot parameter 'crashkernel=' which will override |
| 60 | the default calculated size. Use this option if default |
| 61 | boot memory size is not sufficient for second kernel to |
| 62 | boot successfully. For syntax of crashkernel= parameter, |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 63 | refer to Documentation/admin-guide/kdump/kdump.rst. If any |
| 64 | offset is provided in crashkernel= parameter, it will be |
| 65 | ignored as FADump uses a predefined offset to reserve memory |
Hari Bathini | e7467dc | 2017-05-22 15:04:47 +0530 | [diff] [blame] | 66 | for boot memory dump preservation in case of a crash. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 67 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 68 | - After the low memory (boot memory) area has been saved, the |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 69 | firmware will reset PCI and other hardware state. It will |
| 70 | *not* clear the RAM. It will then launch the bootloader, as |
| 71 | normal. |
| 72 | |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 73 | - The freshly booted kernel will notice that there is a new node |
| 74 | (rtas/ibm,kernel-dump on pSeries or ibm,opal/dump/mpipl-boot |
| 75 | on OPAL platform) in the device tree, indicating that |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 76 | there is crash data available from a previous boot. During |
| 77 | the early boot OS will reserve rest of the memory above |
| 78 | boot memory size effectively booting with restricted memory |
Hari Bathini | 8468d15 | 2019-09-11 20:17:07 +0530 | [diff] [blame] | 79 | size. This will make sure that this kernel (also, referred |
| 80 | to as second kernel or capture kernel) will not touch any |
| 81 | of the dump memory area. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 82 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 83 | - User-space tools will read /proc/vmcore to obtain the contents |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 84 | of memory, which holds the previous crashed kernel dump in ELF |
| 85 | format. The userspace tools may copy this info to disk, or |
| 86 | network, nas, san, iscsi, etc. as desired. |
| 87 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 88 | - Once the userspace tool is done saving dump, it will echo |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 89 | '1' to /sys/kernel/fadump_release_mem to release the reserved |
| 90 | memory back to general use, except the memory required for |
| 91 | next firmware-assisted dump registration. |
| 92 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 93 | e.g.:: |
| 94 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 95 | # echo 1 > /sys/kernel/fadump_release_mem |
| 96 | |
| 97 | Please note that the firmware-assisted dump feature |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 98 | is only available on POWER6 and above systems on pSeries |
| 99 | (PowerVM) platform and POWER9 and above systems with OP940 |
| 100 | or later firmware versions on PowerNV (OPAL) platform. |
| 101 | Note that, OPAL firmware exports ibm,opal/dump node when |
| 102 | FADump is supported on PowerNV platform. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 103 | |
Hari Bathini | 58cf055 | 2019-09-11 20:26:16 +0530 | [diff] [blame] | 104 | On OPAL based machines, system first boots into an intermittent |
| 105 | kernel (referred to as petitboot kernel) before booting into the |
| 106 | capture kernel. This kernel would have minimal kernel and/or |
| 107 | userspace support to process crash data. Such kernel needs to |
| 108 | preserve previously crash'ed kernel's memory for the subsequent |
| 109 | capture kernel boot to process this crash data. Kernel config |
| 110 | option CONFIG_PRESERVE_FA_DUMP has to be enabled on such kernel |
| 111 | to ensure that crash data is preserved to process later. |
| 112 | |
Hari Bathini | b3bba79 | 2019-09-11 20:27:12 +0530 | [diff] [blame] | 113 | -- On OPAL based machines (PowerNV), if the kernel is build with |
| 114 | CONFIG_OPAL_CORE=y, OPAL memory at the time of crash is also |
Sourabh Jain | 8852c07 | 2019-12-11 21:39:08 +0530 | [diff] [blame] | 115 | exported as /sys/firmware/opal/mpipl/core file. This procfs file is |
Hari Bathini | b3bba79 | 2019-09-11 20:27:12 +0530 | [diff] [blame] | 116 | helpful in debugging OPAL crashes with GDB. The kernel memory |
| 117 | used for exporting this procfs file can be released by echo'ing |
Sourabh Jain | 8852c07 | 2019-12-11 21:39:08 +0530 | [diff] [blame] | 118 | '1' to /sys/firmware/opal/mpipl/release_core node. |
Hari Bathini | b3bba79 | 2019-09-11 20:27:12 +0530 | [diff] [blame] | 119 | |
| 120 | e.g. |
Sourabh Jain | 8852c07 | 2019-12-11 21:39:08 +0530 | [diff] [blame] | 121 | # echo 1 > /sys/firmware/opal/mpipl/release_core |
Hari Bathini | b3bba79 | 2019-09-11 20:27:12 +0530 | [diff] [blame] | 122 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 123 | Implementation details: |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 124 | ----------------------- |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 125 | |
| 126 | During boot, a check is made to see if firmware supports |
| 127 | this feature on that particular machine. If it does, then |
| 128 | we check to see if an active dump is waiting for us. If yes |
| 129 | then everything but boot memory size of RAM is reserved during |
| 130 | early boot (See Fig. 2). This area is released once we finish |
| 131 | collecting the dump from user land scripts (e.g. kdump scripts) |
| 132 | that are run. If there is dump data, then the |
| 133 | /sys/kernel/fadump_release_mem file is created, and the reserved |
| 134 | memory is held. |
| 135 | |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 136 | If there is no waiting dump data, then only the memory required to |
| 137 | hold CPU state, HPTE region, boot memory dump, FADump header and |
| 138 | elfcore header, is usually reserved at an offset greater than boot |
| 139 | memory size (see Fig. 1). This area is *not* released: this region |
| 140 | will be kept permanently reserved, so that it can act as a receptacle |
| 141 | for a copy of the boot memory content in addition to CPU state and |
| 142 | HPTE region, in the case a crash does occur. |
| 143 | |
| 144 | Since this reserved memory area is used only after the system crash, |
| 145 | there is no point in blocking this significant chunk of memory from |
| 146 | production kernel. Hence, the implementation uses the Linux kernel's |
| 147 | Contiguous Memory Allocator (CMA) for memory reservation if CMA is |
| 148 | configured for kernel. With CMA reservation this memory will be |
| 149 | available for applications to use it, while kernel is prevented from |
| 150 | using it. With this FADump will still be able to capture all of the |
| 151 | kernel memory and most of the user space memory except the user pages |
| 152 | that were present in CMA region:: |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 153 | |
| 154 | o Memory Reservation during first kernel |
| 155 | |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 156 | Low memory Top of memory |
| 157 | 0 boot memory size |<--- Reserved dump area --->| | |
| 158 | | | | Permanent Reservation | | |
| 159 | V V | | V |
| 160 | +-----------+-----/ /---+---+----+-------+-----+-----+----+--+ |
| 161 | | | |///|////| DUMP | HDR | ELF |////| | |
| 162 | +-----------+-----/ /---+---+----+-------+-----+-----+----+--+ |
| 163 | | ^ ^ ^ ^ ^ |
| 164 | | | | | | | |
| 165 | \ CPU HPTE / | | |
| 166 | ------------------------------ | | |
| 167 | Boot memory content gets transferred | | |
| 168 | to reserved area by firmware at the | | |
| 169 | time of crash. | | |
| 170 | FADump Header | |
| 171 | (meta area) | |
| 172 | | |
| 173 | | |
| 174 | Metadata: This area holds a metadata struture whose |
| 175 | address is registered with f/w and retrieved in the |
| 176 | second kernel after crash, on platforms that support |
| 177 | tags (OPAL). Having such structure with info needed |
| 178 | to process the crashdump eases dump capture process. |
Hari Bathini | 8468d15 | 2019-09-11 20:17:07 +0530 | [diff] [blame] | 179 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 180 | Fig. 1 |
| 181 | |
Hari Bathini | 8468d15 | 2019-09-11 20:17:07 +0530 | [diff] [blame] | 182 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 183 | o Memory Reservation during second kernel after crash |
| 184 | |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 185 | Low memory Top of memory |
| 186 | 0 boot memory size | |
| 187 | | |<------------ Crash preserved area ------------>| |
| 188 | V V |<--- Reserved dump area --->| | |
| 189 | +-----------+-----/ /---+---+----+-------+-----+-----+----+--+ |
| 190 | | | |///|////| DUMP | HDR | ELF |////| | |
| 191 | +-----------+-----/ /---+---+----+-------+-----+-----+----+--+ |
| 192 | | | |
| 193 | V V |
| 194 | Used by second /proc/vmcore |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 195 | kernel to boot |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 196 | |
| 197 | +---+ |
| 198 | |///| -> Regions (CPU, HPTE & Metadata) marked like this in the above |
| 199 | +---+ figures are not always present. For example, OPAL platform |
| 200 | does not have CPU & HPTE regions while Metadata region is |
| 201 | not supported on pSeries currently. |
| 202 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 203 | Fig. 2 |
| 204 | |
Hari Bathini | fbcafda | 2019-09-11 20:23:53 +0530 | [diff] [blame] | 205 | |
Hari Bathini | 8468d15 | 2019-09-11 20:17:07 +0530 | [diff] [blame] | 206 | Currently the dump will be copied from /proc/vmcore to a new file upon |
| 207 | user intervention. The dump data available through /proc/vmcore will be |
| 208 | in ELF format. Hence the existing kdump infrastructure (kdump scripts) |
| 209 | to save the dump works fine with minor modifications. KDump scripts on |
| 210 | major Distro releases have already been modified to work seemlessly (no |
| 211 | user intervention in saving the dump) when FADump is used, instead of |
| 212 | KDump, as dump mechanism. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 213 | |
| 214 | The tools to examine the dump will be same as the ones |
| 215 | used for kdump. |
| 216 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 217 | How to enable firmware-assisted dump (FADump): |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 218 | ---------------------------------------------- |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 219 | |
| 220 | 1. Set config option CONFIG_FA_DUMP=y and build kernel. |
| 221 | 2. Boot into linux kernel with 'fadump=on' kernel cmdline option. |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 222 | By default, FADump reserved memory will be initialized as CMA area. |
Mahesh Salgaonkar | a4e92ce | 2018-08-20 13:47:17 +0530 | [diff] [blame] | 223 | Alternatively, user can boot linux kernel with 'fadump=nocma' to |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 224 | prevent FADump to use CMA. |
Hari Bathini | 92019ef | 2017-05-08 15:56:31 -0700 | [diff] [blame] | 225 | 3. Optionally, user can also set 'crashkernel=' kernel cmdline |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 226 | to specify size of the memory to reserve for boot memory dump |
| 227 | preservation. |
| 228 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 229 | NOTE: |
| 230 | 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead |
| 231 | use 'crashkernel=' to specify size of the memory to reserve |
| 232 | for boot memory dump preservation. |
| 233 | 2. If firmware-assisted dump fails to reserve memory then it |
| 234 | will fallback to existing kdump mechanism if 'crashkernel=' |
| 235 | option is set at kernel cmdline. |
| 236 | 3. if user wants to capture all of user space memory and ok with |
| 237 | reserved memory not available to production system, then |
| 238 | 'fadump=nocma' kernel parameter can be used to fallback to |
| 239 | old behaviour. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 240 | |
| 241 | Sysfs/debugfs files: |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 242 | -------------------- |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 243 | |
| 244 | Firmware-assisted dump feature uses sysfs file system to hold |
| 245 | the control files and debugfs file to display memory reserved region. |
| 246 | |
| 247 | Here is the list of files under kernel sysfs: |
| 248 | |
| 249 | /sys/kernel/fadump_enabled |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 250 | This is used to display the FADump status. |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 251 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 252 | - 0 = FADump is disabled |
| 253 | - 1 = FADump is enabled |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 254 | |
| 255 | This interface can be used by kdump init scripts to identify if |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 256 | FADump is enabled in the kernel and act accordingly. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 257 | |
| 258 | /sys/kernel/fadump_registered |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 259 | This is used to display the FADump registration status as well |
| 260 | as to control (start/stop) the FADump registration. |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 261 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 262 | - 0 = FADump is not registered. |
| 263 | - 1 = FADump is registered and ready to handle system crash. |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 264 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 265 | To register FADump echo 1 > /sys/kernel/fadump_registered and |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 266 | echo 0 > /sys/kernel/fadump_registered for un-register and stop the |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 267 | FADump. Once the FADump is un-registered, the system crash will not |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 268 | be handled and vmcore will not be captured. This interface can be |
| 269 | easily integrated with kdump service start/stop. |
| 270 | |
Sourabh Jain | d8e7345 | 2019-12-11 21:39:10 +0530 | [diff] [blame] | 271 | /sys/kernel/fadump/mem_reserved |
| 272 | |
| 273 | This is used to display the memory reserved by FADump for saving the |
| 274 | crash dump. |
| 275 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 276 | /sys/kernel/fadump_release_mem |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 277 | This file is available only when FADump is active during |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 278 | second kernel. This is used to release the reserved memory |
| 279 | region that are held for saving crash dump. To release the |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 280 | reserved memory echo 1 to it:: |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 281 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 282 | echo 1 > /sys/kernel/fadump_release_mem |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 283 | |
| 284 | After echo 1, the content of the /sys/kernel/debug/powerpc/fadump_region |
| 285 | file will change to reflect the new memory reservations. |
| 286 | |
| 287 | The existing userspace tools (kdump infrastructure) can be easily |
| 288 | enhanced to use this interface to release the memory reserved for |
| 289 | dump and continue without 2nd reboot. |
| 290 | |
Sourabh Jain | 8852c07 | 2019-12-11 21:39:08 +0530 | [diff] [blame] | 291 | Note: /sys/kernel/fadump_release_opalcore sysfs has moved to |
| 292 | /sys/firmware/opal/mpipl/release_core |
| 293 | |
| 294 | /sys/firmware/opal/mpipl/release_core |
Hari Bathini | b3bba79 | 2019-09-11 20:27:12 +0530 | [diff] [blame] | 295 | |
| 296 | This file is available only on OPAL based machines when FADump is |
| 297 | active during capture kernel. This is used to release the memory |
Sourabh Jain | 8852c07 | 2019-12-11 21:39:08 +0530 | [diff] [blame] | 298 | used by the kernel to export /sys/firmware/opal/mpipl/core file. To |
Hari Bathini | b3bba79 | 2019-09-11 20:27:12 +0530 | [diff] [blame] | 299 | release this memory, echo '1' to it: |
| 300 | |
Sourabh Jain | 8852c07 | 2019-12-11 21:39:08 +0530 | [diff] [blame] | 301 | echo 1 > /sys/firmware/opal/mpipl/release_core |
Hari Bathini | b3bba79 | 2019-09-11 20:27:12 +0530 | [diff] [blame] | 302 | |
Sourabh Jain | 3f5f1f2 | 2019-12-11 21:39:09 +0530 | [diff] [blame] | 303 | Note: The following FADump sysfs files are deprecated. |
| 304 | |
| 305 | +----------------------------------+--------------------------------+ |
| 306 | | Deprecated | Alternative | |
| 307 | +----------------------------------+--------------------------------+ |
| 308 | | /sys/kernel/fadump_enabled | /sys/kernel/fadump/enabled | |
| 309 | +----------------------------------+--------------------------------+ |
| 310 | | /sys/kernel/fadump_registered | /sys/kernel/fadump/registered | |
| 311 | +----------------------------------+--------------------------------+ |
| 312 | | /sys/kernel/fadump_release_mem | /sys/kernel/fadump/release_mem | |
| 313 | +----------------------------------+--------------------------------+ |
| 314 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 315 | Here is the list of files under powerpc debugfs: |
| 316 | (Assuming debugfs is mounted on /sys/kernel/debug directory.) |
| 317 | |
| 318 | /sys/kernel/debug/powerpc/fadump_region |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 319 | This file shows the reserved memory regions if FADump is |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 320 | enabled otherwise this file is empty. The output format |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 321 | is:: |
| 322 | |
| 323 | <region>: [<start>-<end>] <reserved-size> bytes, Dumped: <dump-size> |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 324 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 325 | and for kernel DUMP region is: |
| 326 | |
| 327 | DUMP: Src: <src-addr>, Dest: <dest-addr>, Size: <size>, Dumped: # bytes |
| 328 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 329 | e.g. |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 330 | Contents when FADump is registered during first kernel:: |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 331 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 332 | # cat /sys/kernel/debug/powerpc/fadump_region |
| 333 | CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x0 |
| 334 | HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x0 |
| 335 | DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x0 |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 336 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 337 | Contents when FADump is active during second kernel:: |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 338 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 339 | # cat /sys/kernel/debug/powerpc/fadump_region |
| 340 | CPU : [0x0000006ffb0000-0x0000006fff001f] 0x40020 bytes, Dumped: 0x40020 |
| 341 | HPTE: [0x0000006fff0020-0x0000006fff101f] 0x1000 bytes, Dumped: 0x1000 |
| 342 | DUMP: [0x0000006fff1020-0x0000007fff101f] 0x10000000 bytes, Dumped: 0x10000000 |
| 343 | : [0x00000010000000-0x0000006ffaffff] 0x5ffb0000 bytes, Dumped: 0x5ffb0000 |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 344 | |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 345 | |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 346 | NOTE: |
Mauro Carvalho Chehab | 0c1bc6b | 2020-04-14 18:48:37 +0200 | [diff] [blame] | 347 | Please refer to Documentation/filesystems/debugfs.rst on |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 348 | how to mount the debugfs filesystem. |
| 349 | |
| 350 | |
| 351 | TODO: |
| 352 | ----- |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 353 | - Need to come up with the better approach to find out more |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 354 | accurate boot memory size that is required for a kernel to |
| 355 | boot successfully when booted with restricted memory. |
Hari Bathini | 1679b96 | 2019-09-11 20:19:58 +0530 | [diff] [blame] | 356 | - The FADump implementation introduces a FADump crash info structure |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 357 | in the scratch area before the ELF core header. The idea of introducing |
| 358 | this structure is to pass some important crash info data to the second |
| 359 | kernel which will help second kernel to populate ELF core header with |
| 360 | correct data before it gets exported through /proc/vmcore. The current |
| 361 | design implementation does not address a possibility of introducing |
| 362 | additional fields (in future) to this structure without affecting |
| 363 | compatibility. Need to come up with the better approach to address this. |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 364 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 365 | The possible approaches are: |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 366 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 367 | 1. Introduce version field for version tracking, bump up the version |
| 368 | whenever a new field is added to the structure in future. The version |
| 369 | field can be used to find out what fields are valid for the current |
| 370 | version of the structure. |
| 371 | 2. Reserve the area of predefined size (say PAGE_SIZE) for this |
| 372 | structure and have unused area as reserved (initialized to zero) |
| 373 | for future field additions. |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 374 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 375 | The advantage of approach 1 over 2 is we don't need to reserve extra space. |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 376 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 377 | Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 378 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 379 | This document is based on the original documentation written for phyp |
Mauro Carvalho Chehab | 4d2e26a | 2019-04-10 08:32:42 -0300 | [diff] [blame] | 380 | |
Mahesh Salgaonkar | 8e0aa6d | 2012-02-16 01:14:14 +0000 | [diff] [blame] | 381 | assisted dump by Linas Vepstas and Manish Ahuja. |