Lianbo Jiang | f263245 | 2019-01-10 20:19:43 +0800 | [diff] [blame] | 1 | ================================================================ |
| 2 | VMCOREINFO |
| 3 | ================================================================ |
| 4 | |
| 5 | =========== |
| 6 | What is it? |
| 7 | =========== |
| 8 | |
| 9 | VMCOREINFO is a special ELF note section. It contains various |
| 10 | information from the kernel like structure size, page size, symbol |
| 11 | values, field offsets, etc. These data are packed into an ELF note |
| 12 | section and used by user-space tools like crash and makedumpfile to |
| 13 | analyze a kernel's memory layout. |
| 14 | |
| 15 | ================ |
| 16 | Common variables |
| 17 | ================ |
| 18 | |
| 19 | init_uts_ns.name.release |
| 20 | ------------------------ |
| 21 | |
| 22 | The version of the Linux kernel. Used to find the corresponding source |
| 23 | code from which the kernel has been built. For example, crash uses it to |
| 24 | find the corresponding vmlinux in order to process vmcore. |
| 25 | |
| 26 | PAGE_SIZE |
| 27 | --------- |
| 28 | |
| 29 | The size of a page. It is the smallest unit of data used by the memory |
| 30 | management facilities. It is usually 4096 bytes of size and a page is |
| 31 | aligned on 4096 bytes. Used for computing page addresses. |
| 32 | |
| 33 | init_uts_ns |
| 34 | ----------- |
| 35 | |
| 36 | The UTS namespace which is used to isolate two specific elements of the |
| 37 | system that relate to the uname(2) system call. It is named after the |
| 38 | data structure used to store information returned by the uname(2) system |
| 39 | call. |
| 40 | |
| 41 | User-space tools can get the kernel name, host name, kernel release |
| 42 | number, kernel version, architecture name and OS type from it. |
| 43 | |
| 44 | node_online_map |
| 45 | --------------- |
| 46 | |
| 47 | An array node_states[N_ONLINE] which represents the set of online nodes |
| 48 | in a system, one bit position per node number. Used to keep track of |
| 49 | which nodes are in the system and online. |
| 50 | |
| 51 | swapper_pg_dir |
| 52 | ------------- |
| 53 | |
| 54 | The global page directory pointer of the kernel. Used to translate |
| 55 | virtual to physical addresses. |
| 56 | |
| 57 | _stext |
| 58 | ------ |
| 59 | |
| 60 | Defines the beginning of the text section. In general, _stext indicates |
| 61 | the kernel start address. Used to convert a virtual address from the |
| 62 | direct kernel map to a physical address. |
| 63 | |
| 64 | vmap_area_list |
| 65 | -------------- |
| 66 | |
| 67 | Stores the virtual area list. makedumpfile gets the vmalloc start value |
| 68 | from this variable and its value is necessary for vmalloc translation. |
| 69 | |
| 70 | mem_map |
| 71 | ------- |
| 72 | |
| 73 | Physical addresses are translated to struct pages by treating them as |
| 74 | an index into the mem_map array. Right-shifting a physical address |
| 75 | PAGE_SHIFT bits converts it into a page frame number which is an index |
| 76 | into that mem_map array. |
| 77 | |
| 78 | Used to map an address to the corresponding struct page. |
| 79 | |
| 80 | contig_page_data |
| 81 | ---------------- |
| 82 | |
| 83 | Makedumpfile gets the pglist_data structure from this symbol, which is |
| 84 | used to describe the memory layout. |
| 85 | |
| 86 | User-space tools use this to exclude free pages when dumping memory. |
| 87 | |
| 88 | mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map) |
| 89 | -------------------------------------------------------------------------- |
| 90 | |
| 91 | The address of the mem_section array, its length, structure size, and |
| 92 | the section_mem_map offset. |
| 93 | |
| 94 | It exists in the sparse memory mapping model, and it is also somewhat |
| 95 | similar to the mem_map variable, both of them are used to translate an |
| 96 | address. |
| 97 | |
| 98 | page |
| 99 | ---- |
| 100 | |
| 101 | The size of a page structure. struct page is an important data structure |
| 102 | and it is widely used to compute contiguous memory. |
| 103 | |
| 104 | pglist_data |
| 105 | ----------- |
| 106 | |
| 107 | The size of a pglist_data structure. This value is used to check if the |
| 108 | pglist_data structure is valid. It is also used for checking the memory |
| 109 | type. |
| 110 | |
| 111 | zone |
| 112 | ---- |
| 113 | |
| 114 | The size of a zone structure. This value is used to check if the zone |
| 115 | structure has been found. It is also used for excluding free pages. |
| 116 | |
| 117 | free_area |
| 118 | --------- |
| 119 | |
| 120 | The size of a free_area structure. It indicates whether the free_area |
| 121 | structure is valid or not. Useful when excluding free pages. |
| 122 | |
| 123 | list_head |
| 124 | --------- |
| 125 | |
| 126 | The size of a list_head structure. Used when iterating lists in a |
| 127 | post-mortem analysis session. |
| 128 | |
| 129 | nodemask_t |
| 130 | ---------- |
| 131 | |
| 132 | The size of a nodemask_t type. Used to compute the number of online |
| 133 | nodes. |
| 134 | |
| 135 | (page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor| |
| 136 | compound_order|compound_head) |
| 137 | ------------------------------------------------------------------- |
| 138 | |
| 139 | User-space tools compute their values based on the offset of these |
| 140 | variables. The variables are used when excluding unnecessary pages. |
| 141 | |
| 142 | (pglist_data, node_zones|nr_zones|node_mem_map|node_start_pfn|node_ |
| 143 | spanned_pages|node_id) |
| 144 | ------------------------------------------------------------------- |
| 145 | |
| 146 | On NUMA machines, each NUMA node has a pg_data_t to describe its memory |
| 147 | layout. On UMA machines there is a single pglist_data which describes the |
| 148 | whole memory. |
| 149 | |
| 150 | These values are used to check the memory type and to compute the |
| 151 | virtual address for memory map. |
| 152 | |
| 153 | (zone, free_area|vm_stat|spanned_pages) |
| 154 | --------------------------------------- |
| 155 | |
| 156 | Each node is divided into a number of blocks called zones which |
| 157 | represent ranges within memory. A zone is described by a structure zone. |
| 158 | |
| 159 | User-space tools compute required values based on the offset of these |
| 160 | variables. |
| 161 | |
| 162 | (free_area, free_list) |
| 163 | ---------------------- |
| 164 | |
| 165 | Offset of the free_list's member. This value is used to compute the number |
| 166 | of free pages. |
| 167 | |
| 168 | Each zone has a free_area structure array called free_area[MAX_ORDER]. |
| 169 | The free_list represents a linked list of free page blocks. |
| 170 | |
| 171 | (list_head, next|prev) |
| 172 | ---------------------- |
| 173 | |
| 174 | Offsets of the list_head's members. list_head is used to define a |
| 175 | circular linked list. User-space tools need these in order to traverse |
| 176 | lists. |
| 177 | |
| 178 | (vmap_area, va_start|list) |
| 179 | -------------------------- |
| 180 | |
| 181 | Offsets of the vmap_area's members. They carry vmalloc-specific |
| 182 | information. Makedumpfile gets the start address of the vmalloc region |
| 183 | from this. |
| 184 | |
| 185 | (zone.free_area, MAX_ORDER) |
| 186 | --------------------------- |
| 187 | |
| 188 | Free areas descriptor. User-space tools use this value to iterate the |
| 189 | free_area ranges. MAX_ORDER is used by the zone buddy allocator. |
| 190 | |
| 191 | log_first_idx |
| 192 | ------------- |
| 193 | |
| 194 | Index of the first record stored in the buffer log_buf. Used by |
| 195 | user-space tools to read the strings in the log_buf. |
| 196 | |
| 197 | log_buf |
| 198 | ------- |
| 199 | |
| 200 | Console output is written to the ring buffer log_buf at index |
| 201 | log_first_idx. Used to get the kernel log. |
| 202 | |
| 203 | log_buf_len |
| 204 | ----------- |
| 205 | |
| 206 | log_buf's length. |
| 207 | |
| 208 | clear_idx |
| 209 | --------- |
| 210 | |
| 211 | The index that the next printk() record to read after the last clear |
| 212 | command. It indicates the first record after the last SYSLOG_ACTION |
| 213 | _CLEAR, like issued by 'dmesg -c'. Used by user-space tools to dump |
| 214 | the dmesg log. |
| 215 | |
| 216 | log_next_idx |
| 217 | ------------ |
| 218 | |
| 219 | The index of the next record to store in the buffer log_buf. Used to |
| 220 | compute the index of the current buffer position. |
| 221 | |
| 222 | printk_log |
| 223 | ---------- |
| 224 | |
| 225 | The size of a structure printk_log. Used to compute the size of |
| 226 | messages, and extract dmesg log. It encapsulates header information for |
| 227 | log_buf, such as timestamp, syslog level, etc. |
| 228 | |
| 229 | (printk_log, ts_nsec|len|text_len|dict_len) |
| 230 | ------------------------------------------- |
| 231 | |
| 232 | It represents field offsets in struct printk_log. User space tools |
| 233 | parse it and check whether the values of printk_log's members have been |
| 234 | changed. |
| 235 | |
| 236 | (free_area.free_list, MIGRATE_TYPES) |
| 237 | ------------------------------------ |
| 238 | |
| 239 | The number of migrate types for pages. The free_list is described by the |
| 240 | array. Used by tools to compute the number of free pages. |
| 241 | |
| 242 | NR_FREE_PAGES |
| 243 | ------------- |
| 244 | |
| 245 | On linux-2.6.21 or later, the number of free pages is in |
| 246 | vm_stat[NR_FREE_PAGES]. Used to get the number of free pages. |
| 247 | |
| 248 | PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision |
| 249 | |PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy) |
| 250 | |PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline) |
| 251 | ----------------------------------------------------------------- |
| 252 | |
| 253 | Page attributes. These flags are used to filter various unnecessary for |
| 254 | dumping pages. |
| 255 | |
| 256 | HUGETLB_PAGE_DTOR |
| 257 | ----------------- |
| 258 | |
| 259 | The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile |
| 260 | excludes these pages. |
| 261 | |
| 262 | ====== |
| 263 | x86_64 |
| 264 | ====== |
| 265 | |
| 266 | phys_base |
| 267 | --------- |
| 268 | |
| 269 | Used to convert the virtual address of an exported kernel symbol to its |
| 270 | corresponding physical address. |
| 271 | |
| 272 | init_top_pgt |
| 273 | ------------ |
| 274 | |
| 275 | Used to walk through the whole page table and convert virtual addresses |
| 276 | to physical addresses. The init_top_pgt is somewhat similar to |
| 277 | swapper_pg_dir, but it is only used in x86_64. |
| 278 | |
| 279 | pgtable_l5_enabled |
| 280 | ------------------ |
| 281 | |
| 282 | User-space tools need to know whether the crash kernel was in 5-level |
| 283 | paging mode. |
| 284 | |
| 285 | node_data |
| 286 | --------- |
| 287 | |
| 288 | This is a struct pglist_data array and stores all NUMA nodes |
| 289 | information. Makedumpfile gets the pglist_data structure from it. |
| 290 | |
| 291 | (node_data, MAX_NUMNODES) |
| 292 | ------------------------- |
| 293 | |
| 294 | The maximum number of nodes in system. |
| 295 | |
| 296 | KERNELOFFSET |
| 297 | ------------ |
| 298 | |
| 299 | The kernel randomization offset. Used to compute the page offset. If |
| 300 | KASLR is disabled, this value is zero. |
| 301 | |
| 302 | KERNEL_IMAGE_SIZE |
| 303 | ----------------- |
| 304 | |
| 305 | Currently unused by Makedumpfile. Used to compute the module virtual |
| 306 | address by Crash. |
| 307 | |
| 308 | sme_mask |
| 309 | -------- |
| 310 | |
| 311 | AMD-specific with SME support: it indicates the secure memory encryption |
| 312 | mask. Makedumpfile tools need to know whether the crash kernel was |
| 313 | encrypted. If SME is enabled in the first kernel, the crash kernel's |
| 314 | page table entries (pgd/pud/pmd/pte) contain the memory encryption |
| 315 | mask. This is used to remove the SME mask and obtain the true physical |
| 316 | address. |
| 317 | |
| 318 | Currently, sme_mask stores the value of the C-bit position. If needed, |
| 319 | additional SME-relevant info can be placed in that variable. |
| 320 | |
| 321 | For example: |
| 322 | [ misc ][ enc bit ][ other misc SME info ] |
| 323 | 0000_0000_0000_0000_1000_0000_0000_0000_0000_0000_..._0000 |
| 324 | 63 59 55 51 47 43 39 35 31 27 ... 3 |
| 325 | |
| 326 | ====== |
| 327 | x86_32 |
| 328 | ====== |
| 329 | |
| 330 | X86_PAE |
| 331 | ------- |
| 332 | |
| 333 | Denotes whether physical address extensions are enabled. It has the cost |
| 334 | of a higher page table lookup overhead, and also consumes more page |
| 335 | table space per process. Used to check whether PAE was enabled in the |
| 336 | crash kernel when converting virtual addresses to physical addresses. |
| 337 | |
| 338 | ==== |
| 339 | ia64 |
| 340 | ==== |
| 341 | |
| 342 | pgdat_list|(pgdat_list, MAX_NUMNODES) |
| 343 | ------------------------------------- |
| 344 | |
| 345 | pg_data_t array storing all NUMA nodes information. MAX_NUMNODES |
| 346 | indicates the number of the nodes. |
| 347 | |
| 348 | node_memblk|(node_memblk, NR_NODE_MEMBLKS) |
| 349 | ------------------------------------------ |
| 350 | |
| 351 | List of node memory chunks. Filled when parsing the SRAT table to obtain |
| 352 | information about memory nodes. NR_NODE_MEMBLKS indicates the number of |
| 353 | node memory chunks. |
| 354 | |
| 355 | These values are used to compute the number of nodes the crashed kernel used. |
| 356 | |
| 357 | node_memblk_s|(node_memblk_s, start_paddr)|(node_memblk_s, size) |
| 358 | ---------------------------------------------------------------- |
| 359 | |
| 360 | The size of a struct node_memblk_s and the offsets of the |
| 361 | node_memblk_s's members. Used to compute the number of nodes. |
| 362 | |
| 363 | PGTABLE_3|PGTABLE_4 |
| 364 | ------------------- |
| 365 | |
| 366 | User-space tools need to know whether the crash kernel was in 3-level or |
| 367 | 4-level paging mode. Used to distinguish the page table. |
| 368 | |
| 369 | ===== |
| 370 | ARM64 |
| 371 | ===== |
| 372 | |
| 373 | VA_BITS |
| 374 | ------- |
| 375 | |
| 376 | The maximum number of bits for virtual addresses. Used to compute the |
| 377 | virtual memory ranges. |
| 378 | |
| 379 | kimage_voffset |
| 380 | -------------- |
| 381 | |
| 382 | The offset between the kernel virtual and physical mappings. Used to |
| 383 | translate virtual to physical addresses. |
| 384 | |
| 385 | PHYS_OFFSET |
| 386 | ----------- |
| 387 | |
| 388 | Indicates the physical address of the start of memory. Similar to |
| 389 | kimage_voffset, which is used to translate virtual to physical |
| 390 | addresses. |
| 391 | |
| 392 | KERNELOFFSET |
| 393 | ------------ |
| 394 | |
| 395 | The kernel randomization offset. Used to compute the page offset. If |
| 396 | KASLR is disabled, this value is zero. |
| 397 | |
| 398 | ==== |
| 399 | arm |
| 400 | ==== |
| 401 | |
| 402 | ARM_LPAE |
| 403 | -------- |
| 404 | |
| 405 | It indicates whether the crash kernel supports large physical address |
| 406 | extensions. Used to translate virtual to physical addresses. |
| 407 | |
| 408 | ==== |
| 409 | s390 |
| 410 | ==== |
| 411 | |
| 412 | lowcore_ptr |
| 413 | ---------- |
| 414 | |
| 415 | An array with a pointer to the lowcore of every CPU. Used to print the |
| 416 | psw and all registers information. |
| 417 | |
| 418 | high_memory |
| 419 | ----------- |
| 420 | |
| 421 | Used to get the vmalloc_start address from the high_memory symbol. |
| 422 | |
| 423 | (lowcore_ptr, NR_CPUS) |
| 424 | ---------------------- |
| 425 | |
| 426 | The maximum number of CPUs. |
| 427 | |
| 428 | ======= |
| 429 | powerpc |
| 430 | ======= |
| 431 | |
| 432 | |
| 433 | node_data|(node_data, MAX_NUMNODES) |
| 434 | ----------------------------------- |
| 435 | |
| 436 | See above. |
| 437 | |
| 438 | contig_page_data |
| 439 | ---------------- |
| 440 | |
| 441 | See above. |
| 442 | |
| 443 | vmemmap_list |
| 444 | ------------ |
| 445 | |
| 446 | The vmemmap_list maintains the entire vmemmap physical mapping. Used |
| 447 | to get vmemmap list count and populated vmemmap regions info. If the |
| 448 | vmemmap address translation information is stored in the crash kernel, |
| 449 | it is used to translate vmemmap kernel virtual addresses. |
| 450 | |
| 451 | mmu_vmemmap_psize |
| 452 | ----------------- |
| 453 | |
| 454 | The size of a page. Used to translate virtual to physical addresses. |
| 455 | |
| 456 | mmu_psize_defs |
| 457 | -------------- |
| 458 | |
| 459 | Page size definitions, i.e. 4k, 64k, or 16M. |
| 460 | |
| 461 | Used to make vtop translations. |
| 462 | |
| 463 | vmemmap_backing|(vmemmap_backing, list)|(vmemmap_backing, phys)| |
| 464 | (vmemmap_backing, virt_addr) |
| 465 | ---------------------------------------------------------------- |
| 466 | |
| 467 | The vmemmap virtual address space management does not have a traditional |
| 468 | page table to track which virtual struct pages are backed by a physical |
| 469 | mapping. The virtual to physical mappings are tracked in a simple linked |
| 470 | list format. |
| 471 | |
| 472 | User-space tools need to know the offset of list, phys and virt_addr |
| 473 | when computing the count of vmemmap regions. |
| 474 | |
| 475 | mmu_psize_def|(mmu_psize_def, shift) |
| 476 | ------------------------------------ |
| 477 | |
| 478 | The size of a struct mmu_psize_def and the offset of mmu_psize_def's |
| 479 | member. |
| 480 | |
| 481 | Used in vtop translations. |
| 482 | |
| 483 | == |
| 484 | sh |
| 485 | == |
| 486 | |
| 487 | node_data|(node_data, MAX_NUMNODES) |
| 488 | ----------------------------------- |
| 489 | |
| 490 | See above. |
| 491 | |
| 492 | X2TLB |
| 493 | ----- |
| 494 | |
| 495 | Indicates whether the crashed kernel enabled SH extended mode. |