blob: 1f5f7d28c9e6bf54e2b95d97f9399aed324b177f [file] [log] [blame]
Vivek Goyalb089f4a2005-06-25 14:58:15 -07001Documentation for kdump - the kexec-based crash dumping solution
2================================================================
3
4DESIGN
5======
6
7Kdump uses kexec to reboot to a second kernel whenever a dump needs to be taken.
8This second kernel is booted with very little memory. The first kernel reserves
9the section of memory that the second kernel uses. This ensures that on-going
10DMA from the first kernel does not corrupt the second kernel.
11
12All the necessary information about Core image is encoded in ELF format and
13stored in reserved area of memory before crash. Physical address of start of
14ELF header is passed to new kernel through command line parameter elfcorehdr=.
15
16On i386, the first 640 KB of physical memory is needed to boot, irrespective
17of where the kernel loads. Hence, this region is backed up by kexec just before
18rebooting into the new kernel.
19
20In the second kernel, "old memory" can be accessed in two ways.
21
22- The first one is through a /dev/oldmem device interface. A capture utility
23 can read the device file and write out the memory in raw format. This is raw
24 dump of memory and analysis/capture tool should be intelligent enough to
25 determine where to look for the right information. ELF headers (elfcorehdr=)
26 can become handy here.
27
28- The second interface is through /proc/vmcore. This exports the dump as an ELF
29 format file which can be written out using any file copy command
30 (cp, scp, etc). Further, gdb can be used to perform limited debugging on
31 the dump file. This method ensures methods ensure that there is correct
32 ordering of the dump pages (corresponding to the first 640 KB that has been
33 relocated).
34
35SETUP
36=====
37
381) Download http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
39 and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch
40 and after that build the source.
41
Vivek Goyal952b6492005-09-09 13:10:19 -0700422) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernel.
Vivek Goyalb089f4a2005-06-25 14:58:15 -070043
44 Two kernels need to be built in order to get this feature working.
45
46 A) First kernel:
47 a) Enable "kexec system call" feature (in Processor type and features).
48 CONFIG_KEXEC=y
49 b) This kernel's physical load address should be the default value of
50 0x100000 (0x100000, 1 MB) (in Processor type and features).
51 CONFIG_PHYSICAL_START=0x100000
52 c) Enable "sysfs file system support" (in Pseudo filesystems).
53 CONFIG_SYSFS=y
54 d) Boot into first kernel with the command line parameter "crashkernel=Y@X".
55 Use appropriate values for X and Y. Y denotes how much memory to reserve
56 for the second kernel, and X denotes at what physical address the reserved
57 memory section starts. For example: "crashkernel=64M@16M".
58
59 B) Second kernel:
60 a) Enable "kernel crash dumps" feature (in Processor type and features).
61 CONFIG_CRASH_DUMP=y
62 b) Specify a suitable value for "Physical address where the kernel is
63 loaded" (in Processor type and features). Typically this value
64 should be same as X (See option d) above, e.g., 16 MB or 0x1000000.
65 CONFIG_PHYSICAL_START=0x1000000
66 c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems).
67 CONFIG_PROC_VMCORE=y
Vivek Goyald58831e2005-06-25 14:58:17 -070068 d) Disable SMP support and build a UP kernel (Until it is fixed).
69 CONFIG_SMP=n
70 e) Enable "Local APIC support on uniprocessors".
71 CONFIG_X86_UP_APIC=y
72 f) Enable "IO-APIC support on uniprocessors"
73 CONFIG_X86_UP_IOAPIC=y
Vivek Goyalb089f4a2005-06-25 14:58:15 -070074
Vivek Goyald58831e2005-06-25 14:58:17 -070075 Note: i) Options a) and b) depend upon "Configure standard kernel features
76 (for small systems)" (under General setup).
77 ii) Option a) also depends on CONFIG_HIGHMEM (under Processor
Vivek Goyalb089f4a2005-06-25 14:58:15 -070078 type and features).
Vivek Goyald58831e2005-06-25 14:58:17 -070079 iii) Both option a) and b) are under "Processor type and features".
Vivek Goyalb089f4a2005-06-25 14:58:15 -070080
813) Boot into the first kernel. You are now ready to try out kexec-based crash
82 dumps.
83
844) Load the second kernel to be booted using:
85
Vivek Goyal952b6492005-09-09 13:10:19 -070086 kexec -p <second-kernel> --args-linux --elf32-core-headers
87 --append="root=<root-dev> init 1 irqpoll"
Vivek Goyalb089f4a2005-06-25 14:58:15 -070088
89 Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work,
90 as of now.
Vivek Goyal952b6492005-09-09 13:10:19 -070091 ii) By default ELF headers are stored in ELF64 format. Option
92 --elf32-core-headers forces generation of ELF32 headers. gdb can
93 not open ELF64 headers on 32 bit systems. So creating ELF32
94 headers can come handy for users who have got non-PAE systems and
95 hence have memory less than 4GB.
Vivek Goyald58831e2005-06-25 14:58:17 -070096 iii) Specify "irqpoll" as command line parameter. This reduces driver
97 initialization failures in second kernel due to shared interrupts.
Vivek Goyalb089f4a2005-06-25 14:58:15 -070098
995) System reboots into the second kernel when a panic occurs. A module can be
Vivek Goyald58831e2005-06-25 14:58:17 -0700100 written to force the panic or "ALT-SysRq-c" can be used initiate a crash
101 dump for testing purposes.
Vivek Goyalb089f4a2005-06-25 14:58:15 -0700102
1036) Write out the dump file using
104
105 cp /proc/vmcore <dump-file>
106
107 Dump memory can also be accessed as a /dev/oldmem device for a linear/raw
108 view. To create the device, type:
109
110 mknod /dev/oldmem c 1 12
111
112 Use "dd" with suitable options for count, bs and skip to access specific
113 portions of the dump.
114
115 Entire memory: dd if=/dev/oldmem of=oldmem.001
116
117ANALYSIS
118========
119
120Limited analysis can be done using gdb on the dump file copied out of
121/proc/vmcore. Use vmlinux built with -g and run
122
123 gdb vmlinux <dump-file>
124
125Stack trace for the task on processor 0, register display, memory display
126work fine.
127
128Note: gdb cannot analyse core files generated in ELF64 format for i386.
129
130TODO
131====
132
1331) Provide a kernel pages filtering mechanism so that core file size is not
134 insane on systems having huge memory banks.
1352) Modify "crash" tool to make it recognize this dump.
136
137CONTACT
138=======
139
Vivek Goyalb089f4a2005-06-25 14:58:15 -0700140Vivek Goyal (vgoyal@in.ibm.com)
Vivek Goyald58831e2005-06-25 14:58:17 -0700141Maneesh Soni (maneesh@in.ibm.com)