blob: 69862e759c308f1fb29d3b566ace91baa89a4539 [file] [log] [blame]
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -03001=================================
Rafael J. Wysockice2b7142007-11-19 23:43:34 +01002Debugging hibernation and suspend
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -03003=================================
4
Rafael J. Wysocki5b795202007-05-08 00:24:07 -07005 (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
6
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010071. Testing hibernation (aka suspend to disk or STD)
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -03008===================================================
Rafael J. Wysocki5b795202007-05-08 00:24:07 -07009
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030010To check if hibernation works, you can try to hibernate in the "reboot" mode::
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070011
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030012 # echo reboot > /sys/power/disk
13 # echo disk > /sys/power/state
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070014
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010015and the system should create a hibernation image, reboot, resume and get back to
16the command prompt where you have started the transition. If that happens,
17hibernation is most likely to work correctly. Still, you need to repeat the
18test at least a couple of times in a row for confidence. [This is necessary,
19because some problems only show up on a second attempt at suspending and
20resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
21modes causes the PM core to skip some platform-related callbacks which on ACPI
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030022systems might be necessary to make hibernation work. Thus, if your machine
23fails to hibernate or resume in the "reboot" mode, you should try the
24"platform" mode::
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070025
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030026 # echo platform > /sys/power/disk
27 # echo disk > /sys/power/state
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070028
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010029which is the default and recommended mode of hibernation.
30
31Unfortunately, the "platform" mode of hibernation does not work on some systems
32with broken BIOSes. In such cases the "shutdown" mode of hibernation might
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030033work::
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070034
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030035 # echo shutdown > /sys/power/disk
36 # echo disk > /sys/power/state
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070037
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010038(it is similar to the "reboot" mode, but it requires you to press the power
39button to make the system resume).
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070040
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010041If neither "platform" nor "shutdown" hibernation mode works, you will need to
42identify what goes wrong.
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070043
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010044a) Test modes of hibernation
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030045----------------------------
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070046
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010047To find out why hibernation fails on your system, you can use a special testing
48facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
49there is the file /sys/power/pm_test that can be used to make the hibernation
50core run in a test mode. There are 5 test modes available:
51
52freezer
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030053 - test the freezing of processes
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010054
55devices
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030056 - test the freezing of processes and suspending of devices
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010057
58platform
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030059 - test the freezing of processes, suspending of devices and platform
60 global control methods [1]_
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010061
62processors
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030063 - test the freezing of processes, suspending of devices, platform
64 global control methods [1]_ and the disabling of nonboot CPUs
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010065
66core
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030067 - test the freezing of processes, suspending of devices, platform global
68 control methods\ [1]_, the disabling of nonboot CPUs and suspending
69 of platform/system devices
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010070
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030071.. [1]
72
73 the platform global control methods are only available on ACPI systems
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010074 and are only tested if the hibernation mode is set to "platform"
75
76To use one of them it is necessary to write the corresponding string to
77/sys/power/pm_test (eg. "devices" to test the freezing of processes and
78suspending devices) and issue the standard hibernation commands. For example,
79to use the "devices" test mode along with the "platform" mode of hibernation,
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030080you should do the following::
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010081
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -030082 # echo devices > /sys/power/pm_test
83 # echo platform > /sys/power/disk
84 # echo disk > /sys/power/state
Rafael J. Wysocki5b795202007-05-08 00:24:07 -070085
Brian Norris1d4a9c12015-02-22 21:16:49 -080086Then, the kernel will try to freeze processes, suspend devices, wait a few
87seconds (5 by default, but configurable by the suspend.pm_test_delay module
88parameter), resume devices and thaw processes. If "platform" is written to
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010089/sys/power/pm_test , then after suspending devices the kernel will additionally
90invoke the global control methods (eg. ACPI global control methods) used to
Brian Norris1d4a9c12015-02-22 21:16:49 -080091prepare the platform firmware for hibernation. Next, it will wait a
92configurable number of seconds and invoke the platform (eg. ACPI) global
93methods used to cancel hibernation etc.
Rafael J. Wysockice2b7142007-11-19 23:43:34 +010094
95Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal
96hibernation/suspend operations. Also, when open for reading, /sys/power/pm_test
97contains a space-separated list of all available tests (including "none" that
98represents the normal functionality) in which the current test level is
99indicated by square brackets.
100
101Generally, as you can see, each test level is more "invasive" than the previous
102one and the "core" level tests the hardware and drivers as deeply as possible
103without creating a hibernation image. Obviously, if the "devices" test fails,
104the "platform" test will fail as well and so on. Thus, as a rule of thumb, you
105should try the test modes starting from "freezer", through "devices", "platform"
106and "processors" up to "core" (repeat the test on each level a couple of times
107to make sure that any random factors are avoided).
108
109If the "freezer" test fails, there is a task that cannot be frozen (in that case
110it usually is possible to identify the offending task by analysing the output of
111dmesg obtained after the failing test). Failure at this level usually means
112that there is a problem with the tasks freezer subsystem that should be
113reported.
114
115If the "devices" test fails, most likely there is a driver that cannot suspend
116or resume its device (in the latter case the system may hang or become unstable
117after the test, so please take that into consideration). To find this driver,
118you can carry out a binary search according to the rules:
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300119
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700120- if the test fails, unload a half of the drivers currently loaded and repeat
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300121 (that would probably involve rebooting the system, so always note what drivers
122 have been loaded before the test),
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700123- if the test succeeds, load a half of the drivers you have unloaded most
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300124 recently and repeat.
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700125
126Once you have found the failing driver (there can be more than just one of
Rafael J. Wysockice2b7142007-11-19 23:43:34 +0100127them), you have to unload it every time before hibernation. In that case please
128make sure to report the problem with the driver.
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700129
Rafael J. Wysockice2b7142007-11-19 23:43:34 +0100130It is also possible that the "devices" test will still fail after you have
131unloaded all modules. In that case, you may want to look in your kernel
132configuration for the drivers that can be compiled as modules (and test again
133with these drivers compiled as modules). You may also try to use some special
134kernel command line options such as "noapic", "noacpi" or even "acpi=off".
135
136If the "platform" test fails, there is a problem with the handling of the
137platform (eg. ACPI) firmware on your system. In that case the "platform" mode
138of hibernation is not likely to work. You can try the "shutdown" mode, but that
139is rather a poor man's workaround.
140
141If the "processors" test fails, the disabling/enabling of nonboot CPUs does not
142work (of course, this only may be an issue on SMP systems) and the problem
143should be reported. In that case you can also try to switch the nonboot CPUs
144off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
145see if that works.
146
147If the "core" test fails, which means that suspending of the system/platform
148devices has failed (these devices are suspended on one CPU with interrupts off),
149the problem is most probably hardware-related and serious, so it should be
150reported.
151
152A failure of any of the "platform", "processors" or "core" tests may cause your
153system to hang or become unstable, so please beware. Such a failure usually
154indicates a serious problem that very well may be related to the hardware, but
155please report it anyway.
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700156
157b) Testing minimal configuration
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300158--------------------------------
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700159
Rafael J. Wysockice2b7142007-11-19 23:43:34 +0100160If all of the hibernation test modes work, you can boot the system with the
161"init=/bin/bash" command line parameter and attempt to hibernate in the
162"reboot", "shutdown" and "platform" modes. If that does not work, there
163probably is a problem with a driver statically compiled into the kernel and you
164can try to compile more drivers as modules, so that they can be tested
165individually. Otherwise, there is a problem with a modular driver and you can
166find it by loading a half of the modules you normally use and binary searching
167in accordance with the algorithm:
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700168- if there are n modules loaded and the attempt to suspend and resume fails,
169unload n/2 of the modules and try again (that would probably involve rebooting
170the system),
171- if there are n modules loaded and the attempt to suspend and resume succeeds,
172load n/2 modules more and try again.
173
174Again, if you find the offending module(s), it(they) must be unloaded every time
Rafael J. Wysockice2b7142007-11-19 23:43:34 +0100175before hibernation, and please report the problem with it(them).
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700176
Rafael J. Wysocki947d2c22016-08-13 02:54:04 +0200177c) Using the "test_resume" hibernation option
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300178---------------------------------------------
Rafael J. Wysocki947d2c22016-08-13 02:54:04 +0200179
180/sys/power/disk generally tells the kernel what to do after creating a
181hibernation image. One of the available options is "test_resume" which
182causes the just created image to be used for immediate restoration. Namely,
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300183after doing::
Rafael J. Wysocki947d2c22016-08-13 02:54:04 +0200184
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300185 # echo test_resume > /sys/power/disk
186 # echo disk > /sys/power/state
Rafael J. Wysocki947d2c22016-08-13 02:54:04 +0200187
188a hibernation image will be created and a resume from it will be triggered
189immediately without involving the platform firmware in any way.
190
191That test can be used to check if failures to resume from hibernation are
192related to bad interactions with the platform firmware. That is, if the above
193works every time, but resume from actual hibernation does not work or is
194unreliable, the platform firmware may be responsible for the failures.
195
196On architectures and platforms that support using different kernels to restore
197hibernation images (that is, the kernel used to read the image from storage and
198load it into memory is different from the one included in the image) or support
199kernel address space randomization, it also can be used to check if failures
200to resume may be related to the differences between the restore and image
201kernels.
202
203d) Advanced debugging
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300204---------------------
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700205
Rafael J. Wysockice2b7142007-11-19 23:43:34 +0100206In case that hibernation does not work on your system even in the minimal
207configuration and compiling more drivers as modules is not practical or some
208modules cannot be unloaded, you can use one of the more advanced debugging
209techniques to find the problem. First, if there is a serial port in your box,
210you can boot the kernel with the 'no_console_suspend' parameter and try to log
211kernel messages using the serial console. This may provide you with some
212information about the reasons of the suspend (resume) failure. Alternatively,
213it may be possible to use a FireWire port for debugging with firescope
Lubomir Rintela9954ce2013-12-22 11:31:41 +0100214(http://v3.sk/~lkundrak/firescope/). On x86 it is also possible to
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300215use the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700216
2172. Testing suspend to RAM (STR)
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300218===============================
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700219
220To verify that the STR works, it is generally more convenient to use the s2ram
221tool available from http://suspend.sf.net and documented at
Jens Frederich54d4f252013-08-21 21:03:09 -0700222http://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK).
Rafael J. Wysocki5b795202007-05-08 00:24:07 -0700223
Rafael J. Wysockice2b7142007-11-19 23:43:34 +0100224Namely, after writing "freezer", "devices", "platform", "processors", or "core"
225into /sys/power/pm_test (available if the kernel is compiled with
226CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
227to given string. The STR test modes are defined in the same way as for
228hibernation, so please refer to Section 1 for more information about them. In
229particular, the "core" test allows you to test everything except for the actual
230invocation of the platform firmware in order to put the system into the sleep
231state.
232
233Among other things, the testing with the help of /sys/power/pm_test may allow
234you to identify drivers that fail to suspend or resume their devices. They
235should be unloaded every time before an STR transition.
236
Jens Frederich54d4f252013-08-21 21:03:09 -0700237Next, you can follow the instructions at S2RAM_LINK to test the system, but if
238it does not work "out of the box", you may need to boot it with
239"init=/bin/bash" and test s2ram in the minimal configuration. In that case,
240you may be able to search for failing drivers by following the procedure
Rafael J. Wysockice2b7142007-11-19 23:43:34 +0100241analogous to the one described in section 1. If you find some failing drivers,
242you will have to unload them every time before an STR transition (ie. before
243you run s2ram), and please report the problems with them.
ShuoX Liu2a77c462011-08-10 23:01:26 +0200244
245There is a debugfs entry which shows the suspend to RAM statistics. Here is an
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300246example of its output::
247
ShuoX Liu2a77c462011-08-10 23:01:26 +0200248 # mount -t debugfs none /sys/kernel/debug
249 # cat /sys/kernel/debug/suspend_stats
250 success: 20
251 fail: 5
252 failed_freeze: 0
253 failed_prepare: 0
254 failed_suspend: 5
255 failed_suspend_noirq: 0
256 failed_resume: 0
257 failed_resume_noirq: 0
258 failures:
259 last_failed_dev: alarm
260 adc
261 last_failed_errno: -16
262 -16
263 last_failed_step: suspend
264 suspend
Mauro Carvalho Chehab151f4e22019-06-13 07:10:36 -0300265
ShuoX Liu2a77c462011-08-10 23:01:26 +0200266Field success means the success number of suspend to RAM, and field fail means
267the failure number. Others are the failure number of different steps of suspend
268to RAM. suspend_stats just lists the last 2 failed devices, error number and
269failed step of suspend.