Mauro Carvalho Chehab | 151f4e2 | 2019-06-13 07:10:36 -0300 | [diff] [blame] | 1 | ============ |
| 2 | Swap suspend |
| 3 | ============ |
| 4 | |
| 5 | Some warnings, first. |
| 6 | |
| 7 | .. warning:: |
| 8 | |
| 9 | **BIG FAT WARNING** |
| 10 | |
| 11 | If you touch anything on disk between suspend and resume... |
| 12 | ...kiss your data goodbye. |
| 13 | |
| 14 | If you do resume from initrd after your filesystems are mounted... |
| 15 | ...bye bye root partition. |
| 16 | |
| 17 | [this is actually same case as above] |
| 18 | |
| 19 | If you have unsupported ( ) devices using DMA, you may have some |
| 20 | problems. If your disk driver does not support suspend... (IDE does), |
| 21 | it may cause some problems, too. If you change kernel command line |
| 22 | between suspend and resume, it may do something wrong. If you change |
| 23 | your hardware while system is suspended... well, it was not good idea; |
| 24 | but it will probably only crash. |
| 25 | |
| 26 | ( ) suspend/resume support is needed to make it safe. |
| 27 | |
| 28 | If you have any filesystems on USB devices mounted before software suspend, |
| 29 | they won't be accessible after resume and you may lose data, as though |
| 30 | you have unplugged the USB devices with mounted filesystems on them; |
| 31 | see the FAQ below for details. (This is not true for more traditional |
| 32 | power states like "standby", which normally don't turn USB off.) |
| 33 | |
| 34 | Swap partition: |
| 35 | You need to append resume=/dev/your_swap_partition to kernel command |
| 36 | line or specify it using /sys/power/resume. |
| 37 | |
| 38 | Swap file: |
| 39 | If using a swapfile you can also specify a resume offset using |
| 40 | resume_offset=<number> on the kernel command line or specify it |
| 41 | in /sys/power/resume_offset. |
| 42 | |
| 43 | After preparing then you suspend by:: |
| 44 | |
| 45 | echo shutdown > /sys/power/disk; echo disk > /sys/power/state |
| 46 | |
| 47 | - If you feel ACPI works pretty well on your system, you might try:: |
| 48 | |
| 49 | echo platform > /sys/power/disk; echo disk > /sys/power/state |
| 50 | |
| 51 | - If you would like to write hibernation image to swap and then suspend |
| 52 | to RAM (provided your platform supports it), you can try:: |
| 53 | |
| 54 | echo suspend > /sys/power/disk; echo disk > /sys/power/state |
| 55 | |
| 56 | - If you have SATA disks, you'll need recent kernels with SATA suspend |
| 57 | support. For suspend and resume to work, make sure your disk drivers |
| 58 | are built into kernel -- not modules. [There's way to make |
| 59 | suspend/resume with modular disk drivers, see FAQ, but you probably |
| 60 | should not do that.] |
| 61 | |
| 62 | If you want to limit the suspend image size to N bytes, do:: |
| 63 | |
| 64 | echo N > /sys/power/image_size |
| 65 | |
| 66 | before suspend (it is limited to around 2/5 of available RAM by default). |
| 67 | |
| 68 | - The resume process checks for the presence of the resume device, |
| 69 | if found, it then checks the contents for the hibernation image signature. |
| 70 | If both are found, it resumes the hibernation image. |
| 71 | |
| 72 | - The resume process may be triggered in two ways: |
| 73 | |
| 74 | 1) During lateinit: If resume=/dev/your_swap_partition is specified on |
| 75 | the kernel command line, lateinit runs the resume process. If the |
| 76 | resume device has not been probed yet, the resume process fails and |
| 77 | bootup continues. |
| 78 | 2) Manually from an initrd or initramfs: May be run from |
| 79 | the init script by using the /sys/power/resume file. It is vital |
| 80 | that this be done prior to remounting any filesystems (even as |
| 81 | read-only) otherwise data may be corrupted. |
| 82 | |
| 83 | Article about goals and implementation of Software Suspend for Linux |
| 84 | ==================================================================== |
| 85 | |
| 86 | Author: Gรกbor Kuti |
| 87 | Last revised: 2003-10-20 by Pavel Machek |
| 88 | |
| 89 | Idea and goals to achieve |
| 90 | ------------------------- |
| 91 | |
| 92 | Nowadays it is common in several laptops that they have a suspend button. It |
| 93 | saves the state of the machine to a filesystem or to a partition and switches |
| 94 | to standby mode. Later resuming the machine the saved state is loaded back to |
| 95 | ram and the machine can continue its work. It has two real benefits. First we |
| 96 | save ourselves the time machine goes down and later boots up, energy costs |
| 97 | are real high when running from batteries. The other gain is that we don't have |
| 98 | to interrupt our programs so processes that are calculating something for a long |
| 99 | time shouldn't need to be written interruptible. |
| 100 | |
| 101 | swsusp saves the state of the machine into active swaps and then reboots or |
| 102 | powerdowns. You must explicitly specify the swap partition to resume from with |
| 103 | `resume=` kernel option. If signature is found it loads and restores saved |
| 104 | state. If the option `noresume` is specified as a boot parameter, it skips |
| 105 | the resuming. If the option `hibernate=nocompress` is specified as a boot |
| 106 | parameter, it saves hibernation image without compression. |
| 107 | |
| 108 | In the meantime while the system is suspended you should not add/remove any |
| 109 | of the hardware, write to the filesystems, etc. |
| 110 | |
| 111 | Sleep states summary |
| 112 | ==================== |
| 113 | |
| 114 | There are three different interfaces you can use, /proc/acpi should |
| 115 | work like this: |
| 116 | |
| 117 | In a really perfect world:: |
| 118 | |
| 119 | echo 1 > /proc/acpi/sleep # for standby |
| 120 | echo 2 > /proc/acpi/sleep # for suspend to ram |
Bjorn Helgaas | 1992b66 | 2019-11-19 08:09:23 -0600 | [diff] [blame] | 121 | echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power |
| 122 | # conservative |
Mauro Carvalho Chehab | 151f4e2 | 2019-06-13 07:10:36 -0300 | [diff] [blame] | 123 | echo 4 > /proc/acpi/sleep # for suspend to disk |
| 124 | echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system |
| 125 | |
| 126 | and perhaps:: |
| 127 | |
| 128 | echo 4b > /proc/acpi/sleep # for suspend to disk via s4bios |
| 129 | |
| 130 | Frequently Asked Questions |
| 131 | ========================== |
| 132 | |
| 133 | Q: |
| 134 | well, suspending a server is IMHO a really stupid thing, |
| 135 | but... (Diego Zuccato): |
| 136 | |
| 137 | A: |
| 138 | You bought new UPS for your server. How do you install it without |
| 139 | bringing machine down? Suspend to disk, rearrange power cables, |
| 140 | resume. |
| 141 | |
| 142 | You have your server on UPS. Power died, and UPS is indicating 30 |
| 143 | seconds to failure. What do you do? Suspend to disk. |
| 144 | |
| 145 | |
| 146 | Q: |
| 147 | Maybe I'm missing something, but why don't the regular I/O paths work? |
| 148 | |
| 149 | A: |
| 150 | We do use the regular I/O paths. However we cannot restore the data |
| 151 | to its original location as we load it. That would create an |
| 152 | inconsistent kernel state which would certainly result in an oops. |
| 153 | Instead, we load the image into unused memory and then atomically copy |
| 154 | it back to it original location. This implies, of course, a maximum |
| 155 | image size of half the amount of memory. |
| 156 | |
| 157 | There are two solutions to this: |
| 158 | |
| 159 | * require half of memory to be free during suspend. That way you can |
| 160 | read "new" data onto free spots, then cli and copy |
| 161 | |
| 162 | * assume we had special "polling" ide driver that only uses memory |
| 163 | between 0-640KB. That way, I'd have to make sure that 0-640KB is free |
| 164 | during suspending, but otherwise it would work... |
| 165 | |
| 166 | suspend2 shares this fundamental limitation, but does not include user |
| 167 | data and disk caches into "used memory" by saving them in |
| 168 | advance. That means that the limitation goes away in practice. |
| 169 | |
| 170 | Q: |
| 171 | Does linux support ACPI S4? |
| 172 | |
| 173 | A: |
| 174 | Yes. That's what echo platform > /sys/power/disk does. |
| 175 | |
| 176 | Q: |
| 177 | What is 'suspend2'? |
| 178 | |
| 179 | A: |
| 180 | suspend2 is 'Software Suspend 2', a forked implementation of |
| 181 | suspend-to-disk which is available as separate patches for 2.4 and 2.6 |
| 182 | kernels from swsusp.sourceforge.net. It includes support for SMP, 4GB |
| 183 | highmem and preemption. It also has a extensible architecture that |
| 184 | allows for arbitrary transformations on the image (compression, |
| 185 | encryption) and arbitrary backends for writing the image (eg to swap |
| 186 | or an NFS share[Work In Progress]). Questions regarding suspend2 |
| 187 | should be sent to the mailing list available through the suspend2 |
| 188 | website, and not to the Linux Kernel Mailing List. We are working |
| 189 | toward merging suspend2 into the mainline kernel. |
| 190 | |
| 191 | Q: |
| 192 | What is the freezing of tasks and why are we using it? |
| 193 | |
| 194 | A: |
| 195 | The freezing of tasks is a mechanism by which user space processes and some |
Bjorn Helgaas | 1992b66 | 2019-11-19 08:09:23 -0600 | [diff] [blame] | 196 | kernel threads are controlled during hibernation or system-wide suspend (on |
| 197 | some architectures). See freezing-of-tasks.txt for details. |
Mauro Carvalho Chehab | 151f4e2 | 2019-06-13 07:10:36 -0300 | [diff] [blame] | 198 | |
| 199 | Q: |
| 200 | What is the difference between "platform" and "shutdown"? |
| 201 | |
| 202 | A: |
| 203 | shutdown: |
| 204 | save state in linux, then tell bios to powerdown |
| 205 | |
| 206 | platform: |
| 207 | save state in linux, then tell bios to powerdown and blink |
| 208 | "suspended led" |
| 209 | |
| 210 | "platform" is actually right thing to do where supported, but |
| 211 | "shutdown" is most reliable (except on ACPI systems). |
| 212 | |
| 213 | Q: |
| 214 | I do not understand why you have such strong objections to idea of |
| 215 | selective suspend. |
| 216 | |
| 217 | A: |
| 218 | Do selective suspend during runtime power management, that's okay. But |
| 219 | it's useless for suspend-to-disk. (And I do not see how you could use |
| 220 | it for suspend-to-ram, I hope you do not want that). |
| 221 | |
| 222 | Lets see, so you suggest to |
| 223 | |
| 224 | * SUSPEND all but swap device and parents |
| 225 | * Snapshot |
| 226 | * Write image to disk |
| 227 | * SUSPEND swap device and parents |
| 228 | * Powerdown |
| 229 | |
| 230 | Oh no, that does not work, if swap device or its parents uses DMA, |
| 231 | you've corrupted data. You'd have to do |
| 232 | |
| 233 | * SUSPEND all but swap device and parents |
| 234 | * FREEZE swap device and parents |
| 235 | * Snapshot |
| 236 | * UNFREEZE swap device and parents |
| 237 | * Write |
| 238 | * SUSPEND swap device and parents |
| 239 | |
| 240 | Which means that you still need that FREEZE state, and you get more |
| 241 | complicated code. (And I have not yet introduce details like system |
| 242 | devices). |
| 243 | |
| 244 | Q: |
| 245 | There don't seem to be any generally useful behavioral |
| 246 | distinctions between SUSPEND and FREEZE. |
| 247 | |
| 248 | A: |
| 249 | Doing SUSPEND when you are asked to do FREEZE is always correct, |
| 250 | but it may be unnecessarily slow. If you want your driver to stay simple, |
| 251 | slowness may not matter to you. It can always be fixed later. |
| 252 | |
| 253 | For devices like disk it does matter, you do not want to spindown for |
| 254 | FREEZE. |
| 255 | |
| 256 | Q: |
| 257 | After resuming, system is paging heavily, leading to very bad interactivity. |
| 258 | |
| 259 | A: |
| 260 | Try running:: |
| 261 | |
| 262 | cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u | while read file |
| 263 | do |
| 264 | test -f "$file" && cat "$file" > /dev/null |
| 265 | done |
| 266 | |
| 267 | after resume. swapoff -a; swapon -a may also be useful. |
| 268 | |
| 269 | Q: |
| 270 | What happens to devices during swsusp? They seem to be resumed |
| 271 | during system suspend? |
| 272 | |
| 273 | A: |
| 274 | That's correct. We need to resume them if we want to write image to |
| 275 | disk. Whole sequence goes like |
| 276 | |
| 277 | **Suspend part** |
| 278 | |
| 279 | running system, user asks for suspend-to-disk |
| 280 | |
| 281 | user processes are stopped |
| 282 | |
| 283 | suspend(PMSG_FREEZE): devices are frozen so that they don't interfere |
| 284 | with state snapshot |
| 285 | |
Bjorn Helgaas | 1992b66 | 2019-11-19 08:09:23 -0600 | [diff] [blame] | 286 | state snapshot: copy of whole used memory is taken with interrupts |
| 287 | disabled |
Mauro Carvalho Chehab | 151f4e2 | 2019-06-13 07:10:36 -0300 | [diff] [blame] | 288 | |
| 289 | resume(): devices are woken up so that we can write image to swap |
| 290 | |
| 291 | write image to swap |
| 292 | |
| 293 | suspend(PMSG_SUSPEND): suspend devices so that we can power off |
| 294 | |
| 295 | turn the power off |
| 296 | |
| 297 | **Resume part** |
| 298 | |
| 299 | (is actually pretty similar) |
| 300 | |
| 301 | running system, user asks for suspend-to-disk |
| 302 | |
| 303 | user processes are stopped (in common case there are none, |
| 304 | but with resume-from-initrd, no one knows) |
| 305 | |
| 306 | read image from disk |
| 307 | |
| 308 | suspend(PMSG_FREEZE): devices are frozen so that they don't interfere |
| 309 | with image restoration |
| 310 | |
| 311 | image restoration: rewrite memory with image |
| 312 | |
| 313 | resume(): devices are woken up so that system can continue |
| 314 | |
| 315 | thaw all user processes |
| 316 | |
| 317 | Q: |
| 318 | What is this 'Encrypt suspend image' for? |
| 319 | |
| 320 | A: |
| 321 | First of all: it is not a replacement for dm-crypt encrypted swap. |
| 322 | It cannot protect your computer while it is suspended. Instead it does |
| 323 | protect from leaking sensitive data after resume from suspend. |
| 324 | |
| 325 | Think of the following: you suspend while an application is running |
| 326 | that keeps sensitive data in memory. The application itself prevents |
| 327 | the data from being swapped out. Suspend, however, must write these |
| 328 | data to swap to be able to resume later on. Without suspend encryption |
| 329 | your sensitive data are then stored in plaintext on disk. This means |
| 330 | that after resume your sensitive data are accessible to all |
| 331 | applications having direct access to the swap device which was used |
| 332 | for suspend. If you don't need swap after resume these data can remain |
| 333 | on disk virtually forever. Thus it can happen that your system gets |
| 334 | broken in weeks later and sensitive data which you thought were |
| 335 | encrypted and protected are retrieved and stolen from the swap device. |
| 336 | To prevent this situation you should use 'Encrypt suspend image'. |
| 337 | |
| 338 | During suspend a temporary key is created and this key is used to |
| 339 | encrypt the data written to disk. When, during resume, the data was |
| 340 | read back into memory the temporary key is destroyed which simply |
| 341 | means that all data written to disk during suspend are then |
| 342 | inaccessible so they can't be stolen later on. The only thing that |
| 343 | you must then take care of is that you call 'mkswap' for the swap |
| 344 | partition used for suspend as early as possible during regular |
| 345 | boot. This asserts that any temporary key from an oopsed suspend or |
| 346 | from a failed or aborted resume is erased from the swap device. |
| 347 | |
| 348 | As a rule of thumb use encrypted swap to protect your data while your |
| 349 | system is shut down or suspended. Additionally use the encrypted |
| 350 | suspend image to prevent sensitive data from being stolen after |
| 351 | resume. |
| 352 | |
| 353 | Q: |
| 354 | Can I suspend to a swap file? |
| 355 | |
| 356 | A: |
| 357 | Generally, yes, you can. However, it requires you to use the "resume=" and |
Bjorn Helgaas | 1992b66 | 2019-11-19 08:09:23 -0600 | [diff] [blame] | 358 | "resume_offset=" kernel command line parameters, so the resume from a swap |
| 359 | file cannot be initiated from an initrd or initramfs image. See |
Mauro Carvalho Chehab | 151f4e2 | 2019-06-13 07:10:36 -0300 | [diff] [blame] | 360 | swsusp-and-swap-files.txt for details. |
| 361 | |
| 362 | Q: |
| 363 | Is there a maximum system RAM size that is supported by swsusp? |
| 364 | |
| 365 | A: |
| 366 | It should work okay with highmem. |
| 367 | |
| 368 | Q: |
| 369 | Does swsusp (to disk) use only one swap partition or can it use |
| 370 | multiple swap partitions (aggregate them into one logical space)? |
| 371 | |
| 372 | A: |
| 373 | Only one swap partition, sorry. |
| 374 | |
| 375 | Q: |
| 376 | If my application(s) causes lots of memory & swap space to be used |
| 377 | (over half of the total system RAM), is it correct that it is likely |
| 378 | to be useless to try to suspend to disk while that app is running? |
| 379 | |
| 380 | A: |
| 381 | No, it should work okay, as long as your app does not mlock() |
| 382 | it. Just prepare big enough swap partition. |
| 383 | |
| 384 | Q: |
| 385 | What information is useful for debugging suspend-to-disk problems? |
| 386 | |
| 387 | A: |
| 388 | Well, last messages on the screen are always useful. If something |
| 389 | is broken, it is usually some kernel driver, therefore trying with as |
| 390 | little as possible modules loaded helps a lot. I also prefer people to |
| 391 | suspend from console, preferably without X running. Booting with |
| 392 | init=/bin/bash, then swapon and starting suspend sequence manually |
| 393 | usually does the trick. Then it is good idea to try with latest |
| 394 | vanilla kernel. |
| 395 | |
| 396 | Q: |
| 397 | How can distributions ship a swsusp-supporting kernel with modular |
| 398 | disk drivers (especially SATA)? |
| 399 | |
| 400 | A: |
| 401 | Well, it can be done, load the drivers, then do echo into |
| 402 | /sys/power/resume file from initrd. Be sure not to mount |
| 403 | anything, not even read-only mount, or you are going to lose your |
| 404 | data. |
| 405 | |
| 406 | Q: |
| 407 | How do I make suspend more verbose? |
| 408 | |
| 409 | A: |
| 410 | If you want to see any non-error kernel messages on the virtual |
| 411 | terminal the kernel switches to during suspend, you have to set the |
| 412 | kernel console loglevel to at least 4 (KERN_WARNING), for example by |
| 413 | doing:: |
| 414 | |
| 415 | # save the old loglevel |
| 416 | read LOGLEVEL DUMMY < /proc/sys/kernel/printk |
| 417 | # set the loglevel so we see the progress bar. |
| 418 | # if the level is higher than needed, we leave it alone. |
| 419 | if [ $LOGLEVEL -lt 5 ]; then |
| 420 | echo 5 > /proc/sys/kernel/printk |
| 421 | fi |
| 422 | |
| 423 | IMG_SZ=0 |
| 424 | read IMG_SZ < /sys/power/image_size |
| 425 | echo -n disk > /sys/power/state |
| 426 | RET=$? |
| 427 | # |
| 428 | # the logic here is: |
| 429 | # if image_size > 0 (without kernel support, IMG_SZ will be zero), |
| 430 | # then try again with image_size set to zero. |
| 431 | if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size |
| 432 | echo 0 > /sys/power/image_size |
| 433 | echo -n disk > /sys/power/state |
| 434 | RET=$? |
| 435 | fi |
| 436 | |
| 437 | # restore previous loglevel |
| 438 | echo $LOGLEVEL > /proc/sys/kernel/printk |
| 439 | exit $RET |
| 440 | |
| 441 | Q: |
| 442 | Is this true that if I have a mounted filesystem on a USB device and |
| 443 | I suspend to disk, I can lose data unless the filesystem has been mounted |
| 444 | with "sync"? |
| 445 | |
| 446 | A: |
| 447 | That's right ... if you disconnect that device, you may lose data. |
| 448 | In fact, even with "-o sync" you can lose data if your programs have |
| 449 | information in buffers they haven't written out to a disk you disconnect, |
| 450 | or if you disconnect before the device finished saving data you wrote. |
| 451 | |
| 452 | Software suspend normally powers down USB controllers, which is equivalent |
| 453 | to disconnecting all USB devices attached to your system. |
| 454 | |
| 455 | Your system might well support low-power modes for its USB controllers |
| 456 | while the system is asleep, maintaining the connection, using true sleep |
| 457 | modes like "suspend-to-RAM" or "standby". (Don't write "disk" to the |
| 458 | /sys/power/state file; write "standby" or "mem".) We've not seen any |
| 459 | hardware that can use these modes through software suspend, although in |
| 460 | theory some systems might support "platform" modes that won't break the |
| 461 | USB connections. |
| 462 | |
| 463 | Remember that it's always a bad idea to unplug a disk drive containing a |
| 464 | mounted filesystem. That's true even when your system is asleep! The |
| 465 | safest thing is to unmount all filesystems on removable media (such USB, |
| 466 | Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) |
| 467 | before suspending; then remount them after resuming. |
| 468 | |
| 469 | There is a work-around for this problem. For more information, see |
| 470 | Documentation/driver-api/usb/persist.rst. |
| 471 | |
| 472 | Q: |
| 473 | Can I suspend-to-disk using a swap partition under LVM? |
| 474 | |
| 475 | A: |
| 476 | Yes and No. You can suspend successfully, but the kernel will not be able |
| 477 | to resume on its own. You need an initramfs that can recognize the resume |
| 478 | situation, activate the logical volume containing the swap volume (but not |
| 479 | touch any filesystems!), and eventually call:: |
| 480 | |
| 481 | echo -n "$major:$minor" > /sys/power/resume |
| 482 | |
| 483 | where $major and $minor are the respective major and minor device numbers of |
| 484 | the swap volume. |
| 485 | |
| 486 | uswsusp works with LVM, too. See http://suspend.sourceforge.net/ |
| 487 | |
| 488 | Q: |
| 489 | I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were |
| 490 | compiled with the similar configuration files. Anyway I found that |
| 491 | suspend to disk (and resume) is much slower on 2.6.16 compared to |
| 492 | 2.6.15. Any idea for why that might happen or how can I speed it up? |
| 493 | |
| 494 | A: |
| 495 | This is because the size of the suspend image is now greater than |
| 496 | for 2.6.15 (by saving more data we can get more responsive system |
| 497 | after resume). |
| 498 | |
| 499 | There's the /sys/power/image_size knob that controls the size of the |
| 500 | image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as |
| 501 | root), the 2.6.15 behavior should be restored. If it is still too |
| 502 | slow, take a look at suspend.sf.net -- userland suspend is faster and |
| 503 | supports LZF compression to speed it up further. |