blob: 1addb0c374a43a20632b21c3f03a37ca6a64e24e [file] [log] [blame]
Mike Rapoport8d83d822018-03-21 21:22:30 +02001.. _overcommit_accounting:
2
3=====================
4Overcommit Accounting
5=====================
6
Linus Torvalds1da177e2005-04-16 15:20:36 -07007The Linux kernel supports the following overcommit handling modes
8
Mike Rapoport8d83d822018-03-21 21:22:30 +020090
10 Heuristic overcommit handling. Obvious overcommits of address
11 space are refused. Used for a typical system. It ensures a
12 seriously wild allocation fails while allowing overcommit to
13 reduce swap usage. root is allowed to allocate slightly more
14 memory in this mode. This is the default.
Linus Torvalds1da177e2005-04-16 15:20:36 -070015
Mike Rapoport8d83d822018-03-21 21:22:30 +0200161
17 Always overcommit. Appropriate for some scientific
18 applications. Classic example is code using sparse arrays and
19 just relying on the virtual memory consisting almost entirely
20 of zero pages.
Linus Torvalds1da177e2005-04-16 15:20:36 -070021
Mike Rapoport8d83d822018-03-21 21:22:30 +0200222
23 Don't overcommit. The total address space commit for the
24 system is not permitted to exceed swap + a configurable amount
25 (default is 50%) of physical RAM. Depending on the amount you
26 use, in most situations this means a process will not be
27 killed while accessing pages but will receive errors on memory
28 allocation as appropriate.
Linus Torvalds1da177e2005-04-16 15:20:36 -070029
Mike Rapoport8d83d822018-03-21 21:22:30 +020030 Useful for applications that want to guarantee their memory
31 allocations will be available in the future without having to
32 initialize every page.
Andrew Shewmakerc9b1d092013-04-29 15:08:10 -070033
Mike Rapoport8d83d822018-03-21 21:22:30 +020034The overcommit policy is set via the sysctl ``vm.overcommit_memory``.
Linus Torvalds1da177e2005-04-16 15:20:36 -070035
Mike Rapoport8d83d822018-03-21 21:22:30 +020036The overcommit amount can be set via ``vm.overcommit_ratio`` (percentage)
Anssi Hannuladc10ec92021-12-11 21:41:59 +020037or ``vm.overcommit_kbytes`` (absolute value). These only have an effect
38when ``vm.overcommit_memory`` is set to 2.
Linus Torvalds1da177e2005-04-16 15:20:36 -070039
40The current overcommit limit and amount committed are viewable in
Mike Rapoport8d83d822018-03-21 21:22:30 +020041``/proc/meminfo`` as CommitLimit and Committed_AS respectively.
Linus Torvalds1da177e2005-04-16 15:20:36 -070042
43Gotchas
Mike Rapoport8d83d822018-03-21 21:22:30 +020044=======
Linus Torvalds1da177e2005-04-16 15:20:36 -070045
46The C language stack growth does an implicit mremap. If you want absolute
Mike Rapoport8d83d822018-03-21 21:22:30 +020047guarantees and run close to the edge you MUST mmap your stack for the
Linus Torvalds1da177e2005-04-16 15:20:36 -070048largest size you think you will need. For typical stack usage this does
49not matter much but it's a corner case if you really really care
50
Mike Rapoport8d83d822018-03-21 21:22:30 +020051In mode 2 the MAP_NORESERVE flag is ignored.
Linus Torvalds1da177e2005-04-16 15:20:36 -070052
53
54How It Works
Mike Rapoport8d83d822018-03-21 21:22:30 +020055============
Linus Torvalds1da177e2005-04-16 15:20:36 -070056
57The overcommit is based on the following rules
58
59For a file backed map
Mike Rapoport8d83d822018-03-21 21:22:30 +020060 | SHARED or READ-only - 0 cost (the file is the map not swap)
61 | PRIVATE WRITABLE - size of mapping per instance
Linus Torvalds1da177e2005-04-16 15:20:36 -070062
Mike Rapoport8d83d822018-03-21 21:22:30 +020063For an anonymous or ``/dev/zero`` map
64 | SHARED - size of mapping
65 | PRIVATE READ-only - 0 cost (but of little use)
66 | PRIVATE WRITABLE - size of mapping per instance
Linus Torvalds1da177e2005-04-16 15:20:36 -070067
68Additional accounting
Mike Rapoport8d83d822018-03-21 21:22:30 +020069 | Pages made writable copies by mmap
70 | shmfs memory drawn from the same pool
Linus Torvalds1da177e2005-04-16 15:20:36 -070071
72Status
Mike Rapoport8d83d822018-03-21 21:22:30 +020073======
Linus Torvalds1da177e2005-04-16 15:20:36 -070074
Mike Rapoport8d83d822018-03-21 21:22:30 +020075* We account mmap memory mappings
76* We account mprotect changes in commit
77* We account mremap changes in size
78* We account brk
79* We account munmap
80* We report the commit status in /proc
81* Account and check on fork
82* Review stack handling/building on exec
83* SHMfs accounting
84* Implement actual limit enforcement
Linus Torvalds1da177e2005-04-16 15:20:36 -070085
86To Do
Mike Rapoport8d83d822018-03-21 21:22:30 +020087=====
88* Account ptrace pages (this is hard)