Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 1 | .. _overcommit_accounting: |
| 2 | |
| 3 | ===================== |
| 4 | Overcommit Accounting |
| 5 | ===================== |
| 6 | |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 7 | The Linux kernel supports the following overcommit handling modes |
| 8 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 9 | 0 |
| 10 | Heuristic overcommit handling. Obvious overcommits of address |
| 11 | space are refused. Used for a typical system. It ensures a |
| 12 | seriously wild allocation fails while allowing overcommit to |
| 13 | reduce swap usage. root is allowed to allocate slightly more |
| 14 | memory in this mode. This is the default. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 15 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 16 | 1 |
| 17 | Always overcommit. Appropriate for some scientific |
| 18 | applications. Classic example is code using sparse arrays and |
| 19 | just relying on the virtual memory consisting almost entirely |
| 20 | of zero pages. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 21 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 22 | 2 |
| 23 | Don't overcommit. The total address space commit for the |
| 24 | system is not permitted to exceed swap + a configurable amount |
| 25 | (default is 50%) of physical RAM. Depending on the amount you |
| 26 | use, in most situations this means a process will not be |
| 27 | killed while accessing pages but will receive errors on memory |
| 28 | allocation as appropriate. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 29 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 30 | Useful for applications that want to guarantee their memory |
| 31 | allocations will be available in the future without having to |
| 32 | initialize every page. |
Andrew Shewmaker | c9b1d09 | 2013-04-29 15:08:10 -0700 | [diff] [blame] | 33 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 34 | The overcommit policy is set via the sysctl ``vm.overcommit_memory``. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 35 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 36 | The overcommit amount can be set via ``vm.overcommit_ratio`` (percentage) |
| 37 | or ``vm.overcommit_kbytes`` (absolute value). |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 38 | |
| 39 | The current overcommit limit and amount committed are viewable in |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 40 | ``/proc/meminfo`` as CommitLimit and Committed_AS respectively. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 41 | |
| 42 | Gotchas |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 43 | ======= |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 44 | |
| 45 | The C language stack growth does an implicit mremap. If you want absolute |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 46 | guarantees and run close to the edge you MUST mmap your stack for the |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 47 | largest size you think you will need. For typical stack usage this does |
| 48 | not matter much but it's a corner case if you really really care |
| 49 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 50 | In mode 2 the MAP_NORESERVE flag is ignored. |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 51 | |
| 52 | |
| 53 | How It Works |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 54 | ============ |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 55 | |
| 56 | The overcommit is based on the following rules |
| 57 | |
| 58 | For a file backed map |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 59 | | SHARED or READ-only - 0 cost (the file is the map not swap) |
| 60 | | PRIVATE WRITABLE - size of mapping per instance |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 61 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 62 | For an anonymous or ``/dev/zero`` map |
| 63 | | SHARED - size of mapping |
| 64 | | PRIVATE READ-only - 0 cost (but of little use) |
| 65 | | PRIVATE WRITABLE - size of mapping per instance |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 66 | |
| 67 | Additional accounting |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 68 | | Pages made writable copies by mmap |
| 69 | | shmfs memory drawn from the same pool |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 70 | |
| 71 | Status |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 72 | ====== |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 73 | |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 74 | * We account mmap memory mappings |
| 75 | * We account mprotect changes in commit |
| 76 | * We account mremap changes in size |
| 77 | * We account brk |
| 78 | * We account munmap |
| 79 | * We report the commit status in /proc |
| 80 | * Account and check on fork |
| 81 | * Review stack handling/building on exec |
| 82 | * SHMfs accounting |
| 83 | * Implement actual limit enforcement |
Linus Torvalds | 1da177e | 2005-04-16 15:20:36 -0700 | [diff] [blame] | 84 | |
| 85 | To Do |
Mike Rapoport | 8d83d82 | 2018-03-21 21:22:30 +0200 | [diff] [blame] | 86 | ===== |
| 87 | * Account ptrace pages (this is hard) |