Mike Rapoport | 2fcbc41 | 2018-03-21 21:22:27 +0200 | [diff] [blame] | 1 | .. _ksm: |
| 2 | |
| 3 | ======================= |
| 4 | Kernel Samepage Merging |
| 5 | ======================= |
Hugh Dickins | 7701c9c | 2009-09-21 17:02:24 -0700 | [diff] [blame] | 6 | |
| 7 | KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y, |
Mike Rapoport | 2fcbc41 | 2018-03-21 21:22:27 +0200 | [diff] [blame] | 8 | added to the Linux kernel in 2.6.32. See ``mm/ksm.c`` for its implementation, |
Alexander A. Klimov | 93431e0 | 2020-05-26 08:05:44 +0200 | [diff] [blame] | 9 | and http://lwn.net/Articles/306704/ and https://lwn.net/Articles/330589/ |
Hugh Dickins | 7701c9c | 2009-09-21 17:02:24 -0700 | [diff] [blame] | 10 | |
Mike Rapoport | c916108 | 2018-04-24 09:40:28 +0300 | [diff] [blame] | 11 | The userspace interface of KSM is described in :ref:`Documentation/admin-guide/mm/ksm.rst <admin_guide_ksm>` |
Andrea Arcangeli | 2c653d0 | 2017-07-06 15:36:55 -0700 | [diff] [blame] | 12 | |
Mike Rapoport | 064fca3 | 2018-04-24 09:40:24 +0300 | [diff] [blame] | 13 | Design |
| 14 | ====== |
| 15 | |
| 16 | Overview |
| 17 | -------- |
| 18 | |
| 19 | .. kernel-doc:: mm/ksm.c |
| 20 | :DOC: Overview |
| 21 | |
| 22 | Reverse mapping |
| 23 | --------------- |
| 24 | KSM maintains reverse mapping information for KSM pages in the stable |
| 25 | tree. |
| 26 | |
| 27 | If a KSM page is shared between less than ``max_page_sharing`` VMAs, |
| 28 | the node of the stable tree that represents such KSM page points to a |
Mauro Carvalho Chehab | 9303c9d | 2020-09-25 12:01:25 +0200 | [diff] [blame] | 29 | list of struct rmap_item and the ``page->mapping`` of the |
Mike Rapoport | 064fca3 | 2018-04-24 09:40:24 +0300 | [diff] [blame] | 30 | KSM page points to the stable tree node. |
| 31 | |
| 32 | When the sharing passes this threshold, KSM adds a second dimension to |
| 33 | the stable tree. The tree node becomes a "chain" that links one or |
| 34 | more "dups". Each "dup" keeps reverse mapping information for a KSM |
| 35 | page with ``page->mapping`` pointing to that "dup". |
| 36 | |
| 37 | Every "chain" and all "dups" linked into a "chain" enforce the |
| 38 | invariant that they represent the same write protected memory content, |
| 39 | even if each "dup" will be pointed by a different KSM page copy of |
| 40 | that content. |
| 41 | |
| 42 | This way the stable tree lookup computational complexity is unaffected |
| 43 | if compared to an unlimited list of reverse mappings. It is still |
| 44 | enforced that there cannot be KSM page content duplicates in the |
| 45 | stable tree itself. |
| 46 | |
Mike Rapoport | 6570c78 | 2018-04-24 09:40:25 +0300 | [diff] [blame] | 47 | The deduplication limit enforced by ``max_page_sharing`` is required |
| 48 | to avoid the virtual memory rmap lists to grow too large. The rmap |
| 49 | walk has O(N) complexity where N is the number of rmap_items |
| 50 | (i.e. virtual mappings) that are sharing the page, which is in turn |
| 51 | capped by ``max_page_sharing``. So this effectively spreads the linear |
| 52 | O(N) computational complexity from rmap walk context over different |
| 53 | KSM pages. The ksmd walk over the stable_node "chains" is also O(N), |
| 54 | but N is the number of stable_node "dups", not the number of |
| 55 | rmap_items, so it has not a significant impact on ksmd performance. In |
| 56 | practice the best stable_node "dup" candidate will be kept and found |
| 57 | at the head of the "dups" list. |
| 58 | |
| 59 | High values of ``max_page_sharing`` result in faster memory merging |
| 60 | (because there will be fewer stable_node dups queued into the |
| 61 | stable_node chain->hlist to check for pruning) and higher |
| 62 | deduplication factor at the expense of slower worst case for rmap |
| 63 | walks for any KSM page which can happen during swapping, compaction, |
| 64 | NUMA balancing and page migration. |
| 65 | |
Mike Rapoport | 8b898fd | 2018-04-24 09:40:27 +0300 | [diff] [blame] | 66 | The ``stable_node_dups/stable_node_chains`` ratio is also affected by the |
| 67 | ``max_page_sharing`` tunable, and an high ratio may indicate fragmentation |
| 68 | in the stable_node dups, which could be solved by introducing |
| 69 | fragmentation algorithms in ksmd which would refile rmap_items from |
| 70 | one stable_node dup to another stable_node dup, in order to free up |
| 71 | stable_node "dups" with few rmap_items in them, but that may increase |
| 72 | the ksmd CPU usage and possibly slowdown the readonly computations on |
| 73 | the KSM pages of the applications. |
| 74 | |
Mike Rapoport | 2a695ca | 2018-04-24 09:40:26 +0300 | [diff] [blame] | 75 | The whole list of stable_node "dups" linked in the stable_node |
| 76 | "chains" is scanned periodically in order to prune stale stable_nodes. |
| 77 | The frequency of such scans is defined by |
| 78 | ``stable_node_chains_prune_millisecs`` sysfs tunable. |
| 79 | |
Mike Rapoport | 064fca3 | 2018-04-24 09:40:24 +0300 | [diff] [blame] | 80 | Reference |
| 81 | --------- |
| 82 | .. kernel-doc:: mm/ksm.c |
| 83 | :functions: mm_slot ksm_scan stable_node rmap_item |
| 84 | |
Mike Rapoport | db12c00 | 2018-04-24 09:40:23 +0300 | [diff] [blame] | 85 | -- |
Hugh Dickins | 7701c9c | 2009-09-21 17:02:24 -0700 | [diff] [blame] | 86 | Izik Eidus, |
Hugh Dickins | d0f209f | 2009-12-14 17:59:34 -0800 | [diff] [blame] | 87 | Hugh Dickins, 17 Nov 2009 |