Changbin Du | 3cdd868 | 2018-02-17 13:39:43 +0800 | [diff] [blame] | 1 | ============================ |
| 2 | Subsystem Trace Points: kmem |
| 3 | ============================ |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 4 | |
Randy Dunlap | 2ec91ee | 2009-12-21 14:37:23 -0800 | [diff] [blame] | 5 | The kmem tracing system captures events related to object and page allocation |
| 6 | within the kernel. Broadly speaking there are five major subheadings. |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 7 | |
Changbin Du | 3cdd868 | 2018-02-17 13:39:43 +0800 | [diff] [blame] | 8 | - Slab allocation of small objects of unknown type (kmalloc) |
| 9 | - Slab allocation of small objects of known type |
| 10 | - Page allocation |
| 11 | - Per-CPU Allocator Activity |
| 12 | - External Fragmentation |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 13 | |
Randy Dunlap | 2ec91ee | 2009-12-21 14:37:23 -0800 | [diff] [blame] | 14 | This document describes what each of the tracepoints is and why they |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 15 | might be useful. |
| 16 | |
| 17 | 1. Slab allocation of small objects of unknown type |
| 18 | =================================================== |
Changbin Du | 3cdd868 | 2018-02-17 13:39:43 +0800 | [diff] [blame] | 19 | :: |
| 20 | |
| 21 | kmalloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s |
| 22 | kmalloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d |
| 23 | kfree call_site=%lx ptr=%p |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 24 | |
| 25 | Heavy activity for these events may indicate that a specific cache is |
| 26 | justified, particularly if kmalloc slab pages are getting significantly |
| 27 | internal fragmented as a result of the allocation pattern. By correlating |
| 28 | kmalloc with kfree, it may be possible to identify memory leaks and where |
| 29 | the allocation sites were. |
| 30 | |
| 31 | |
| 32 | 2. Slab allocation of small objects of known type |
| 33 | ================================================= |
Changbin Du | 3cdd868 | 2018-02-17 13:39:43 +0800 | [diff] [blame] | 34 | :: |
| 35 | |
| 36 | kmem_cache_alloc call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s |
| 37 | kmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d |
| 38 | kmem_cache_free call_site=%lx ptr=%p |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 39 | |
| 40 | These events are similar in usage to the kmalloc-related events except that |
| 41 | it is likely easier to pin the event down to a specific cache. At the time |
| 42 | of writing, no information is available on what slab is being allocated from, |
Randy Dunlap | 2ec91ee | 2009-12-21 14:37:23 -0800 | [diff] [blame] | 43 | but the call_site can usually be used to extrapolate that information. |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 44 | |
| 45 | 3. Page allocation |
| 46 | ================== |
Changbin Du | 3cdd868 | 2018-02-17 13:39:43 +0800 | [diff] [blame] | 47 | :: |
| 48 | |
| 49 | mm_page_alloc page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s |
| 50 | mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d |
| 51 | mm_page_free page=%p pfn=%lu order=%d |
| 52 | mm_page_free_batched page=%p pfn=%lu order=%d cold=%d |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 53 | |
| 54 | These four events deal with page allocation and freeing. mm_page_alloc is |
| 55 | a simple indicator of page allocator activity. Pages may be allocated from |
| 56 | the per-CPU allocator (high performance) or the buddy allocator. |
| 57 | |
| 58 | If pages are allocated directly from the buddy allocator, the |
| 59 | mm_page_alloc_zone_locked event is triggered. This event is important as high |
| 60 | amounts of activity imply high activity on the zone->lock. Taking this lock |
| 61 | impairs performance by disabling interrupts, dirtying cache lines between |
| 62 | CPUs and serialising many CPUs. |
| 63 | |
Konstantin Khlebnikov | b413d48 | 2012-01-10 15:07:09 -0800 | [diff] [blame] | 64 | When a page is freed directly by the caller, the only mm_page_free event |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 65 | is triggered. Significant amounts of activity here could indicate that the |
| 66 | callers should be batching their activities. |
| 67 | |
Konstantin Khlebnikov | b413d48 | 2012-01-10 15:07:09 -0800 | [diff] [blame] | 68 | When pages are freed in batch, the also mm_page_free_batched is triggered. |
| 69 | Broadly speaking, pages are taken off the LRU lock in bulk and |
| 70 | freed in batch with a page list. Significant amounts of activity here could |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 71 | indicate that the system is under memory pressure and can also indicate |
| 72 | contention on the zone->lru_lock. |
| 73 | |
| 74 | 4. Per-CPU Allocator Activity |
| 75 | ============================= |
Changbin Du | 3cdd868 | 2018-02-17 13:39:43 +0800 | [diff] [blame] | 76 | :: |
| 77 | |
| 78 | mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d |
| 79 | mm_page_pcpu_drain page=%p pfn=%lu order=%d cpu=%d migratetype=%d |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 80 | |
| 81 | In front of the page allocator is a per-cpu page allocator. It exists only |
| 82 | for order-0 pages, reduces contention on the zone->lock and reduces the |
| 83 | amount of writing on struct page. |
| 84 | |
| 85 | When a per-CPU list is empty or pages of the wrong type are allocated, |
| 86 | the zone->lock will be taken once and the per-CPU list refilled. The event |
| 87 | triggered is mm_page_alloc_zone_locked for each page allocated with the |
| 88 | event indicating whether it is for a percpu_refill or not. |
| 89 | |
| 90 | When the per-CPU list is too full, a number of pages are freed, each one |
| 91 | which triggers a mm_page_pcpu_drain event. |
| 92 | |
Randy Dunlap | 2ec91ee | 2009-12-21 14:37:23 -0800 | [diff] [blame] | 93 | The individual nature of the events is so that pages can be tracked |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 94 | between allocation and freeing. A number of drain or refill pages that occur |
Randy Dunlap | 2ec91ee | 2009-12-21 14:37:23 -0800 | [diff] [blame] | 95 | consecutively imply the zone->lock being taken once. Large amounts of per-CPU |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 96 | refills and drains could imply an imbalance between CPUs where too much work |
| 97 | is being concentrated in one place. It could also indicate that the per-CPU |
| 98 | lists should be a larger size. Finally, large amounts of refills on one CPU |
| 99 | and drains on another could be a factor in causing large amounts of cache |
| 100 | line bounces due to writes between CPUs and worth investigating if pages |
| 101 | can be allocated and freed on the same CPU through some algorithm change. |
| 102 | |
| 103 | 5. External Fragmentation |
| 104 | ========================= |
Changbin Du | 3cdd868 | 2018-02-17 13:39:43 +0800 | [diff] [blame] | 105 | :: |
| 106 | |
| 107 | mm_page_alloc_extfrag page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 108 | |
| 109 | External fragmentation affects whether a high-order allocation will be |
| 110 | successful or not. For some types of hardware, this is important although |
| 111 | it is avoided where possible. If the system is using huge pages and needs |
| 112 | to be able to resize the pool over the lifetime of the system, this value |
| 113 | is important. |
| 114 | |
| 115 | Large numbers of this event implies that memory is fragmenting and |
| 116 | high-order allocations will start failing at some time in the future. One |
Randy Dunlap | 2ec91ee | 2009-12-21 14:37:23 -0800 | [diff] [blame] | 117 | means of reducing the occurrence of this event is to increase the size of |
Mel Gorman | 8fbb398 | 2009-09-21 17:02:49 -0700 | [diff] [blame] | 118 | min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where |
| 119 | pageblock_size is usually the size of the default hugepage size. |