mirror of
				https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable.git
				synced 2025-11-04 07:44:51 +10:00 
			
		
		
		
	Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to fix the incorrect comments in code. Also fixed some zone->lru_lock comment error from ancient time. etc. I struggled to understand the comment above move_pages_to_lru() (surely it never calls page_referenced()), and eventually realized that most of it had got separated from shrink_active_list(): move that comment back. Link: https://lkml.kernel.org/r/1604566549-62481-20-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Jann Horn <jannh@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexander Duyck <alexander.duyck@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Chen, Rong A" <rong.a.chen@intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
		
			
				
	
	
		
			120 lines
		
	
	
		
			5.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			120 lines
		
	
	
		
			5.4 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
============================
 | 
						|
Subsystem Trace Points: kmem
 | 
						|
============================
 | 
						|
 | 
						|
The kmem tracing system captures events related to object and page allocation
 | 
						|
within the kernel. Broadly speaking there are five major subheadings.
 | 
						|
 | 
						|
  - Slab allocation of small objects of unknown type (kmalloc)
 | 
						|
  - Slab allocation of small objects of known type
 | 
						|
  - Page allocation
 | 
						|
  - Per-CPU Allocator Activity
 | 
						|
  - External Fragmentation
 | 
						|
 | 
						|
This document describes what each of the tracepoints is and why they
 | 
						|
might be useful.
 | 
						|
 | 
						|
1. Slab allocation of small objects of unknown type
 | 
						|
===================================================
 | 
						|
::
 | 
						|
 | 
						|
  kmalloc		call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
 | 
						|
  kmalloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
 | 
						|
  kfree		call_site=%lx ptr=%p
 | 
						|
 | 
						|
Heavy activity for these events may indicate that a specific cache is
 | 
						|
justified, particularly if kmalloc slab pages are getting significantly
 | 
						|
internal fragmented as a result of the allocation pattern. By correlating
 | 
						|
kmalloc with kfree, it may be possible to identify memory leaks and where
 | 
						|
the allocation sites were.
 | 
						|
 | 
						|
 | 
						|
2. Slab allocation of small objects of known type
 | 
						|
=================================================
 | 
						|
::
 | 
						|
 | 
						|
  kmem_cache_alloc	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s
 | 
						|
  kmem_cache_alloc_node	call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%d
 | 
						|
  kmem_cache_free		call_site=%lx ptr=%p
 | 
						|
 | 
						|
These events are similar in usage to the kmalloc-related events except that
 | 
						|
it is likely easier to pin the event down to a specific cache. At the time
 | 
						|
of writing, no information is available on what slab is being allocated from,
 | 
						|
but the call_site can usually be used to extrapolate that information.
 | 
						|
 | 
						|
3. Page allocation
 | 
						|
==================
 | 
						|
::
 | 
						|
 | 
						|
  mm_page_alloc		  page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%s
 | 
						|
  mm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
 | 
						|
  mm_page_free		  page=%p pfn=%lu order=%d
 | 
						|
  mm_page_free_batched	  page=%p pfn=%lu order=%d cold=%d
 | 
						|
 | 
						|
These four events deal with page allocation and freeing. mm_page_alloc is
 | 
						|
a simple indicator of page allocator activity. Pages may be allocated from
 | 
						|
the per-CPU allocator (high performance) or the buddy allocator.
 | 
						|
 | 
						|
If pages are allocated directly from the buddy allocator, the
 | 
						|
mm_page_alloc_zone_locked event is triggered. This event is important as high
 | 
						|
amounts of activity imply high activity on the zone->lock. Taking this lock
 | 
						|
impairs performance by disabling interrupts, dirtying cache lines between
 | 
						|
CPUs and serialising many CPUs.
 | 
						|
 | 
						|
When a page is freed directly by the caller, the only mm_page_free event
 | 
						|
is triggered. Significant amounts of activity here could indicate that the
 | 
						|
callers should be batching their activities.
 | 
						|
 | 
						|
When pages are freed in batch, the also mm_page_free_batched is triggered.
 | 
						|
Broadly speaking, pages are taken off the LRU lock in bulk and
 | 
						|
freed in batch with a page list. Significant amounts of activity here could
 | 
						|
indicate that the system is under memory pressure and can also indicate
 | 
						|
contention on the lruvec->lru_lock.
 | 
						|
 | 
						|
4. Per-CPU Allocator Activity
 | 
						|
=============================
 | 
						|
::
 | 
						|
 | 
						|
  mm_page_alloc_zone_locked	page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%d
 | 
						|
  mm_page_pcpu_drain		page=%p pfn=%lu order=%d cpu=%d migratetype=%d
 | 
						|
 | 
						|
In front of the page allocator is a per-cpu page allocator. It exists only
 | 
						|
for order-0 pages, reduces contention on the zone->lock and reduces the
 | 
						|
amount of writing on struct page.
 | 
						|
 | 
						|
When a per-CPU list is empty or pages of the wrong type are allocated,
 | 
						|
the zone->lock will be taken once and the per-CPU list refilled. The event
 | 
						|
triggered is mm_page_alloc_zone_locked for each page allocated with the
 | 
						|
event indicating whether it is for a percpu_refill or not.
 | 
						|
 | 
						|
When the per-CPU list is too full, a number of pages are freed, each one
 | 
						|
which triggers a mm_page_pcpu_drain event.
 | 
						|
 | 
						|
The individual nature of the events is so that pages can be tracked
 | 
						|
between allocation and freeing. A number of drain or refill pages that occur
 | 
						|
consecutively imply the zone->lock being taken once. Large amounts of per-CPU
 | 
						|
refills and drains could imply an imbalance between CPUs where too much work
 | 
						|
is being concentrated in one place. It could also indicate that the per-CPU
 | 
						|
lists should be a larger size. Finally, large amounts of refills on one CPU
 | 
						|
and drains on another could be a factor in causing large amounts of cache
 | 
						|
line bounces due to writes between CPUs and worth investigating if pages
 | 
						|
can be allocated and freed on the same CPU through some algorithm change.
 | 
						|
 | 
						|
5. External Fragmentation
 | 
						|
=========================
 | 
						|
::
 | 
						|
 | 
						|
  mm_page_alloc_extfrag		page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d
 | 
						|
 | 
						|
External fragmentation affects whether a high-order allocation will be
 | 
						|
successful or not. For some types of hardware, this is important although
 | 
						|
it is avoided where possible. If the system is using huge pages and needs
 | 
						|
to be able to resize the pool over the lifetime of the system, this value
 | 
						|
is important.
 | 
						|
 | 
						|
Large numbers of this event implies that memory is fragmenting and
 | 
						|
high-order allocations will start failing at some time in the future. One
 | 
						|
means of reducing the occurrence of this event is to increase the size of
 | 
						|
min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
 | 
						|
pageblock_size is usually the size of the default hugepage size.
 |