mirror of
				https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stable.git
				synced 2025-11-04 07:44:51 +10:00 
			
		
		
		
	Delay accounting does not track the delay of IRQ/SOFTIRQ.  While
IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such
as when workloads are running on system which is busy handling network
IRQ/SOFTIRQ.
Get the delay of IRQ/SOFTIRQ could help users to reduce such delay.  Such
as setting interrupt affinity or task affinity, using kernel thread for
NAPI etc.  This is inspired by "sched/psi: Add PSI_IRQ to track
IRQ/SOFTIRQ pressure"[1].  Also fix some code indent problems of older
code.
And update tools/accounting/getdelays.c:
    / # ./getdelays -p 156 -di
    print delayacct stats ON
    printing IO accounting
    PID     156
    CPU             count     real total  virtual total    delay total  delay average
                       15       15836008       16218149      275700790         18.380ms
    IO              count    delay total  delay average
                        0              0          0.000ms
    SWAP            count    delay total  delay average
                        0              0          0.000ms
    RECLAIM         count    delay total  delay average
                        0              0          0.000ms
    THRASHING       count    delay total  delay average
                        0              0          0.000ms
    COMPACT         count    delay total  delay average
                        0              0          0.000ms
    WPCOPY          count    delay total  delay average
                       36        7586118          0.211ms
    IRQ             count    delay total  delay average
                       42         929161          0.022ms
[1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure")
Link: https://lkml.kernel.org/r/202304081728353557233@zte.com.cn
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: Jiang Xuexin <jiang.xuexin@zte.com.cn>
Cc: wangyong <wang.yong12@zte.com.cn>
Cc: junhua huang <huang.junhua@zte.com.cn>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
		
	
			
		
			
				
	
	
		
			134 lines
		
	
	
		
			4.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			134 lines
		
	
	
		
			4.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
================
 | 
						|
Delay accounting
 | 
						|
================
 | 
						|
 | 
						|
Tasks encounter delays in execution when they wait
 | 
						|
for some kernel resource to become available e.g. a
 | 
						|
runnable task may wait for a free CPU to run on.
 | 
						|
 | 
						|
The per-task delay accounting functionality measures
 | 
						|
the delays experienced by a task while
 | 
						|
 | 
						|
a) waiting for a CPU (while being runnable)
 | 
						|
b) completion of synchronous block I/O initiated by the task
 | 
						|
c) swapping in pages
 | 
						|
d) memory reclaim
 | 
						|
e) thrashing
 | 
						|
f) direct compact
 | 
						|
g) write-protect copy
 | 
						|
h) IRQ/SOFTIRQ
 | 
						|
 | 
						|
and makes these statistics available to userspace through
 | 
						|
the taskstats interface.
 | 
						|
 | 
						|
Such delays provide feedback for setting a task's cpu priority,
 | 
						|
io priority and rss limit values appropriately. Long delays for
 | 
						|
important tasks could be a trigger for raising its corresponding priority.
 | 
						|
 | 
						|
The functionality, through its use of the taskstats interface, also provides
 | 
						|
delay statistics aggregated for all tasks (or threads) belonging to a
 | 
						|
thread group (corresponding to a traditional Unix process). This is a commonly
 | 
						|
needed aggregation that is more efficiently done by the kernel.
 | 
						|
 | 
						|
Userspace utilities, particularly resource management applications, can also
 | 
						|
aggregate delay statistics into arbitrary groups. To enable this, delay
 | 
						|
statistics of a task are available both during its lifetime as well as on its
 | 
						|
exit, ensuring continuous and complete monitoring can be done.
 | 
						|
 | 
						|
 | 
						|
Interface
 | 
						|
---------
 | 
						|
 | 
						|
Delay accounting uses the taskstats interface which is described
 | 
						|
in detail in a separate document in this directory. Taskstats returns a
 | 
						|
generic data structure to userspace corresponding to per-pid and per-tgid
 | 
						|
statistics. The delay accounting functionality populates specific fields of
 | 
						|
this structure. See
 | 
						|
 | 
						|
     include/uapi/linux/taskstats.h
 | 
						|
 | 
						|
for a description of the fields pertaining to delay accounting.
 | 
						|
It will generally be in the form of counters returning the cumulative
 | 
						|
delay seen for cpu, sync block I/O, swapin, memory reclaim, thrash page
 | 
						|
cache, direct compact, write-protect copy, IRQ/SOFTIRQ etc.
 | 
						|
 | 
						|
Taking the difference of two successive readings of a given
 | 
						|
counter (say cpu_delay_total) for a task will give the delay
 | 
						|
experienced by the task waiting for the corresponding resource
 | 
						|
in that interval.
 | 
						|
 | 
						|
When a task exits, records containing the per-task statistics
 | 
						|
are sent to userspace without requiring a command. If it is the last exiting
 | 
						|
task of a thread group, the per-tgid statistics are also sent. More details
 | 
						|
are given in the taskstats interface description.
 | 
						|
 | 
						|
The getdelays.c userspace utility in tools/accounting directory allows simple
 | 
						|
commands to be run and the corresponding delay statistics to be displayed. It
 | 
						|
also serves as an example of using the taskstats interface.
 | 
						|
 | 
						|
Usage
 | 
						|
-----
 | 
						|
 | 
						|
Compile the kernel with::
 | 
						|
 | 
						|
	CONFIG_TASK_DELAY_ACCT=y
 | 
						|
	CONFIG_TASKSTATS=y
 | 
						|
 | 
						|
Delay accounting is disabled by default at boot up.
 | 
						|
To enable, add::
 | 
						|
 | 
						|
   delayacct
 | 
						|
 | 
						|
to the kernel boot options. The rest of the instructions below assume this has
 | 
						|
been done. Alternatively, use sysctl kernel.task_delayacct to switch the state
 | 
						|
at runtime. Note however that only tasks started after enabling it will have
 | 
						|
delayacct information.
 | 
						|
 | 
						|
After the system has booted up, use a utility
 | 
						|
similar to  getdelays.c to access the delays
 | 
						|
seen by a given task or a task group (tgid).
 | 
						|
The utility also allows a given command to be
 | 
						|
executed and the corresponding delays to be
 | 
						|
seen.
 | 
						|
 | 
						|
General format of the getdelays command::
 | 
						|
 | 
						|
	getdelays [-dilv] [-t tgid] [-p pid]
 | 
						|
 | 
						|
Get delays, since system boot, for pid 10::
 | 
						|
 | 
						|
	# ./getdelays -d -p 10
 | 
						|
	(output similar to next case)
 | 
						|
 | 
						|
Get sum of delays, since system boot, for all pids with tgid 5::
 | 
						|
 | 
						|
	# ./getdelays -d -t 5
 | 
						|
	print delayacct stats ON
 | 
						|
	TGID	5
 | 
						|
 | 
						|
 | 
						|
	CPU             count     real total  virtual total    delay total  delay average
 | 
						|
	                    8        7000000        6872122        3382277          0.423ms
 | 
						|
	IO              count    delay total  delay average
 | 
						|
                   0              0          0.000ms
 | 
						|
	SWAP            count    delay total  delay average
 | 
						|
                       0              0          0.000ms
 | 
						|
	RECLAIM         count    delay total  delay average
 | 
						|
                   0              0          0.000ms
 | 
						|
	THRASHING       count    delay total  delay average
 | 
						|
                       0              0          0.000ms
 | 
						|
	COMPACT         count    delay total  delay average
 | 
						|
                       0              0          0.000ms
 | 
						|
	WPCOPY          count    delay total  delay average
 | 
						|
                       0              0          0.000ms
 | 
						|
	IRQ             count    delay total  delay average
 | 
						|
                       0              0          0.000ms
 | 
						|
 | 
						|
Get IO accounting for pid 1, it works only with -p::
 | 
						|
 | 
						|
	# ./getdelays -i -p 1
 | 
						|
	printing IO accounting
 | 
						|
	linuxrc: read=65536, write=0, cancelled_write=0
 | 
						|
 | 
						|
The above command can be used with -v to get more debug information.
 |