CPU
LOAD AVERAGE:
uptime
18:01pm up 217 days, 22:40, 0 users, load average: 8.78, 8.75, 8.82
shows the load average in the last minute, five minutes, and fifteen minutes
In general, each process that is running, waiting for CPU, or waiting on I/O would add one to the load average. These figures are calculated and then averaged over time.
load average is the "r" column under vmstat. This is the number of kernel threads (the runnable threads) It has to be compared to tha actual number of CPUs (logical CPU) if the CPUs can service those threads.
----------------------------------
ps aux <--shows CPU and memory usage of processes
root@aix31: / # ps aux
USER PID %CPU %MEM SZ RSS TTY STAT STIME TIME COMMAND
root 61470 8.2 0.0 384 384 - A Aug 27 4711:53 wait <-this wait process is assigned to the CPU if the system is idle
oracle 594004 1.0 0.0 29544 30180 - A Aug 27 554:03 /u02/app/oracle
%CPU: The percentage of time the process has used the CPU. It is the average CPU utilization of the process since it was first created
(If a process consumes 100% for 5 seconds, then sleeps for 5 seconds, ps report at the and of 10 seconds 50%.)
(This can be misleading as right now the process is not actually using CPU time.)
It can be misleading as it shows the accumulated time for a process since it has been started.
(For a process which was started long ago can have a high number)
%MEM: The percentage of real memory used by this process
wait: In AIX the CPU must always be doing work. If the system is idle, the wait process will be executed.
??kproc: is an idle kernel process created by the UNIX OS to keep the processors doing something void while there's no CPU requirements.
----------------------------------
ps aux | head -1; ps aux | sort -rn +2 | head -20 top CPU processes
ps -elf shows priority of processes (PRI) (60 is general, keep an eye on lower than 60 (it means higher priority))
ps -efk|grep wait k: will show kernel processes
ps -fp <pid> check the time field, if it is constant over time, a probable deadlock or hang could have occurred.
----------------------------------
Check the accumulated system time for key processes:
root@aix10: /root # ps -ekf | grep -v grep | egrep "syncd|lrud|nfsd|biod|wait"
root 8196 0 0 Mar 24 - 1082:56 wait
root 16392 0 0 Mar 24 - 369:35 lrud
root 49176 0 0 Mar 24 - 4973:24 wait
root 53274 0 0 Mar 24 - 498:10 wait
root 57372 0 0 Mar 24 - 3135:26 wait
root 61470 0 0 Mar 24 - 358:20 wait
root 123092 0 0 Mar 24 - 0:52 kbiod
root 147640 1 0 Mar 24 - 267:27 /usr/sbin/syncd 60
root 2560068 229562 0 May 06 - 0:00 /usr/sbin/biod 6
root 3133690 229562 0 May 06 - 0:00 /usr/sbin/nfsd 3891
root 3420312 1 0 May 06 - 5:18 nfsd
wait: every cpu has an assigned wait process, and it is in wait state when it has nothing to do. (its a kernel process mapped 1 to 1 to a CPU)
You can check these processes how much the accumulated time (for example nfsd, if it has high time value it neeeds some reconfig...)
match system times of lrud againd syncd: If lrud is greater than syncd it should get your attention. lrud is a fixed priority kernel process with process priority 16. Once lrud is running not very much else is running
----------------------------------
time <any command or script>
real 0m0.96s
user 0m0.12s
sys 0m0.05s
It shows CPU time spent in user mode, and in system mode and the real time (elapsed time) to execute it.
----------------------------------
sar:
sar -P ALL 1 15 shows CPU usage for all logical CPU (-P : CPU by CPU basis statistics)
System configuration: lcpu=4 ent=0.20 mode=Uncapped
13:49:15 cpu %usr %sys %wio %idle physc %entc
13:49:16 0 26 57 0 17 0.01 3.8
1 0 15 0 85 0.00 0.2
2 0 9 0 91 0.00 0.0
3 0 47 0 53 0.00 0.1
Check %usr+%sys together and compare it to physc.
Some examples when physical consumption of a CPU should be also looked when smt is on.:
- usr+sys=16%, but physc=0.56, it means i see 16% is utliized of a CPU, but actually half of the physical CPU (0.56) is used.
- if us+sys=100 and physc=0.45 we have to look both. If someone says 100% percent is used, then 100% of what? The 100% of the half of the CPU (physc=0.45) is used.
- %usr+%sys=83% for lcpu 0 (output from command sar). It looks a high number at the first sight, but if you check physc, you can see only 0.01 physical core has been used, and the entitled capacityis 0.20, so this 83% is actually very little CPU consumption.
-----
sar -u 2 10 <--shows system activity info (-u:CPU usage date, 2: interval in seconds, 10:number of intervals)(same as topas)
22:06:25 %usr %sys %wio %idle physc %entc
22:06:27 89 9 1 1 0.49 163.4
The sar command can extract and show previously saved CPU utilization metrics that were previously saved in a file (/var/adm/sa/sadd, where dd refers to current day). The system utilization information is saved by two shell scripts (/usr/lib/sa/sa1 and /usr/lib/sa/sa2) running in the background. These shell scripts are started by the cron daemon using crontab file /var/spool/cron/crontabs/adm.
Collecting data in this manner is a useful way to characterize system usage over a period of time and determine peak usage hours.
To view the files:
sar -f /usr/adm/sa/sa03
sar -P ALL -f /usr/adm/sa/sa03
----------------------------------
mpstat:
performance statistics from logical processor viewpoint
root@bb_lpar: / # mpstat 5
cpu min maj mpc int cs ics rq mig lpa sysc us sy wa id pc %ec lcs
0 0 0 0 185 88 0 1 0 100 66 13 62 0 24 0.00 1.2 133
1 0 0 0 9 0 0 0 0 - 0 0 2 0 98 0.00 0.4 9
U - - - - - - - - - - - - 0 99 0.20 98.3 -
ALL 0 0 0 194 88 0 1 0 0 66 0 1 0 99 0.00 1.7 142
--------------------------------------------------------------------------------
0 0 0 0 188 90 0 1 0 100 69 14 62 0 24 0.00 1.3 135
1 0 0 0 9 0 0 0 0 100 0 0 4 0 96 0.00 0.5 10
U - - - - - - - - - - - - 0 99 0.20 98.2 -
ALL 0 0 0 197 90 0 1 0 100 69 0 1 0 99 0.00 1.8 145
mig: number of thread migrations to another logical processor
lpa: it shows which logical cpu (smt thread) is active ("100" means that smt hread is in use, "-" means it is not used)
lcs: logical processor context switches
It shows there are standard logical processor context switches (lcs), however, no thread was forced to migrate to another logical processor.
----------------------------------
tprof:
it reports processor usage for individual programs and the system as a whole (it shows which sections of the program are most heavily using the processor)
1. tprof -x sleep 60 <--run tprof, execute sleep and run it for 60 seconds (we are not profiling "sleep" we are using the value 60 to run tprof)
2. it creates a file, in the dir wher we are: cat sleep.prof
Process Freq Total Kernel User Shared Other
======= ==== ===== ====== ==== ====== =====
wait 4 99.83 99.83 0.00 0.00 0.00
nfsd 1 0.07 0.07 0.00 0.00 0.00
/usr/sbin/getty 1 0.03 0.03 0.00 0.00 0.00
rpc.lockd 1 0.03 0.03 0.00 0.00 0.00
/usr/bin/tprof 1 0.03 0.03 0.00 0.00 0.00
======= ==== ===== ====== ==== ====== =====
Total 8 100.00 100.00 0.00 0.00 0.00
Process PID TID Total Kernel User Shared Other
======= === === ===== ====== ==== ====== =====
wait 131076 131077 57.58 57.58 0.00 0.00 0.00
wait 917532 1376299 20.09 20.09 0.00 0.00 0.00
wait 1048608 1507375 20.09 20.09 0.00 0.00 0.00
wait 983070 1441837 2.08 2.08 0.00 0.00 0.00
nfsd 14221360 30146813 0.07 0.07 0.00 0.00 0.00
/usr/sbin/getty 7798814 14680259 0.03 0.03 0.00 0.00 0.00
/usr/bin/tprof 11010196 34603221 0.03 0.03 0.00 0.00 0.00
rpc.lockd 5177366 9044005 0.03 0.03 0.00 0.00 0.00
======= === === ===== ====== ==== ====== =====
Total 100.00 100.00 0.00 0.00 0.00
First section shows the processes, below the threads (with thread id (TID)). It separets Kernel and User usage for each process during the timeframe it was running.
Very helpful information. Thanks...
ReplyDeletevery nice !!! keep the good work
ReplyDelete:-)
DeleteHi ..Can yo please help me how can i find the overall cpu % in the system ..Normally how i will check is just run topas then see idle value if it is showing as 20% then i am assuming that CPU utilzation is 80% ..Please give me some more inputs on this. thanks in advance .
ReplyDeleteHi, topas shows user % and system (kern) % as well. It is a good tool for this. User % shows how much CPU is used by users and Kern by the system. The other good tool is nmon, which will show you this on a graph.
DeleteHi,
ReplyDeletewhat is difference between Core / Processor/ Physical CPU/ Processing unit in AIX language ?
for example POWER 6 (550) has 4 CPU with Dual Core
means 8 cores (on HMC it is showing as 8 processing units)
Can i say that, i have 8 physical CPUs on my POWER6 box ? i am kind of confused between " CPU/processor/Physical CPU/processing unit/Entitled CPU/Virtual CPU/Logical CPUs"
please give me an idea...
Hi, yes you can say that. Physical Processor, CPU or Core are the same (These are just different words, but meaning the same thing in Power systems. At HP or SUN (Oracle) probably these are not the same.) Here is some description about them: http://aix4admins.blogspot.hu/2011/08/commands-and-processes-process-you-use.html.
DeleteExcellent Work!!!!! Every page in this blog is useful!!!!
ReplyDeleteThanks for the great work.
Very good information. Can you tell me how to control the load average in the server. what threshold value we can put for the server, in my servers I have kept for 10. I am getting daily alerts saying load average crossed 10.
ReplyDeleteHow to resolve that wait processes, because in my server these are running from Dec 16 2012.
Hi,
Deletei think the wait process becomes active once server boots up and never stops until server is rebooted. Please see the example below.
root@LABSERVER:/:> ps -ekf | grep -v grep | egrep "syncd|lrud|nfsd|biod|wait"
root 131076 0 0 Oct 24 - 68:52 wait
root 262152 0 0 Oct 24 - 0:00 lrud
root 983070 0 0 Oct 24 - 109:15 wait
root@LABSERVER:/:> who -b
. system boot Oct 24 06:12
This is really help full information ... thanks alot
ReplyDeletehow i can get CPU and memory utilization in percentage ..just want to know the overall CPU utilization in percentage wise.. Pls help me out..
ReplyDeleteHi,
ReplyDeleteHow we can find the Free CPU Pool current usage status from the AIX 6.1 or 7.1 ?
Thanks
Check out my article in IBM Systems magazine about CURT: the "CPU Usage Reporting Tool" for advanced CPU diagnostics. The article is available at this link:
ReplyDeletehttp://ibmsystemsmag.com/aix/administrator/performance/using_curt/
This comment has been removed by the author.
ReplyDelete