dropdown menu

CPU

LOAD AVERAGE:

uptime
18:01pm  up 217 days,  22:40,  0 users,  load average:  8.78, 8.75, 8.82

shows the load average in the last minute, five minutes, and fifteen minutes

In general, each process that is running, waiting for CPU, or waiting on I/O would add one to the load average. These figures are calculated and then averaged over time.

load average is the "r" column under vmstat. This is the number of kernel threads (the runnable threads) It has to be compared to tha actual number of CPUs (logical CPU) if the CPUs can service those threads.

----------------------------------

ps aux            <--shows CPU and memory usage of processes
root@aix31: / # ps aux
USER         PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND
root       61470  8.2  0.0  384  384      - A      Aug 27 4711:53 wait    <-this wait process is assigned to the CPU if the system is idle
oracle    594004  1.0  0.0 29544 30180    - A      Aug 27 554:03 /u02/app/oracle

%CPU: The percentage of time the process has used the CPU. It is the average CPU utilization of the process since it was first created
      (If a process consumes 100% for 5 seconds, then sleeps for 5 seconds, ps report at the and of 10 seconds 50%.)
      (This can be misleading as right now the process is not actually using CPU time.)

      It can be misleading as it shows the accumulated time for a process since it has been started.
      (For a process which was started long ago can have a high number)

%MEM: The percentage of real memory used by this process

wait: In AIX the CPU must always be doing work. If the system is idle, the wait process will be executed.
??kproc: is an idle kernel process created by the UNIX OS to keep the processors doing something void while there's no CPU requirements.

----------------------------------

ps aux | head -1; ps aux | sort -rn +2 | head -20     top CPU processes
ps -elf                  shows priority of processes (PRI) (60 is general, keep an eye on lower than 60 (it means higher priority))
ps -efk|grep wait        k: will show kernel processes
ps -fp <pid>             check the time field, if it is constant over time, a probable deadlock or hang could have occurred.

----------------------------------

Check the accumulated system time for key processes:

root@aix10: /root # ps -ekf | grep -v grep | egrep "syncd|lrud|nfsd|biod|wait"
    root    8196       0   0   Mar 24      - 1082:56 wait
    root   16392       0   0   Mar 24      - 369:35 lrud
    root   49176       0   0   Mar 24      - 4973:24 wait
    root   53274       0   0   Mar 24      - 498:10 wait
    root   57372       0   0   Mar 24      - 3135:26 wait
    root   61470       0   0   Mar 24      - 358:20 wait
    root  123092       0   0   Mar 24      -  0:52 kbiod
    root  147640       1   0   Mar 24      - 267:27 /usr/sbin/syncd 60
    root 2560068  229562   0   May 06      -  0:00 /usr/sbin/biod 6
    root 3133690  229562   0   May 06      -  0:00 /usr/sbin/nfsd 3891
    root 3420312       1   0   May 06      -  5:18 nfsd

wait: every cpu has an assigned wait process, and it is in wait state when it has nothing to do. (its a kernel process mapped 1 to 1 to a CPU)
You can check these processes how much the accumulated time (for example nfsd, if it has high time value it neeeds some reconfig...)

match system times of lrud againd syncd: If lrud is greater than syncd it should get your attention. lrud is a fixed priority kernel process with process priority 16. Once lrud is running not very much else is running

----------------------------------

time <any command or script>
real    0m0.96s
user    0m0.12s
sys     0m0.05s


It shows CPU time spent in user mode, and in system mode and the real time (elapsed time) to execute it.

----------------------------------

sar:

sar -P ALL 1 15          shows CPU usage for all logical CPU (-P : CPU by CPU basis statistics)

System configuration: lcpu=4 ent=0.20 mode=Uncapped

13:49:15 cpu    %usr    %sys    %wio   %idle   physc   %entc
13:49:16  0       26      57       0      17    0.01     3.8
          1        0      15       0      85    0.00     0.2
          2        0       9       0      91    0.00     0.0
          3        0      47       0      53    0.00     0.1


Check %usr+%sys together and compare it to physc.

Some examples when physical consumption of a CPU should be also looked when smt is on.:
- usr+sys=16%, but physc=0.56, it means i see 16% is utliized of a CPU, but actually half of the physical CPU (0.56) is used.

- if us+sys=100 and physc=0.45 we have to look both. If someone says 100% percent is used, then 100% of what? The 100% of the half of the CPU (physc=0.45) is used.

- %usr+%sys=83% for lcpu 0 (output from command sar). It looks a high number at the first sight,  but if you check physc, you can see only 0.01 physical core has been used, and the entitled capacityis 0.20, so this 83% is actually very little CPU consumption.

-----

sar -u 2 10            <--shows system activity info (-u:CPU usage date, 2: interval in seconds, 10:number of intervals)(same as topas)
22:06:25    %usr    %sys    %wio   %idle   physc   %entc
22:06:27      89       9       1       1    0.49   163.4

The sar command can extract and show previously saved CPU utilization metrics that were previously saved in a file (/var/adm/sa/sadd, where dd refers to current day). The system utilization information is saved by two shell scripts (/usr/lib/sa/sa1 and /usr/lib/sa/sa2) running in the background. These shell scripts are started by the cron daemon using crontab file /var/spool/cron/crontabs/adm.

Collecting data in this manner is a useful way to characterize system usage over a period of time and determine peak usage hours.
To view the files:
sar -f /usr/adm/sa/sa03
sar -P ALL -f /usr/adm/sa/sa03

----------------------------------

mpstat:
performance statistics from logical processor viewpoint


root@bb_lpar: / # mpstat 5

cpu  min  maj  mpc  int   cs  ics   rq  mig lpa sysc us sy wa id   pc  %ec  lcs
  0    0    0    0  185   88    0    1    0 100   66 13 62  0 24 0.00  1.2  133
  1    0    0    0    9    0    0    0    0   -    0  0  2  0 98 0.00  0.4    9
  U    -    -    -    -    -    -    -    -   -    -  -  -  0 99 0.20 98.3    -
ALL    0    0    0  194   88    0    1     0   66  0  1  0 99 0.00  1.7  142
--------------------------------------------------------------------------------
  0    0    0    0  188   90    0    1    0 100   69 14 62  0 24 0.00  1.3  135
  1    0    0    0    9    0    0    0    0 100    0  0  4  0 96 0.00  0.5   10
  U    -    -    -    -    -    -    -    -   -    -  -  -  0 99 0.20 98.2    -
ALL    0    0    0  197   90    0    1    0 100   69  0  1  0 99 0.00  1.8  145


mig: number of thread migrations to another logical processor
lpa: it shows which logical cpu (smt thread) is active ("100" means that smt hread is in use, "-" means it is not used)
lcs: logical processor context switches

It shows there are standard logical processor context switches (lcs), however, no thread was forced to migrate to another logical processor.

----------------------------------

tprof:

it reports processor usage for individual programs and the system as a whole (it shows which sections of the program are most heavily using the processor)

1. tprof -x sleep 60  <--run tprof, execute sleep and run it for 60 seconds (we are not profiling "sleep" we are using the value 60 to run tprof)

2. it creates a file, in the dir wher we are: cat sleep.prof


Process                                Freq  Total Kernel   User Shared  Other
=======                                ====  ===== ======   ==== ======  =====
wait                                      4  99.83  99.83   0.00   0.00   0.00
nfsd                                      1   0.07   0.07   0.00   0.00   0.00
/usr/sbin/getty                           1   0.03   0.03   0.00   0.00   0.00
rpc.lockd                                 1   0.03   0.03   0.00   0.00   0.00
/usr/bin/tprof                            1   0.03   0.03   0.00   0.00   0.00
=======                                ====  ===== ======   ==== ======  =====
Total                                     8 100.00 100.00   0.00   0.00   0.00

Process                   PID      TID  Total Kernel   User Shared  Other
=======                   ===      ===  ===== ======   ==== ======  =====
wait                   131076   131077  57.58  57.58   0.00   0.00   0.00
wait                   917532  1376299  20.09  20.09   0.00   0.00   0.00
wait                  1048608  1507375  20.09  20.09   0.00   0.00   0.00
wait                   983070  1441837   2.08   2.08   0.00   0.00   0.00
nfsd                 14221360 30146813   0.07   0.07   0.00   0.00   0.00
/usr/sbin/getty       7798814 14680259   0.03   0.03   0.00   0.00   0.00
/usr/bin/tprof       11010196 34603221   0.03   0.03   0.00   0.00   0.00
rpc.lockd             5177366  9044005   0.03   0.03   0.00   0.00   0.00
=======                   ===      ===  ===== ======   ==== ======  =====
Total                                  100.00 100.00   0.00   0.00   0.00

First section shows the processes, below the threads (with thread id (TID)). It separets Kernel and User usage for each process during the timeframe it was running.

14 comments:

  1. Very helpful information. Thanks...

    ReplyDelete
  2. Hi ..Can yo please help me how can i find the overall cpu % in the system ..Normally how i will check is just run topas then see idle value if it is showing as 20% then i am assuming that CPU utilzation is 80% ..Please give me some more inputs on this. thanks in advance .

    ReplyDelete
    Replies
    1. Hi, topas shows user % and system (kern) % as well. It is a good tool for this. User % shows how much CPU is used by users and Kern by the system. The other good tool is nmon, which will show you this on a graph.

      Delete
  3. Hi,
    what is difference between Core / Processor/ Physical CPU/ Processing unit in AIX language ?

    for example POWER 6 (550) has 4 CPU with Dual Core
    means 8 cores (on HMC it is showing as 8 processing units)

    Can i say that, i have 8 physical CPUs on my POWER6 box ? i am kind of confused between " CPU/processor/Physical CPU/processing unit/Entitled CPU/Virtual CPU/Logical CPUs"

    please give me an idea...

    ReplyDelete
    Replies
    1. Hi, yes you can say that. Physical Processor, CPU or Core are the same (These are just different words, but meaning the same thing in Power systems. At HP or SUN (Oracle) probably these are not the same.) Here is some description about them: http://aix4admins.blogspot.hu/2011/08/commands-and-processes-process-you-use.html.

      Delete
  4. Excellent Work!!!!! Every page in this blog is useful!!!!
    Thanks for the great work.

    ReplyDelete
  5. Very good information. Can you tell me how to control the load average in the server. what threshold value we can put for the server, in my servers I have kept for 10. I am getting daily alerts saying load average crossed 10.

    How to resolve that wait processes, because in my server these are running from Dec 16 2012.

    ReplyDelete
    Replies
    1. Hi,

      i think the wait process becomes active once server boots up and never stops until server is rebooted. Please see the example below.
      root@LABSERVER:/:> ps -ekf | grep -v grep | egrep "syncd|lrud|nfsd|biod|wait"
      root 131076 0 0 Oct 24 - 68:52 wait
      root 262152 0 0 Oct 24 - 0:00 lrud
      root 983070 0 0 Oct 24 - 109:15 wait
      root@LABSERVER:/:> who -b
      . system boot Oct 24 06:12

      Delete
  6. This is really help full information ... thanks alot

    ReplyDelete
  7. how i can get CPU and memory utilization in percentage ..just want to know the overall CPU utilization in percentage wise.. Pls help me out..

    ReplyDelete
  8. Hi,

    How we can find the Free CPU Pool current usage status from the AIX 6.1 or 7.1 ?

    Thanks

    ReplyDelete