PERF. - vmstat

vmstat - CPU/RAM

vmstat -t 5 3        shows 3 statistics in 5 seconds interval (-t: it will show timestamps as well)
vmstat -l 5          it will show large pages as well (alp:active large page, flp: free large page)
vmstat -s            displays the count of various events (paging in and paging out events)
vmstat hdisk0 2 5    displays 5 summaries for hdisk0 at 2 seconds interval

vmstat -Iwt 2        it is what IBM-ers are using:

   kthr            memory                         page                       faults                 cpu             time
----------- --------------------- ------------------------------------ ------------------ ----------------------- --------
  r   b   p        avm        fre    fi    fo    pi    po    fr     sr    in     sy    cs us sy id wa    pc    ec hr mi se
  0   0   0    1667011      35713     0     0     0     0     0      0    16    488   250  0  0 99  0  0.01   0.3 11:38:56
  0   0   0    1667012      35712     0     0     0     0     0      0    16    102   236  0  0 99  0  0.01   0.1 11:38:58
  1   0   0    1664233      38490     0     1     0     0     0      0    12    218   245  0  0 99  0  0.01   0.3 11:39:00
  0   0   0    1664207      38515     0    15     0     0     0      0   164   5150   450  1  3 96  0  0.20   4.9 11:39:02


kthr: kernel threads
    r:    threads placed in run queue (runnable threads) or are already executing (running)
    b:    threads placed in virt. memory waiting queue (b=blocked queue,waiting for resource (e.g. blocked jfs2 read/write, inode lock))
          (inode can lock files, which means io operations are done on few big files, but read can block write operations or vice versa)
    p:    physical io (raw io)


"r" is the count of running plus runnable threads, which are (or will be) dispatched to the logical CPUs, so "r" should not be higher than the number of logical CPUs otherwise we have a possible CPU bottleneck. (Do we have enough CPUs or do we have more threads?)

High numbers in the blocked processes column (b) indicates slow disks.
(r) should always be higher than (b); if it is not, it usually means you have a CPU bottleneck

an example:
lcpu=2, r=18 (18/2=9), so 8 threads are waiting. But you have to compare this number with the nature of the work is being done. (These processes are holding onto a CPU for a long time or they are using the CPU (running) for a very little time then they get load off from there.) If a queue can be emptied fast then 8 may not be a problem.

---------------------------


memory:
    avm:    The amount of active virtual memory (in 4k pages) you are using, not including file pages.
        Active virtual memory is defined as the number of virtual-memory working segment pages that have actually been touched.
from Earl Jew:
Active Virtual Memory is computational memory which is active. AVM does not include any file buffer cache at all. AVM is your computational memory percent that you see listed under topas. AVM includes the active pages out on the paging space. It is possible you have computational memory or virtual memory which was not recently active and it would not be in this caclulation."
"Over memory commitment would be a situation where AVM would be greater that the installed RAM. It is good to keep AVM at or less than 80%."

(non computational memory is your file buffer cache)

    fre:    The size of your memory free list.

We don't worry when fre is small, as AIX loves using every last drop of memory and does not return it as fast as you might like. This setting is determined by the minfree parameter of the vmo command.

---------------------------

page:
    pi:    Pages paged in from the paging space. (if there is any it is not a problem)
    po:    Pages paged out to the paging space. (if there is any this could be a pronlem!)
    fi:    file system reads
    fo:    file system writes
    fr:    Pages freed (replaced)
    sr:    pages scanned (pages should be scanned to see if it could be freed)
           fr and sr ratio can show how much pages we had to scan to free up that amount.
           (if we scanned 1000 and freed 999 those memory pages were not in use recently, it is an indicator)

Look at the largest value of avm (output of vmstat: active virtual pages). Multiply it by 4KB. Compare that number with the installed RAM.
Ideally avm should be smaller than total RAM. (avm * 4096 < bootinfo -r * 1024) If not some amount of virtual memory paging will occur.

If there is far more virtual memory than real memory, this could cause excessive paging which then results in delays.
But if avm is lower than RAM and paging activity occurs, then tuning minperm/maxperm could reduce paging.
(If the system paging too much, using vmo/vmtune may help)

If free space in memory (fre) goes down, lrud starts scanning (sr) and freeing (fr), this should be avoided

If sr is much higher than fr (5 times higher) than it should take your attention. If you had to scan a lot to free a little bit, it means that the memory is recently used, so it is harder to steal.


If fi+fo is greater than free memory (fre), then system has to scan (sr) and free (fr) pages to push through that amount of I/O and this increases the 'free frame waits' value. lrud is scanning and freeing the needed memory pages.

---------------------------

faults:
    in:    interrupt rate (hardware interrups against the network or san... it is good if it is not high, like here)
    sy:    system calls (this amount shows how much work is done by the system, if it is a 6 digit number it is doing a lot of work)
    cs:    context switch (process or thread switch) (the rate is given in switches per second)
(A context switch occurs when the currently runnig thread is different from the previously running thread, so it is taken off of the CPU.)
It is not uncommon to see the context switch rate be approximately the same as device interrupt rate (in column)

rule of reference: for normal operation these digit numbers can be considered normal
4 digits: in (hardware interrupts)
6 digits: sy (system calls) (sy is a good indicator if system is working or not, below 6 digits it is not doing much work)
5 digits: cs (context switches)

If cs is high, it may indicate too much process switching is occurring, thus using memory inefficiently.

If a program is written inefficiently, it may generate an unusually large number of system calls. (sy)

If cs is higher then sy, system is doing more context switching than actual work.


High r with high cs -> possible lock contention
Lock contention occurs whenever one process or thread attempts to acquire a lock held by another process or thread. The more granular the available locks, the less likely one process/thread will request a lock held by the other. (For example, locking a row rather than the entire table, or locking a cell rather than the entire row.)

When you are seeing blocked processes or high values on waiting on I/O (wa), it usually signifies either real I/O issues where you are waiting for file accesses or an I/O condition associated with paging due to a lack of memory on your system.

---------------------------

cpu:     
    us:   % of CPU time spent in user mode (not using kernel code, not able to acces to kernel resources)
    sy:   % of CPU time spent in system mode (it can acces kernel resources (all the nfs daemons and lrud are kernel processes)
    id:   % of CPU time when CPUs is idle
    wa:   % of CPU time when there was at least one I/O in progress (waiting for finishing that I/O)

    pc:   physical capacity (how much physical cpu is used)
    ec:   entitled capacity (in percentage) (it correlates with the system calls (sy))

When a wait process is running it can show up either in id (idle) or wa (wait):
-wait%: if there is at least 1 outstanding thread which is waiting for something (such as I/O to complete, or read it from disk)
-idle%: if there is nothing to wait for it will show up as idle%

(If the CPU is waiting data from real memory, the CPU is still considered as being in busy state. )

To measure true idle time measure id+wa together:
- if id=0%, it does not mean all CPU is consummed, becuase "wait" (wa) can be 100% and waiting for an I/O to complete

- if wait=0%, it does not mean I have no I/O waiting issues, because as long I have threads which keep the CPU busy I could have additional threads waiting for I/O, but this will be masked by the running threads

If process A is running and process B is waiting on I/O, the wai% still would have a 0 number.
A 0 number doesn't mean I/O is not occurring, it means that the system is not waiting on I/O.
If process A and process B are both waiting on I/O, and there is nothing that can use the CPU, then you would see that column increase.

- if wait% is high, it does not mean I have io performance problem, it can be an indication that I am doing some IO but the cpu does not kept busy at all

- if id% is high then likely there is no CPU or I/O problem


To measure cpu utilization measure us+sy together (and compare it to physc):
- if us+sy is always greater than 80%, then CPU is approaching its limits (but check physc as well and in "sar -P ALL" for each lcpu)

- if us+sy = 100% -> possible CPU bottleneck, but in an uncapped shared lpar check physc as well.

- if sy is high, your appl. is issuing many system calls to the kernel and asking the kernel to work. It measures how heavily the appl. is using kernel services.

- if sy  is higher then us, this means your system is spending less time on real work (not good)


Don't forget to compare these values with ouputs where each logical CPU can be seen (like "sar -p ALL 1 5")

Some examples when physical consumption of a CPU should be also looked when smt is on.:
- usr+sys=16%, but physc=0.56, it means i see 16% is utliized of a CPU, but actually half of the physical CPU (0.56) is used.

- if us+sys=100 and physc=0.45 we have to look both. If someone says 100% percent is used, then 100% of what? The 100% of the half of the CPU (physc=0.45) is used.

- %usr+%sys=83% for lcpu 0 (output from command sar). It looks a high number at the first sight,  but if you check physc, you can see only 0.01 physical core has been used, and the entitled capacityis 0.20, so this 83% is actually very little CPU consumption.


------------------------------------

------------------------------------

------------------------------------


# vmstat -v   

       4980736 memory pages
        739175 lruable pages
    --------------------
        432957 free pages                <--6 digit generous, 5 digit ideal, 4 digits trouble, 3 digits big trouble
    --------------------
             1 memory pools
         84650 pinned pages
          80.0 maxpin percentage
          20.0 minperm percentage
          80.0 maxperm percentage 
           2.2 numperm percentage                                      <--% of memory containing non-comp. pages
         16529 file pages                                              <--# of non-comp. pages
           0.0 compressed percentage
             0 compressed pages
           2.2 numclient percentage                                    <--% of memory containing non-comp. client pages
          80.0 maxclient percentage
         16503 client pages                                            <--# of non-comp. client pages
             0 remote pageouts scheduled
    -----------------------
        940098 pending disk I/Os blocked with no pbuf                  <--vg (lv): every disk allocated to a vg has a certain amount of pbuf
       1141440 paging space I/Os blocked with no psbuf                 <--paging space buffer
          2228 filesystem I/Os blocked with no fsbuf                   <--jfs filesystem buffer
             0 client filesystem I/Os blocked with no fsbuf            <--nfs/veritas filesystem buffer
        382716 external pager filesystem I/Os blocked with no fsbuf    <--jfs2 filesystem buffer
    -------------------------
             0 Virtualized Partition Memory Page Faults
          0.00 Time resolving virtualized partition memory page faults


pbuf, psbuf, fsbuf:
I/O buffers are pinned memory buffers used to hold the I/O requests in different layers like FS, VMM and LVM. Run vmstat once (when system is busy) then to run again after 10 minutes and compare each blocked I/O counter. Running vmstat once won't let you know if there is a problem with the buffers (there could be a problem last month or uptime is large.) If blocked IO counters are increasing, there is a shortage in buffers (the IO will pending without enough buffers). When increasing a buffer, double the current value (use power of 2 numbers.)

pbuf:
These are physical device buffers. pbus are allocated in the memory per lun in the volume group. (if you have more luns there will be more pbufs) Every lun in the vg are pulled together, all ios to these LUNs go through these pbufs (these are pinned memory structures). When disk I/Os blocked with no pbufs, more pbufs need to be allocated. Check with lvmo too blocked io count (lvmo -v oradbvg -a)  and increase: # lvmo -v oradbvg -o pv_pbuf_count=2048 and  # lvmo -p -o pv_min_pbuf=2048

psbuf:
These are buffers used by VMM for the IO of paging space. Blocked IOs are indication of out of memory condition or not optimal paging space setup. Increasiig the number of paging space devices might help. (recommedation is 2 paging spaces with the same size on different disks) Check vmstat -s: paging space page outs too.

fsbuf:
When AIX mounts a fs it allocates a static number of fsbufs per filesytem. If IOs blocked with no fsbufs then the buffer for the specified fs has been exhausted (no I/O can go through until the fs buffer unblocks). Increasing folowing parameters can help:
fsbuf (without anything): JFS  --> increase ioo -p -o numfsbufs=(Value) (re-mount fs)
fsbuf (client fs): NFS, GPFS   --> increase nfsv3 or nfsv4: # nfso -o nfs_v3_vm_bufs= 20000 and # nfso -o nfs_v3_pdts = 2 (in this order)
fsbuf (ext. pager fs): JFS2    --> increase ioo -p -o j2_dynamicBufferPreallocation=(Value) (no reboot required)

------------------------------------
------------------------------------
------------------------------------

# vmstat -s
          15503846449 total address trans. faults
           3320663543 page ins                        <--filesystem reads from disk (vmstat:page:fi)
           3257961345 page outs                       <--filesystem writes to disk (vmsta:page:fo)
        -----------------------------------
              1775154 paging space page ins           <--vmstat:page:pi
              2477803 paging space page outs          <--vmstat:page:po (5 digits/90 days uptime is acceptable)
        -----------------------------------
                    0 total reclaims
           9424678118 zero filled pages faults
            158255178 executable filled pages faults
        -----------------------------------
          36410003498 pages examined by clock         <--vmstat:page:sr
               169803 revolutions of the clock hand
           2438851639 pages freed by the clock        <--vmstat:page:fr
        ------------------------------------
            179510410 backtracks
                  699 free frame waits                <--5 digits/90 days uptime is acceptable
                    0 extend XPT waits
        ------------------------------------
            192163699 pending I/O waits
           6572422694 start I/Os
            447693244 iodones
        ------------------------------------
          43768541570 cpu context switches            <--vmstat:faults:cs
          12683830408 device interrupts
            528405827 software interrupts
           4196361885 decrementer interrupts
             40062263 mpc-sent interrupts
             40062181 mpc-received interrupts
            772686338 phantom interrupts
                    0 traps
         102934901653 syscalls



total address trans. faults:
Every page ins/outs will cause 1 total addr. trans. faults.
- If the sum of page ins+outs is higher that total addr. trans. faults, it means data is paged in and out that has the total addres trans. faults already calculated, so I am reading in and out the same data
- If the sum of page ins+outs smaller than total addr. trans. faults it means we are not reading/writing the same data, but there are additional io probably from process executions...

The value of total addr. trans. faults can be compared to the sum of the below 4 lines (page ins/outs, paging space page ins/outs). If the 1st line is larger than the sum of the below 4 than the TLB (Translation Lookaside buffer) has to be recalculated for the contents that already have.

paging space page outs:
earls rule: independently from the system uptime is paging space page outs 5 digits then it should grab your attention and every plus digit should take 10 times more concern from you (6 digit 10 times concern, 7 digit 100 times more concern, 8 digit ....)

pages examined -revolutions of the clock hand - pages freed:
clock hand: it examines the pages in memory (in background at a very low priority).
lrud is a kernel process that does the scanning and freeing (sr and fr in vmstat -I). The clock hand is the pointer that lrud is using for scanning and freeing memory. It examines the pages and/or frees the pages.

The clock hand examines pages and if there are pages which have not been used, it frees them.
revolutions of the clock hand means lrud that many times scanned through the memory since uptime.

'pages examined' shows how many pages have been scanned by lrud, 'pages freed' shows how many pages were freed. ratio of pages examined and pages freed is useful to know (how much work a system has to do to free some pages.)

free frame waits:
it is whenever the amount of free memory hits zero (since boot how many times ther were no free memory) And the system has to scan and free memory in order to

start I/Os - iodones:
how many ios started and how many are done (if it is blocked/timed out it is not done it had to be restarted). If iodones are higher than start I/O then probably NFS is running there. (page ins+page outs is the start I/Os)

31 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Undoubtedly the best AIX performance related article ever understandably written

    It is clear crisp and examples for analyzing the performance is TOPNOTCH.

    Good Work mate

    ReplyDelete
    Replies
    1. Thank You very much!
      I really appreciate your nice words :)

      Delete
  3. This very helpfull

    ReplyDelete
  4. Hi Admin,

    i got a question regarding the Memory in AIX.

    Q) How much memory we need to allocate to a particular AIX LPAR?
    i believe that Database (DB2/Oracle/Sybase) servers consume lot of Memory when compare to middle-ware servers(WAS/Weblogic/jboss).

    please give me some idea on how to allocate Memory in AIX LPARs from your experience.

    i heard that, if we allocate 100 GB Memory or 50 GB Memory..AIX will use everything...shows that 90% usage.

    please clarify..this really helps a lot

    Thanks,
    Mahijith

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. just want add something to my comment/qn

    i see 94% of Memory usage on DB2 server. but i never saw paging activity even the memory reaches around 97%.

    why paging space is not utilized.....in this case

    Memory --> 12 GB
    paging space --> 3G

    i understand that i need follow some memory/paging ratio...(atleast pgspace =3/4 of RAM) but i never saw paging space exceeds...2% on my new aix 7.1 servers..

    please advise.

    ReplyDelete
    Replies
    1. Hi,
      if you check in nmon the memory usage, you will see FS Cache (FilesystemCache) as well. If there are some free memory on the system, it will go to FS Cacche, that is why you see 97% memory usage. But if a process needs more RAM FS cache will unload memory, so that process can have that.

      Regards,
      Balazs

      Delete
  7. brilliant ! thanks Sean

    ReplyDelete
  8. Brilliant work...It is easy and crisp..

    Do you have any suggestion for monitoring Shared Processor Pools in capped and uncapped mode. Do you think we should rely on tools like LPAR2RRD or is it better to explore HMC tools like lslparutil. What do you think?

    ReplyDelete
    Replies
    1. Hi, those tools also could work, probably you can take a look to the man page of lparstat command, this is written for 'app' column:
      app: Indicates the available physical processors in the shared pool.

      Delete
    2. Thanks ...I could figure out few options for finding out the Processor Pool usage and individual LPAR utilization using lslparutil. For example for calculating pool usage: (Outputs are in a table format, however there are easy formulas to figure out the usage :-))

      lslparutil -m p780 -r procpool --startyear 2013 \
      --startmonth 8 --startday 14 --starthour 15 --endyear 2013 \
      --endmonth 8 --endday 14 --endhour 15 --filter \
      "event_types=sample,pool_names=DefaultPool" \
      -F time,total_pool_cycles,utilized_pool_cycles


      14-Aug-2013 15:20:20,36300443082267208,29584831001708
      14-Aug-2013 15:19:20,36299949801705398,29584415732108
      14-Aug-2013 15:18:19,36299456422770681,29584017742836
      14-Aug-2013 15:17:19,36298963014842834,29583618377948
      14-Aug-2013 15:16:19,36298469722301123,29583194221204

      Pool utilisation =
      (utilized_pool_cycles / total_pool_cycles) * 100

      i.e Shared Processor Pool Default 0 usage would be=((29584831001708-29583194221204)/(36300443082267208-36298469722301123))*100

      Delete
    3. That is very nice...I didn't know that before...thx a lot for this valuable feedback :-)

      Delete
    4. Hi
      Many thanks for this article. Higly informative.
      Please advise the servers where high I/O is done.
      How to I determine the value of pgbuf and fsbuf

      Delete
  9. Simply superb technote, I never seen anything better than this on this topic. Keep up the great work

    ReplyDelete
  10. [root@sp ~]# vmstat -w 1
    procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
    r b swpd free buff cache si so bi bo in cs us sy id wa st
    0 1 0 5906004 643532 207544 0 0 203 205 46 60 1 0 98 1 0
    0 1 0 5738844 806092 207480 0 0 80640 0 806 1493 0 2 76 22 0
    2 1 0 5571568 968908 207480 0 0 80896 8 781 1807 0 2 77 21 0
    0 1 0 5444344 1092308 207604 0 0 61952 64 629 1659 0 2 76 22 0
    0 1 0 5278060 1253844 207480 0 0 79616 0 775 1463 0 2 77 22 0
    0 1 0 5110412 1416660 207736 0 0 80896 0 817 1495 0 2 77 21 0
    0 1 0 4943136 1579092 207752 0 0 80512 0 776 1471 0 2 76 22 0
    0 1 0 4869604 1649876 207480 0 0 40064 32768 488 1055 0 1 76 23 0
    0 1 0 4790800 1723988 207480 0 0 33152 40960 440 941 0 1 76 23 0


    What information i can get it from b under the procs column ? do i have any I/O issue ?


    top - 00:21:38 up 16:50, 2 users, load average: 1.23, 0.66, 1.17
    Tasks: 300 total, 1 running, 299 sleeping, 0 stopped, 0 zombie
    Cpu(s): 0.0%us, 1.0%sy, 0.0%ni, 70.2%id, 28.8%wa, 0.0%hi, 0.0%si, 0.0%st
    Mem: 8097348k total, 6834124k used, 1263224k free, 5146852k buffers
    Swap: 8191992k total, 0k used, 8191992k free, 207792k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    15123 root 20 0 112m 10m 568 D 3.1 0.1 0:02.94 dd if=/dev/rnd/N1 of=/dev/rnd/N4 bs=10M
    34 root 20 0 0 0 0 S 2.0 0.0 0:33.56 [kblockd/0]
    14839 qemu 20 0 2602m 597m 5012 S 0.5 7.6 42:33.80 /usr/libexec/qemu-kvm -name nagios -S -M rhel6.4.0 -enable-kvm -m 2048 -s
    15125 root 20 0 0 0 0 S 0.5 0.0 0:00.64 [flush-253:12]
    1 root 20 0 19352 1444 1132 S 0.0 0.0 0:00.83 /sbin/init

    Here why D-stat processing using cpu ?

    ReplyDelete
  11. Very informative... highly appreciated.

    ReplyDelete
  12. Excellent write-up. The best I have seen for an avm explanation. Thank you!

    ReplyDelete
  13. Good stuff, This post helped a lot. Thanks!

    ReplyDelete
  14. thanks for taking the time to post this very useful article - much appreciated.

    ReplyDelete
  15. Very useful article ..This information is highly valuable.Thank you

    ReplyDelete
  16. Technical details are clearly explained. Thanks a lot.

    ReplyDelete
  17. Thank you very much. I would like like to identify the utilization during the specific date, Is it possible to view results using vmstat. Can you please advice/suggest on this.

    ReplyDelete
    Replies
    1. Hope you got to pull out the nmon report....

      Delete
  18. Check out my article in IBM Systems Magazine on all the "stats" utilities: vmstat, iostat, netstat and others! The article can be found at this link to the magazine's website:

    http://ibmsystemsmag.com/aix/administrator/performance/art-science-stats-utilities/

    ReplyDelete
    Replies
    1. Thanks Mark for the link, very useful article series.

      Delete
  19. The best resource on AIX I have come across. Great many thanks!

    ReplyDelete