AIX for System Administrators: December 2011

TOPAS - NMON:

TOPAS

Reports selected local and remote system statistics. The topas command requires the bos.perf.tools and perfagent.tools file sets to be installed on the system.

Navigation is possible between the columns with the narrow keys (<-, ->, ...)

------------------------

Default View (small letters):

- c n d f p – cool!!!    = CPU Network Disk Filesystem Processes

- c --> CPUs           c --> graph
               c --> all
               c --> off

- n --> networks       n --> totals
               n --> each interface
               n --> off

- d --> disks    d --> totals
               d --> each disk
               d --> off

- f --> filesystems    f --> totals
               f --> each filesystems
               f --> off

- p --> processes    p --> top 20 processes
               p --> off

- if present: t = tape, w=WLM, @=WPARs
- a=reset all

------------------------

Detailed View (capital letters):

- D --> Disks in full detail    m --> Multi-path I/O (only if it is used)
                                d --> adapter view (scsi, vscsi)

                    in adapter view (d):
                    f --> disks (devices) attached to that adaper (first navigate to an adapter with arrows then hit f)
                v --> virtual adapters only (vscsi)

- E --> Ethernet (shows adapters, SEA, Eth. Chan....) (SEA interface must be in up state)

- F --> Filesystems (more details than "f")

- L --> LPAR settings (SMT, physc, %entc..) and individual CPUs (logical CPUs) usage

- P --> Processes details (CPU%, TIME, Page space usage; page space usage is shown only here!!!)

- T --> Tape if there is a ATAPE device attached

- V --> Volume group statistics
                       in volume group view:
                f --> LVs in the VG (first navigate to a vg then hit f)

- W --> WLM then @ --> WPAR

------------------------

CEC or Cross Partitions View or Whole Machine

topas -C    or    topas and hit C

On HMC the "Allow performance information collection" should be enabled for the LPAR.

topas -C might not be able to locate partitions residing on other subnets. To circumvent this, create a $HOME/Rsi.hosts file containing the fully qualified host names for each partition (including domains), one host per line.

- s and d --> Shared CPU and Dedicated CPU sections
- g --> Global
- m --> Memory pool = AMS stats from Hypervisor (select 0 hit f - note:only 1 pool)
- p --> CPU pool stats
- v --> VIO Server/Client disk use f to select the VIOS

------------------------

Navigation between topas and nmon:

------------------------

NMON (Nigel's Monitor):

c, C    CPU usage (c: small view C:large view)
l -> # it shows physical cpu usage
m   memory and paging statistics
n       network interface view
k   kernel statistics

t       processes --> [1=Basic 2=CPU 3=Perf 4=Size 5=I/O 6=Cmds
A    AIO processes

.   displays only busy disks and processes

D   disk statistics (read/write KB/s)
d    disk statistics with graph (same as D just with graph)
a       adapter I/O statistics (read/write KB/s, %busy)
^ Fibre channel adapter statistics (fcstat, ^ then a hit e.g. space)

j       jfs view
V volume group statistics (read/write KB/s)

p       shared processor logical partition view
O    Shared Ethernet adapter statistics ("O" means OCean, SEA=sea)

nmon -k < disklist >    Reports only the disks in the disk list. (e.g. nmon -k hdisk1,hdisk2 only with original nmon)

If you use the same set of keys every time the nmon command is started, you can place the keys in the NMON shell variable.
For example, you can run the following command:
export NMON=mcd    (it will display by default memory, CPU, disk statistics)

------------------------

Capturing NMON data to file:

nmon -f -s "seconds" -c "count"

capture of a busy hour with 10 seconds interval: nmon -f -s 10 -c 360
(60minsx60secs=3600 secs-> with a 10 seconds interval it is 360 snapshots)

capture of a day with 5 mins interval: nmon -f -s 300 -c 288
(86400 seconds in a day divided by 300 (5 mins)= 288 snapshots)

For a detailed graph you can increase the count number to 600-700, but it has no value to go above that.
(Every point in a graph takes 3 pixels, so 600x3=1800 pixels are needed on your screen to see that graph)

-m <dir>    give output directory to nmon file
-T          captures top processes with command arguments (-t captures only top processes without command arguments)
-N          add NFS stats
-^          add FC stats
-O          add VIOS SEA stats

------------------------

Some performance information (related to nmon):
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/nmon_FAQ

nmon reports, more than 100% CPU utilisation for a process:
Unlike AIX commands, nmon reports the CPU use of a process per CPU. If your process is, for example, taking 250% then it is using 2.5 CPUs and must be multiple threaded. This is far better than the AIX tools because the percentages on larger machines make it very hard to determine if a process is using a whole CPU. On a 64 CPU machine a single process uselessly spinning on the CPU takes up 1.56% of the total CPU - this makes it very unclear what is going on.

adapter busy goes over 100%:
there are no adapter stats in AIX. They are derived from the disk stats. The adapter busy% is simply the sum of the disk busy%.
So if the adapter busy% is, for example, 350% then you have 3.5 disks busy on that adapter. Or it could be 7 disks at 50% busy or 14 disks at 25% or ....
There is no way to determine the adapter busy, the adapter has a dedicated on-board CPU that is always busy and we don't run nmon of these adapter CPUs to find out what they are really doing!!

CPU wait is too high:
CPU "waiting for I/O" means the CPU is Idle but has a disk I/O outstanding. In history this was used to highlight that your application is being held up by slow disks or disks problems. In the Wait for I/O state the CPU is actually free to do other work and the CPU is NOT looping waiting for the disk - it in fact actioned the adapter to perform the disk I/O, put the calling process to sleep and carried on. If there is no other process it is in the same loop as in the Idle state i.e. it is available to do other things.

In benchmarks, Wait for I/O is seen positively as an opportunity - we can do throw in more work to boost throughput. In fact, faster CPUs would mean even high wait values.

free memory is near zero:
This is just how AIX works and is perfectly normal. All of memory will be soaked up with copies of filesystem blocks after a reasonable length of time and the free memory will be near zero. AIX will then use the lrud (least recently used daemon) process to keep the free list at a reasonable level. If you see the lrud process taking more than 30% of a CPU then you need to investigate and make memory parameter changes.

------------------------

AIX for System Administrators

dropdown menu

PERF. - TOPAS, NMON