AIX for System Administrators: PERF.

fcstat

The fcstat command reports statistics directly from the FC adapter firmware and the FC driver. Protocols such as TCP/IP are designed to tolerate packet loss and out-of-order packets with minimal disruption, but the FC protocol is in-tolerant of missing, damaged or out-of-order frames and is incapable of re-transmitting a single missing frame.

This moves error recovery into the SCSI layer and can result in waiting for commands to timeout. In some cases an error frame is not detected by either the target or the initiator, so it just waits for completion until 30 or 60 seconds to timeout. These are often the result of a physical layer problems such as a damaged fibre channel cable, faulty or degraded laser in SFP’s (in a storage controller, switch or host) or perhaps a failing a ASIC in a switch or a slow draining device causing frames to be discarded. Regardless of the cause, identifying and resolving fibre channel transport related problems are necessary before any I/O performance tuning is attempted.

It is also important to ensure the SCSI layer does not overwhelm the Target Ports or LUNs with excessive I/O requests. Increasing num_cmd_elems may result in driving more I/O to a storage device resulting in even worse I/O service times. (errpt, and iostat can help uncover some of these problems.) However acceptable I/O service time can differ. For example, some shops demand less than 2 ms service times where others may tolerate 11 ms. The disk technology affects expected I/O service time, as does the availability of write and/or read cache.

If queuing in the disk driver is occurring, (iostat shows non-zero value in qfull) this should be resolved first like increasing queue_depth, or adding additional storage resources (if io service times are too high). After ensuring there are no fibre channel physical layer problems, average I/O response times are in good range (not exceeding 15 ms) and there is no queuing (qfull) in the disk driver, then we can tune the adapter.

-----------------------------------

In normal way fcstat resets statistics when server is rebooted or the fcs device is reconfigured. fcstat -Z fcsX can be useful for daily monitoring because it resets statistics.

fcstat fcsX shows fc adapter statistics
fcstat -D fcsX shows additional fcs related details
fcstat -e fcsX shows all stats, which includes the device-specific statistics (driver statistics, link statistics, and FC4 types)
fcststat -Z fcsx resets statistics

-----------------------------------

root@aix1:/ # fcstat fcs0
FIBRE CHANNEL STATISTICS REPORT: fcs0
Device Type: 8Gb PCI Express Dual Port FC Adapter (df1000f114108a03)
(adapter/pciex/df1000f114108a0)
Serial Number: 1C041083F7
Option ROM Version: 02781174
ZA: U2D1.11X4 <--firmware version
World Wide Node Name: 0x20000000C9A8C4A6 <--adapter WWN
World Wide Port Name: 0x10000000C9A8C4A6 <--adapter WWPN
FC-4 TYPES:
Supported: 0x00000120000000000000000000000000000000000000
Active: 0x00000100000000000000000000000000000000000000
Class of Service: 3
Port Speed (supported): 8 GBIT <--8Gb adapter
Port Speed (running): 8 GBIT <--running at 8Gb
Port FC ID: 0x6df640 <--adapter FC ID (first 2 digits after x will show switch id, here 6d)
Port Type: Fabric <--connected in Fabric
Attention Type: Link Up <--link status

Seconds Since Last Reset: 270300 <--adapter is collecting stats since this amount seconds

Transmit Statistics Receive Statistics
------------------- ------------------
Frames: 2503792149 704083655
Words: 104864195328 437384431872

LIP Count: 0
NOS Count: 0
Error Frames: 0 <--affects io when frames are damaged or discarded
Dumped Frames: 0   <--affects io when frames are damaged or discarded
Link Failure Count: 0
Loss of Sync Count: 8
Loss of Signal: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 31 <--fast increase may result in buffer to buffer credit problems, damaged FC frames, discards
Invalid CRC Count: 0   <--affects io when frames are damaged or discarded

...
Elastic buffer overrun count: 0 <--may occur with link failures

IP over FC Adapter Driver Information
No DMA Resource Count: 3207
No Adapter Elements Count: 126345

FC SCSI Adapter Driver Information
No DMA Resource Count: 3207   <--IOs queued at the adapter due to lack of DMA resources (increase max_xfer_size)
No Adapter Elements Count: 126345 <--IO was temporarily blocked/queued (increase num_cmd_elems)
No Command Resource Count: 133 <--there was no free cmd_elems (increase num_cmd_elems)

IP over FC Traffic Statistics
Input Requests: 0
Output Requests: 0
Control Requests: 0
Input Bytes: 0
Output Bytes: 0

FC SCSI Traffic Statistics
Input Requests: 6777091279
Output Requests: 2337796
Control Requests: 116362
Input Bytes: 57919837230920
Output Bytes: 39340971008

Adapter Effective max transfer value: 0x100000 <--value set in the kernel regardless of ODM (must be equal or greater than hdisk max_coalesce)

-----------------------------------

Port FC ID
We can get some information about the switch in hexa. Here 0x6df640, which are six hexa digits:
1st 2 digits after x: domain id of the SAN switch, we can call it "switch id" (here 6d)
2nd 2 digits after x: port ID (but could be some virtualized interpretation as well, here f6),
3rd 2 digits after x: loop id if in loop mode (00)

Checking "switch id", will show if ports of an FC adapter are connected to different fabrics (switches) or not. Keep in mind, that there may be more switches in a Fabric, so multiple "switch ids" are not guarantee for multiple Fabrics.

If we check a 4-port adapter, and if the first 2 hexa digits are the same, we can say that we are connected to the same switch.
fcs0: Port FC ID: 0xd1e6c0 <--Fabric 1 (switch id: d1)
fcs1: Port FC ID: 0xd1e7c0 <--Fabric 1 (switch id: d1)
fcs2: Port FC ID: 0x6de6c0 <--Fabric 2 (switch id: 6d)
fcs3: Port FC ID: 0x6de7c0 <--Fabric 2 (switch id: 6d)

Error frames, Dumped frames, Invalid CRC count:
These may be the result of a physical transport layer problem which may result in damaged fiber channel frames as they arrive at the adapter. These are usually not incrementing on frames being transmitted but rather frames received.

For each CRC errors, AIX will log an errpt entry indicating a damaged frame. CRC errors can occur anywhere in the fabric and are usually related to a bad sfp or bad FC cable. These errors will affect I/O processing for a single read or write operation but the driver will retry these. These are the most difficult to troubleshoot.

Link Failure Count, Loss of Sync Count, Loss of Signal:
It indicates the health of the physical link between the switch and the host HBA. If these error counters increase daily we generally suspect a problem with an sfp or FC cable between the switch and the FC HBA. These can affect I/O processing on the host.

Invalid Tx Word Count:
These are incremented when the HBA receives damaged words from the switch. In many cases this will not affect I/O processing but is an early indication of a problem. On certain switch models this may be due to an improper port fill word setting. If not, this may indicate a bad sfp or cable between the HBA and the switch. This error counter is only relevant for communications at the physical layer / Tx / Rx between the switch and the HBA.

Elastic buffer overrun count:
This counter could increment due to Link Failure Count, Loss of Sync Count, Loss of Signal, Invalid Tx Word Count or old unsupported host HBA adapter firmware levels.

-----------------------------------

No DMA Resource Count:
It means additional I/O DMA memory is needed to initiate (larger) I/O’s from the adapter. When the adapter driver is unable to initiate an I/O request due to no free DMA resource, the "No DMA Resource" counter is incremented and the I/O request waits. Increasing max_xfer_size can help in this situation.

No Adapter Elements Count:
number of times since boot, an IO was temporarily blocked due to an inadequate num_cmd_elems. If it shows non-zero values increaseing num_cmd_elems can help.

No Command Resource Count:
When the adapter driver is unable to initiate an I/O request due to no free cmd_elems (num_cmd_elems), the "No Command Resource" counter is incremented and the I/O request waits for adapter buffer resources (checking for free command elements for the adapter). Resources will be available when a currently running I/O request is completed. Increasing num_cmd_elems can help to avoid this situation.

If the "No Command Resource Count" and/or the "No DMA Resource Count" continues to increment, (and the max_xfer_size and num_cmd_elems are set to maximum values), then the adapter I/O workload capability has been exceeded. In this case I/O load should be reduced by moving load to additional resources, like adding additional FC adapters and balancing the I/O work load. Another workaround would be to reduce the num_cmd_elems.

-----------------------------------

fcstat -D fcsX can display additional info:
(Values preceded by a 0x are in hex. All values below are reported in hex, not decimal.)

Driver Statistics:
Number of interrupts: 76534
Number of spurious interrupts: 0
Long term DMA pool size: 0x800000
I/O DMA pool size: 0x1000000 <--currently active I/O DMA pool size in the driver

FC SCSI Adapter Driver Queue Statistics <--adapter driver
Number of active commands: 0
High water mark of active commands: 11
Number of pending commands: 0
High water mark of pending commands: 1
Number of commands in the Adapter Driver Held off queue: 0
High water mark of number of commands in the Adapter Driver Held off queue: 0

FC SCSI Protocol Driver Queue Statistics <--protocol driver
Number of active commands: 0
High water mark of active commands: 11
Number of pending commands: 4
High water mark of pending commands: 5

Number of active commands:
Represents the I/O workload”. Active commands are commands that have left the adapter driver and have been handed off to the adapter hardware for transport to the end device. These commands have not received a completion status and are considered active.

High watermark of active commands:
The "high water mark of active commands" represents the peak (highest) number of active commands. If I/O service times are low and if the high water mark of active commands is around the num_cmd_elems then increasing the num_cmd_elems may improve I/O performance. In certain error recovery scnerios the "high water mark of active commands" could increase up to the num_cmd_elems limit. When tuning, clear these counters and monitor them for few days, that there are no errors.

High watermark of pending commands:
The "high water mark of pending commands" represents the peak (highest) number of pending commands. (These are pending because the number of active commands reached the num_cmd_limits and the additional commands above that limit are pending.)

If high water mark for active + pending is near to or is exceeding the num_cmd_elems, we recommend increasing num_cmd_elems to cover this water mark to improve the IO performance. Rule to follow: num_cmd_elems > (High water mark for active commands + High water mark for pending commands)

The increase for num_cmd_elems is always recommended to be done gradually until 'No Command Resource Count' counter stops increasing.

If with large sequenial IOs (like backups), there are high avg read and write service timees and number of active/peak commands are also high (but there are no physical layer problems, no queuing in the adapter and disk) then the storage server is unable to service these I/O requests in a timely manner or the I/O load is greater than the LUN / storage controller capability (like handling within a ~15ms window). Solution could be adding additional storage resources, like distributing the I/O work load to additional LUNs and/or storage controllers

-----------------------------------

Link to some IBM desctiptions: https://www.ibm.com/support/pages/node/6198385

-----------------------------------

Adabter busy %

There are no busy% for adapters in AIX. They are derived from the disk stats. The adapter busy% is simply the sum of the disk busy%.
So if the adapter busy% is, for example, 350% then you have 3.5 disks busy on that adapter. Or it could be 7 disks at 50% busy or 14 disks at 25% or ....

There is no way to determine the adapter busy and in fact it is not clear what it would really mean. The adapter has a dedicated on-board CPU that is always busy (probably no real OS) and we don't run nmon of these adapter CPUs to find out what they are really doing.

-----------------------------------

AIX for System Administrators

dropdown menu

PERF. - fcstat

No comments: