SDD (subsystem device driver):
It designed to support the multipath configuration in the ESS.
The software used to balance ESS I/O traffic across all adapters. It provides multiple access to data from the host.
when using sdd cfgmgr is run 3 times (cfgmgr -l fcs0, cfgmgr -l fcs1, cfgmgr (the third one builds the vpaths))
3 policies exist for load balancing:
-default: selecting the path with the least number of current I/O operations
-round robin: choosing the path, that was not used for the last operation (alternating if 2 pathes exist)
-failover: all I/O sent ove the most preferred path, until a failure is detected.
SDDSRV:
SDD has a server daemon running in the background: lssrc/stopsrc/startsrc -s sddsrv
If sddsrv is stopped, the feature that automatically recovers failed paths disabled.
vpath:
A logical disk defined in ESS and recognized by AIX. AIX uses vpath instead of hdisk as a unit of physical storage.
root@aix40: /dev # lsattr -El vpath0
active_hdisk hdisk20/00527461/fscsi1 Active hdisk False
active_hdisk hdisk4/00527461/fscsi0 Active hdisk False
policy df Scheduling Policy True <-path selection policy
pvid 0056db9a77baebb90000000000000000 Physical volume identifier False
qdepth_enable yes Queue Depth Control True
serial_number 00527461 LUN serial number False
unique_id 1D080052746107210580003IBMfcp Device Unique Identification False
policy:
fo: failover only - all I/O operations sent to the same paths until the path fails
lb: load balancing - the path is chosen by the number of I/O operations currently in process
lbs: load balancing sequential - same as before with optimization for sequential I/O
rr: round ropbin - path is chosen at random from the not used paths
rrs: round robin sequential - same as before with optimization for sequential I/O
df: default - the default policy is load balancing
datapath set device N policy change the SDD path selection policy dynamically
DPO (Data Path Optimizer):
it is a pseudo device (lsdev | grep dpo), which is the pseudo parent of the vpaths
root@aix40: / # lsattr -El dpo
SDD_maxlun 1200 Maximum LUNS allowed for SDD False
persistent_resv yes Subsystem Supports Persistent Reserve Command False
--------------------------------
software requirements for SDD:
-host attachment for SDD (ibm2105.rte, devices.fcp.disk.ibm.rte) - this is the ODM extension
The host attachments for SDD add 2105 (ESS)/2145 (SVC)/1750 (DS6000)/2107 (DS8000) device information to allow AIX to properly configure 2105/2145/1750/2107 hdisks.
The 2105/2145/1750/2107 device information allows AIX to:
- Identify the hdisk(s) as a 2105/2145/1750/2107 hdisk.
- Set default hdisk attributes such as queue_depth and timeout values.
- Indicate to the configure method to configure 2105/2145/1750/2107 hdisk as non-MPIO-capable devices
ibm2105.rte: for 2105 devices
devices.fcp.disk.ibm.rte: for DS8000, DS6000 and SAN Volume Controller)
-devices.sdd.53.rte - this is the driver (sdd)
it provides the multipath configuration environment support
--------------------------------
addpaths dynamically adds more paths to SDD vpath devices (before addpaths, run cfgmgr)
(running cfgmgr alone does not add new paths to SDD vpath devices)
cfgdpo configures dpo
cfgvpath configures vpaths
cfallvpath configures dpo+vpaths
dpovgfix <vgname> fixes a vg that has mixed vpath and hdisk physical volumes
extenfvg4vp this can be used insteadof extendvg (it will move pvid from hdisk to vpath)
datapath query version shows sdd version
datapath query essmap shows vpaths and their hdisks in a list
datapath query portmap shows vpaths and ports
---------------------------------------
datapath query adapter information about the adapters
State:
-Normal adapter is in use.
-Degraded one or more paths are not functioning.
-Failed the adapter is no longer being used by SDD.
datapath query device information about the devices 8datapath query device 0)
State:
-Open path is in use
-Close path is not being used
-Failed due to errors path has been removed from service
-Close_Failed path was detected to be broken and failed to open when the device was opened
-Invalid path is failed to open, but the MPIO device is opened
---------------------------------------
datapath remove device X path Y removes path# Y from device# X (datapath query device, will show X and Y)
datapath set device N policy change the SDD path selection policy dynamically
datapath set adapter 1 offline
lsvpcfg list vpaths and their hdisks
lsvp -a displays vpath, vg, disk informations
lquerypr reads and releases the persistent reservation key
lquerypr -h/dev/vpath30 queries the persistent resrevation on the device (0:if it is reserved by current host, 1: if another host)
lquerypr -vh/dev/vpath30 query and display the persistent reservation on a device
lquerypr -rh/dev/vpath30 release the persisten reservation if the device is reserved by the current host
(0: if the command succeeds or not reserved, 2: if the command fails)
lquerypr -ch/dev/vpath30 reset any persistent reserve and clear all reservation key registrations
lquerypr -ph/dev/vpath30 remove the persisten reservation if the device is reserved by another host
---------------------------------------
Removing SDD (after install a new one):
-umount fs on ESS
-varyoffvg
(if HACMP and RG is online on other host: vp2hd <vgname>) <--it converts vpaths to hdisks)
-rmdev -dl dpo -R <--removes all the SDD vpath devices
-stopsrc -s sddsrv <--stops SDD server
-if needed: rmdev -dl hdiskX <--removes hdisks
(lsdev -C -t 2105* -F name | xargs -n1 rmdev -dl)
-smitty remove -- devices.sdd.52.rte
-smitty install -- devices.sdd.53.rte (/mnt/Storage-Treiber/ESS/SDD-1.7)
-cfgmgr
---------------------------------------
Removing SDD Host Attachment:
-lsdev -C -t 2105* -F name | xargs -n1 rmdev -dl <--removes hdisk devices
-smitty remove -- ibm2105.rte (devices.fcp.disk.ibm)
---------------------------------------
Change adapter settings (Un/re-configure paths):
-datapath set adapter 1 offline
-datapath remove adapter 1
-rmdev -Rl fcs0
(if needed: for i in `lsdev -Cc disk | grep -i defined | awk '{ print $1 }'`; do rmdev -Rdl $i; done)
-chdev -l fscsi0 -a dyntrk=yes -a fc_err_recov=fast_fail
-chdev -l fcs0 -a init_link=pt2pt
-cfgmgr; addpaths
---------------------------------------
Reconfigure vpaths:
-datapath remove device 2 path 0
-datapath remove device 1 path 0
-datapath remove device 0 path 0
-cfgmgr; addpaths
-rmdev -Rdl vpath0
-cfgmgr;addpaths
---------------------------------------
Can't give pvid for a vpath:
root@aix40: / # chdev -l vpath6 -a pv=yes
Method error (/usr/lib/methods/chgvpath):
0514-047 Cannot access a device.
in errpt:DEVICE LOCKED BY ANOTHER USER
RELEASE DEVICE PERSISTENT RESERVATION
# lquerypr -Vh /dev/vpath6 <--it will show the host key
# lquerypr -Vph /dev/vpath6 <--it will clear the reservation lock
# lquerypr -Vh /dev/vpath6 <--checking again will show it is OK now
Practical Guide to AIX (and PowerVM, PowerHA, PowerVC, HMC, DevOps ...)
STORAGE - MPIO
MPIO (Multipath I/O):
Multipathing
--------------------------------------------------------
MPIO disk settings
https://developer.ibm.com/articles/au-aix-mpio/
# lsattr -El hdisk15
PCM PCM/friend/fcpother Path Control Module False
PR_key_value none Persistant Reserve Key Value True+
algorithm round_robin Algorithm True+
clr_q no Device CLEARS its Queue on error True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd inquiry Health Check Command True+
hcheck_interval 60 Health Check Interval True+
hcheck_mode nonactive Health Check Mode True+
...
...
timeout_policy retry_path Timeout Policy True+
unique_id 2611200173800005102BE072810XIV03IBMfcp Unique device identifier False
ww_name 0x500173800051019 FC World Wide Name False
--------------------
algorithm: (determines how I/O should be distributed across the paths)
- fail_over: I/O is distributed to one path at a time, if it fails next enabled path is selected, depending on path priority.
VSCSI disks use fail_over, if SCSI-2 reserves is used (reserve_policy=single_path) fail_over is the only possible algorithm.
By default, priority is set to 1 (highest priority) and it can range from 1 to 255 (which is the lowest priority).
set priority: # chpath -l hdisk1 -a priority=255 -p fscsi0 -w 20080022a10bb2d5,1000000000000
check priority # lspath -l hdisk1 -a priority -F value -p fscsi0 -w 20080022a10bb2d5,1000000000000
- round_robin: I/O is distributed to all enabled paths. Paths with same prio. has equal I/O, otherwise higher prio. path utilized more
To achieve optimal performance during failure, make sure the ordered path list alternate paths between separate fabrics.
- shortest_queue: Similar to round_robin, but when load increases it favors path with fewest I/O and path priority is ignored.
If one path is slow (congestion in the SAN), other less-congested paths are used for more I/O
If using SCSI-2 reserves or vSCSI disks, then fail_over must be used. For other situations, shortest_queue (if available) or round_robin enable maximum use of the SAN resources.
--------------------
hcheck_cmd: (the command that will be sent to the disk for healthcheck)
- test_unit_rdy: Test Unit Ready (TUR) command, this is the default setting
- inquiry: In clustered environments with this can have greater control of SCSI reserve and release
In general the default setting should be used, but if there are reservation locks on the disks, use the inquiry option. If test unit ready is used and the backing device is reserved, then test unit ready fails and log an error on the client.
--------------------
hcheck_mode: (determines which paths will be tested by the health checker. Disabled or Missing paths are never checked
(Disabled paths must be recovered manually with chpath, Missing paths with cfgmgr first. If disk is not open (VG is varied off), nohealth checking is done.)
- nonactive: paths with no active I/O are checked (or paths in failed state).
(At round_robin and shortest_queue all paths are busy, so only failed paths are checked. If disk is idle then any paths can be checked (without a pending I/O.))
- enabled: The PCM checks all enabled paths, even paths that have other active I/O
- failed: PCM checks paths that are marked as failed.
The default value is nonactive, and most of the cases this should not be changed.
--------------------
hcheck_interval: (interval, in seconds, at which MPIO checks path availability).
- hcheck_interval=0: disables health check mechanism, any failed paths need to be recovered manually
- hcheck_interval=(small value): the health check commands use the disk’s queue_depth and they receive a higher priority than normal I/O, so a small health check interval can quickly use up a lot of bandwidth on the SAN if there are a large number of disks.
If a device has only one non-failed path and an error is detected on that last path, AIX will send a health check command on all failed paths before retrying the I/O. If there are good paths to use, AIX will discover it and uses it before failing user I/O, regardless of the health check interval setting.
So, for most cases, the default value is appropriate and the general rule for hcheck_interval is, that it should be greater than or equal to rw_timeout. It is much more likely to be a good idea to increase the health check interval than to decrease it. Better performance is achieved when hcheck_interval is slightly greater than the rw_timeout value on the disks.
https://support.purestorage.com/Solutions/IBM/AIX/AIX_Recommended_Settings
PureStorage ODM definition sets hcheck_interval setting to 10 as opposed to IBM recommendation of 30. hcheck_interval setting is set to lower value than the rw_timeout as we are not checking active paths and a lower hcheck_interval will not have any SAN performance impact.
--------------------
timeout_policy: (when an I/O operation fails to complete within the rw_timeout value, what action PCM should take)
- retry_path: command is retried on the same path. This is likely to lead to delays in the I/O recovery.
(Only after several failures will AIX fail the path and try the I/O on an another path.)
- fail_path: AIX will fail the path after a single command timeout. Failing the path forces the I/O to be retried on a different path.
With this setting recovery can be quicker and also the detection of a situation where all paths have failed is also quicker.
A path that is failed due to timeout policy can later be recovered automatically by the AIX health check commands.
- disable_path: path will be disabled after a single command timeout. Disabled path can be recovered only manually (chpath)
The recommended setting is fail_path.
--------------------------------------------------------
Active/Passive paths
A Storage Processor or Storage Controller manages or controls the disk array at storage side. Many storage arrays have two or more controllers (Storage Processors) and connecting HBA ports to each controllers protects against controller failures. A device that has multiple controllers can designate one controller as the active or preferred controller. For such a device, the PCM uses just the paths to the active or preferred controller as long as there is at least one such path that is enabled and not failed.
Disk arrays can be:
- active/active: it allows access to the LUNs simultaneously through all the storage processors. All the paths are active at all times.
- active/passive: one SP is actively servicing a given LUN. The other SP acts as backup for the LUN (and may be actively serving other LUN I/O).
--------------------------------------------------------
manage_disk_drivers
manage_disk_drivers lists storage models and their drivers. These drivers can be AIX MPIO (like AIX_AAPCM or AIX_APPCM) and non MPIO drivers, when a third-party mulitpath driver is installed. In this case the AIX MPIO feature can be disabled by selecting the AIX_non_MPIO option. (AAPCM means Active/Active and APPCM means Active/Passive)
# manage_disk_drivers -l
Device Present Driver Driver Options
2810XIV AIX_AAPCM AIX_AAPCM,AIX_non_MPIO
DS4100 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4200 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4300 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4500 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4700 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4800 AIX_APPCM AIX_APPCM,AIX_fcparray
DS3950 AIX_APPCM AIX_APPCM
DS5020 AIX_APPCM AIX_APPCM
DCS3700 AIX_APPCM AIX_APPCM
DS5100/DS5300 AIX_APPCM AIX_APPCM
DS3500 AIX_APPCM AIX_APPCM
XIVCTRL MPIO_XIVCTRL MPIO_XIVCTRL,nonMPIO_XIVCTRL
2107DS8K NO_OVERRIDE NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
IBMFlash NO_OVERRIDE NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
IBMSVC NO_OVERRIDE NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
NO_OVERRIDE:
The "NO_OVERRIDE" option (as a Present Driver) indicates that the configuration is not overridden by using "manage_disk_drivers -d (device) -o...". If SDDPCM is installed it will substitue AIX MPIO driver for SVC, DS8k (that is why it is not listed by manage_disk_drivers), and by default SDDPCM takes precedence. In this case NO_OVERRIDE would mean SDDPCM will be used (we did not override it by a manual change). If SDDPCM is not installed, then the AIX default PCM is used.
# manage_disk_drivers -d IBMSVC -o AIX_AAPCM <---use the AIX default PCM even if SDDPCM is installed
# manage_disk_drivers -l <--after reboot, AIX PCM is in control (not SDDPCM, which can be uninstalled now)
Device Present Driver Driver Options
IBMSVC AIX_AAPCM NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
Changing the driver will reset disk attributes (queue_depth, algorithm, reserve_policy..) to the new default values of the new driver.
--------------------------------------------------------
lsmpio
The command lsmpio was introduced in AIX 7.1 TL3, and it displays information about the MPIO storage devices (those devices which are controlled by PCM). It is similar to lspath, but with much more details.
# lsmpio -l hdisk1234
name path_id status path_status parent connection
===============================================================================
hdisk1234 0 Enabled Opt,Sel,Deg,Rsv fscsi0 500a098186a7d4ca,0008000000000000
hdisk1234 1 Enabled Non fscsi0 500a098196a7d4ca,0008000000000000
hdisk1234 2 Enabled Opt,Sel fscsi1 500a098186a7d4ca,0008000000000000
hdisk1234 3 Enabled Non fscsi1 500a098196a7d4ca,0008000000000000
status: (same as with lspath)
enabled: path is configured and operational. It will be considered when paths are selected for IO.
disabled: path has been manually disabled and won't be considered when paths are selected for IO. (set back to enabled with 'chpath')
failed: path is unusable due to IO failures. It won't be selected for IO. (after problem is fixed, remove path and rediscover it)
defined: path is not configured and used for io operation (rmpath -l will explicitly turn path off)
missing: path was not detected after reboot, but it was there earlier in the system (these can be recovered with 'cfgmgr')
detected: path was detected during boot, but it was not configured (this status should never appear, only during boot)
It is best to manually disable paths before storage maintenance (rmpath). AIX MPIO stops using any disabled or Defined paths, so no error detection or recovery will be done. This ensures that the AIX host does not go into extended error recovery during a scheduled maintenance. After the maintenance is complete, the paths can be re-enabled with cfgmgr. (When disabling multiple paths for multiple LUNs, rmpath is simpler than chpath, as it does not have to be run on a per-disk basis.)
path_status: (more detailed path status)
Opt - optimized path: preferred path, which is attached to a preferred controller. PCM selects preferred paths whenever possible.
Non - non-optimized path: this path is not considered as preferred path. PCM avoids this path, unless all preferred paths fail.
Act - active path: (on a device that has active and passive controllers) the PCM selects active paths for I/O operations.
Pas - passive path: (on a device that has active and passive controllers) the PCM avoids the selection of passive paths.
Sel - selected path: this path is selected for I/O operations when lsmpio command was issued.
Rsv - reservation conflict: could be that multiple hosts accessing the same disk
Fai - failure: path experienced a failure. (If status is Enabled and Fai: MPIO leaves one path in Enabled state, even when all paths have errors.)
Deg - degraded: there were errors, which causing to temporarily avoid to use this path. (Additional errors can cause path to fail)
Clo - closed: if all paths are closed, the device is closed. If some paths are closed, then those had errors and MPIO tries to periodically recover those.
--------------------------------------------------------
smitty mpio
genkex | grep pcm show loaded pcm (like /usr/lib/drivers/sddpcmke)
lspath lists paths (lspath -l hdisk46)
lspath -l hdisk0 -HF "name path_id parent connection path_status status" more detailed info about a device (like lsdev for devices)
lspath -AHE -l hdisk0 -p vscsi0 -w "810000000000" display attrib. for given path and connection (-w) (-A is like lsattr for devices)
(if only 1 path exist to parent device connection can be omitted: lspath -AHE -l hdisk0 -p vscsi0)
lspath -l hdisk1 -a priority -F value -p fscsi0 -w 20080022a10bb2d5,1000000000000 check priority
chpath changing path state (enabbled, disabled)
chpath -s enabled -l hdisk -p vscsi0 it will set the path to enabled status
chpath -l hdisk1 -a priority=255 -p fscsi0 -w 20080022a10bb2d5,1000000000000 change priority
rmpath -l hdiskX -p vscsi0 -w 870000000000 put path in defined state (-w can be omitted if only 1 path exist to parent device)
rmpath -dl hdiskX -p fcsiY dynamically remove all paths under a parent adapter from a supported storage MPIO device
(-d: deletes, without it puts it to define state)
(The last path cannot be removed, the command will fail if you try to remove the last path)
mkpath ... makes a path into Available state
lsmpio lists addtional info about paths (which path is selected)
lsmpio -q shows disks with its size
lsmpio -ql hdiskX shows disk serial number (LUN ID)
lsmpio -Sl hdisk0 | grep Path shows path statistics (which path was used mostly in the past)
lsmpio -ar list parent adapter and remote port information (-a: adapter (local), -r: remote port)
lsmpio -are list error statistics for local and remote (-e: error)
lsmpio -z reset statistics
While checking errors in lsmpio -are:
If 1 HBA shows error on all remote storage ports (but other HBA has no errors), then it is a problem with the adapter, cable or switch.
If 2 HBAs show errors on 1 particular storage port, then the error is at the storage port side (or the cable or the switch port.)
If 2 HBAs show erros on the same number of storage ports, then most likely switch has errors (probably at the ISL (inter switch link) between switches)
--------------------------------------------------
Failed path handling:
(there were Hitachi disks in Offline (E) state, but they were not unconfigured earlier)
-lspath | grep -v Enab
-rmpath -p fscsiX -d
-cfgmgr -l fcsX
-lspath | grep -v Enab
-dlnkmgr view -lu -item
--------------------------------------------------
Change adapter setting online:
rmpath -d -p vscsi0 <--removes all paths from adapt. (rmpath -dl hdisk0 -p vscsi0, it removes only specified path)
rmdev -l vscsi0 <--puts adapter into defined state
chdev -l vscsi0 -a vscsi_err_recov=fast_fail <--change adapter setting (if -P is used it will be activated after reboot)
cfgmgr -l vscsi0 <--configure back adapter
--------------------------------------------------------
SDDPCM migration to AIX PCM
https://www.ibm.com/support/pages/migrate-aixpcm-using-managediskdrivers-command>
Below steps should be done during downtime when no application is using the disks.
If the disks are shared among multiple nodes in a cluster or VIOS configuration, the reservation policy must be checked and changed before the disk is opened. If this is not changed, other nodes in the cluster or VIOS configuration will lose access to the shared disks until the reservation policy is changed.
lsattr -El hdiskX <--check current disk attributes (save output) (lsmpio should not work at this point)
manage_disk_drivers -l <--list disk drivers
manage_disk_drivers -d IBMSVC -o AIX_AAPCM <--switching disk driver to MPIO (manage_disk_drivers -l will show new value)
shutdown -Fr <--reboot
lsattr -El hdiskX <--check new value (lsmpio should work now)
chdev -P -l hdisk0 -a queue_depth=X -a reserve_policy=no_reserve <--set back needed values for disks
chdef -a reserve_policy=no_reserve -c disk -s fcp -t mpioosdisk <--set predefined values in ODM (or chdef -a reserve_policy=no_reserve -c disk -s fcp -t aixmpiods8k)
shutdown -Fr <--reboot
installp -u devices.fcp.disk.ibm.mpio.rte devices.sddpcm.72.rte <--remove host attachment and sddpcm filesets (if sddpcm is removed both should be removed!!)
shutdown -Fr <--reboot
Check again current and ODM settings to be sure SDDPCM removal did not overwrite something, because removing SDDPCM will set the queue to the default value of 20
If needed to roll back to SDDPCM: # manage_disk_drivers -d IBMSVC -o NO_OVERRIDE
Multipathing
A path describes a route from the HBA port in the host (through the switches in the fabric) into a storage port on the storage array. When more than one HBA ports are cabled, a host can access a LUN through more than one path, which is called multipathing. If any of the components along the path fails, the server selects another path. The process of detecting a failed path and switching to another path is called path failover.
In general 4 or 8 paths per disk is recommended, (or up to 16 paths for rare situations.) Extra, unnecessary redundancy could have some negative effect (each paths takes extra memory, error recovery could take longer)
Patch Control Module (PCM): It is responsible for controlling multiple paths. Each storage device requires a PCM. PCM is a storage vendor supplied code that handles path management. It can be a separate (3rd party) software (driver) or AIX has a native PCM package, which comes with the base operating system, which is sometimes called AIXPCM, MPIOPCM or just MPIO.
--------------------------------------------------------
AIX multipathing and MPIO history
At the end of the 90s and early 2000 SAN and FC technology got more popular and storage vendors (EMC, Hitachi, IBM...) came up with their own multipathing solutions. IBM developed SDD. With SDD each path was called hdisk and on top of these a "super device" (vpath) has been created to do path management. It was capable of path failover and load balancing, but with the increased number of hdisks and vpath devices the maintenance was a bit complex. Around 2002 AIX 5.2 has introduced a new feature called MPIO. Without installing any 3rd party or vendor filestes and without creating extra hdisks devices, the new AIX MPIO feature was capable of discovering multiple paths. It recognized that these paths belonged to the same LUN (hdisk), so only one hdisk device was created. It made administration easier, but it was lacking some capabilites, for example load-balancing. To solve this problem IBM created the SDDPCM package (for IBM storage), which was based on MPIO and it was capable for load-balancing. SDDPCM was very popular, but during the years the default AIX MPIO was also getting better and better, until it reached all the capabilities of SDDPCM, so IBM decided to withdraw SDDPCM from the market in 2020.
Before AIX 5.2 to use multipathing, storage vendor codes were needed, for example IBM provided the SDD package. Since AIX 5.2, a default AIX installation is capable for path management and this feature is called MPIO (MPIO is an acronym for Multipath I/O, which is a mutlipathing solution.) MPIO allows a disk to have multiple paths, but it has only 1 entry per hdisk in the ODM. With MPIO design, the path management is off-loaded from the disk driver to an AIX kernel extension called: PCMKE (Path Control Module Kernel Extension). The reason to have a separate PCMKE is to make it easier for disk vendors, such as EMC, Hitachi or IBM, to adopt the AIX MPIO solution. Additionally 4 new AIX commands have been added to AIX 5.2 to manage the device paths: mkpath, rmpath, lspath, chpath. In AIX 7.1 TL3 a new command: lsmpio, has been introduced. This command is similar to lspath, but displays additional information about the MPIO storage devices.
UDID (Unique device identifier)
Every MPIO-capable device must provide a unique identifier that allows the device to be distinguished from any other device in the system. It is called UDID . When the cfgmgr command runs, it requests the UDID for the device and it is compared with the UDIDs stored in ODM to determine if a newly discovered device needs to be defined or the device already exists and only a new path needs to be defined.
--------------------------------------------------------
ODM Update
The AIX MPIO infrastructure allows IBM or third-party storage vendors to supply ODM definitions, which are often referred to as a host attachment kit. A host attachment fileset from the disk vendor is needed to update the ODM to support a specific storage, then AIX can recognize and appropriately configure the disk. Without this, disks are configured using generic ODM definitions. Based on the host attachment filesets disks will be configured as MPIO or non-MPIO devices.
For example if we have a 3rd party storage (like Hitachi or EMC), and we don't install any additional packages, then probably AIX will detect disks as "Other FC SCSI Disk Drive". A device will be discovered as an "MPIO other FC device" only if the device has been certified with one of the AIX default PCMs and no vendor-provided ODM definitions have been installed. Certification does not guarantee that all device capabilities can be used. (Usually only fail_over is possible but no round_robin or load_balancing.) After the ODM is updated, and the host attachment contained MPIO definitions, the device will be discovered as an MPIO device (for example EMC storage could look like "EMC MPIO FC disks") and one of the AIX PCMs can do the path management. If host attachment contained non-MPIO definitions, then disks will be recognized as non-MPIO (like "EMC CLARiiON FCP RAID 1/0 Disk") and the vendor supplied driver needs to be installed for path management. After the ODM update, the newly discoverd disks will have vendor configured default values (like queue_depth, reserve_policy etc.)
MPIO disk settings
https://developer.ibm.com/articles/au-aix-mpio/
# lsattr -El hdisk15
PCM PCM/friend/fcpother Path Control Module False
PR_key_value none Persistant Reserve Key Value True+
algorithm round_robin Algorithm True+
clr_q no Device CLEARS its Queue on error True
dist_err_pcnt 0 Distributed Error Percentage True
dist_tw_width 50 Distributed Error Sample Time True
hcheck_cmd inquiry Health Check Command True+
hcheck_interval 60 Health Check Interval True+
hcheck_mode nonactive Health Check Mode True+
...
...
timeout_policy retry_path Timeout Policy True+
unique_id 2611200173800005102BE072810XIV03IBMfcp Unique device identifier False
ww_name 0x500173800051019 FC World Wide Name False
--------------------
algorithm: (determines how I/O should be distributed across the paths)
- fail_over: I/O is distributed to one path at a time, if it fails next enabled path is selected, depending on path priority.
VSCSI disks use fail_over, if SCSI-2 reserves is used (reserve_policy=single_path) fail_over is the only possible algorithm.
By default, priority is set to 1 (highest priority) and it can range from 1 to 255 (which is the lowest priority).
set priority: # chpath -l hdisk1 -a priority=255 -p fscsi0 -w 20080022a10bb2d5,1000000000000
check priority # lspath -l hdisk1 -a priority -F value -p fscsi0 -w 20080022a10bb2d5,1000000000000
- round_robin: I/O is distributed to all enabled paths. Paths with same prio. has equal I/O, otherwise higher prio. path utilized more
To achieve optimal performance during failure, make sure the ordered path list alternate paths between separate fabrics.
- shortest_queue: Similar to round_robin, but when load increases it favors path with fewest I/O and path priority is ignored.
If one path is slow (congestion in the SAN), other less-congested paths are used for more I/O
If using SCSI-2 reserves or vSCSI disks, then fail_over must be used. For other situations, shortest_queue (if available) or round_robin enable maximum use of the SAN resources.
--------------------
hcheck_cmd: (the command that will be sent to the disk for healthcheck)
- test_unit_rdy: Test Unit Ready (TUR) command, this is the default setting
- inquiry: In clustered environments with this can have greater control of SCSI reserve and release
In general the default setting should be used, but if there are reservation locks on the disks, use the inquiry option. If test unit ready is used and the backing device is reserved, then test unit ready fails and log an error on the client.
--------------------
hcheck_mode: (determines which paths will be tested by the health checker. Disabled or Missing paths are never checked
(Disabled paths must be recovered manually with chpath, Missing paths with cfgmgr first. If disk is not open (VG is varied off), nohealth checking is done.)
- nonactive: paths with no active I/O are checked (or paths in failed state).
(At round_robin and shortest_queue all paths are busy, so only failed paths are checked. If disk is idle then any paths can be checked (without a pending I/O.))
- enabled: The PCM checks all enabled paths, even paths that have other active I/O
- failed: PCM checks paths that are marked as failed.
The default value is nonactive, and most of the cases this should not be changed.
--------------------
hcheck_interval: (interval, in seconds, at which MPIO checks path availability).
- hcheck_interval=0: disables health check mechanism, any failed paths need to be recovered manually
- hcheck_interval=(small value): the health check commands use the disk’s queue_depth and they receive a higher priority than normal I/O, so a small health check interval can quickly use up a lot of bandwidth on the SAN if there are a large number of disks.
If a device has only one non-failed path and an error is detected on that last path, AIX will send a health check command on all failed paths before retrying the I/O. If there are good paths to use, AIX will discover it and uses it before failing user I/O, regardless of the health check interval setting.
So, for most cases, the default value is appropriate and the general rule for hcheck_interval is, that it should be greater than or equal to rw_timeout. It is much more likely to be a good idea to increase the health check interval than to decrease it. Better performance is achieved when hcheck_interval is slightly greater than the rw_timeout value on the disks.
https://support.purestorage.com/Solutions/IBM/AIX/AIX_Recommended_Settings
PureStorage ODM definition sets hcheck_interval setting to 10 as opposed to IBM recommendation of 30. hcheck_interval setting is set to lower value than the rw_timeout as we are not checking active paths and a lower hcheck_interval will not have any SAN performance impact.
--------------------
timeout_policy: (when an I/O operation fails to complete within the rw_timeout value, what action PCM should take)
- retry_path: command is retried on the same path. This is likely to lead to delays in the I/O recovery.
(Only after several failures will AIX fail the path and try the I/O on an another path.)
- fail_path: AIX will fail the path after a single command timeout. Failing the path forces the I/O to be retried on a different path.
With this setting recovery can be quicker and also the detection of a situation where all paths have failed is also quicker.
A path that is failed due to timeout policy can later be recovered automatically by the AIX health check commands.
- disable_path: path will be disabled after a single command timeout. Disabled path can be recovered only manually (chpath)
The recommended setting is fail_path.
--------------------------------------------------------
Active/Passive paths
A Storage Processor or Storage Controller manages or controls the disk array at storage side. Many storage arrays have two or more controllers (Storage Processors) and connecting HBA ports to each controllers protects against controller failures. A device that has multiple controllers can designate one controller as the active or preferred controller. For such a device, the PCM uses just the paths to the active or preferred controller as long as there is at least one such path that is enabled and not failed.
Disk arrays can be:
- active/active: it allows access to the LUNs simultaneously through all the storage processors. All the paths are active at all times.
- active/passive: one SP is actively servicing a given LUN. The other SP acts as backup for the LUN (and may be actively serving other LUN I/O).
--------------------------------------------------------
manage_disk_drivers
manage_disk_drivers lists storage models and their drivers. These drivers can be AIX MPIO (like AIX_AAPCM or AIX_APPCM) and non MPIO drivers, when a third-party mulitpath driver is installed. In this case the AIX MPIO feature can be disabled by selecting the AIX_non_MPIO option. (AAPCM means Active/Active and APPCM means Active/Passive)
# manage_disk_drivers -l
Device Present Driver Driver Options
2810XIV AIX_AAPCM AIX_AAPCM,AIX_non_MPIO
DS4100 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4200 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4300 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4500 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4700 AIX_APPCM AIX_APPCM,AIX_fcparray
DS4800 AIX_APPCM AIX_APPCM,AIX_fcparray
DS3950 AIX_APPCM AIX_APPCM
DS5020 AIX_APPCM AIX_APPCM
DCS3700 AIX_APPCM AIX_APPCM
DS5100/DS5300 AIX_APPCM AIX_APPCM
DS3500 AIX_APPCM AIX_APPCM
XIVCTRL MPIO_XIVCTRL MPIO_XIVCTRL,nonMPIO_XIVCTRL
2107DS8K NO_OVERRIDE NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
IBMFlash NO_OVERRIDE NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
IBMSVC NO_OVERRIDE NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
NO_OVERRIDE:
The "NO_OVERRIDE" option (as a Present Driver) indicates that the configuration is not overridden by using "manage_disk_drivers -d (device) -o...". If SDDPCM is installed it will substitue AIX MPIO driver for SVC, DS8k (that is why it is not listed by manage_disk_drivers), and by default SDDPCM takes precedence. In this case NO_OVERRIDE would mean SDDPCM will be used (we did not override it by a manual change). If SDDPCM is not installed, then the AIX default PCM is used.
# manage_disk_drivers -d IBMSVC -o AIX_AAPCM <---use the AIX default PCM even if SDDPCM is installed
# manage_disk_drivers -l <--after reboot, AIX PCM is in control (not SDDPCM, which can be uninstalled now)
Device Present Driver Driver Options
IBMSVC AIX_AAPCM NO_OVERRIDE,AIX_AAPCM,AIX_non_MPIO
Changing the driver will reset disk attributes (queue_depth, algorithm, reserve_policy..) to the new default values of the new driver.
--------------------------------------------------------
lsmpio
The command lsmpio was introduced in AIX 7.1 TL3, and it displays information about the MPIO storage devices (those devices which are controlled by PCM). It is similar to lspath, but with much more details.
# lsmpio -l hdisk1234
name path_id status path_status parent connection
===============================================================================
hdisk1234 0 Enabled Opt,Sel,Deg,Rsv fscsi0 500a098186a7d4ca,0008000000000000
hdisk1234 1 Enabled Non fscsi0 500a098196a7d4ca,0008000000000000
hdisk1234 2 Enabled Opt,Sel fscsi1 500a098186a7d4ca,0008000000000000
hdisk1234 3 Enabled Non fscsi1 500a098196a7d4ca,0008000000000000
status: (same as with lspath)
enabled: path is configured and operational. It will be considered when paths are selected for IO.
disabled: path has been manually disabled and won't be considered when paths are selected for IO. (set back to enabled with 'chpath')
failed: path is unusable due to IO failures. It won't be selected for IO. (after problem is fixed, remove path and rediscover it)
defined: path is not configured and used for io operation (rmpath -l will explicitly turn path off)
missing: path was not detected after reboot, but it was there earlier in the system (these can be recovered with 'cfgmgr')
detected: path was detected during boot, but it was not configured (this status should never appear, only during boot)
It is best to manually disable paths before storage maintenance (rmpath). AIX MPIO stops using any disabled or Defined paths, so no error detection or recovery will be done. This ensures that the AIX host does not go into extended error recovery during a scheduled maintenance. After the maintenance is complete, the paths can be re-enabled with cfgmgr. (When disabling multiple paths for multiple LUNs, rmpath is simpler than chpath, as it does not have to be run on a per-disk basis.)
path_status: (more detailed path status)
Opt - optimized path: preferred path, which is attached to a preferred controller. PCM selects preferred paths whenever possible.
Non - non-optimized path: this path is not considered as preferred path. PCM avoids this path, unless all preferred paths fail.
Act - active path: (on a device that has active and passive controllers) the PCM selects active paths for I/O operations.
Pas - passive path: (on a device that has active and passive controllers) the PCM avoids the selection of passive paths.
Sel - selected path: this path is selected for I/O operations when lsmpio command was issued.
Rsv - reservation conflict: could be that multiple hosts accessing the same disk
Fai - failure: path experienced a failure. (If status is Enabled and Fai: MPIO leaves one path in Enabled state, even when all paths have errors.)
Deg - degraded: there were errors, which causing to temporarily avoid to use this path. (Additional errors can cause path to fail)
Clo - closed: if all paths are closed, the device is closed. If some paths are closed, then those had errors and MPIO tries to periodically recover those.
--------------------------------------------------------
smitty mpio
genkex | grep pcm show loaded pcm (like /usr/lib/drivers/sddpcmke)
lspath lists paths (lspath -l hdisk46)
lspath -l hdisk0 -HF "name path_id parent connection path_status status" more detailed info about a device (like lsdev for devices)
lspath -AHE -l hdisk0 -p vscsi0 -w "810000000000" display attrib. for given path and connection (-w) (-A is like lsattr for devices)
(if only 1 path exist to parent device connection can be omitted: lspath -AHE -l hdisk0 -p vscsi0)
lspath -l hdisk1 -a priority -F value -p fscsi0 -w 20080022a10bb2d5,1000000000000 check priority
chpath changing path state (enabbled, disabled)
chpath -s enabled -l hdisk -p vscsi0 it will set the path to enabled status
chpath -l hdisk1 -a priority=255 -p fscsi0 -w 20080022a10bb2d5,1000000000000 change priority
rmpath -l hdiskX -p vscsi0 -w 870000000000 put path in defined state (-w can be omitted if only 1 path exist to parent device)
rmpath -dl hdiskX -p fcsiY dynamically remove all paths under a parent adapter from a supported storage MPIO device
(-d: deletes, without it puts it to define state)
(The last path cannot be removed, the command will fail if you try to remove the last path)
mkpath ... makes a path into Available state
lsmpio lists addtional info about paths (which path is selected)
lsmpio -q shows disks with its size
lsmpio -ql hdiskX shows disk serial number (LUN ID)
lsmpio -Sl hdisk0 | grep Path shows path statistics (which path was used mostly in the past)
lsmpio -ar list parent adapter and remote port information (-a: adapter (local), -r: remote port)
lsmpio -are list error statistics for local and remote (-e: error)
lsmpio -z reset statistics
While checking errors in lsmpio -are:
If 1 HBA shows error on all remote storage ports (but other HBA has no errors), then it is a problem with the adapter, cable or switch.
If 2 HBAs show errors on 1 particular storage port, then the error is at the storage port side (or the cable or the switch port.)
If 2 HBAs show erros on the same number of storage ports, then most likely switch has errors (probably at the ISL (inter switch link) between switches)
Failed path handling:
(there were Hitachi disks in Offline (E) state, but they were not unconfigured earlier)
-lspath | grep -v Enab
-rmpath -p fscsiX -d
-cfgmgr -l fcsX
-lspath | grep -v Enab
-dlnkmgr view -lu -item
--------------------------------------------------
Change adapter setting online:
rmpath -d -p vscsi0 <--removes all paths from adapt. (rmpath -dl hdisk0 -p vscsi0, it removes only specified path)
rmdev -l vscsi0 <--puts adapter into defined state
chdev -l vscsi0 -a vscsi_err_recov=fast_fail <--change adapter setting (if -P is used it will be activated after reboot)
cfgmgr -l vscsi0 <--configure back adapter
--------------------------------------------------------
SDDPCM migration to AIX PCM
https://www.ibm.com/support/pages/migrate-aixpcm-using-managediskdrivers-command>
Below steps should be done during downtime when no application is using the disks.
If the disks are shared among multiple nodes in a cluster or VIOS configuration, the reservation policy must be checked and changed before the disk is opened. If this is not changed, other nodes in the cluster or VIOS configuration will lose access to the shared disks until the reservation policy is changed.
lsattr -El hdiskX <--check current disk attributes (save output) (lsmpio should not work at this point)
manage_disk_drivers -l <--list disk drivers
manage_disk_drivers -d IBMSVC -o AIX_AAPCM <--switching disk driver to MPIO (manage_disk_drivers -l will show new value)
shutdown -Fr <--reboot
lsattr -El hdiskX <--check new value (lsmpio should work now)
chdev -P -l hdisk0 -a queue_depth=X -a reserve_policy=no_reserve <--set back needed values for disks
chdef -a reserve_policy=no_reserve -c disk -s fcp -t mpioosdisk <--set predefined values in ODM (or chdef -a reserve_policy=no_reserve -c disk -s fcp -t aixmpiods8k)
shutdown -Fr <--reboot
installp -u devices.fcp.disk.ibm.mpio.rte devices.sddpcm.72.rte <--remove host attachment and sddpcm filesets (if sddpcm is removed both should be removed!!)
shutdown -Fr <--reboot
Check again current and ODM settings to be sure SDDPCM removal did not overwrite something, because removing SDDPCM will set the queue to the default value of 20
If needed to roll back to SDDPCM: # manage_disk_drivers -d IBMSVC -o NO_OVERRIDE
STORAGE - HITACHI
HBA: The cable port on the host is the host bus adapter (HBA)
CHA: The cable port on the storage subsystem is a port (P) on a channel adapter (CHA)
------------------------------------
Product : 9500V
SerialNumber : 6123
LUs : 12
iLU SLPR HDevName VG OSPathID PathID PathName ChaPort CLPR Status Type IO-Count IO-Errors DNum IEP
0062 - hdisk11 - 00000 000000 08.06.0000000000210300.0000 1A - Online Own 0 0 0 -
00001 000006 09.06.0000000000211300.0000 1B - Online Own 0 0 0 -
00002 000007 08.07.00000000000E0600.0000 0C - Online Non 0 0 0 -
00003 000009 09.07.00000000000E1600.0000 0D - Online Non 0 0 0 -
Product : USP
SerialNumber : 0022137
LUs : 9
iLU SLPR HDevName VG OSPathID PathID PathName ChaPort CLPR Status Type IO-Count IO-Errors DNum IEP
016F 0 hdisk46 - 00000 000024 08.06.00000000000B8800.0000 3A 0 Online Own 3201989 0 0 -
00001 000025 09.06.00000000000BA600.0000 3E 0 Online Own 3201945 0 0 -
00002 000092 08.07.0000000000240A00.0000 4A 0 Online Own 3198945 0 0 -
00003 000106 09.07.0000000000242A00.0000 4E 0 Online Own 78042543 0 0 -
PathID: HDLM manages a path by assigning an ID to it.
OSPathID: The ID that AIX assigns to a path (lspath shows it as well)
Product:9500V - HDLM performs load balancing between owner paths or between non-owner paths.
Owner paths are coming from the same CHA. Non-owner paths are coming from the other CHA.
Load is balanced first between the owner paths, if all the owner paths are lost, then between the non owner paths.
Product:USP - All the paths are owner paths.
Load is balanced among all the paths.
Status:
Online I/O can be issued normally
Online (E) An error occured on the path, and none of the other paths are in Online staus (it still can be used)
Offline (C) Path was placed offline by using the command (manually)
Offline (E) I/O cannot be performed, because an error occured in the path
------------------------------------
The location of the scripts: /usr/DynamicLinkManager/bin/
DLMManager logs: /var/DynamicLinkManager/log
Operation logs (command history): /var/opt/hitachi/HNTRLib2/spool/hntr2[1-16].log
The location of config file: /usr/D*/drv/dlmfdrv.conf
dlnkmgr view -help
dlnkmgr view -hba shows info about the hbas
dlnkmgr view -lu shows the status of the path (online is OK)
dlnkmgr view -path shows that which device uses which path
dlnkmgr view -path -srt lu useful command as: dlnkmgr view -lu -item
dlnkmgr view -lu -item all gives a full list
dlnkmgr view -cha shows storage channel port, serial number of storage box
dlnkmgr view -sys -lic shows license information
dlnkmgr view -drv shows storsge box serial, and iLU numbers
dlnkmgr online -help
dlnkmgr online -pathid 000023 puts a path to online state
dlnkmgr offline -hba <adapter> puts an hba to offline state (dlnkmgr offline -hba 08.08)
SETTINGS:
dlnkmgr view -sys shows sytem settings (HDLM version,load balance, health checking...)
root@aix41: / # dlnkmgr view -sys
HDLM Version : 05-94
Service Pack Version :
Load Balance : on(rr)
Support Cluster :
Elog Level : 3
Elog File Size (KB) : 9900
Number Of Elog Files : 2
Trace Level : 0
Trace File Size (KB) : 1000
Number Of Trace Files : 4
Path Health Checking : on(30)
Auto Failback : off
Intermittent Error Monitor : off
dlnkmgr set -help
dlnkmgr set -lb on -lbtype rr sets load balancing on (rr:round robin, rr is recommended not exrr: extended rr)
dlnkmgr set -pchk on -intvl 10 sets path health checking on (invl: checking interval (in minutes))
dlnkmgr set -afb on -intvl 10 sets automatic failback on
dlnkmgr set -iem on -intvl 20 -iemnum 2 sets intermittent monitoring on (interval 20 min, the number of times the error is to oocur 2)
dlnkmgr set -ellv 3 sets the error log collection level (3 is recommended, 0:no log, 1:errors only, 2:1+warnings, 3:2+infos)
dlnkmgr set -systflv 0 sets trace level (0 is recommended, after an error could be set to higher (1) to collect logs)
dlnkmgr set -elfs 9900 sets the error log file size in KB (9900 is recommended)
dlnkmgr set -elfn 2 sets the number of error log files (2 is recommended)
dlnkmgr set -systfs 1000 sets the trace file size (1000 is recommended)
dlnkmgr set -systfn 4 sets the number of trace files (4 is recommended)
dlnkmgr set -rsv on 2 sets the reservation level (2 is recommended it means persistent res. level, 0:ignore reservation)
(it's important for reserving disks, if a LUN is not supporting it, it won't reserve the disk)
dlnkmgr clear -pdst this clears the statistics (I/O counts and I/O errors)
dlmcfgmgr configures dlm devices
dlmrmdev removes dlm devices (/usr/D*/bin/dlmrmdev) (dlmrmdev -A makes umount, varyoff, rmdev)
dlmpr it clears persistent reservation in clustered environment
dlmpr -k hdiskX shows the reservation key
dlmpr -c hdiskX clears the reservation key
------------------------------------
lssrc -a | grep -i dlm shows if HDLM Manager running (startsrc/stopsrc -s DLMManager)
------------------------------------
changing FC adapter settings (for dualpath):
dlnkmgr offline -hba 08.08 (full delete: dlmhbadel fscsi0) (dlnkmgr offline -pathid 000001)
rmpath -p fscsiX <--sets define (rmpath -p fscsiX -d <--it will delete)
rmdev -Rl fcsX (rmdev -Rdl fscsiX <--careful with raw devices, as it will delete)
chdev -l fcs1 -a init_link=pt2pt
chdev -l fscsi1 -a dyntrk=yes -a fc_err_recov=fast_fail
cfgmgr (dlmcfgmgr if needed)
------------------------------------
Changing ODM parmaeters of the HDLM driver:
# odmget PdAt > /tmp/PdAtbackup_091119
# odmget -q"uniquetype=disk/fcp/Hitachi and attribute=reserve_policy" PdAt > /tmp/pdatreserve_policy
Edit /tmp/pdatreserve_policy
PdAt:
uniquetype = "disk/fcp/Hitachi"
attribute = "reserve_policy"
deflt = "no_reserve"
values = "no_reserve,single_path,PR_exclusive,PR_shared"
width = ""
type = "R"
generic = "DU"
rep = "sl"
nls_index = 96
# odmchange -o PdAt -q"uniquetype=disk/fcp/Hitachi and attribute=reserve_policy" /tmp/pdatreserve_policy
STORAGE - EMC
EMC:
EMC comes with the default settings. These settings are set by EMC technicians for the optimized performance.
It is not recommended to change these configurations:
queue_depth=16
init_link al
dyntrk yes
fc_err_recov fast_fail
Because of queue_depth attributes which is 16 you may need to change num_cmd_elems attributes on FC card too.
num_cmd_elems >= queue_depth * LUN numbers.
Maximum num_cmd_elems could be 1024 OR 2048 which depend FC card type.
------------------
installed filesets:
root@aix14: /etc/emc # lslpp -L | grep EMC
EMC.Symmetrix.aix.rte 5.3.0.3 C F EMC Symmetrix AIX Support
EMC.Symmetrix.fcp.rte 5.3.0.3 C F EMC Symmetrix FCP Support
EMCpower.base 5.3.1.0 C F PowerPath Base Driver and
EMCpower.encryption 5.3.1.0 C F PowerPath Encryption with RSA
EMCpower.migration_enabler
EMCpower.mpx 5.3.1.0 C F PowerPath Multi_Pathing
------------------
EMCPOWERRESET:
EMC has developed a binary called emcpowerreset for removing disk reservations, held by PowerPath devices, in the event that a node crashes. This binary is required for any HACMP installations on AIX 5.1, and higher when running PowerPath version 3.0.3 and higher.
To determine the different emcpowerreset versions, run the following command:
cksum emcpowerreset (or cksum /usr/lpp/EMC/Symmetrix/bin/emcpowerreset)
Version 1 = 1108394902 7867
Version 2 = 1955156125 10311
The emcpowerreset binary takes as options two parameters, and these two parameters are automatically passed to the binary whenever it is invoked within the HACMP script logic.
Reset reservation bit:
If you run into not being able to access an hdiskpowerX disk, you may need to reset the reservation bit on it:
# /usr/lpp/EMC/Symmetrix/bin/emcpowerreset fscsiX hdiskpowerX
------------------
smitty powerpath
powermt version shows installed powerpath version
powermt display shows short overview of hba, paths, managed classes
powermt display options shows some settings for PowerPath
powermt display paths shows the paths and dead paths to the storage port
powermt display ports shows the storage ports information
powermt display dev=all shows detailed info of all devices
powermt display dev=hdiskpowerX shows detailed info of the specified device
powermt display every=<x seconds> shows io stats on each adapter in the specified interval (good for checking load balancing)
/usr/lpp/EMC/Symmetrix/bin # ./inq.aix64_51 shows detailed info about the disks (type, serial numbers...)
powermt remove hba=1 dev=all removes all pathes from an hba (it asks before reomving the last active path)
powermt config configures back the pathes
powermt disable hba=0 disables specified hba (it will show failed, errpt entry will also generated) - enable is needed later
powermt enable hba=0 enables the specified hba
powermt remove hba=2 dev=0 remove only 1 path from the specified device (powermt remove hba=2 dev=hdisk11)
powermt set mode=standby hba=1 dev=all sets the adapter to standby mode (it will be activated only if the other path failed)
powermt unmanage class=hitachi powermt will not manage any hitachi devices
powermt manage class=hitachi powermt will manage hitachi devices (not recommended, can cause trouble)
powermt save file=<path> saves powerpath configuration settings
powermt load file=<path> loads back powerpath configuration settings
powermt restore dev=all it does an I/O path check, if a previously dead path is alive, it will be marked as alive
(If you have dead I/O paths, and you fixed something, you can request PowerPath to re-check the paths.)
powermt check check the I/O Paths
(for example, if you manually removed an I/O path, it will detect a dead path and remove it from the list.)
lsdev -Ct power shows the powerpath0 device (which is the pseudo parent of the hdiskpowerX devices)
rmdev -dl powerpath0 this is needed for the full removing of powerpath
powermt watch every=5 it shows in 5 seconds interval fc adapter statistics (IO/sec). It is good for checkng if load balancing works.
--------------------------
removing a LUN:
powermt display dev=19
powermt remove dev=19
rmdev -dl hdiskpower19
rmdev -dl hdisk59
rmdev -dl hdisk77
--------------------------
"powermt check" checks specified paths and, if desired, removes from the PowerPath configuration any paths marked dead.
Warning: storage_system I/O path path_name is dead
Do you want to remove it (y/n/a/q)?
Valid responses are:
y Remove the dead path, and continue checking remaining paths.
n Do not remove the path, but continue checking remaining paths.
a Remove the dead path and any subsequent paths marked dead.
q Do not remove the dead path, and exit the command. Any paths that were already removed remain removed.
--------------------------
Install:
0. preparations:
-set FC adapter settings if needed: (init_link, dyntrk, fc_err_recov, num_cmd_elems)
-remove already assigned EMC disks (if you see Other MPIO... (which is EMC) remove them, so cfgmgr will bring up correctly later)
1. ODM extension and Powerpath:
-ODM_def_EMC.AIX.5.3.0.3:
install only these:
-EMC Symmetrix AIX Support Software
-EMC Symmetrix FCP Support Software
(Symmetrix MPIO and Symmmetrix Powerpath can't be installed on 1 system together.)
-Powerpath_AIX.5.3.SP1:
EMCpower: everything can be installed
2. add licence key (/mnt/Storage-Treiber/EMC/PowerPath_license)
emcpreg -add XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
you can check if it is added:
emcpreg -list or powermt check_registration
(license key can be copied from an already installed system)
3. specifics for RAC,VIO (dual) and HACMP
RAC and VIO (dual):
set the reservation setting to no reserve in the ODM
we have a script: /mnt/Storage-Treiber/EMC/emc_reserve.sh
you will be asked:
1. Set reserve_lock=no in the PdAt for Symmetrix FCP devices.
2. Set reserve_lock=yes in the PdAt for Symmetrix FCP devices.
choose #1
you can check the settings in the ODM: odmget PdAt | grep -p reserve | grep -p SYMMETRIX
(deflt. should be on no)
(reserve should be removed only for dual vio environment)
HACMP:
(full instruction: Host Connectivity Guide pg. 348)
emcpowerreset utilit should be run, for HACMP to be able handle disk reservations correctly.
It can be found here: /usr/lpp/EMC/Symmetrix/bin
(makes sure it exists and root is the owner)
smitty hacmp -> Ext. Conf. -> Ext. Res. Conf. -> HACMP Ext. Resources Conf. -> Conf. Custom Disk Methods -> Add Cust. Disk
* Disk Type (PdDvLn field from CuDv) [disk/pseudo/power]
* Method to identify ghost disks [SCSI3] +
* Method to determine if a reserve is held [SCSI_TUR] +
* Method to break a reserve [/usr/lpp/EMC/Symmetrix/bin/emcpowerreset] +
Break reserves in parallel true +
* Method to make the disk available [MKDEV] +
Configure the same custom disk processing method on each node in the cluster and synchronize the cluster resources.
(if you want NFS exports, same major numbers are needed for the hdispowerX devices)
4. Configure Powerpath to not manage other disks (ESS, Hitachi)
(this is very important, otherwise Powerpath try to manage them, and can cause problems)
change this (with vi) in /etc/emc/mpaa.lams
root@aix14: /etc/emc # cat mpaa.lams
global:version:5.3.0
managed:symm
managed:clariion
unmanaged:hitachi
unmanaged:invista
unmanaged:hpxp
unmanaged:ess
unmanaged:hphsx
(this file will be used when configuring the paths)
5. Run config manager
cfgmgr -S
(this will config the devices sequentially. This way recommended to avoid troubles of parallel config.)
(normal cfgmgr is OK as well)
if needed, run : powermt config
you can setup this link as well:
ln -s /usr/lpp/EMC/Symmetrix/bin/inq.aix64_51 /usr/sbin/inq
--------------------------
problem with bosboot after pprootdev fix:
pprootdev fix, can help if rootvg is on hdiskpower devices and we have problems with bosboot.
"bosboot -a" wants to use that disk, what is used during boot (bootinfo -b), this disk should be in the output of "lsvg -p" after we did pprootdev fix. It can happen that pprootdev fix will put rootvg to a different disk, so bosboot will fail
Solution: put the unneeded paths to failed state, then do pprootdev fix, so rootvg will be only on that disk what is available (what is used during boot). Then the other paths can be put back to available.
Commands for determining which disk are needed:
-odmget -q"name=rootvg and attribute=pv" CuAt | grep value
value = "00080e82a47d2e5d0000000000000000"
value = "00080e828106fa470000000000000000"
-odmget -q "value=00080e828106fa470000000000000000 and attribute=pvid" CuAt <--this disks list is important
CuAt:
name = "hdisk12"
attribute = "pvid"
value = "00080e828106fa470000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
CuAt:
name = "hdisk8"
attribute = "pvid"
value = "00080e828106fa470000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
--------------------------
emcp_xcryptd has high CPU usage:
This daemon is doing encryption for EMC. It is usually not in use in AIX.
You can check if encryption is being used and disbale this process with these steps:
# powervt xcrypt -info -dev all <--lists all hdiskpower devices, if it is encrypted or not
# /etc/rc.emcp_xcryptd stop <--stops emc_xryptd
# rmitab rcxcrypt <--remove from inittab
EMC comes with the default settings. These settings are set by EMC technicians for the optimized performance.
It is not recommended to change these configurations:
queue_depth=16
init_link al
dyntrk yes
fc_err_recov fast_fail
Because of queue_depth attributes which is 16 you may need to change num_cmd_elems attributes on FC card too.
num_cmd_elems >= queue_depth * LUN numbers.
Maximum num_cmd_elems could be 1024 OR 2048 which depend FC card type.
------------------
installed filesets:
root@aix14: /etc/emc # lslpp -L | grep EMC
EMC.Symmetrix.aix.rte 5.3.0.3 C F EMC Symmetrix AIX Support
EMC.Symmetrix.fcp.rte 5.3.0.3 C F EMC Symmetrix FCP Support
EMCpower.base 5.3.1.0 C F PowerPath Base Driver and
EMCpower.encryption 5.3.1.0 C F PowerPath Encryption with RSA
EMCpower.migration_enabler
EMCpower.mpx 5.3.1.0 C F PowerPath Multi_Pathing
------------------
EMCPOWERRESET:
EMC has developed a binary called emcpowerreset for removing disk reservations, held by PowerPath devices, in the event that a node crashes. This binary is required for any HACMP installations on AIX 5.1, and higher when running PowerPath version 3.0.3 and higher.
To determine the different emcpowerreset versions, run the following command:
cksum emcpowerreset (or cksum /usr/lpp/EMC/Symmetrix/bin/emcpowerreset)
Version 1 = 1108394902 7867
Version 2 = 1955156125 10311
The emcpowerreset binary takes as options two parameters, and these two parameters are automatically passed to the binary whenever it is invoked within the HACMP script logic.
Reset reservation bit:
If you run into not being able to access an hdiskpowerX disk, you may need to reset the reservation bit on it:
# /usr/lpp/EMC/Symmetrix/bin/emcpowerreset fscsiX hdiskpowerX
------------------
smitty powerpath
powermt version shows installed powerpath version
powermt display shows short overview of hba, paths, managed classes
powermt display options shows some settings for PowerPath
powermt display paths shows the paths and dead paths to the storage port
powermt display ports shows the storage ports information
powermt display dev=all shows detailed info of all devices
powermt display dev=hdiskpowerX shows detailed info of the specified device
powermt display every=<x seconds> shows io stats on each adapter in the specified interval (good for checking load balancing)
/usr/lpp/EMC/Symmetrix/bin # ./inq.aix64_51 shows detailed info about the disks (type, serial numbers...)
powermt remove hba=1 dev=all removes all pathes from an hba (it asks before reomving the last active path)
powermt config configures back the pathes
powermt disable hba=0 disables specified hba (it will show failed, errpt entry will also generated) - enable is needed later
powermt enable hba=0 enables the specified hba
powermt remove hba=2 dev=0 remove only 1 path from the specified device (powermt remove hba=2 dev=hdisk11)
powermt set mode=standby hba=1 dev=all sets the adapter to standby mode (it will be activated only if the other path failed)
powermt unmanage class=hitachi powermt will not manage any hitachi devices
powermt manage class=hitachi powermt will manage hitachi devices (not recommended, can cause trouble)
powermt save file=<path> saves powerpath configuration settings
powermt load file=<path> loads back powerpath configuration settings
powermt restore dev=all it does an I/O path check, if a previously dead path is alive, it will be marked as alive
(If you have dead I/O paths, and you fixed something, you can request PowerPath to re-check the paths.)
powermt check check the I/O Paths
(for example, if you manually removed an I/O path, it will detect a dead path and remove it from the list.)
lsdev -Ct power shows the powerpath0 device (which is the pseudo parent of the hdiskpowerX devices)
rmdev -dl powerpath0 this is needed for the full removing of powerpath
powermt watch every=5 it shows in 5 seconds interval fc adapter statistics (IO/sec). It is good for checkng if load balancing works.
--------------------------
removing a LUN:
powermt display dev=19
powermt remove dev=19
rmdev -dl hdiskpower19
rmdev -dl hdisk59
rmdev -dl hdisk77
--------------------------
"powermt check" checks specified paths and, if desired, removes from the PowerPath configuration any paths marked dead.
Warning: storage_system I/O path path_name is dead
Do you want to remove it (y/n/a/q)?
Valid responses are:
y Remove the dead path, and continue checking remaining paths.
n Do not remove the path, but continue checking remaining paths.
a Remove the dead path and any subsequent paths marked dead.
q Do not remove the dead path, and exit the command. Any paths that were already removed remain removed.
--------------------------
Install:
0. preparations:
-set FC adapter settings if needed: (init_link, dyntrk, fc_err_recov, num_cmd_elems)
-remove already assigned EMC disks (if you see Other MPIO... (which is EMC) remove them, so cfgmgr will bring up correctly later)
1. ODM extension and Powerpath:
-ODM_def_EMC.AIX.5.3.0.3:
install only these:
-EMC Symmetrix AIX Support Software
-EMC Symmetrix FCP Support Software
(Symmetrix MPIO and Symmmetrix Powerpath can't be installed on 1 system together.)
-Powerpath_AIX.5.3.SP1:
EMCpower: everything can be installed
2. add licence key (/mnt/Storage-Treiber/EMC/PowerPath_license)
emcpreg -add XXXX-XXXX-XXXX-XXXX-XXXX-XXXX
you can check if it is added:
emcpreg -list or powermt check_registration
(license key can be copied from an already installed system)
3. specifics for RAC,VIO (dual) and HACMP
RAC and VIO (dual):
set the reservation setting to no reserve in the ODM
we have a script: /mnt/Storage-Treiber/EMC/emc_reserve.sh
you will be asked:
1. Set reserve_lock=no in the PdAt for Symmetrix FCP devices.
2. Set reserve_lock=yes in the PdAt for Symmetrix FCP devices.
choose #1
you can check the settings in the ODM: odmget PdAt | grep -p reserve | grep -p SYMMETRIX
(deflt. should be on no)
(reserve should be removed only for dual vio environment)
HACMP:
(full instruction: Host Connectivity Guide pg. 348)
emcpowerreset utilit should be run, for HACMP to be able handle disk reservations correctly.
It can be found here: /usr/lpp/EMC/Symmetrix/bin
(makes sure it exists and root is the owner)
smitty hacmp -> Ext. Conf. -> Ext. Res. Conf. -> HACMP Ext. Resources Conf. -> Conf. Custom Disk Methods -> Add Cust. Disk
* Disk Type (PdDvLn field from CuDv) [disk/pseudo/power]
* Method to identify ghost disks [SCSI3] +
* Method to determine if a reserve is held [SCSI_TUR] +
* Method to break a reserve [/usr/lpp/EMC/Symmetrix/bin/emcpowerreset] +
Break reserves in parallel true +
* Method to make the disk available [MKDEV] +
Configure the same custom disk processing method on each node in the cluster and synchronize the cluster resources.
(if you want NFS exports, same major numbers are needed for the hdispowerX devices)
4. Configure Powerpath to not manage other disks (ESS, Hitachi)
(this is very important, otherwise Powerpath try to manage them, and can cause problems)
change this (with vi) in /etc/emc/mpaa.lams
root@aix14: /etc/emc # cat mpaa.lams
global:version:5.3.0
managed:symm
managed:clariion
unmanaged:hitachi
unmanaged:invista
unmanaged:hpxp
unmanaged:ess
unmanaged:hphsx
(this file will be used when configuring the paths)
5. Run config manager
cfgmgr -S
(this will config the devices sequentially. This way recommended to avoid troubles of parallel config.)
(normal cfgmgr is OK as well)
if needed, run : powermt config
you can setup this link as well:
ln -s /usr/lpp/EMC/Symmetrix/bin/inq.aix64_51 /usr/sbin/inq
--------------------------
problem with bosboot after pprootdev fix:
pprootdev fix, can help if rootvg is on hdiskpower devices and we have problems with bosboot.
"bosboot -a" wants to use that disk, what is used during boot (bootinfo -b), this disk should be in the output of "lsvg -p" after we did pprootdev fix. It can happen that pprootdev fix will put rootvg to a different disk, so bosboot will fail
Solution: put the unneeded paths to failed state, then do pprootdev fix, so rootvg will be only on that disk what is available (what is used during boot). Then the other paths can be put back to available.
Commands for determining which disk are needed:
-odmget -q"name=rootvg and attribute=pv" CuAt | grep value
value = "00080e82a47d2e5d0000000000000000"
value = "00080e828106fa470000000000000000"
-odmget -q "value=00080e828106fa470000000000000000 and attribute=pvid" CuAt <--this disks list is important
CuAt:
name = "hdisk12"
attribute = "pvid"
value = "00080e828106fa470000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
CuAt:
name = "hdisk8"
attribute = "pvid"
value = "00080e828106fa470000000000000000"
type = "R"
generic = "D"
rep = "s"
nls_index = 2
--------------------------
emcp_xcryptd has high CPU usage:
This daemon is doing encryption for EMC. It is usually not in use in AIX.
You can check if encryption is being used and disbale this process with these steps:
# powervt xcrypt -info -dev all <--lists all hdiskpower devices, if it is encrypted or not
# /etc/rc.emcp_xcryptd stop <--stops emc_xryptd
# rmitab rcxcrypt <--remove from inittab
NIM - SPOT
SPOT:
Essentially the SPOT is a /usr filesystem just like the one on your NIM master. Everything that a machine requires in a /usr file system, such as the AIX kernel, executable commands, libraries, and applications are included in the SPOT. During client install client needs to run commands (mkvg, mklv..), these commands are availabe in the SPOT.
During the installation, the client machine NFS mounts this resource in order to access the code needed for the installation process. Device drivers, the BOS install program, and other necessary code needed to perform a base operating system installation are found inside the SPOT.
SPOT is responsible for
- Creating a boot image to send to the client machine over the network.
- Running the commands needed to install the NIM client.
You can think of it as having multiple "mini-systems" on your NIM master, because each SPOT is its own /usr filesystem. You can upgrade it, add fixes to it, use it to boot a client system....etc.
You can also create a SPOT from a NIM mksysb resource. This SPOT however is not as versatile as one created from an lpp_source and can not be upgraded with any fixes and can only be used with the mksysb resource it was created from.
When a SPOT is created, network boot images are constructed in the /tftpboot directory using code from the newly created SPOT. When a client performs a network boot, it uses tftp to obtain a boot image from the server. After the boot image is loaded into memory at the client, the SPOT is mounted in the client's RAM file system to provide all additional software support required to complete the operation.
root@aixnim1: / # lsnim -l spot_5300_09
spot_5300_09:
class = resources
type = spot
plat_defined = chrp
Rstate = ready for use
prev_state = ready for use
location = /nim/spot/spot_5300_09/usr <--shows the location
...
operations:
reset = reset an object's NIM state
cust = perform software customization
showres = show contents of a resource
maint = perform software maintenance
lslpp = list LPP information about an object
fix_query = perform queries on installed fixes
showlog = display a log in the NIM environment
check = check the status of a NIM object
lppchk = verify installed filesets
update_all = update all currently installed filesets
creating a SPOT (only the top directory should be specified, the SPOT directory will be created automatically):
nim -o define -t spot -a server=master -a location=/nim/spot -a source=5300-09-03 -a installp_flags=-aQg spot_5300-09-03
resetting a SPOT (if an operation failed, with this the resource state (Rstate) will be updated, and SPOT is ready to use):
nim -Fo reset spot_5300-09-03
preferable however to run a force check on the SPOT instead:
checking a SPOT (verifies the usability of a SPOT, and rebuild network boot image if necessary and change its state to "ready for use"):
nim -Fo check spot_5300-09-03
checking the contents of the spot (verifies that software was installed successfully on a spot resource):
nim -o lppchk -a show_progress=yes spot_5200_08
Creating a SPOT from an mksysb (created spot can be used only for this mksysb):
smitty nim_mkres -> spot -> enter the values needed (the Source of Install Image should be the mksysb)
checking if a SPOT contains a fileset:
nim -o showres 'spot_5300-11-04_bb1' | grep bos.alt_disk_install.rte
nim -o lslpp -a filesets="bos.alt_disk_install.rte" spot_5300-11-04_bb1
checking if a SPOT contains a specific driver:
e.g. lsdev -Cc adapter displayed this driver "2514310025140100"
nim -o lslpp AIX_6100-06_SPOT |grep 2514310025140100
checking a SPOT level (similar to instfix -i | grep ML):
root@aixnim1: / # nim -o fix_query spot_5200-08 | grep ML
All filesets for 5.2.0.0_AIX_ML were found.
All filesets for 5200-01_AIX_ML were found.
All filesets for 5200-02_AIX_ML were found.
update a spot with an lpp_source:
nim -o cust -a fixes=update_all -a lpp_source=5305_lpp 5305_spot
SPOT is an installed entity, like any other AIX system, so it can run into cases where it has broken filesets, broken links, or missing/corrupt files. They are also fixed in the same manner as you would on any other system:
nim -o lppchk -a lppchk_flags="v" 5305_spot <--use the "Force Overwrite" or "Force Reinstall" options for -v errors
nim -o lppchk -a lppchk_flags="l" 5305_spot <--using the "-ul" flags for missing links from "-l" errors
nim -o lppchk -a lppchk_flags="c" 5305_spot <--replacing bad files for any "-c" output
----------------------------------
Spot creation with SMITTY:
smitty nim -> perform nim administration -> manage resources -> define a resource (spot)
Resource Name [spot_TL7_SP3]
Resource Type spot
Server of Resource [master]
Source of Install Images [TL7_SP3]
Location of Resource [/nim/spots]
...
COMMIT software updates? yes
----------------------------------
Spot update with SMITTY:
(bos.alt_disk_install.rte fileset will be added to a spot)
smitty nim -> perform nim softw. inst. -> inst. and upd. softw. -> Inst. softw. (spot -> lpp_source)
Installation Target spot_TL7_SP3
LPP_SOURCE TL7_SP3
Software to Install [+ 6.1.7.2 Alt. Disk Inst. Runt.] <--after F4 -> bos.alt_disk_install.rte with F7
...
installp Flags
COMMIT software updates? [yes]
SAVE replaced files? [no]
----------------------------------
Essentially the SPOT is a /usr filesystem just like the one on your NIM master. Everything that a machine requires in a /usr file system, such as the AIX kernel, executable commands, libraries, and applications are included in the SPOT. During client install client needs to run commands (mkvg, mklv..), these commands are availabe in the SPOT.
During the installation, the client machine NFS mounts this resource in order to access the code needed for the installation process. Device drivers, the BOS install program, and other necessary code needed to perform a base operating system installation are found inside the SPOT.
SPOT is responsible for
- Creating a boot image to send to the client machine over the network.
- Running the commands needed to install the NIM client.
You can think of it as having multiple "mini-systems" on your NIM master, because each SPOT is its own /usr filesystem. You can upgrade it, add fixes to it, use it to boot a client system....etc.
You can also create a SPOT from a NIM mksysb resource. This SPOT however is not as versatile as one created from an lpp_source and can not be upgraded with any fixes and can only be used with the mksysb resource it was created from.
When a SPOT is created, network boot images are constructed in the /tftpboot directory using code from the newly created SPOT. When a client performs a network boot, it uses tftp to obtain a boot image from the server. After the boot image is loaded into memory at the client, the SPOT is mounted in the client's RAM file system to provide all additional software support required to complete the operation.
root@aixnim1: / # lsnim -l spot_5300_09
spot_5300_09:
class = resources
type = spot
plat_defined = chrp
Rstate = ready for use
prev_state = ready for use
location = /nim/spot/spot_5300_09/usr <--shows the location
...
operations:
reset = reset an object's NIM state
cust = perform software customization
showres = show contents of a resource
maint = perform software maintenance
lslpp = list LPP information about an object
fix_query = perform queries on installed fixes
showlog = display a log in the NIM environment
check = check the status of a NIM object
lppchk = verify installed filesets
update_all = update all currently installed filesets
creating a SPOT (only the top directory should be specified, the SPOT directory will be created automatically):
nim -o define -t spot -a server=master -a location=/nim/spot -a source=5300-09-03 -a installp_flags=-aQg spot_5300-09-03
resetting a SPOT (if an operation failed, with this the resource state (Rstate) will be updated, and SPOT is ready to use):
nim -Fo reset spot_5300-09-03
preferable however to run a force check on the SPOT instead:
checking a SPOT (verifies the usability of a SPOT, and rebuild network boot image if necessary and change its state to "ready for use"):
nim -Fo check spot_5300-09-03
checking the contents of the spot (verifies that software was installed successfully on a spot resource):
nim -o lppchk -a show_progress=yes spot_5200_08
Creating a SPOT from an mksysb (created spot can be used only for this mksysb):
smitty nim_mkres -> spot -> enter the values needed (the Source of Install Image should be the mksysb)
checking if a SPOT contains a fileset:
nim -o showres 'spot_5300-11-04_bb1' | grep bos.alt_disk_install.rte
nim -o lslpp -a filesets="bos.alt_disk_install.rte" spot_5300-11-04_bb1
checking if a SPOT contains a specific driver:
e.g. lsdev -Cc adapter displayed this driver "2514310025140100"
nim -o lslpp AIX_6100-06_SPOT |grep 2514310025140100
checking a SPOT level (similar to instfix -i | grep ML):
root@aixnim1: / # nim -o fix_query spot_5200-08 | grep ML
All filesets for 5.2.0.0_AIX_ML were found.
All filesets for 5200-01_AIX_ML were found.
All filesets for 5200-02_AIX_ML were found.
update a spot with an lpp_source:
nim -o cust -a fixes=update_all -a lpp_source=5305_lpp 5305_spot
SPOT is an installed entity, like any other AIX system, so it can run into cases where it has broken filesets, broken links, or missing/corrupt files. They are also fixed in the same manner as you would on any other system:
nim -o lppchk -a lppchk_flags="v" 5305_spot <--use the "Force Overwrite" or "Force Reinstall" options for -v errors
nim -o lppchk -a lppchk_flags="l" 5305_spot <--using the "-ul" flags for missing links from "-l" errors
nim -o lppchk -a lppchk_flags="c" 5305_spot <--replacing bad files for any "-c" output
----------------------------------
Spot creation with SMITTY:
smitty nim -> perform nim administration -> manage resources -> define a resource (spot)
Resource Name [spot_TL7_SP3]
Resource Type spot
Server of Resource [master]
Source of Install Images [TL7_SP3]
Location of Resource [/nim/spots]
...
COMMIT software updates? yes
----------------------------------
Spot update with SMITTY:
(bos.alt_disk_install.rte fileset will be added to a spot)
smitty nim -> perform nim softw. inst. -> inst. and upd. softw. -> Inst. softw. (spot -> lpp_source)
Installation Target spot_TL7_SP3
LPP_SOURCE TL7_SP3
Software to Install [+ 6.1.7.2 Alt. Disk Inst. Runt.] <--after F4 -> bos.alt_disk_install.rte with F7
...
installp Flags
COMMIT software updates? [yes]
SAVE replaced files? [no]
----------------------------------
Creating a SPOT from an mksysb file:
(this SPOT is valid only for that specific LPAR, to do maintenance boot for example)
1. check mksysb level:
# lsmksysb -lf <mksysb file> | grep SERVICEPACK
(outout, likeS SERVICEPACK LEVEL: 7200-03-01-1838)
2. define mksysb resource from mksysb file
# nim -o define -t mksysb -a server=master -a location=<path to mksysb file> <mksysb resoruce name>
3. define spot from mksysb resource:
# nim -o define -t spot -a server=master -a source=<mksysb resource> -a location=<where to create the spot resource> <spot name>
----------------------------------
NIM - MKSYSB
MKSYSB:
This resource is a file containing the image of the root volume group (generated with the AIX mksysb command) of a machine. It is used to restore a machine, or to install it from scratch (also known as “cloning” a client).
creating mksysb from a nim client:
nim -o define -t mksysb -a server=master -a source=LPAR5 -a mk_image=yes -a location=/export/images/mksysb.lpar5 mksysb_lpar5
-o define Specifies the operation (defining the object).
-t mksysb Specifies that the object type is mksysb.
-a server=master Specifies that the server to hold this object (the mksysb image file) is the NIM master itself.
-a source=LPAR5 Specifies the NIM client to be used as the source the mksysb image, in this case, LPAR5.
-a mk_image=yes Specifies that the mksysb image file should be created.
-a location=/export/images/mksysb.lpar5 Specifies the path and filename for the mksysb image file.
mksysb_lpar5 Specifies the NIM object name for this mksysb image.
additional attributes:
-a verbose=2 it will show more detailed progress view
-a size_preview=yes shows the size
-a mksysb_flags=T specifies mksysb options described in the mksysb man page. (lsnim -Pa mksysb_flags)
Here: [-T] use jfs2 filesystem snapshots.
checking the size of an mksysb before creating it:
nim -o define -t mksysb -a server=master -a source=LPAR5 -a size_preview=yes -a mk_image=yes -a location=/nim/mksysb.lpar5 mksysb_lpar5
------------------
mkszfile and image.data:
The image.data file, being responsible for rebuilding the system structure during mksysb restore. If you added somme new filesystems to the system, but image.data has not been updated, then it has no knowledge of the added filesystems. What will happen during an mksysb resore, is all of that extra data will be put into your / (root) filesystem. It will likely fill up to 100%, and your mksysb restore will fail.
If you create your mksysb backups in smit, the default option is already set to run mkszfile command so your image.data file will be updated. When running a mksysb from command line you use the '-i' flag to make sure the /image.data file gets updated. Alternately you can execute it manually (mkszfile)
------------------
remove an mksysb image smitty nim --> Perform NIM Admin. --> Manage resources --> Remove a resource (remove mksysb image: yes)
nim -o remove aix40 remove a resource (it won't do any mksysbs about that system in the future)
(smitty nim-> manage machines ->change show machines ->remove machine)
lsmksysb -l -f /path/to/nim_file verifying an mksysb (or smitty nim_verify_res )
listvgbackup lists or restores the content (or a file) of an mksysb
(listvgbackup -lf <mksysbname>, is the same as lsmksysb -lf <mksysbname>)
extracting a file from an mksysb:
cd /tmp <--the file will be extracted here
restore -xvqf /nim/mksysb/mksysb_aix21.1224 ./bosinst.data <--bosinst.data file will be extracted from mksysb_aix21...
with other command:
listvgbackup -f /nim/mksysb/m_aix10_5200-08_110412 -r -s -d /tmp/bb ./sbin/rc.d <--copy from mksysb the directory /sbin/rc.d to /tmp/bb
------------------
Excluding files from mksysb:
If you want to exclude certain files from the backup, create the /etc/exclude.rootvg file on the client and enter the patterns of file names that you do not want included in your system backup image.
for example:
^./tmp/ it will backup /tmp, but not the content of it (at restore an empty /tmp will be created)
(exclude the contents of /tmp and avoid excluding any other directories which have /tmp in the pathname)
^./tmp it won't backup /tmp at all (at restore an empty /tmp won't be created)
/temp/ exclude the contents of every /temp/ directory on the system ("^." makes the exclude start from / and look for your path)
old$ $ indicates that the search should end at the end of the line
or on the master:
1. edit exclude file (on master):
root@aixnim1: /nim/lppsource # vi exclude.aix31
"exclude_aix31" [New file]
^./usr/local/stat/
(in this way the empty filesystems will be backed up, just the files under it will not be backed up)
(the file should be created in a dir which is exported, so client can mount it)
2. creating the exclude_files NIM object
nim -o define -a verbose=2 -t exclude_files -a server=master -a location=/nim/lppsource/exclude.aix31 exclude_aix31
3. using exclude_files when creating mksysb image
nim -o define -t mksysb -a server=master -a source=aix31 -a mk_image=yes -a location=/nim/mksysb/mksysb_aix31_test -a exclude_files=exclude_aix31 mksysb_aix31_test
------------------
0042-304 m_mkbosi: Unable to obtain the space allocation lock...
This type of error usually occurs when a NIM define operation has been halted abruptly such as with a control-C.
In my case mksysb creation has been halted abruptly, and after I killed the stucked mksysb processes the lock was cleared automatically and I was able to do a new mksysb.
I have found this on the net:
"When NIM defines objects such as a mksysb, SPOT, or LPP, NIM puts a lock in /var/adm/nim/.
To fix, on the NIM master, make sure there are no NIM operations running other than nimesis.
Run lsnim -a locked to see which NIM object has the locks.
Remove lock in /var/adm/nim and run /usr/lpp/bos.sysmgt/nim/methods/m_chattr -a locked="" <NIM object> to clean up the NIM db entry."
------------------
0042-207 m_mkbosi: Unable to allocate the exclude_nim resource to machine
Always check the following:
# lsnim -l [machine]
if you see already allocated resources like spot or mksysb image, then you must [reset] the machine and remove the resources if those are not currently used:
# nim -F -o reset [machine]
# nim -o deallocate -a exclude_files=exclude_nim [machine] <--with the correct resource
If that does not help check the following things:
- rsh connection from NIM master to client (in case of errors check [.rhosts] [/etc/hosts.allow] [/etc/hosts.deny] files)
- can the client reach the exported NIM master NFS directory
- on the client: does the NIM master defined correctly in /etc/hosts?
- check route settings on NIM master
- in case you have multiple LPARs and only one produce this error, compare the failed node to the others.
------------------
0042-274 m_mkbosi: The '-e' flag in the mksysb_flags attribute and the exclude_files attribute cannot be specified together.
Specify the '-e' flag with the mksysb_flags attribute to
exclude the files in /etc/exclude.rootvg on aix41 from
the backup, or specify an exclude_files attribute.
Always check the following:
# lsnim -l [machine]
if you see already allocated exclude_files resource, then you must [reset] the machine and remove the resource if that is not currently used.
# nim -F -o reset [machine]
# nim -o deallocate -a exclude_files=exclude_nim [machine]
------------------
mksysb restore from VIO server with use of image.data
We want to change PP size of rootvg, and for this we will create an image.data file and mksysb what will be restored.
1. create image.data file and change PPSIZE line:
root@bb_lpar: / # mkszfile
root@bb_lpar: / # vi /image.data
PPSIZE= 128
(it was PPSIZE= 64, whaat we changed to 128)
2. create an mksysb (and use the updated image.data)
root@bb_lpar: / # mksysb /bb/bb_lpar.mksysb
Creating list of files to back up.
Backing up 76740 files.............................
0512-003 mksysb may not have been able to archive some files.
The messages displayed on the Standard Error contained additional
information.
(if you omit the -i parameter with mksysb it will use our updated image.data and won't create a new one)
(I have recieved this output, but it did not cause any problem for me; you can check mksysb with lsmksysb command)
3. creating iso image from mksysb:
root@bb_lpar: /bb # mkcd -L -S -I /bb/image -m /bb/bb_lpar.mksysb
Initializing mkcd log: /var/adm/ras/mkcd.log...
Verifying command parameters...
Creating temporary file system: /mkcd/cd_fs...
Populating the CD or DVD file system...
Building chrp boot image...
Copying backup to the CD or DVD file system...
...........
-L creates DVD sized images (up to 4.38 GB).
-S stops mkcd before writing to DVD without removing the created images
-I specifies the directory where to create the image
-m specifies a previously created mksysb image (-m will not create a new mksysb file)
Other useful flags:
-M specifies the target dir for the new mskysb file when not using -m
-T use jfs2 external filesystem snapshot to create the mksysb
4. renamig the image to a more meaningful name and copying it to the VIOS virtual media library
root@bb_lpar: /bb/image # mv cd_image_8323206 bb_lpar_mksysb.iso
root@bb_lpar: /bb/image # scp bb_lpar_mksysb.iso aix-vios1:/var/vio/VMLibrary
5. (creating virtual optical device if needed) and loding the iso image to it
padmin@aix-vios1 : /home/padmin # loadopt -vtd vtopt0 -disk bb_lpar_mksysb.iso
After rebooting the AIX server, from this mksysb iso image system can be restored.
------------------
This resource is a file containing the image of the root volume group (generated with the AIX mksysb command) of a machine. It is used to restore a machine, or to install it from scratch (also known as “cloning” a client).
creating mksysb from a nim client:
nim -o define -t mksysb -a server=master -a source=LPAR5 -a mk_image=yes -a location=/export/images/mksysb.lpar5 mksysb_lpar5
-o define Specifies the operation (defining the object).
-t mksysb Specifies that the object type is mksysb.
-a server=master Specifies that the server to hold this object (the mksysb image file) is the NIM master itself.
-a source=LPAR5 Specifies the NIM client to be used as the source the mksysb image, in this case, LPAR5.
-a mk_image=yes Specifies that the mksysb image file should be created.
-a location=/export/images/mksysb.lpar5 Specifies the path and filename for the mksysb image file.
mksysb_lpar5 Specifies the NIM object name for this mksysb image.
additional attributes:
-a verbose=2 it will show more detailed progress view
-a size_preview=yes shows the size
-a mksysb_flags=T specifies mksysb options described in the mksysb man page. (lsnim -Pa mksysb_flags)
Here: [-T] use jfs2 filesystem snapshots.
checking the size of an mksysb before creating it:
nim -o define -t mksysb -a server=master -a source=LPAR5 -a size_preview=yes -a mk_image=yes -a location=/nim/mksysb.lpar5 mksysb_lpar5
------------------
mkszfile and image.data:
The image.data file, being responsible for rebuilding the system structure during mksysb restore. If you added somme new filesystems to the system, but image.data has not been updated, then it has no knowledge of the added filesystems. What will happen during an mksysb resore, is all of that extra data will be put into your / (root) filesystem. It will likely fill up to 100%, and your mksysb restore will fail.
If you create your mksysb backups in smit, the default option is already set to run mkszfile command so your image.data file will be updated. When running a mksysb from command line you use the '-i' flag to make sure the /image.data file gets updated. Alternately you can execute it manually (mkszfile)
------------------
remove an mksysb image smitty nim --> Perform NIM Admin. --> Manage resources --> Remove a resource (remove mksysb image: yes)
nim -o remove aix40 remove a resource (it won't do any mksysbs about that system in the future)
(smitty nim-> manage machines ->change show machines ->remove machine)
lsmksysb -l -f /path/to/nim_file verifying an mksysb (or smitty nim_verify_res )
listvgbackup lists or restores the content (or a file) of an mksysb
(listvgbackup -lf <mksysbname>, is the same as lsmksysb -lf <mksysbname>)
extracting a file from an mksysb:
cd /tmp <--the file will be extracted here
restore -xvqf /nim/mksysb/mksysb_aix21.1224 ./bosinst.data <--bosinst.data file will be extracted from mksysb_aix21...
with other command:
listvgbackup -f /nim/mksysb/m_aix10_5200-08_110412 -r -s -d /tmp/bb ./sbin/rc.d <--copy from mksysb the directory /sbin/rc.d to /tmp/bb
------------------
Excluding files from mksysb:
If you want to exclude certain files from the backup, create the /etc/exclude.rootvg file on the client and enter the patterns of file names that you do not want included in your system backup image.
for example:
^./tmp/ it will backup /tmp, but not the content of it (at restore an empty /tmp will be created)
(exclude the contents of /tmp and avoid excluding any other directories which have /tmp in the pathname)
^./tmp it won't backup /tmp at all (at restore an empty /tmp won't be created)
/temp/ exclude the contents of every /temp/ directory on the system ("^." makes the exclude start from / and look for your path)
old$ $ indicates that the search should end at the end of the line
or on the master:
1. edit exclude file (on master):
root@aixnim1: /nim/lppsource # vi exclude.aix31
"exclude_aix31" [New file]
^./usr/local/stat/
(in this way the empty filesystems will be backed up, just the files under it will not be backed up)
(the file should be created in a dir which is exported, so client can mount it)
2. creating the exclude_files NIM object
nim -o define -a verbose=2 -t exclude_files -a server=master -a location=/nim/lppsource/exclude.aix31 exclude_aix31
3. using exclude_files when creating mksysb image
nim -o define -t mksysb -a server=master -a source=aix31 -a mk_image=yes -a location=/nim/mksysb/mksysb_aix31_test -a exclude_files=exclude_aix31 mksysb_aix31_test
------------------
0042-304 m_mkbosi: Unable to obtain the space allocation lock...
This type of error usually occurs when a NIM define operation has been halted abruptly such as with a control-C.
In my case mksysb creation has been halted abruptly, and after I killed the stucked mksysb processes the lock was cleared automatically and I was able to do a new mksysb.
I have found this on the net:
"When NIM defines objects such as a mksysb, SPOT, or LPP, NIM puts a lock in /var/adm/nim/.
To fix, on the NIM master, make sure there are no NIM operations running other than nimesis.
Run lsnim -a locked to see which NIM object has the locks.
Remove lock in /var/adm/nim and run /usr/lpp/bos.sysmgt/nim/methods/m_chattr -a locked="" <NIM object> to clean up the NIM db entry."
------------------
0042-207 m_mkbosi: Unable to allocate the exclude_nim resource to machine
Always check the following:
# lsnim -l [machine]
if you see already allocated resources like spot or mksysb image, then you must [reset] the machine and remove the resources if those are not currently used:
# nim -F -o reset [machine]
# nim -o deallocate -a exclude_files=exclude_nim [machine] <--with the correct resource
If that does not help check the following things:
- rsh connection from NIM master to client (in case of errors check [.rhosts] [/etc/hosts.allow] [/etc/hosts.deny] files)
- can the client reach the exported NIM master NFS directory
- on the client: does the NIM master defined correctly in /etc/hosts?
- check route settings on NIM master
- in case you have multiple LPARs and only one produce this error, compare the failed node to the others.
------------------
0042-274 m_mkbosi: The '-e' flag in the mksysb_flags attribute and the exclude_files attribute cannot be specified together.
Specify the '-e' flag with the mksysb_flags attribute to
exclude the files in /etc/exclude.rootvg on aix41 from
the backup, or specify an exclude_files attribute.
Always check the following:
# lsnim -l [machine]
if you see already allocated exclude_files resource, then you must [reset] the machine and remove the resource if that is not currently used.
# nim -F -o reset [machine]
# nim -o deallocate -a exclude_files=exclude_nim [machine]
------------------
mksysb restore from VIO server with use of image.data
We want to change PP size of rootvg, and for this we will create an image.data file and mksysb what will be restored.
1. create image.data file and change PPSIZE line:
root@bb_lpar: / # mkszfile
root@bb_lpar: / # vi /image.data
PPSIZE= 128
(it was PPSIZE= 64, whaat we changed to 128)
2. create an mksysb (and use the updated image.data)
root@bb_lpar: / # mksysb /bb/bb_lpar.mksysb
Creating list of files to back up.
Backing up 76740 files.............................
0512-003 mksysb may not have been able to archive some files.
The messages displayed on the Standard Error contained additional
information.
(if you omit the -i parameter with mksysb it will use our updated image.data and won't create a new one)
(I have recieved this output, but it did not cause any problem for me; you can check mksysb with lsmksysb command)
3. creating iso image from mksysb:
root@bb_lpar: /bb # mkcd -L -S -I /bb/image -m /bb/bb_lpar.mksysb
Initializing mkcd log: /var/adm/ras/mkcd.log...
Verifying command parameters...
Creating temporary file system: /mkcd/cd_fs...
Populating the CD or DVD file system...
Building chrp boot image...
Copying backup to the CD or DVD file system...
...........
-L creates DVD sized images (up to 4.38 GB).
-S stops mkcd before writing to DVD without removing the created images
-I specifies the directory where to create the image
-m specifies a previously created mksysb image (-m will not create a new mksysb file)
Other useful flags:
-M specifies the target dir for the new mskysb file when not using -m
-T use jfs2 external filesystem snapshot to create the mksysb
4. renamig the image to a more meaningful name and copying it to the VIOS virtual media library
root@bb_lpar: /bb/image # mv cd_image_8323206 bb_lpar_mksysb.iso
root@bb_lpar: /bb/image # scp bb_lpar_mksysb.iso aix-vios1:/var/vio/VMLibrary
5. (creating virtual optical device if needed) and loding the iso image to it
padmin@aix-vios1 : /home/padmin # loadopt -vtd vtopt0 -disk bb_lpar_mksysb.iso
After rebooting the AIX server, from this mksysb iso image system can be restored.
------------------
Subscribe to:
Posts (Atom)