Shared Storage Pools (SSP):
Shared Storage Pool is a storage virtualization technique. The concept is that VIO servers create a cluster, and they own the same disks assigned from SAN (for example a 1TB LUN). Then with special commands we can create smaller disks (logical units) which we can assign to AIX LPARs as usual vscsi disks. With this technique we have more control over the storage of each AIX LPAR, and we have less depenedency on the storage team.(As the VIO servers sharig the same storge resources LPM is possible by default.)
Only Virtual I/O Server can be part of a the SSP cluster, which is based on Cluster Aware AIX (CAA) and RSCT technology. The Virtual I/O Servers in the cluster communicate with each other using Ethernet connections. They share the repository disk and the disks for the storage pool through the SAN. (The repository is contained in a cluster filesystem that has been developed specifically for the purpose of storage virtualization, which is located at /var/vio/SSP/bb_cluster/D_E_F_A_U_L_T_061310)
Cluster can consist:
VIOS version 2.2.0.11, Fix Pack 24, Service Pack 1 <--1 node
VIOS version 2.2.1.3 <--4 node
VIOS Version 2.2.2.0 <--16 node
VIOS Version 2.2.5.0 <--24 node
On the Virtual I/O Server, the poold daemon handles group services and is running in the user space. The vio_daemon daemon is responsible for monitoring the health of the cluster nodes and the pool, plus the pool capacity:
poold - handles group services
vio_daemon - monitors the health of the cluster nodes and the pool
# lssrc -ls vio_daemon
Node ID: 2aab290eb80911e9801c98be9402197c
Log File: /home/ios/logs/viod.log
VKE Kernel Socket: 4
VKE Daemon Socket: 5
Bound to : /home/ios/socks/vioke_unix
API Socket: 8
Bound to : /home/ios/socks/api_eve_unix
Cluster Name: SSP_Cluster_1
Cluster ID: e9ebb214090711e8800298be946fc362
PNN NODE ID: 00000000000000000000000000000000
DBN NODE ID: dabfd57c1f1d11e9801698be9402197c
Pool Label: SSP_1
Pool VIO Name: D_E_F_A_U_L_T_061310
Pool ID: FFFFFFFFAC17583D000000005A75F1BF
Pool State: UP
Pool Sync Status: COMPLETED
Repository Cluster Mode: EVENT
Repository Disk State: UP
DBN Role: Other
PNN Role: Other
------------------------------------------------------------------------------
Logical Units
When using shared storage pools, the Virtual I/O Server provides storage to AIX LPARs through logical units. A logical unit is a file backed storage device that resides in the cluster filesystem (/var/vio/SSP/bb_cluster/D_E_F_A_U_L_T_061310) in the shared storage pool. It appears as a virtual SCSI disk in the client partition.
The physical volumes in the shared storage pool are managed as an aggregation of physical blocks, and user data is stored in these blocks. After the physical blocks are allocated to a logical unit to write actual data, the physical blocks are not released from the logical unit until the logical unit is removed from the shared storage pool. Deleting files, file systems or logical volumes, which reside on the virtual disk on a client partition does not increase free space of the shared storage pool.
The system reserves a small amount of each physical volume in the shared storage pool to record meta-data.
------------------------------------------------------------------------------
Thin provisioning
A thin-provisioned device represents a larger disk space than the actual physical disk space. It is not fully backed by physical storage as long as the blocks are not in use. A thin-provisioned logical unit is defined with a user-specified size when it is created. It appears in the client partition as a virtual SCSI disk with that user-specified size. However blocks on the physical disks in the shared storage pool are only allocated when they are used.
Consider a shared storage pool that has a size of 20 GB. If you create a logical unit with a size of 15 GB, the client partition will see a virtual disk with a size of 15 GB. But as long as the client partition does not write to the disk, only a small portion of that space will initially be used from the shared storage pool. If you create a second logical unit also with a size of 15 GB, the client partition will see two virtual SCSI disks, each with a size of 15 GB. So although the shared storage pool has only 20 GB of physical disk space, the client partition sees 30 GB of disk space in total.
After the client partition starts writing to the disks, physical blocks will be allocated in the shared storage pool and the amount of free space in the shared storage pool will decrease. Deleting files or logical volumes from the shared storage pool, on a client partition does not increase free space of the shared storage pool.
When the shared storage pool is full, client partitions will see an I/O error on the virtual SCSI disk. Therefore even though the client partition will report free space to be available on a disk, that information might not be accurate if the shared storage pool is full.
To prevent such a situation, the shared storage pool provides a threshold that, if reached, writes an event in the errorlog of the Virtual I/O Server.
(If you use -thick flag with mkdbsp command, not a thin provisioned disk, but a usual disk (thick) will be created and client will have all the disk space.)
------------------------------------------------------------------------------
Creating SSP
When a cluster is created, we must specify one physical volume for the repository disk and one for the usual storage pool data, which provides storage to the client partitions. The repository disk is used to perform cluster communication and store the cluster configuration.
If you need to increase the free space in the shared storage pool, you can either add an additional physical volume or you can replace an existing volume with a bigger one. Removing a physical volum will fail if there is not enough free space in the shared storage pool to accommodate the data from the physical volume that is removed.
Requirements:
-each VIO Server must resolve correctly other VIO servers in cluster (DNS or /etc/hosts must be filled up with all VIO Servers)
-hostname command should show FQDN (with domain.com)
-VLAN tagging interfaces are not supported in earlier VIO versions for cluster communications
-fibre channel adapter should be set to dyntrk=yes, fc_err_recov=fast_fail
-disks reserve policy should be set to no_reserve and all VIO Server must have these disk in available state.
-1 disk is needed for repository (min 10GB) and 1 or more for data (min 10GB) (these should be SAN FC LUNs)
-Active Memory Sharing paging space cannot be on SSP disk
# cluster -create -clustername bb_cluster -spname bb_pool -repopvs hdiskpower1 -sppvs hdiskpower2 -hostname bb_vio1
clustername bb_cluster <--name of the cluster
-spname bb_pool <--storage pool name
-repopvs hdiskpower1 <--disk of repository
-sppvs hdiskpower2 <--storage pool disk
-hostname bb_vio1 <--VIO Server hostname (where to create cluster)
(This command will create cluster, start CAA daemons and create shared storage pool)
(On HMC v9 it is possible to create SSP in GUI, which makes it very easy.)
------------------------------------------------------------------------------
General commands:
cluster ... <--for cluster create, add/remove node, list, status
lssp ... <--for the pool free space
lu ... <--for virtual disk create and control
pv ... <--for controlling the LUNs in the LUN pool
failgrp ... <--for creating the pool mirror
lscluster ... <--for high level view of the hdisk / LUN names
(mkbdsp, rmbdsp commands are available but they have complicated syntax, so IBM created later new commands for easier usafe like: lu, pv)
/var/vio/SSP cluster related directory (and files) will be created in this path
cluster -list display cluster name and ID
cluster -status -clustername bb_cluster display cluster state and pool state on each node
cluster -addnode -clustername bb_cluster -hostname bb_vio2 adding node to the cluster
cluster -rmnode -clustername bb_cluster -hostname bb_vios1 remove node from cluster
cluster -delete -clustername bb_cluster remove cluster completely
lssp -clustername bb_cluster list storage pool details (pool size, free space...)
lssp -clustername bb_cluster -sp bb_pool -bd list created LUNs in the storage pool (backing devices in lsmap -all)
lspv -clustername bb_cluster -sp bb_pool list physical volumes of shared storage pool (disk size, id)
lspv -clustername bb_cluster -capable list which disk can be added to the cluster
lsmap -clustername SSP_Cluster_1 -all list disk mappings to vscsi devices
storage commands:
pv -list
pv -add hdisk3 hdisk4 add 2 disks to ssp
pv -remove -pv hdisk4 remove a pv from ssp
(fails if not enough free space in ssp to accomodate data of the pv)
pv –replace –clustername bb –sp mysp –oldpv hdisk1 -newpv hdisk2 replace hdisk1 with hdisk2 in ssp
lu -list list lu name, size, ID (lu commands are since VIOS 2.2.3.1)
lu -list -verbose lists all details in stanza (thin, tier, snapshot…)
lu -list -fmt : -field LU_SIZE LU_USED_PERCENT LU_USED_SPACE LU_UNUSED_SPACE lists spec. fields (good for scripting)
lu -list -attr provisioned=true list lus that are mapped to an LPAR
lu -list -attr provisioned=false list lus that are not mapped to an LPAR
lu -resize -lu <disk> -size 128G change the size of the logical unit
lu -create -lu vdisk1 -size 10G create a lun with 10GB
lu -remove -lu vdisk1 remove a lun (this is newer command than rmdsp)
lu –map –lu vdisk1 –vadapter vhost1 maps a lun to a vscsi adapter
lu -unmap ... unmaps a lun from a vhost adapter
failgrp commands: (A failure group (a mirror) is a set of physical volumes that are treated as a single point of failure by the system.)
failgrp -create ... create failure group
failgrp -remove ... remove failure group
failgrp -list (-verbose) ... list failure group
failgrp -modify ... modify failure group
CAA commands:
lscluster -c list cluster configuration
lscluster -d list disk details of the cluster
lscluster -m list info about nodes (interfaces) of the cluster
lscluster -s list network statistics of the local node (packets sent...)
lscluster -i -n bb_cluster list interface information of the cluster
odmget -q "name=hdiskpower2 and attribute=unique_id" CuAt checking LUN ID (as root)
cleandisk -r hdiskX clean cluster repository disk signature from hdisk
cleandisk -s hdiskX clean storage pool signature (after this disk can be added to SSP)
chrepos -n globular -r +hdisk16 move/rebuild repo. disk on hdisk16 (repo. issues won't bring down SSP)
chrepos -n -r +hdisk5 -hdisk1 replace repository disk (hdisk1 is replaced with new hdisk5)
-----------------------
storage commands (old):
mkbdsp -clustername bb_cluster -sp bb_pool 10G -bd bb_disk2 creating a 10G LUN
mkbdsp -clustername bb_cluster -sp bb_pool -bd bb_disk2 -vadapter vhost0 assigning LUN to a vhost adapter (lsmap will show)
mkbdsp -clustername bb_cluster -sp bb_pool -luudid c7ef7a2 -vadapter vhost0 same as above just with LUN ID
rmbdsp -clustername bb_cluster -sp bb_pool -bd bb_disk1 remove LUN (backing device will be deleted from vhost)
chsp -add -clustername bb_cluster -sp bb_pool hdiskpower2 adding disk to a shared storage pool
-----------------------
------------------------------------------------------------------------------
Create cluster and Shared Storage Pool:
1. create a cluster and pool: cluster -create ...
2. adding additional nodes to the cluster: cluster -addnode
3. checking which physical volume can be added: lspv -cluatername clusterX -capable
4. adding physical volume: chsp -add
5. create and map LUNS to clients: mkdsp -clustername...
------------------------------------------------------------------------------
Add new LUN to SSP and PowerVC (via HMC)
1. request new LUN from Storage team to all VIO servers
2. cfgmgr on VIO servers (you should see new disks)
3. on HMC: Shared Storage Pool --> click on our SSP --> on the new page click on the check button --> Action --> Add capacity
------------------------------------------------------------------------------
Replacing disks
Replacing a pv
chsp command will replace the original disk (oldpv) with the new disk (newpv). The command can be run from any node in the cluster.(Before replacing a disk, make sure disk is available on all nodes, if lspv output is the same on all nodes disk replacement happens automatically on all nodes.)
replacing oldpv/hdisk6 with newpv/hdisk5:
$ chsp -replace -clustername mycluster -sp mysp -oldpv hdisk6 -newpv hdisk5
Current request action progress: % 5
Current request action progress: % 11
Current request action progress: % 99
Current request action progress: % 100
Current request action progress: % 100
same as above just using newer pv command:
$ pv –replace –clustername mycluster –sp mysp –oldpv hdisk6 -newpv hdisk5
If lspv output is the same on all nodes, the SSP disk replacement is automatically propagated to all nodes in the cluster. At this point, the oldpv is released and can be removed. This procedure can be done while SSP storage is provisioned to an AIX client. In the above example and AIC LPAR used the disk for user volume group with a file system where file system I/O was happening. AIX client errlog showed no new entries during and after the SSP disk was replaced.
Replacing repository disk
The replace operation works on a functional or failed repository disk. When the repository disk fails, the cluster remains operational. While the repository disk is in a failed state, all requests for cluster configuration fail. After you replace the failed disk, the cluster will be fully functional
the hdisk1 repository disk is replaced with the hdisk5 repository disk:
$ chrepos -n -r +hdisk5 -hdisk1
------------------------------------------------------------------------------
Removing LUN:
$ lu -list -attr provisioned=false
POOL_NAME: SSP_1
TIER_NAME: System
LU_NAME SIZE(MB) UNUSED(MB) UDID
bb-aix-110-NovaLink-d1 51200 0 7d2750b08a772e1dbeb9adbcdc98f76c
volume-bb-aix-pci61m-1~ 153600 0 505efe7dd1b204bf112563d820a44df2
volume-my-openstack--2~ 153600 0 590de5e7109ce9087e2c521904466241
$ lu -remove -clustername SSP_Cluster_1 -lu ld-aix-h10-NovaLink-d1
Logical unit bb-aix-110-NovaLink-d1 with udid "7d2750b08a772e1dbeb9adbcdc98f76c" is removed.
$ lu -remove -clustername SSP_Cluster_1 -luudid 590de5e7109ce9087e2c521904466241
Logical unit with udid "590de5e7109ce9087e2c521904466241" is removed.
------------------------------------------------------------------------------
Managing snapshots:
Snapshots from a LUN can be created which later can be restored in case of any problems
# snapshot -create bb_disk1_snap -clustername bb_cluster -spname bb_pool -lu bb_disk1 <--create a snapshot
# snapshot -list -clustername bb_cluster -spname bb_pool <--list snapshots of a storage pool
Lu Name Size(mb) ProvisionType Lu Udid
bb_disk1 10240 THIN 4aafb883c949d36a7ac148debc6d4ee7
Snapshot
bb_disk1_snap
# snapshot -rollback bb_disk1_snap -clustername bb_cluster -spname bb_pool -lu bb_disk1 <--rollback a snapshot to a LUN
$ snapshot -delete bb_disk1_snap -clustername bb_cluster -spname bb_pool -lu bb_disk1 <--delete a snapshot
------------------------------------------------------------------------------
Checking if an LU is a clone from other LU (or not)
The LU_UDID_DERIVED_FROM field tells you (if it has a value) this LU is a clone of the other LU with that particular LU_UDID or "N/A" means it is not a clone.
$ lu -list -verbose | grep -p 775f0715fe27ef06914cccb9194607a0
...
LU_PROVISION_TYPE:THIN
LU_UDID_DERIVED_FROM:775f0715fe27ef06914cccb9194607a0 <-- clone of the below
LU_MOVE_STATUS:N/A
LU_SNAPSHOTS:N/A
...
LU_PROVISION_TYPE:THIN
LU_UDID_DERIVED_FROM:N/A <-- master copy
LU_MOVE_STATUS:N/A
LU_SNAPSHOTS:0d932572c08ca5523a43c64a209d7832IMSnap
----------------------------------------------------------------------------
Migrating an LPAR to PowerVC on SSP (with "dd")
We had to move an old Linux LPAR from an old Power Server with vSCSI to a new Power Server with SSP. This new Power server was managed by PowerVC. The old LPAR was using a vSCSI LUN (as a VIO client), and we copied this LUN with "dd" into the SSP.
1. # shutdown -h now <--stop the Linux server we want to migrate (to avoid any io errors)
2. # dd if=/dev/rh5.d1 of=/home/rh5.d1.dd bs=1M <--save with dd the disk of Linux server on VIO
(as root on the VIO, check ulimit and free space)
3. create a new VM (rh5_new) in PowerVC with an empty boot disk <--boot disk should have exactly the same size as the original LPAR
4. # mount nim01:/migrate /mnt <--copy or nfs mount the dd file to the VIO server which has the SSP
5. # ls -ltr /var/vio/SSP/SSP_Cluster_1/D_E_F_A_U_L_T_061310/VOL1 <--find the LUN on VIOS with SSP where we need to do the dd again
(all SSP data is under this directory)
--w------- 1 root system 253 Apr 24 12:46 .volume-rh5_new.d1-936e9092-fb3c.7e6745bce57e4b5c01452e91f0322feb
-rwx------ 1 root system 75161927680 Apr 24 12:56 volume-rh5_new.d1-936e9092-fb3c.7e6745bce57e4b5c01452e91f0322feb
-rwx------ 1 root system 75161927680 Apr 24 12:56 volume-rh5_new-692aa120-0000007a-boot-0-565121dd-87d3.8fa2e2c8b1c6ca63f059a86f32389544
--w------- 1 root system 327 Apr 24 12:57 .volume-rh5_new-692aa120-0000007a-boot-0-565121dd-87d3.8fa2e2c8b1c6ca63f059a86f32389544
(the files with . are not important, from the big size files, we need only which contains "boot", this belong to the RedHat VM, the other is just the volume used in the general image)
6. dd to this file:
# dd if=/mnt/rh5.d1.dd of=/var/vio/SSP/SSP_Cluster_1/D_E_F_A_U_L_T_061310/VOL1/volume-rh5-new-692aa120-0000007a-boot-0-565121dd-87d3.8fa2e2c8b1c6ca63f059a86f32389544 bs=1M
71680+0 records in.
71680+0 records out.
After that VM can be started. We need to go to the SMS menu to manually choose disk device:
5. Select Boot Options
2. Configure Boot Device Order
1. Select 1st Boot Device
6. List All Devices
2. - SCSI 69 GB Harddisk, part=1 ()
2. Set Boot Sequence: Configure as 1st Boot Device
After that during boot, the boot device was found, but we had an error: can't allocate kernel memory
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
/
Elapsed time since release of system processors: 121069 mins 33 secs
Config file read, 1024 bytes
Welcome
Welcome to yaboot version 1.3.13 (Red Hat 1.3.13-14.el5)
Enter "help" to get some basic usage information
boot: linux
Please wait, loading kernel...
Claim error, can't allocate 900000 at 0xc00000
Claim error, can't allocate kernel memory
boot:
The solution was to change below address from c00000 to 2000000.
To do this do another reboot and go to Firmware Prompt, and do following steps:
8 = Open Firmware Prompt
Memory Keyboard Network Speaker ok
0 > printenv real-base
-------------- Partition: common -------- Signature: 0x70 ---------------
real-base c00000 c00000
ok
0 > setenv real-base 2000000 ok
0 > printenv real-base
-------------- Partition: common -------- Signature: 0x70 ---------------
real-base 2000000 c00000
ok
0 > reset-all
After that reboot was successful and old Linux server was running nicely on the new hardware with SSP.
----------------------------------------------------------------------------
Setting alerts for Shared Storage Pools:
As thin provisioning is in place, real storage free space cannot be seen exactly. If storage pool gets 100% full, IO error will occur on client LPAR. To avoid this alerts can be configured:
$ alert -list -clustername bb_cluster -spname bb_pool
PoolName PoolID Threshold%
bb_pool 000000000A8C1517000000005150C18D 35 <--it shows the free percentage
# alert -set -clustername bb_cluster -spname bb_pool -type threshold -value 25 <--if free space goes below 25% it will alert
# alert -list -clustername bb_cluster -spname bb_pool
PoolName PoolID Threshold%
bb_pool 000000000A8C1517000000005150C18D 25 <--new value can be seen here
$ alert -unset -clustername bb_cluster -spname bb_pool <--unset an alert
in errlog you can see the warning:
0FD4CF1A 0424082818 I O VIOD_POOL Informational Message
----------------------------------------
VIO + SSP Backups
https://www.ibm.com/developerworks/community/blogs/cgaix/entry/Automatically_backup_VIOS_configuration_changes?lang=en
on VIO, under the crontab of root:
0 * * * * /usr/ios/sbin/autoviosbr -start 1>/dev/null 2>/dev/null
Every hour takes a node backup (VIOS config) + on the database node of SSP cluster a cluster backup (SSP config)
The location is in home dir under cfgbackups (/home/padmin/cfgbackups)
Checking the status:
$ viosbr -autobackup status -type node
Node configuration changes:Complete. <--complete means: we have a backup already
pending means: we have changes which will be saved the next hourly backup
viosbr -autobackup status -type cluster <--same for the cluster backup
Checking the SSP database node:
$ cluster -list
CLUSTER_NAME: SSP_Cluster_1
CLUSTER_ID: e9ebb214090711e8800298be946fc362
$ cluster -status -clustername SSP_Cluster_1 -verbose | grep -p DBN
Node Name: ls-aix-h01.mgmt.lab.dynatrace.org
Node Id: e9f36c52090711e8800298be946fc362
Node MTM: 8284-22A02783844X
Node Partition Num: 1
Node State: OK
Node Repos State: OK
Node Upgrade Status: 2.2.6.21 ON_LEVEL
Node Roles: DBN
Pool Name: SSP_1
Pool Id: FFFFFFFFAC17583D000000005A75F1BF
Pool State: OK
----------------------------------------
VIO + SSP Backups
https://www.ibm.com/developerworks/community/blogs/cgaix/entry/Automatically_backup_VIOS_configuration_changes?lang=en
on VIO, under the crontab of root:
0 * * * * /usr/ios/sbin/autoviosbr -start 1>/dev/null 2>/dev/null
Every hour takes a node backup (VIOS config) + on the database node of SSP cluster a cluster backup (SSP config)
The location is in home dir under cfgbackups (/home/padmin/cfgbackups)
Checking the status:
$ viosbr -autobackup status -type node
Node configuration changes:Complete. <--complete means: we have a backup already
pending means: we have changes which will be saved the next hourly backup
viosbr -autobackup status -type cluster <--same for the cluster backup
Checking the SSP database node:
$ cluster -list
CLUSTER_NAME: SSP_Cluster_1
CLUSTER_ID: e9ebb214090711e8800298be946fc362
$ cluster -status -clustername SSP_Cluster_1 -verbose | grep -p DBN
Node Name: ls-aix-h01.mgmt.lab.dynatrace.org
Node Id: e9f36c52090711e8800298be946fc362
Node MTM: 8284-22A02783844X
Node Partition Num: 1
Node State: OK
Node Repos State: OK
Node Upgrade Status: 2.2.6.21 ON_LEVEL
Node Roles: DBN
Pool Name: SSP_1
Pool Id: FFFFFFFFAC17583D000000005A75F1BF
Pool State: OK
----------------------------------------
Hello can you please update more on thin and thick privisioning please
ReplyDeleteI'd like to see a topic on troubleshooting a VIO cluster. I'm not yet convinced of the stability of this setup due to the network dependency. Why was this not designed with disk communication rather than ( or in conjunction with ) network like PowerHA.
ReplyDeleteABEND FATAL AWK IN MUXPROC
ReplyDeleteIs there any command to display the usage of a virtual LUN over all VIOs ?
ReplyDeleteTo see if a LUN is mapped in one VIO before giving it to another LPAR ?
Will it be possible to extend LUN ( thin) which is already presented to client machine in VSSP
ReplyDeleteHow would I reliably relate a disk on the LPAR back to its storage pool backing device on the VIOS please?
ReplyDeleteFor physical VSCSI assigned disks you can use the PVID but I am not sure how to do so with SSP allocated storage
Solution found :
DeleteOn LPAR:
user/> lspv -u | grep hdisk3
hdisk3 00f68d5a61122458 datavg active 412173194C707447A2AB56DF8EB93F320FB2F103303 NVDISK03IBMvscsi f410292-f9e1-5a24-7503-34d6a282
user/>
(Removed excess spaces to make it easier to read)
Take the number in the 5th column and remove the first 5 and last 6 digits :
3194C707447A2AB56DF8EB93F320FB2F
On the VIOS :
padmin/> lssp -clustername MyClusterName -sp MySSPName -bd | grep -i 3194C707447A2AB56DF8EB93F320FB2F
v3a_vhost18_dvg00 256000 THIN 78% 55804 3194c707447a2ab56df8eb93f320fb2f
padmin/>
The first column is the storage pool lun.
Note: the number is upper case on LPAR and lower case on VIOS.
This number is consistent across all VIOS in the storage pool so you can reliably identify the disk from anywhere in the storage pool.
Thanks a lot Michael for the solution :)
Deletelspv -u | grep hdiskXXX | awk '{print tolower($0)}' | awk '{print $5}' | cut -c 6-37
Delete