HA - SSP

Shared Storage Pools (SSP):

Shared Storage Pool is a storage virtualization technique. The concept is that VIO servers create a cluster, and they own the same disks assigned from SAN (for example a 1TB LUN). Then with special commands we can create smaller disks (logical units) which we can assign to AIX LPARs as usual vscsi disks. With this technique we have more control over the storage of each AIX LPAR, and we have less depenedency on the storage team.(As the VIO servers sharig the same storge resources LPM is possible by default.)

Only Virtual I/O Server can be part of a the SSP cluster, which is based on Cluster Aware AIX (CAA) and RSCT technology. The Virtual I/O Servers in the cluster communicate with each other using Ethernet connections. They share the repository disk and the disks for the storage pool through the SAN. (The repository is contained in a cluster filesystem that has been developed specifically for the purpose of storage virtualization, which is located at /var/vio/SSP/bb_cluster/D_E_F_A_U_L_T_061310)



Cluster can consist:
VIOS version 2.2.0.11, Fix Pack 24, Service Pack 1         <--1 node
VIOS version 2.2.1.3                                       <--4 node
VIOS Version 2.2.2.0                                       <--16 node
VIOS Version 2.2.5.0                                       <--24 node

On the Virtual I/O Server, the poold daemon handles group services and is running in the user space. The vio_daemon daemon is responsible for monitoring the health of the cluster nodes and the pool, plus the pool capacity:
poold      - handles group services
vio_daemon - monitors the health of the cluster nodes and the pool

# lssrc -ls vio_daemon
Node ID:                      2aab290eb80911e9801c98be9402197c
Log File:                     /home/ios/logs/viod.log
VKE Kernel Socket:            4
VKE Daemon Socket:            5
Bound to :                    /home/ios/socks/vioke_unix
API Socket:                   8
Bound to :                    /home/ios/socks/api_eve_unix
Cluster Name:                 SSP_Cluster_1
Cluster ID:                   e9ebb214090711e8800298be946fc362
PNN NODE ID:                  00000000000000000000000000000000
DBN NODE ID:                  dabfd57c1f1d11e9801698be9402197c
Pool Label:                   SSP_1
Pool VIO Name:                D_E_F_A_U_L_T_061310
Pool ID:                      FFFFFFFFAC17583D000000005A75F1BF
Pool State:                   UP
Pool Sync Status:             COMPLETED
Repository Cluster Mode:      EVENT
Repository Disk State:        UP
DBN Role:                     Other
PNN Role:                     Other


------------------------------------------------------------------------------

Logical Units

When using shared storage pools, the Virtual I/O Server provides storage to AIX LPARs through logical units. A logical unit is a file backed storage device that resides in the cluster filesystem (/var/vio/SSP/bb_cluster/D_E_F_A_U_L_T_061310) in the shared storage pool. It appears as a virtual SCSI disk in the client partition.

The physical volumes in the shared storage pool are managed as an aggregation of physical blocks, and user data is stored in these blocks.  After the physical blocks are allocated to a logical unit to write actual data, the physical blocks are not released from the logical unit until the logical unit is removed from the shared storage pool. Deleting files, file systems or logical volumes, which reside on the virtual disk on a client partition does not increase free space of the shared storage pool.

The system reserves a small amount of each physical volume in the shared storage pool to record meta-data.


------------------------------------------------------------------------------

Thin provisioning

A thin-provisioned device represents a larger disk space than the actual physical disk space. It is not fully backed by physical storage as long as the blocks are not in use. A thin-provisioned logical unit is defined with a user-specified size when it is created. It appears in the client partition as a virtual SCSI disk with that user-specified size. However blocks on the physical disks in the shared storage pool are only allocated when they are used. 

Consider a shared storage pool that has a size of 20 GB. If you create a logical unit with a size of 15 GB, the client partition will see a virtual disk with a size of 15 GB. But as long as the client partition does not write to the disk, only a small portion of that space will initially be used from the shared storage pool. If you create a second logical unit also with a size of 15 GB, the client partition will see two virtual SCSI disks, each with a size of 15 GB. So although the shared storage pool has only 20 GB of physical disk space, the client partition sees 30 GB of disk space in total. 

After the client partition starts writing to the disks, physical blocks will be allocated in the shared storage pool and the amount of free space in the shared storage pool will decrease. Deleting files or logical volumes from the shared storage pool, on a client partition does not increase free space of the shared storage pool.

When the shared storage pool is full, client partitions will see an I/O error on the virtual SCSI disk. Therefore even though the client partition will report free space to be available on a disk, that information might not be accurate if the shared storage pool is full. 

To prevent such a situation, the shared storage pool provides a threshold that, if reached, writes an event in the errorlog of the Virtual I/O Server.

(If you use -thick flag with mkdbsp command, not a thin provisioned disk, but a usual disk (thick) will be created and client will have all the disk space.)

------------------------------------------------------------------------------

Creating SSP

When a cluster is created, we must specify one physical volume for the repository disk and one for the usual storage pool data, which provides storage to the client partitions. The repository disk is used to perform cluster communication and store the cluster configuration.

If you need to increase the free space in the shared storage pool, you can either add an additional physical volume or you can replace an existing volume with a bigger one. Removing a physical volum will fail if there is not enough free space in the shared storage pool to accommodate the data from the physical volume that is removed.  


Requirements:
-each VIO Server must resolve correctly other VIO servers in cluster (DNS or /etc/hosts must be filled up with all VIO Servers)
-hostname command should show FQDN (with domain.com)
-VLAN tagging interfaces are not supported in earlier VIO versions for cluster communications
-fibre channel adapter should be set to dyntrk=yes, fc_err_recov=fast_fail
-disks reserve policy should be set to no_reserve and all VIO Server must have these disk in available state.
-1 disk is needed for repository (min 10GB) and 1 or more for data (min 10GB) (these should be SAN FC LUNs)
-Active Memory Sharing paging space cannot be on SSP disk


# cluster -create -clustername bb_cluster -spname bb_pool -repopvs hdiskpower1 -sppvs hdiskpower2 -hostname bb_vio1
        clustername    bb_cluster                                                <--name of the cluster
        -spname bb_pool                                                          <--storage pool name
        -repopvs hdiskpower1                                                     <--disk of repository
        -sppvs hdiskpower2                                                       <--storage pool disk
        -hostname bb_vio1                                                        <--VIO Server hostname (where to create cluster)

(This command will create cluster, start CAA daemons and create shared storage pool)
(On HMC v9 it is possible to create SSP in GUI, which makes it very easy.)


------------------------------------------------------------------------------

General commands:

cluster ...        <--for cluster create, add/remove node, list, status
lssp ...           <--for the pool free space
lu ...             <--for virtual disk create and control
pv ...             <--for controlling the LUNs in the LUN pool
failgrp ...        <--for creating the pool mirror
lscluster ...      <--for high level view of the hdisk / LUN names
(mkbdsp, rmbdsp commands are available but they have complicated syntax, so IBM created later new commands for easier usafe like: lu, pv)

/var/vio/SSP                                                       cluster related directory (and files) will be created in this path

cluster -list                                                      display cluster name and ID
cluster -status -clustername bb_cluster                            display cluster state and pool state on each node
cluster -addnode -clustername bb_cluster -hostname bb_vio2         adding node to the cluster
cluster -rmnode -clustername bb_cluster -hostname bb_vios1         remove node from cluster
cluster -delete -clustername bb_cluster                            remove cluster completely

lssp -clustername bb_cluster                                       list storage pool details (pool size, free space...)
lssp -clustername bb_cluster -sp bb_pool -bd                       list created LUNs in the storage pool (backing devices in lsmap -all)

lspv -clustername bb_cluster -sp bb_pool                           list physical volumes of shared storage pool (disk size, id)
lspv -clustername bb_cluster -capable                              list which disk can be added to the cluster
lsmap -clustername SSP_Cluster_1 -all                              list disk mappings to vscsi devices

                                                                                 
storage commands:
pv -list
pv -add hdisk3 hdisk4                                              add 2 disks to ssp
pv -remove -pv hdisk4                                              remove a pv from ssp 
                                                                   (fails if not enough free space in ssp to accomodate data of the pv)
pv –replace –clustername bb –sp mysp –oldpv hdisk1 -newpv hdisk2   replace hdisk1 with hdisk2 in ssp

lu -list                                                           list lu name, size, ID (lu commands are since VIOS 2.2.3.1)
lu -list -verbose                                                  lists all details in stanza (thin, tier, snapshot…)
lu -list -fmt : -field LU_SIZE LU_USED_PERCENT LU_USED_SPACE LU_UNUSED_SPACE   lists spec. fields (good for scripting)
lu -list -attr provisioned=true                                    list lus that are mapped to an LPAR
lu -list -attr provisioned=false                                   list lus that are not mapped to an LPAR
lu -resize -lu <disk> -size 128G                                   change the size of the logical unit
lu -create -lu vdisk1 -size 10G                                    create a lun with 10GB
lu -remove -lu vdisk1                                              remove a lun (this is newer command than rmdsp)
lu –map –lu vdisk1 –vadapter vhost1                                maps a lun to a vscsi adapter
lu -unmap ...                                                      unmaps a lun from a vhost adapter


failgrp commands: (A failure group (a mirror) is a set of physical volumes that are treated as a single point of failure by the system.)
failgrp -create ...                                                create failure group
failgrp -remove ...                                                remove failure group
failgrp -list (-verbose) ...                                       list failure group
failgrp -modify ...                                                modify failure group


CAA commands:
lscluster -c                                                       list cluster configuration
lscluster -d                                                       list disk details of the cluster
lscluster -m                                                       list info about nodes (interfaces) of the cluster
lscluster -s                                                       list network statistics of the local node (packets sent...)
lscluster -i -n bb_cluster                                         list interface information of the cluster

odmget -q "name=hdiskpower2 and attribute=unique_id" CuAt          checking LUN ID (as root)

cleandisk -r hdiskX                                                clean cluster repository disk signature from hdisk
cleandisk -s hdiskX                                                clean storage pool signature (after this disk can be added to SSP)
chrepos -n globular -r +hdisk16                                    move/rebuild repo. disk on hdisk16 (repo. issues won't bring down SSP)
chrepos -n -r +hdisk5 -hdisk1                                      replace repository disk (hdisk1 is replaced with new hdisk5)

-----------------------
storage commands (old):
mkbdsp -clustername bb_cluster -sp bb_pool 10G -bd bb_disk2                      creating a 10G LUN
mkbdsp -clustername bb_cluster -sp bb_pool -bd bb_disk2 -vadapter vhost0         assigning LUN to a vhost adapter (lsmap will show)
mkbdsp -clustername bb_cluster -sp bb_pool -luudid c7ef7a2 -vadapter vhost0      same as above just with LUN ID
rmbdsp -clustername bb_cluster -sp bb_pool -bd bb_disk1                          remove LUN (backing device will be deleted from vhost)
chsp -add -clustername bb_cluster -sp bb_pool hdiskpower2                        adding disk to a shared storage pool
-----------------------

------------------------------------------------------------------------------

Create cluster and Shared Storage Pool:

1. create a cluster and pool: cluster -create ...
2. adding additional nodes to the cluster: cluster -addnode
3. checking which physical volume can be added: lspv -cluatername clusterX -capable
4. adding physical volume: chsp -add
5. create and map LUNS to clients: mkdsp -clustername...

------------------------------------------------------------------------------

Add new LUN to SSP and PowerVC (via HMC)

1. request new LUN from Storage team to all VIO servers
2. cfgmgr on VIO servers (you should see new disks)
3. on HMC: Shared Storage Pool --> click on our SSP --> on the new page click on the check button --> Action --> Add capacity

------------------------------------------------------------------------------

Replacing disks


Replacing a pv
chsp command will replace the original disk (oldpv) with the new disk (newpv).  The command can be run from any node in the cluster.(Before replacing a disk, make sure disk is available on all nodes, if lspv output is the same on all nodes disk replacement happens automatically on all nodes.)

replacing oldpv/hdisk6 with newpv/hdisk5:
$ chsp -replace -clustername mycluster -sp mysp -oldpv hdisk6 -newpv hdisk5
Current request action progress: % 5
Current request action progress: % 11
Current request action progress: % 99
Current request action progress: % 100
Current request action progress: % 100

same as above just using newer pv command:
$ pv –replace –clustername mycluster –sp mysp –oldpv hdisk6 -newpv hdisk5

If lspv output is the same on all nodes, the SSP disk replacement is automatically propagated to all nodes in the cluster. At this point, the oldpv is released and can be removed. This procedure can be done while SSP storage is provisioned to an AIX client. In the above example and AIC LPAR used the disk for user volume group with a file system where file system I/O was happening. AIX client errlog showed no new entries during and after the SSP disk was replaced.


Replacing repository disk
The replace operation works on a functional or failed repository disk. When the repository disk fails, the cluster remains operational. While the repository disk is in a failed state, all requests for cluster configuration fail. After you replace the failed disk, the cluster will be fully functional

the hdisk1 repository disk is replaced with the hdisk5 repository disk:
$ chrepos -n -r +hdisk5 -hdisk1

------------------------------------------------------------------------------

Removing LUN:

$ lu -list -attr provisioned=false
POOL_NAME: SSP_1
TIER_NAME: System
LU_NAME                 SIZE(MB)    UNUSED(MB)  UDID
bb-aix-110-NovaLink-d1  51200       0           7d2750b08a772e1dbeb9adbcdc98f76c
volume-bb-aix-pci61m-1~ 153600      0           505efe7dd1b204bf112563d820a44df2
volume-my-openstack--2~ 153600      0           590de5e7109ce9087e2c521904466241

$ lu -remove -clustername SSP_Cluster_1 -lu ld-aix-h10-NovaLink-d1
Logical unit bb-aix-110-NovaLink-d1 with udid "7d2750b08a772e1dbeb9adbcdc98f76c" is removed.

$ lu -remove -clustername SSP_Cluster_1 -luudid 590de5e7109ce9087e2c521904466241
Logical unit  with udid "590de5e7109ce9087e2c521904466241" is removed.

------------------------------------------------------------------------------

Managing snapshots:

Snapshots from a LUN can be created which later can be restored in case of any problems

# snapshot -create bb_disk1_snap -clustername bb_cluster -spname bb_pool -lu bb_disk1    <--create a snapshot

# snapshot -list -clustername bb_cluster -spname bb_pool                                 <--list snapshots of a storage pool
Lu Name          Size(mb)    ProvisionType    Lu Udid
bb_disk1         10240       THIN             4aafb883c949d36a7ac148debc6d4ee7
Snapshot
bb_disk1_snap

# snapshot -rollback bb_disk1_snap -clustername bb_cluster -spname bb_pool -lu bb_disk1  <--rollback a snapshot to a LUN
$ snapshot -delete bb_disk1_snap -clustername bb_cluster -spname bb_pool -lu bb_disk1    <--delete a snapshot

------------------------------------------------------------------------------

Checking if an LU is a clone from other LU (or not)

The LU_UDID_DERIVED_FROM field tells you (if it has a value) this LU is a clone of the other LU with that particular LU_UDID or "N/A" means it is not a clone.

$ lu -list -verbose | grep -p 775f0715fe27ef06914cccb9194607a0
...
LU_PROVISION_TYPE:THIN
LU_UDID_DERIVED_FROM:775f0715fe27ef06914cccb9194607a0         <-- clone of the below
LU_MOVE_STATUS:N/A
LU_SNAPSHOTS:N/A

...
LU_PROVISION_TYPE:THIN
LU_UDID_DERIVED_FROM:N/A                                      <-- master copy
LU_MOVE_STATUS:N/A
LU_SNAPSHOTS:0d932572c08ca5523a43c64a209d7832IMSnap

----------------------------------------------------------------------------

Migrating an LPAR to PowerVC on SSP (with "dd")

We had to move an old Linux LPAR from an old Power Server with vSCSI to a new Power Server with SSP. This new Power server was managed by PowerVC. The old LPAR was using a vSCSI LUN (as a VIO client), and we copied this LUN with "dd" into the SSP.

1. # shutdown -h now                                              <--stop the Linux server we want to migrate (to avoid any io errors)

2. # dd if=/dev/rh5.d1 of=/home/rh5.d1.dd bs=1M                   <--save with dd the disk of Linux server on VIO
                                                                  (as root on the VIO, check ulimit and free space)

3. create a new VM (rh5_new) in PowerVC with an empty boot disk   <--boot disk should have exactly the same size as the original LPAR

4. # mount nim01:/migrate /mnt                                    <--copy or nfs mount the dd file to the VIO server which has the SSP

5. # ls -ltr /var/vio/SSP/SSP_Cluster_1/D_E_F_A_U_L_T_061310/VOL1 <--find the LUN on VIOS with SSP where we need to do the dd again
                                                                  (all SSP data is under this directory)                   
--w-------    1 root     system          253 Apr 24 12:46 .volume-rh5_new.d1-936e9092-fb3c.7e6745bce57e4b5c01452e91f0322feb
-rwx------    1 root     system   75161927680 Apr 24 12:56 volume-rh5_new.d1-936e9092-fb3c.7e6745bce57e4b5c01452e91f0322feb
-rwx------    1 root     system   75161927680 Apr 24 12:56 volume-rh5_new-692aa120-0000007a-boot-0-565121dd-87d3.8fa2e2c8b1c6ca63f059a86f32389544
--w-------    1 root     system          327 Apr 24 12:57 .volume-rh5_new-692aa120-0000007a-boot-0-565121dd-87d3.8fa2e2c8b1c6ca63f059a86f32389544
(the files with . are not important, from the big size files, we need only which contains "boot", this belong to the RedHat VM, the other is just the volume used in the general image)

6. dd to this file:
# dd if=/mnt/rh5.d1.dd of=/var/vio/SSP/SSP_Cluster_1/D_E_F_A_U_L_T_061310/VOL1/volume-rh5-new-692aa120-0000007a-boot-0-565121dd-87d3.8fa2e2c8b1c6ca63f059a86f32389544 bs=1M
71680+0 records in.
71680+0 records out.

After that VM can be started. We need to go to the SMS menu to manually choose disk device:
5.   Select Boot Options
2.   Configure Boot Device Order
1.   Select 1st Boot Device
6.   List All Devices
2.        -      SCSI 69 GB Harddisk, part=1 ()
2.   Set Boot Sequence: Configure as 1st Boot Device

After that during boot, the boot device was found, but we had an error:  can't allocate kernel memory

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
/
Elapsed time since release of system processors: 121069 mins 33 secs

Config file read, 1024 bytes
Welcome
Welcome to yaboot version 1.3.13 (Red Hat 1.3.13-14.el5)
Enter "help" to get some basic usage information
boot: linux
Please wait, loading kernel...
Claim error, can't allocate 900000 at 0xc00000
Claim error, can't allocate kernel memory
boot:

The solution was to change below address from c00000 to 2000000.
To do this do another reboot and go to Firmware Prompt, and do following steps:
 8 = Open Firmware Prompt           

     Memory      Keyboard     Network     Speaker  ok
0 > printenv real-base
-------------- Partition: common -------- Signature: 0x70 ---------------
real-base                c00000              c00000
 ok
0 > setenv real-base 2000000  ok
0 > printenv real-base
-------------- Partition: common -------- Signature: 0x70 ---------------
real-base                2000000             c00000
 ok
0 > reset-all

After that reboot was successful and old Linux server was running nicely on the new hardware with SSP.

----------------------------------------------------------------------------

Setting alerts for Shared Storage Pools:

As thin provisioning is in place, real storage free space cannot be seen exactly. If storage pool gets 100% full, IO error will occur on client LPAR. To avoid this alerts can be configured:

$ alert -list -clustername bb_cluster -spname bb_pool
PoolName                 PoolID                             Threshold%
bb_pool                  000000000A8C1517000000005150C18D   35                        <--it shows the free percentage

# alert -set -clustername bb_cluster -spname bb_pool -type threshold -value 25        <--if free space goes below 25% it will alert

# alert -list -clustername bb_cluster -spname bb_pool
PoolName                 PoolID                             Threshold%
bb_pool                  000000000A8C1517000000005150C18D   25                        <--new value can be seen here

$ alert -unset -clustername bb_cluster -spname bb_pool                                <--unset an alert

in errlog you can see the warning:
0FD4CF1A   0424082818 I O VIOD_POOL      Informational Message

----------------------------------------

VIO + SSP Backups
https://www.ibm.com/developerworks/community/blogs/cgaix/entry/Automatically_backup_VIOS_configuration_changes?lang=en

on VIO, under the crontab of root:
0 * * * * /usr/ios/sbin/autoviosbr -start 1>/dev/null 2>/dev/null

Every hour takes a node backup (VIOS config) + on the database node of SSP cluster a cluster backup (SSP config)
The location is in home dir under cfgbackups (/home/padmin/cfgbackups)

Checking the status:
$ viosbr -autobackup status -type node
Node configuration changes:Complete.                    <--complete means: we have a backup already
                                                           pending means: we have changes which will be saved the next hourly backup
viosbr -autobackup status -type cluster                 <--same for the cluster backup

Checking the SSP database node:
$ cluster -list
CLUSTER_NAME:    SSP_Cluster_1
CLUSTER_ID:      e9ebb214090711e8800298be946fc362

$  cluster -status -clustername SSP_Cluster_1 -verbose | grep -p DBN
    Node Name:            ls-aix-h01.mgmt.lab.dynatrace.org
    Node Id:              e9f36c52090711e8800298be946fc362
    Node MTM:             8284-22A02783844X
    Node Partition Num:   1
    Node State:           OK
    Node Repos State:     OK
    Node Upgrade Status:  2.2.6.21 ON_LEVEL
    Node Roles:           DBN
        Pool Name:        SSP_1
        Pool Id:          FFFFFFFFAC17583D000000005A75F1BF
        Pool State:       OK

----------------------------------------


9 comments:

  1. Hello can you please update more on thin and thick privisioning please

    ReplyDelete
  2. I'd like to see a topic on troubleshooting a VIO cluster. I'm not yet convinced of the stability of this setup due to the network dependency. Why was this not designed with disk communication rather than ( or in conjunction with ) network like PowerHA.

    ReplyDelete
  3. ABEND FATAL AWK IN MUXPROC

    ReplyDelete
  4. Is there any command to display the usage of a virtual LUN over all VIOs ?
    To see if a LUN is mapped in one VIO before giving it to another LPAR ?

    ReplyDelete
  5. Will it be possible to extend LUN ( thin) which is already presented to client machine in VSSP

    ReplyDelete
  6. How would I reliably relate a disk on the LPAR back to its storage pool backing device on the VIOS please?
    For physical VSCSI assigned disks you can use the PVID but I am not sure how to do so with SSP allocated storage

    ReplyDelete
    Replies
    1. Solution found :

      On LPAR:
      user/> lspv -u | grep hdisk3
      hdisk3 00f68d5a61122458 datavg active 412173194C707447A2AB56DF8EB93F320FB2F103303 NVDISK03IBMvscsi f410292-f9e1-5a24-7503-34d6a282
      user/>
      (Removed excess spaces to make it easier to read)

      Take the number in the 5th column and remove the first 5 and last 6 digits :
      3194C707447A2AB56DF8EB93F320FB2F

      On the VIOS :
      padmin/> lssp -clustername MyClusterName -sp MySSPName -bd | grep -i 3194C707447A2AB56DF8EB93F320FB2F
      v3a_vhost18_dvg00 256000 THIN 78% 55804 3194c707447a2ab56df8eb93f320fb2f
      padmin/>

      The first column is the storage pool lun.
      Note: the number is upper case on LPAR and lower case on VIOS.

      This number is consistent across all VIOS in the storage pool so you can reliably identify the disk from anywhere in the storage pool.

      Delete
    2. Thanks a lot Michael for the solution :)

      Delete
    3. lspv -u | grep hdiskXXX | awk '{print tolower($0)}' | awk '{print $5}' | cut -c 6-37

      Delete