AIX for System Administrators: HA

HA - Diskheartbeat

!!! In PowerHA 7.1 and later there is no disk heartbeat anymore. !!!
!!! Below document is valid only for old HACMP configurations. !!!

Disk Heartbeat

Heartbeat disks should be used in enhanced concurrent mode. Enhanced concurrent mode disks use RSCT group services to control locking, thus freeing up a sector on the disk that can now be used for communication. This sector, which was formerly used for SSA Concurrent mode disks, is now used for writing heartbeat information.

Any disk that is part of an enhanced concurrent volume group can be used for a diskhb network, including those used for data storage. Also, the volume group that contains the disk used for a diskhb network does not have to be varied on.

An enhanced concurrent volume group is not the same as a concurrent volume group (which is part of a concurrent resource group), rather, it refers to the mode of locking by using RSCT.

DISK HEARTBEAT:

lspv | grep hb          <--shows the actual state of heartbeat disks
cltopinfo -i | grep hb    <--shows what had been saved into the configuration (we have to change it to show the actual state)

Configuring from scratch

After the disk heartbeat VGs are ready (enhanced concurrent, should be varied off: mkvg -C ...), there are 2 methods for configuring
(there could be problems with Discovery method so Pre-Defined is suggested )

-PRE-DEFINED DEVICES METHOD: Manually have to to create a diskhb network first, then assign the disk-node pair devices to the network

    1. Create diskhb network
    Extended Configuration->Extended Topology->Configure HACMP Networks->Add a Network...
        choose:diskhb

        * Network Name                                       [anything you want]
        * Network Type                                        diskhb

    2. Add device
    Extended Configuration->Extended Topology->Configure HACMP Comm. Interfaces/Dev.->Add ...
        Add Pre-defined...
        Communication Devices
        Choose your diskhb Network Name

    * Device Name                                         [aix41_diskhb2] <--choose a unique name
        * Network Type                                        diskhb
        * Network Name                                        net_diskhb_aix41_aix42
        * Device Path                                         [/dev/vpath4]
        * Node Name                                           [aix41]

    You will repeat this process for the other node and the other device. This will complete both devices for the diskhb network.

(-VERIFICATION AND SYNCHRONIZATION: After configuring you can do: Extended Config.->Extended Verification ... (with default settings))

or:
-DISCOVERY METHOD: let the HA find it for us

    1. Run Discovery
    Extended Configuration->Discover HACMP-related Information from Configured Nodes
    (This will run automatically and create a clip_config file that contains the information it has discovered)

    2. Add device
     Extended Configuration->Extended Topology->Configure HACMP Comm. Interfaces/Dev.->Add ...
        Add discovered ...
        Communication Devices

        Choose appropriate devices (ex. vpath0 and vpath3)
        Select Point-to-Point Pair of Discovered Communication Devices to Add (one or more items can be selected (F7))

-----------------------------

Testing disk heartbeat:

DO NOT PERFORM THIS TEST WHILE HACMP IS RUNNING???

dhb_read -p devicename        <--dump diskhb sector contents
dhb_read -p devicename -r     <--receive data over diskhb network
dhb_read -p devicename -t     <--transmit data over diskhb network

1. on one node set receiving::
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -r

2. on the other node set transmit:
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -t

dhb_read -p rvpath0 -r <--Note: That the device name is raw device as designated with the "r" proceeding the device name.

If everything is OK: Link operating normally

-----------------------------

Monitoring disk heartbeat:

root@aix41: / # lssrc -ls topsvcs
Subsystem         Group            PID     Status
topsvcs          topsvcs          921638 active
Network Name   Indx Defd Mbrs St   Adapter ID      Group ID
VLAN200_10_20_ [ 0] 2     2     S    10.10.10.2      10.10.10.2
VLAN200_10_20_ [ 0] en11             0x41c64107      0x41c64108
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
...

27 comments:

Anonymous said...: hi,
What is concurrentvg and its advantages also difference between this and enhanced concurrentvg; September 21, 2012 at 10:54 AM
aix said...: Hi,
I have found a very good description here:
http://unix.worldiswelcome.com/non-concurrent-concurrent-and-enhanced-concurrent-volume-groups

Non Concurrent VG:
------------------
In Non Concurrent VG, application runs typically on a single node only and data of VG is accessible to that node only. If however present node fails then application moves to other nodes and VG is varied on that node. Then only present node can access the data on VG.

Concurrent VG:
--------------
In concurrent VG, the application runs on all nodes of cluster simultaneously. All of the nodes can access the data at the same time. All of the nodes can read and write the data simultaneously. The data integrity is then the responsibility of the application.

Enhanced Concurrent VG:
-----------------------
In enhanced concurrent VG, access to VG is allowed on all of the nodes of cluster. Certain restrictions apply in this case. The VG can be varied on different nodes in active or passive states.

-Active State Varyon:
In active state varyon of VG the following operations are allowed:
- Filesystem operations and mounting of filesystems.
- Operations on Logical Volumes.
- Operations on application.
- Syncing of VGs.

-Passive State Varyon
In passive state varyon of VG certain restrictions apply:
- Filesystems mounting and operations on filesystems are not allowed.
- No operations on Lvs are allowed.
- Syncing of VGs is not allowed.

The following are allowed:
- LVM read only access to Vgs special file is allowed.
- LVM read only access to first 4k of all Lvs under VG is allowed.; September 21, 2012 at 11:23 AM
Siva said...: Hi,

Thanks for prompt response. why we are using concurrentvg rather than nonconcurrentvG in hacmp configuration.

Regards,
Siva; September 21, 2012 at 12:54 PM
aix said...: Hi,
OK, ConcurrentVG is good for concurrent resource groups, where more than 1 node is in online state.

Regards,
Balazs; September 21, 2012 at 1:45 PM
Siva said...: Hi,

Could please explain the work flow of disk heartbeat polling, how it identifies failure and move the RG from one node to other.

Rgds,
Siva; September 21, 2012 at 5:38 PM
aix said...: Disk heartbeat is used for writing to the disk by the 1st node and reading it by the 2nd node. In case of problems there is the network heartbeat to determine if node is up or not. After HACMP could decide, for example node is down, then it initiates switching the resource group. Stops application, remove IP label, umount filesystems, varyoff vg from the first node, then on the 2nd node varyonvg, mount filesystems, adding IP label, and starting application.; September 21, 2012 at 9:19 PM
Unknown said...: Gr8 ....; October 20, 2012 at 11:54 AM
Anonymous said...: root@aix41: / # lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 921638 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
VLAN200_10_20_ [ 0] 2 2 S 10.10.10.2 10.10.10.2
VLAN200_10_20_ [ 0] en11 0x41c64107 0x41c64108
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0

Why would the IP's be same ?; November 12, 2012 at 6:02 PM
aix said...: "lssrc -ls topsvcs" gives a long output, I put there only a small part. Every network, which is configured in the cluster has a section in the output. What you see as "VLAN200_10_20" that is my network name and the details of the network (IP, interface name) are next to it. So the IP's are not the same, that is the network name.; November 13, 2012 at 10:05 AM
Anonymous said...: Thanks for the valuable information; December 19, 2012 at 7:21 AM
aix said...: welcome :); December 19, 2012 at 9:17 PM
Anonymous said...: How to test disk heartbeat while cluster is running? What are precautions need to take care before this.; February 3, 2013 at 7:04 PM
aix said...: dhb_read is the the command, what can be used for testing, but this is written in PowerHA for AIX Cookbook: "Disk heartbeat testing can only be run when PowerHA is not running on the nodes"

As IBM does not suggest running dhb_read while cluster is running, I don't know other method....
(Once I tried dhb_read command on an online cluster and it worked successfully for me...); February 3, 2013 at 8:26 PM
Anonymous said...: Hi, Do you have any procedure for replacing the hdisk which is part under hb_vg.; February 26, 2013 at 10:45 PM
aix said...: Hi, I would remove the disk heartbeat network, after the vg and disk. And then I would configure again from scratch. It is an online action, so Resource Group can stay online. If this is done a synch. and verification could be run as well.; February 27, 2013 at 8:55 AM
RedmiR said...: Hi, Can you help, how to check heartbeat for tcp network.; April 27, 2013 at 1:47 PM
aix said...: Hi, 'cltopinfo -m' is showing heartbeat statistics, missed heartbeats...; April 27, 2013 at 5:15 PM
Anonymous said...: We are looking to Migrate from 1 EMC Array to a new one. We have a Disk Heartbeat Setup. Would it make sense to create a new Disk Heartbeat and then remove the old one? Also can this be performed with the HA Cluster Up?; May 15, 2013 at 5:26 PM
aix said...: Yes, it can be done while cluster is up. Cluster is sending heartbeats via network interfaces as well, so if it is configured correctly you can remove old Disk Heartbeat and create a new one, and cluster will survive without problems. If you want to be on the safe side, you can do that way as you wrote :-); May 15, 2013 at 9:03 PM
Anonymous said...: Hi

when will use enhanced concurrent vg

when will use concurrent vg

plz help me ..

Regards,
Narendra; July 14, 2013 at 9:55 AM
aix said...: Hi, you can find some info about these here: http://aix4admins.blogspot.hu/2011/05/rsct-based-shared-storage-protection.html; July 14, 2013 at 5:52 PM
Anonymous said...: how will create concurrent vg without hacmp file sets ?it is possible or not?; July 16, 2013 at 4:23 PM
Anonymous said...: Will I be able to add disk Heartbeat while HACMP is running?
Configured on one node, but miss out the configuration on another node...; February 26, 2014 at 11:02 AM
Unknown said...: why the disk heart beat applicable only for enhanced concurrent vg ?????; October 19, 2014 at 5:06 PM
Unknown said...: i think its not possible..because without installing the hacmp filesets smitty hacmp will not work.....then its not possible to create it......; October 20, 2014 at 4:54 AM
Anonymous said...: how can we remove heart beat ?; March 13, 2017 at 12:41 PM
Anonymous said...: how to remove missed heart beat in cluster . check once in below output and suggest me ?

root$lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 13762710 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
net_ether_01_0 [ 0] 2 2 S 192.168.100.15 192.168.100.16
net_ether_01_0 [ 0] en0 0x47498a9f 0x47498ad3
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 1 Current group: 1
Packets sent : 24934310 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 32420167 ICMP 0 Dropped: 0
NIM's PID: 13500470
diskhb_0 [ 1] 2 2 S 255.255.10.1 255.255.10.1
diskhb_0 [ 1] rhdisk5 0x87498a9e 0x8749f4b7
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 4 Current group: 4
Packets sent : 12508603 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 11871643 ICMP 0 Dropped: 0
NIM's PID: 13369558
2 locally connected Clients with PIDs:
haemd(13238320) hagsd(13959246)
Fast Failure Detection available but off.
Dead Man Switch Enabled:
reset interval = 1 seconds
trip interval = 20 seconds
Client Heartbeating Disabled.
Configuration Instance = 19
Daemon employs no security
Segments pinned: Text Data.
Text segment size: 869 KB. Static data segment size: 1507 KB.
Dynamic data segment size: 6849. Number of outstanding malloc: 130
User time 1594 sec. System time 1064 sec.
Number of page faults: 118. Process swapped out 0 times.
Number of nodes up: 2. Number of nodes down: 0.
root$; March 13, 2017 at 2:16 PM

dropdown menu

HA - Diskheartbeat

27 comments: