dropdown menu

HA - Diskheartbeat

!!! In PowerHA 7.1 and later there is no disk heartbeat anymore. !!!
!!!  Below document is valid only for old HACMP configurations.  !!!




Disk Heartbeat

Heartbeat disks should be used in enhanced concurrent mode. Enhanced concurrent mode disks use RSCT group services to control locking, thus freeing up a sector on the disk that can now be used for communication. This sector, which was formerly used for SSA Concurrent mode disks, is now used for writing heartbeat information.

Any disk that is part of an enhanced concurrent volume group can be used for a diskhb network, including those used for data storage. Also, the volume group that contains the disk used for a diskhb network does not have to be varied on.

An enhanced concurrent volume group is not the same as a concurrent volume group (which is part of a concurrent resource group), rather, it refers to the mode of locking by using RSCT.


DISK HEARTBEAT:

lspv | grep hb            <--shows the actual state of heartbeat disks
cltopinfo -i | grep hb    <--shows what had been saved into the configuration (we have to change it to show the actual state)


Configuring from scratch

After the disk heartbeat VGs are ready (enhanced concurrent, should be varied off: mkvg -C ...), there are 2 methods for configuring
(there could be problems with Discovery method so Pre-Defined is suggested )

-PRE-DEFINED DEVICES METHOD: Manually have to to create a diskhb network first, then assign the disk-node pair devices to the network

    1. Create diskhb network
    Extended Configuration->Extended Topology->Configure HACMP Networks->Add a Network...
        choose:diskhb

        * Network Name                                       [anything you want]
        * Network Type                                        diskhb


    2. Add device
    Extended Configuration->Extended Topology->Configure HACMP Comm. Interfaces/Dev.->Add ...
        Add Pre-defined...
        Communication Devices
        Choose your diskhb Network Name

        * Device Name                                         [aix41_diskhb2] <--choose a unique name
        * Network Type                                        diskhb
        * Network Name                                        net_diskhb_aix41_aix42
        * Device Path                                         [/dev/vpath4]
        * Node Name                                           [aix41]

    You will repeat this process for the other node and the other device. This will complete both devices for the diskhb network.

(-VERIFICATION AND SYNCHRONIZATION: After configuring you can do: Extended Config.->Extended Verification ... (with default settings))


or:
-DISCOVERY METHOD: let the HA find it for us

    1. Run Discovery
    Extended Configuration->Discover HACMP-related Information from Configured Nodes
    (This will run automatically and create a clip_config file that contains the information it has discovered)

    2. Add device
     Extended Configuration->Extended Topology->Configure HACMP Comm. Interfaces/Dev.->Add ...
        Add discovered ...
        Communication Devices

        Choose appropriate devices (ex. vpath0 and vpath3)
        Select Point-to-Point Pair of Discovered Communication Devices to Add (one or more items can be selected (F7))


-----------------------------

Testing disk heartbeat:

DO NOT PERFORM THIS TEST WHILE HACMP IS RUNNING???

dhb_read -p devicename        <--dump diskhb sector contents
dhb_read -p devicename -r     <--receive data over diskhb network
dhb_read -p devicename -t     <--transmit data over diskhb network


1. on one node set receiving::
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -r

2. on the other node set transmit:
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -t


dhb_read -p rvpath0 -r <--Note: That the device name is raw device as designated with the "r" proceeding the device name.

If everything is OK: Link operating normally

-----------------------------

Monitoring disk heartbeat:


root@aix41: / # lssrc -ls topsvcs
Subsystem         Group            PID     Status
 topsvcs          topsvcs          921638  active
Network Name   Indx Defd  Mbrs  St   Adapter ID      Group ID
VLAN200_10_20_ [ 0] 2     2     S    10.10.10.2      10.10.10.2
VLAN200_10_20_ [ 0] en11             0x41c64107      0x41c64108
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
...

27 comments:

Anonymous said...

hi,
What is concurrentvg and its advantages also difference between this and enhanced concurrentvg

aix said...

Hi,
I have found a very good description here:
http://unix.worldiswelcome.com/non-concurrent-concurrent-and-enhanced-concurrent-volume-groups

Non Concurrent VG:
------------------
In Non Concurrent VG, application runs typically on a single node only and data of VG is accessible to that node only. If however present node fails then application moves to other nodes and VG is varied on that node. Then only present node can access the data on VG.

Concurrent VG:
--------------
In concurrent VG, the application runs on all nodes of cluster simultaneously. All of the nodes can access the data at the same time. All of the nodes can read and write the data simultaneously. The data integrity is then the responsibility of the application.

Enhanced Concurrent VG:
-----------------------
In enhanced concurrent VG, access to VG is allowed on all of the nodes of cluster. Certain restrictions apply in this case. The VG can be varied on different nodes in active or passive states.

-Active State Varyon:
In active state varyon of VG the following operations are allowed:
- Filesystem operations and mounting of filesystems.
- Operations on Logical Volumes.
- Operations on application.
- Syncing of VGs.

-Passive State Varyon
In passive state varyon of VG certain restrictions apply:
- Filesystems mounting and operations on filesystems are not allowed.
- No operations on Lvs are allowed.
- Syncing of VGs is not allowed.

The following are allowed:
- LVM read only access to Vgs special file is allowed.
- LVM read only access to first 4k of all Lvs under VG is allowed.

Siva said...

Hi,

Thanks for prompt response. why we are using concurrentvg rather than nonconcurrentvG in hacmp configuration.

Regards,
Siva

aix said...

Hi,
OK, ConcurrentVG is good for concurrent resource groups, where more than 1 node is in online state.

Regards,
Balazs

Siva said...

Hi,

Could please explain the work flow of disk heartbeat polling, how it identifies failure and move the RG from one node to other.

Rgds,
Siva

aix said...

Disk heartbeat is used for writing to the disk by the 1st node and reading it by the 2nd node. In case of problems there is the network heartbeat to determine if node is up or not. After HACMP could decide, for example node is down, then it initiates switching the resource group. Stops application, remove IP label, umount filesystems, varyoff vg from the first node, then on the 2nd node varyonvg, mount filesystems, adding IP label, and starting application.

Unknown said...

Gr8 ....

Anonymous said...

root@aix41: / # lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 921638 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
VLAN200_10_20_ [ 0] 2 2 S 10.10.10.2 10.10.10.2
VLAN200_10_20_ [ 0] en11 0x41c64107 0x41c64108
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0

Why would the IP's be same ?

aix said...

"lssrc -ls topsvcs" gives a long output, I put there only a small part. Every network, which is configured in the cluster has a section in the output. What you see as "VLAN200_10_20" that is my network name and the details of the network (IP, interface name) are next to it. So the IP's are not the same, that is the network name.

Anonymous said...

Thanks for the valuable information

aix said...

welcome :)

Anonymous said...

How to test disk heartbeat while cluster is running? What are precautions need to take care before this.

aix said...

dhb_read is the the command, what can be used for testing, but this is written in PowerHA for AIX Cookbook: "Disk heartbeat testing can only be run when PowerHA is not running on the nodes"

As IBM does not suggest running dhb_read while cluster is running, I don't know other method....
(Once I tried dhb_read command on an online cluster and it worked successfully for me...)

Anonymous said...

Hi, Do you have any procedure for replacing the hdisk which is part under hb_vg.

aix said...

Hi, I would remove the disk heartbeat network, after the vg and disk. And then I would configure again from scratch. It is an online action, so Resource Group can stay online. If this is done a synch. and verification could be run as well.

RedmiR said...

Hi, Can you help, how to check heartbeat for tcp network.

aix said...

Hi, 'cltopinfo -m' is showing heartbeat statistics, missed heartbeats...

Anonymous said...

We are looking to Migrate from 1 EMC Array to a new one. We have a Disk Heartbeat Setup. Would it make sense to create a new Disk Heartbeat and then remove the old one? Also can this be performed with the HA Cluster Up?

aix said...

Yes, it can be done while cluster is up. Cluster is sending heartbeats via network interfaces as well, so if it is configured correctly you can remove old Disk Heartbeat and create a new one, and cluster will survive without problems. If you want to be on the safe side, you can do that way as you wrote :-)

Anonymous said...

Hi


when will use enhanced concurrent vg

when will use concurrent vg

plz help me ..

Regards,
Narendra

aix said...

Hi, you can find some info about these here: http://aix4admins.blogspot.hu/2011/05/rsct-based-shared-storage-protection.html

Anonymous said...

how will create concurrent vg without hacmp file sets ?it is possible or not?

Anonymous said...

Will I be able to add disk Heartbeat while HACMP is running?
Configured on one node, but miss out the configuration on another node...

Unknown said...

why the disk heart beat applicable only for enhanced concurrent vg ?????

Unknown said...

i think its not possible..because without installing the hacmp filesets smitty hacmp will not work.....then its not possible to create it......

Anonymous said...

how can we remove heart beat ?

Anonymous said...

how to remove missed heart beat in cluster . check once in below output and suggest me ?

root$lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 13762710 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
net_ether_01_0 [ 0] 2 2 S 192.168.100.15 192.168.100.16
net_ether_01_0 [ 0] en0 0x47498a9f 0x47498ad3
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 1 Current group: 1
Packets sent : 24934310 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 32420167 ICMP 0 Dropped: 0
NIM's PID: 13500470
diskhb_0 [ 1] 2 2 S 255.255.10.1 255.255.10.1
diskhb_0 [ 1] rhdisk5 0x87498a9e 0x8749f4b7
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 4 Current group: 4
Packets sent : 12508603 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 11871643 ICMP 0 Dropped: 0
NIM's PID: 13369558
2 locally connected Clients with PIDs:
haemd(13238320) hagsd(13959246)
Fast Failure Detection available but off.
Dead Man Switch Enabled:
reset interval = 1 seconds
trip interval = 20 seconds
Client Heartbeating Disabled.
Configuration Instance = 19
Daemon employs no security
Segments pinned: Text Data.
Text segment size: 869 KB. Static data segment size: 1507 KB.
Dynamic data segment size: 6849. Number of outstanding malloc: 130
User time 1594 sec. System time 1064 sec.
Number of page faults: 118. Process swapped out 0 times.
Number of nodes up: 2. Number of nodes down: 0.
root$