AIX for System Administrators: HA

HA - Diskheartbeat

!!! In PowerHA 7.1 and later there is no disk heartbeat anymore. !!!
!!! Below document is valid only for old HACMP configurations. !!!

Disk Heartbeat

Heartbeat disks should be used in enhanced concurrent mode. Enhanced concurrent mode disks use RSCT group services to control locking, thus freeing up a sector on the disk that can now be used for communication. This sector, which was formerly used for SSA Concurrent mode disks, is now used for writing heartbeat information.

Any disk that is part of an enhanced concurrent volume group can be used for a diskhb network, including those used for data storage. Also, the volume group that contains the disk used for a diskhb network does not have to be varied on.

An enhanced concurrent volume group is not the same as a concurrent volume group (which is part of a concurrent resource group), rather, it refers to the mode of locking by using RSCT.

DISK HEARTBEAT:

lspv | grep hb          <--shows the actual state of heartbeat disks
cltopinfo -i | grep hb    <--shows what had been saved into the configuration (we have to change it to show the actual state)

Configuring from scratch

After the disk heartbeat VGs are ready (enhanced concurrent, should be varied off: mkvg -C ...), there are 2 methods for configuring
(there could be problems with Discovery method so Pre-Defined is suggested )

-PRE-DEFINED DEVICES METHOD: Manually have to to create a diskhb network first, then assign the disk-node pair devices to the network

    1. Create diskhb network
    Extended Configuration->Extended Topology->Configure HACMP Networks->Add a Network...
        choose:diskhb

        * Network Name                                       [anything you want]
        * Network Type                                        diskhb

    2. Add device
    Extended Configuration->Extended Topology->Configure HACMP Comm. Interfaces/Dev.->Add ...
        Add Pre-defined...
        Communication Devices
        Choose your diskhb Network Name

    * Device Name                                         [aix41_diskhb2] <--choose a unique name
        * Network Type                                        diskhb
        * Network Name                                        net_diskhb_aix41_aix42
        * Device Path                                         [/dev/vpath4]
        * Node Name                                           [aix41]

    You will repeat this process for the other node and the other device. This will complete both devices for the diskhb network.

(-VERIFICATION AND SYNCHRONIZATION: After configuring you can do: Extended Config.->Extended Verification ... (with default settings))

or:
-DISCOVERY METHOD: let the HA find it for us

    1. Run Discovery
    Extended Configuration->Discover HACMP-related Information from Configured Nodes
    (This will run automatically and create a clip_config file that contains the information it has discovered)

    2. Add device
     Extended Configuration->Extended Topology->Configure HACMP Comm. Interfaces/Dev.->Add ...
        Add discovered ...
        Communication Devices

        Choose appropriate devices (ex. vpath0 and vpath3)
        Select Point-to-Point Pair of Discovered Communication Devices to Add (one or more items can be selected (F7))

-----------------------------

Testing disk heartbeat:

DO NOT PERFORM THIS TEST WHILE HACMP IS RUNNING???

dhb_read -p devicename        <--dump diskhb sector contents
dhb_read -p devicename -r     <--receive data over diskhb network
dhb_read -p devicename -t     <--transmit data over diskhb network

1. on one node set receiving::
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -r

2. on the other node set transmit:
/usr/sbin/rsct/bin/dhb_read -p hdisk2 -t

dhb_read -p rvpath0 -r <--Note: That the device name is raw device as designated with the "r" proceeding the device name.

If everything is OK: Link operating normally

-----------------------------

Monitoring disk heartbeat:

root@aix41: / # lssrc -ls topsvcs
Subsystem         Group            PID     Status
topsvcs          topsvcs          921638 active
Network Name   Indx Defd Mbrs St   Adapter ID      Group ID
VLAN200_10_20_ [ 0] 2     2     S    10.10.10.2      10.10.10.2
VLAN200_10_20_ [ 0] en11             0x41c64107      0x41c64108
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
...

27 comments:

AnonymousSeptember 21, 2012 at 10:54 AM
hi,
What is concurrentvg and its advantages also difference between this and enhanced concurrentvg
ReplyDelete
Replies
SivaSeptember 21, 2012 at 12:54 PM
Hi,

Thanks for prompt response. why we are using concurrentvg rather than nonconcurrentvG in hacmp configuration.

Regards,
Siva
ReplyDelete
Replies
SivaSeptember 21, 2012 at 5:38 PM
Hi,

Could please explain the work flow of disk heartbeat polling, how it identifies failure and move the RG from one node to other.

Rgds,
Siva
ReplyDelete
Replies
UnknownOctober 20, 2012 at 11:54 AM
Gr8 ....
ReplyDelete
Replies
AnonymousNovember 12, 2012 at 6:02 PM
root@aix41: / # lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 921638 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
VLAN200_10_20_ [ 0] 2 2 S 10.10.10.2 10.10.10.2
VLAN200_10_20_ [ 0] en11 0x41c64107 0x41c64108
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0

Why would the IP's be same ?
ReplyDelete
Replies
AnonymousDecember 19, 2012 at 7:21 AM
Thanks for the valuable information
ReplyDelete
Replies
AnonymousFebruary 3, 2013 at 7:04 PM
How to test disk heartbeat while cluster is running? What are precautions need to take care before this.
ReplyDelete
Replies
AnonymousFebruary 26, 2013 at 10:45 PM
Hi, Do you have any procedure for replacing the hdisk which is part under hb_vg.
ReplyDelete
Replies
RedmiRApril 27, 2013 at 1:47 PM
Hi, Can you help, how to check heartbeat for tcp network.
ReplyDelete
Replies
AnonymousMay 15, 2013 at 5:26 PM
We are looking to Migrate from 1 EMC Array to a new one. We have a Disk Heartbeat Setup. Would it make sense to create a new Disk Heartbeat and then remove the old one? Also can this be performed with the HA Cluster Up?
ReplyDelete
Replies
AnonymousJuly 16, 2013 at 4:23 PM
how will create concurrent vg without hacmp file sets ?it is possible or not?
ReplyDelete
Replies
AnonymousFebruary 26, 2014 at 11:02 AM
Will I be able to add disk Heartbeat while HACMP is running?
Configured on one node, but miss out the configuration on another node...
ReplyDelete
Replies
UnknownOctober 19, 2014 at 5:06 PM
why the disk heart beat applicable only for enhanced concurrent vg ?????
ReplyDelete
Replies
AnonymousMarch 13, 2017 at 12:41 PM
how can we remove heart beat ?
ReplyDelete
Replies
AnonymousMarch 13, 2017 at 2:16 PM
how to remove missed heart beat in cluster . check once in below output and suggest me ?

root$lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 13762710 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
net_ether_01_0 [ 0] 2 2 S 192.168.100.15 192.168.100.16
net_ether_01_0 [ 0] en0 0x47498a9f 0x47498ad3
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 1 Current group: 1
Packets sent : 24934310 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 32420167 ICMP 0 Dropped: 0
NIM's PID: 13500470
diskhb_0 [ 1] 2 2 S 255.255.10.1 255.255.10.1
diskhb_0 [ 1] rhdisk5 0x87498a9e 0x8749f4b7
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 4 Current group: 4
Packets sent : 12508603 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 11871643 ICMP 0 Dropped: 0
NIM's PID: 13369558
2 locally connected Clients with PIDs:
haemd(13238320) hagsd(13959246)
Fast Failure Detection available but off.
Dead Man Switch Enabled:
reset interval = 1 seconds
trip interval = 20 seconds
Client Heartbeating Disabled.
Configuration Instance = 19
Daemon employs no security
Segments pinned: Text Data.
Text segment size: 869 KB. Static data segment size: 1507 KB.
Dynamic data segment size: 6849. Number of outstanding malloc: 130
User time 1594 sec. System time 1064 sec.
Number of page faults: 118. Process swapped out 0 times.
Number of nodes up: 2. Number of nodes down: 0.
root$
ReplyDelete
Replies

Add comment