dropdown menu

DARE - Snapshot

HACMP provides a facility that allows changes to cluster topology and resources to be made while the cluster is active. This facility is known as DARE or to give it it's full name Dynamic Automatic Reconfiguration Event. This requires three copies of the HACMP ODM.


Keep in mind:all SMIT actions only affect the DCD. Invoking a synchronize at this point causes the DCD copied to all cluster nodes and a dynamic reconfiguration event is invoked. (ACD is never changed directly by the user)


An example:
When you configure an HACMP cluster, configuration data is stored in HACMP specific object classes in the Configuration Database (ODM). The AIX ODM object classes are stored in the default system configuration directory (DCD), /etc/es/objrepos. At cluster startup, HACMP copies HACMP-specific ODM classes into a separate directory called the Active Configuration Directory (ACD). If you synchronize the cluster (topology or resources) while the cluster manager is running on the local node, this action triggers a dynamic reconfiguration event. During Dynamic Reconfiguration the Default Configuration Directories (DCDs) on all cluster nodes are updated, and the data in the ACD is overwritten with the new configuration data, then the HACMP daemons are refreshed.



HOW DARE WORKS:

1. change topology or resources in smitty hacmp (this will change the DCD)

    ROLLBACK:
    If you want to discard this changes what you make (in DCD) (only if you did not synch. yet):
    Problem Determ. Tools --> Restore HACMP Conf. Database from Active Conf.
    (during this a snapshot will be taken of the DCD if needed in future)
   
2. synchronize topology or resources in smitty:
    -Verifying and Synchronizing (Standard): (this is under Initialisation and Standard Conf.)
    (If you choose this there is no opportunity to modify the process in any way)

    or

    -Verifying and Synchronizing (Extended): (this is under Extended Conf.) here you can choose:
    emulate or actual: emulate the change before actually doing it
    force: this is dangerous, should be on "no"
    verify changes only: it will be not synch. only verify, so it can be tested if a change is valid
    logging: verbose can be used if standard does not show the reason why verification fails.

   During the synchronization:
   DCD is copied to SCD on the nodes -> snapshot of the ACD* -> content of SCD copied to ACD -> clstrmgr refresh -> SCD deleted

    ROLLBACK:
    1. If DARE change successful after synch., but it does not give the desired result, then use the snapshot taken prior*
    snapshot path: /usr/es/sbin/cluster/snapshots (i.e: active.0.odm, 0 is the most recent)
    Extended Configuration --> Snapshot Conf. --> Apply Snapshot

    2. If a DARE change fails and SCD does not cleared, then SCD acts as a lock and it prevents further config. changes.
    If an SCD exists on any cluster node, then no further synch. permitted until it is deleted in smitty hacmp:
    Problem Determ. Tools --> Release Locks Set by Dyn. Reconf.


---------------------------------
Except cluster name, the add and delete functions can be used while the cluster is running.
---------------------------------

CLUSTER SNAPSHOT:

It records the cluster configuration.
Adding a snapshot: Ext. Conf. -> Snapsh. Conf. ...

SNAPSHOTPATH variable shows the path to snapshots.
Default path: /usr/es/sbin/cluster/snapshots

Applying a snapshot:
When a cluster snapshot is applied, it overwrites the existing HACMP ODM classes on all cluster nodes with the HACMP ODM classes contained in the snapshot.

A snapshot may be applied from any cluster node.

if a snapshot is applied to a running cluster it will trigger a cluster wide DARE operation. In addition to modifying the default configuration directory (DCD), it also replaces the configuration data stored in the active configuration directory (ACD), and makes those changes the active configuration of the system.

"Un/Configure Cluster Resources?": If this option set to yes, HACMP changes the definition of the resource in the ODM and it performs any configuration necessary as well. (e.g: if removing a filesystem, it will be removed from the ODM, and it will be unmounted as well.)

9 comments:

  1. "Un/Configure Cluster Resources?": If this option set to yes, HACMP changes the definition of the resource in the ODM and it performs any configuration necessary as well. (e.g: if removing a filesystem, it will be removed from the ODM, and it will be unmounted as well.)

    Question is... Filesystem Information is not stored in ODM... Can you provide some more example on it ?

    ReplyDelete
    Replies
    1. Please take into account that is not a regular filesystem, but a filesystem which has been added to cluster configuration. When you configure an HACMP cluster, configuration data is stored in HACMP specific object classes in the Configuration Database (ODM).So, when you remove something, which belongs to the cluster as well, ODM will be updated too.

      Delete
  2. Any Idea how to solve below issue without bringing down the cluster...

    Early response is highly appreciated :-)

    FYI -- while doing verif. & sycn. I'm getting below error ..

    Command: failed stdout: yes stderr: no

    Before command completion, additional instructions may appear below.



    cldare: Node(s) Us12Test in the cluster is(are) forced down. Dare cannot be run
    as long as any of the nodes in the cluster are forced down.

    ReplyDelete
    Replies
    1. It says that one of your nodes in the cluster are in forced down state. If you check 'lssrc -ls clstrmgrES' it should show that a node is down. I guess that node has no Resource Groups otherwise it will be more obvious. You should bring that node up. (But be aware if you cluster version is old, and it has a POL setting, it can happen your RG will be moved to the other node....)

      Delete
    2. Hi Aix, Thanks for your reply ..
      When I checked 'lssrc -ls clstrmgrES' it is showing on node Us12Test forced down ...even I have checked cluster services it is running ..
      when I do veri. & sync. it is showing error as earlier post.

      Cluster version 6.1.0.6

      any help..

      Delete
    3. I think you should do a start cluster services on Us12Test. As far as I know clstrmgrES is running always on PowerHA 6.1. Someone earlier probably did "unmanage resource group" on Us12Test, and since then you cluster was in this state.

      Some info about the relation between forced down and unmanage resource group: "Unmanage resource groups. The cluster services are stopped immediately. Resources that are online on the node are not stopped. Applications continue to run. This option is equivalent to the forced down option in previous releases. "

      Delete
  3. Hi Balazs,

    Nice article keep it up !

    I found some type error on blog .. DCD path is incorrect ..
    An example:
    When you configure an HACMP cluster, configuration data is stored in HACMP specific object classes in the Configuration Database (ODM). The AIX ODM object classes are stored in the default system configuration directory (DCD), /etc/es/objrepos.

    ReplyDelete
    Replies
    1. Hi Manoj, I'm not sure if it is a typo, probably /etc/objrepos and /etc/es/objrepos are both valid:
      # ls /etc/es/objrepos
      HACMPcommadapter
      HACMPcommadapterOLD
      HACMPcommand
      HACMPcommandOLD
      ...

      Delete
  4. help me to change snapshot default location

    ReplyDelete