AIX for System Administrators: HA

Resource, Resource Group

Resources: File systems, service IPs, applications... which are highly available (These can be moved from one node to another.)
Resource Group (RG): Those resources which are grouped together, and moved together during a failover
Default node priority: The order in which the nodes are defined in the RG. A RG with default attributes will move from node to node in this order as each node fails.
Home node: The highest priority node in the default node list. RG will initially be activated there. (It is not the node where the RG is currently active on.)
Fallover: The process of moving a RG that is online on one node to another node in the cluster in response to an event.
Fallback: The process of moving a RG that is currently online on a node that is not its home node, to a re-integrating node.
Node failure: If a node fails, the RGs that were active on that node are distributed among the other nodes in the cluster, depending on their fallover policies.
Node recovery: When a node recovers and is reintegrated into the cluster, RGs can be reacquired depending on their fallback policies.

------------------------------------

Resource group (RG)

Resource groups allow PowerHA to manage resources as a single entity. For example, an application can consist of start and stop scripts, a database, and an IP address. These resources are then included in a resource group for PowerHA to control as a single entity. PowerHA ensures that resource groups remain highly available by moving them from node to node.

Resource group states:
- Online: The RG is currently operating properly on one or more nodes in the cluster.
- Offline: The RG is not operating and currently not in error condition (the user may requested this state or dependencies were not met)
- Acquiring: A RG is currently coming up on a node. In normal conditions status changes to Online.
- Releasing: The RG is in the process of being released (going down). In normal conditions after released, the status changes to offline.
- Error: The resource group has reported an error condition. User interaction is required.
- Unknown: The RG’s current status cannot be obtained, possibly because of loss of communication, or a resource group dependency is not met...

Each node that joins the cluster automatically attempts to bring online any of the resource groups that are in the ERROR state.

Start up options:
- Online on home node only: The RG is brought online when its home node joins the cluster. If home node isn't available, it stays offline
- Online on first available node: The RG is brought online when the first node in its node list joins the cluster.
- Online on all available nodes: The RG is brought online on all nodes in its node list as they join the cluster.
- Online using distribution policy: The RG is brought online only if the node has no other resource group of this type already online.

Fallover options:
- Fall over to next priority node in list: The RG falls over to the next node in the resource group node list.
- Fallover using dynamic node priority: The RG will be aquired by that node which has for example most free memory, most free cpu...user script is also possible
- Bring offline, on error node only: The RG is brought offline in the event of an error. This option is designed for RGs that are online on all available nodes.

Fallback options: (when a node joins back the cluster)
- Fall back to higher priority node in list: The RG falls back to a higher priority node when it joins the cluster.
- Never fall back: The RG does not move if a high priority node joins the cluster. RGs with online on all available nodes must be configured with this option.

------------------------------------

Resource Group attributes during Startup,Fallover,Fallback

Settling time
If a RG has the setting of "online on first available node", settling time ensures that a RG does not start on an early integrated node that is low in its priority list, then keep falling over to higher priority nodes as they integrate. If the settling time is set for a resource group and the node that integrates into the cluster is its highest priority node then it goes online immediately, otherwise it waits the settling time to see if another higher priority node joins.

Delayed fallback timers
Configures the frequency for a fallback operation, which can be daily, weekly, monthly, yearly. Fallback will happen at the configured time.

Distribution policy
This node-based distribution policy ensures that on cluster startup, each node will acquire only one resource group with this policy set.

Resource group processing order
If a node is attempting to bring more than one resource group online, the default behavior is to merge all the resources into one large resource group and then process them as one “resource group.” This is called parallel processing, although it is not true parallel processing because it is single thread. This default behavior can be altered and serial processing can be specified for particular resource groups by specifying a serial acquisition list. This order defines only the order of processing on a particular node, not across nodes. If serial processing is set, the specified RGs are processed in order, RGs containing only NFS mounts are processed in parallel. The reverse order is used on release.

------------------------------------

Resource group dependencies

An example for RG dependencies, when a database must be online before the application server is started. If the database goes down and falls over to a different node, the RG that contains the application server will also be brought down and back up on any of the available cluster nodes. If the fallover of the database RG is not successful, then both RGs (database and application) will be put offline.

Resource group dependencies (combination of two out of three types of RG dependency can be set):
- Parent/child dependency: a RG cannot be started until a particular RG is already active
- Location dependency: certain RGs will be always online on the same node or on different nodes
- Start/stop after dependency: similar to parent/child dependency, but based on the setting during start or stop RGs can be processed together

Parent/Child dependency:
A parent/child dependency allows binding resource groups in a hierarchical manner. A child resource group depends on a parent resource group. The parent resource group must be online before any of its children can be brought online. If the parent resource group is to be taken offline, the children must be taken offline first. There can be only three levels of dependency for resource groups. A resource group can act both as a parent and a child. You cannot specify circular dependencies among resource groups. It is important to have startup application monitors for the parents. After the startup application monitor confirmed that the application has successfully started, the processing of the child resource groups can then continue.

Location dependency
It ensure that RGs will always be online on the same node or on different nodes (or sites).
- Online on same node: A RG can be brought online on the node where other RGs in the same set are already online
- Online on different nodes: The specified RGs will be distributed on different nodes.
- Online on same site: A RG can only be brought online on a site where other RGs with this dependency are currently in an online state

Start/Stop after dependency
- Start after dependency: The target RG must be online before a source (dependent) RG can be activated. There is no dependency when releasing RGs, they are released in parallel.
- Stop after dependency: Te target RG must be offline before a source (dependent) RG can be brought offline. There is no dependency when acquiring RGs and they are acquired in parallel.

Set or display the RG dependencies (clrgdependency):
# clrgdependency -t [PARENT_CHILD | NODECOLLOCATION | ANTICOLLOCATION |SITECOLLOCATION ] -sl
# clrgdependency -t PARENT_CHILD -sl
#Parent Child
rg1 rg2
rg1 rg3

Another way to check is by using the odmget HACMPrg_loc_dependency command.

------------------------------------

AIX for System Administrators

dropdown menu

HA - RESOURCES

No comments: