HA - RSCT

RSCT (Reliable Scalable Cluster Technology)

RSCT  (as its name says) is a sort of Cluster Technology. It comes with AIX by default (no additional installation is needed) and it consists of several low-level components (daemons, subsystems). These components create a basic cluster environment (with nodes and heartbeat between these nodes etc.) which is monitored by RSCT. If a node crashes an event is generated and RSCT informs the RSCT-aware client. (PowerHA, or more precisely the cluster manager (clstrmgrES) is itself an RSCT-aware client). Historically RSCT was a separate product, but starting with AIX 5.1 it is shipped with the operating system. On AIX 7.2 the actual RSCT fileset version is 3.2. It is possible to check/remove the RSCT filesets, which name is like rsct... (lslpp -l | grep rsct), and as a comparison to CAA, CAA is built into AIX so inherently, that is is part of the Base Operating System (CAA is contained in bos.cluster... filesets).

The key point here is that RSCT provides services, such as cluster monitoring, which is used by PowerHA and PowerHA provides "high availability services" to applications. For example, responding to an unexpected event, it is necessary to know when it occurs. This is the job of the RSCT to monitor for certain failure. Beside PowerHA, RSCT-aware clients are GPFS, SSP or the HMC too.

RSCT’s role in a PowerHA cluster is to provide:
- Failure detection and diagnosis for topology components (nodes, networks, and network adapters)
- Notification to the cluster manager of events
- Coordination of the recovery actions (fallovers, fallbacks and dealing with individual NIC failures by moving or swapping IP addresses)

We can use the ctversion command to finnd out which version of RSCT is running on a particular AIX (or lslpp):
# /opt/rsct/install/bin/ctversion


==================================

RSCT components



The main RSCT components are:
Resource:  A resource is the fundamental concept of the RSCT architecture; it is an instance of a physical or logical entity. Examples of resources include lv01 on node A, Ethernet device en0 on node B, and IP address 9.117.7.21. A set of resources that have similar characteristics is called a resource class.

Resource Monitoring and Control (RMC): This is the main component in RSCT. RMC can be configured to monitor resources (disk space, CPU usage, processes etc.) and perform an action in response to a defined condition. For example, using the RMC API or CLI it is possible to create conditions or events which will automatically expand a file system if its usage exceeds 95 percent. These events are created based on the messages received from the Resource Managers. RMC also coordinates between the various RSCT components.

Resource Managers (RMs): Resource Managers are software layers between a resource (for example a filesystem) and RMC. RMs are managed by RMC. They are making the actual commands for each resource and then sending data to the RMC. For example there are File System Resource Manager, Host Resource Manager, Audit Log Resource Manager, Event Response Resource Manger ...

Security Services: This provides the security infrastructure that enables RSCT components to authenticate. (These days only RMC and the Resource Managers are using the RSCT security services)

Group Services:  Group Services is responsible for coordinating and monitoring changes across all cluster nodes and ensures all of them finished properly. Group Services is a client of RMC and CAA (or Topology Services). In a PowerHA setup, CAA (in earlier PowerHA versions the Topology Services) sends information to the Group Services, which reports the failures to the PowerHA Cluster Manager(clstrmgrES). From Group Services point of view the "application running on multiple nodes" is the Cluster Manager. Then the Cluster Manager makes cluster-wide coordinated responses to the failure. (The PowerHA cluster manager is an RSCT client and it registers itself with both the RMC and the Group Services. After an event has been reported to the PowerHA Cluster Manager, it responds to this event with recovery commands and event scripts. These scripts are coordinated via the Group Services.)

Topology Services: This provides node and network monitoring and failure detection (heartbeats). It is responsible for building heartbeat rings for the purpose of detecting and reporting important informations to the Group Services, which then reports them to the Cluster Manager.  In the heartbeat ring, each Topology Services daemon sends a heartbeat message to one of its neighbors and expects to receive a heartbeat from another. In this system of heartbeat messages, each member monitors one of its neighbors. If the neighbor stops responding, the member that is monitoring it will send a message to the "group leader". Topology Services is also responsible for the transmission of any RSCT-related messages between cluster nodes. After PowerHA 7.1.0, the RSCT topology service is deactivated and all its functions are performed by CAA topology services.

==================================

RSCT domains

RSCT can provide 2 types of "clusters", which are called in RSCT terminology: domains. Depending on the status of the nodes (if all of them are on equal level or if there is a special control node between them) these 2 RSCT domains exist: management domain and peer domain.

Management Domain: (set of nodes that is configured for manageability or monitoring)
An RSCT management domain is a set of nodes that can be managed and monitored from one of the nodes, which is designated as the management control point (MCP). Except the MCP all other nodes are considered to be managed nodes. Topology Services and Group Services are not used in a management domain.

Peer Domain: (set of nodes that is configured for high availability)
An RSCT peer domain is a set of nodes that have a knowledge of each other, and they share resources between each other. On each node within the peer domain, RMC depends on Topology Services, Group Services, and cluster security services.  If PowerHA V7 is installed, Topology Services are not used, and CAA is used instead. 

The general difference between them is the relationship between the nodes. In a peer domain, all nodes are considered equal and any node can monitor and control (or be monitored and controlled) by any other node. In a management domain, a management node is aware of all nodes it is managing but the nodes themselves know nothing of each other.

Combination of management and peer domains
We can have a combination of management domains and peer domains. This example shows one Hardware Management Console (HMC) that is managing three LPARS. The HMC and Node A, Node B and NodeC are creating a Management Domain. Additionally on Node B and on Node C PowerHA is installed, so these 2 nodes are making a peer domain too. In a Power Systems environment, the HMC is always the management server (MCP) in the RSCT management domain. LPARs are automatically configured as managed nodes.



==================================

RSCT and CAA


Cluster Aware AIX (CAA) introduces clustering capabilities to AIX (setup of a cluster, detecting the state of nodes and interfaces). When RSCT operates on nodes in a CAA cluster, a peer domain is created that is equivalent to the CAA cluster, and can be used to manage the cluster by using peer domain commands. 

Only one CAA cluster can be defined on a set of nodes. Therefore, if a CAA cluster is defined then the peer domain that represents it is the only peer domain which can exist there. If no CAA cluster is configured, then existing and new peer domains can also be used. 

A CAA cluster and the equivalent RSCT peer domain operate hand in hand such that a change made to the CAA cluster by using CAA commands, is reflected automatically in the corresponding peer domain; similarly the existing peer domain commands result in equivalent changes to the CAA cluster. So, for example, when you create a CAA cluster by using mkcluster command, the equivalent peer domain also gets created, the same way if we used the mkrpdomain RSCT command. Similarly node add and delete operations that use either peer domain or cluster commands are applied to both the CAA cluster and the peer domain.

Starting with RSCT version 3.1.0.0, the Group Services subsystem can operate in a Cluster Aware AIX (CAA) environment. In this environment, Group Services rely on the CAA to provide node and adapter liveness information and node-to-node communication, thus removing its dependency on RSCT Topology Services. Instead of connecting to the Topology Services daemon, it gets information directly from the low-level cluster services in the CAA environment.

RSCT version 3.1.2.0, or later, can be installed on the nodes and can coexist with prior RSCT releases. Because CAA delivers fundamental node and interface liveness information, the Topology Services subsystem is not active in a peer domain based on CAA. 

===========================

Commands:

lssrc -ls cthags                         shows info of RSCT cthags services
lssrc -ls IBM.StorageRM                  shows info of StorageRM (Resource Monitor) objects
lssrc -ls IBM.ConfigRM                   shows info of ConfigRM (Resource Monitor) objects

/opt/rsct/install/bin/ctversion          checking RSCT version

===========================

No comments:

Post a Comment