PreviousNextIndex

Chapter 5: Ensuring Application Availability


This chapter describes how the HACMP software ensures application availability by ensuring the availability of cluster components. HACMP eliminates single points of failure for all key system components, and eliminates the need for scheduled downtime for most routine cluster maintenance tasks.

This chapter covers the following topics:

  • Eliminating Single Points of Failure in an HACMP Cluster
  • Minimizing Scheduled Downtime with HACMP
  • Starting Cluster Services without Stopping Applications
  • Minimizing Takeover Time: Fast Disk Takeover
  • Maximizing Disaster Recovery
  • Cluster Events.
  • Overview

    The key facet of a highly available cluster is its ability to detect and respond to changes that could interrupt the essential services it provides. The HACMP software allows a cluster to continue to provide application services critical to an installation even though a key system component—a network interface card, for example—is no longer available. When a component becomes unavailable, the HACMP software is able to detect the loss and shift the workload from that component to another component in the cluster. In planning a highly available cluster, you attempt to ensure that key components do not become single points of failure.

    In addition, HACMP software allows a cluster to continue providing application services while routine maintenance tasks are performed using a process called dynamic reconfiguration. In dynamic reconfiguration, you can change components in a running cluster, such as adding or removing a node or network interface, without having to stop and restart cluster services. The changed configuration becomes the active configuration dynamically. You can also dynamically replace a failed disk.

    The following sections describe conceptually how to use the HACMP software to:

  • Eliminate single points of failure in a cluster.
  • Minimize scheduled downtime in an HACMP cluster with the dynamic reconfiguration, resource group management, and cluster management (C-SPOC) utilities.
  • Minimize unscheduled downtime with the fast recovery feature, and by specifying a delayed fallback timer policy for resource groups.
  • Minimize the time it takes to perform disk takeover.
  • Interpret and emulate cluster events.
  • Note: You may need to monitor the cluster activity while a key component fails and the cluster continues providing availability of an application. For more information on which monitoring and diagnostic tools you can use, see Chapter 7: HACMP Configuration Process and Facilities.

    Eliminating Single Points of Failure in an HACMP Cluster

    The HACMP software enables you to build clusters that are both highly available and scalable by eliminating single points of failure (SPOF). A single point of failure exists when a critical cluster function is provided by a single component. If that component fails, the cluster has no other way to provide that function and essential services become unavailable.

    For example, if all the data for a critical application resides on a single disk that is not mirrored, and that disk fails, the disk is a single point of failure for the entire system. Client nodes cannot access that application until the data on the disk is restored.

    Potential Single Points of Failure in an HACMP Cluster

    HACMP provides recovery options for the following cluster components:

  • Nodes
  • Applications
  • Networks and network interfaces
  • Disks and disk adapters.
  • To be highly available, a cluster must have no single point of failure. While the goal is to eliminate all single points of failure, compromises may have to be made. There is usually a cost associated with eliminating a single point of failure. For example, redundant hardware increases cost. The cost of eliminating a single point of failure should be compared to the cost of losing services should that component fail. The purpose of the HACMP software is to provide a cost-effective, highly available computing environment that can grow to meet future processing demands.

    Eliminating Nodes as a Single Point of Failure

    Nodes leave the cluster either through a planned transition (a node shutdown or stopping cluster services on a node), or because of a failure.

    Node failure begins when a node monitoring a neighbor node ceases to receive heartbeat traffic for a defined period of time. If the other cluster nodes agree that the failure is a node failure, the failing node is removed from the cluster and its resources are taken over by the nodes configured to do so. An active node may, for example, take control of the shared disks configured on the failed node. Or, an active node may masquerade as the failed node (by acquiring its service IP address) and run the processes of the failed node while still maintaining its own processes. Thus, client applications can switch over to a surviving node for shared-disk and processor services.

    The HACMP software provides the following facilities for processing node failure:

  • Disk takeover
  • IP Address Takeover via IP Aliases
  • IP Address Takeover via IP Replacement (with or without Hardware Address Takeover).
  • Disk Takeover

    In an HACMP environment, shared disks are physically connected to multiple nodes.

    Disk Takeover in Concurrent Environments

    In concurrent access configurations, the shared disks are actively connected to multiple nodes at the same time. Therefore, disk takeover is not required when a node leaves the cluster. The following figures illustrate disk takeover in concurrent environments.

    Concurrent Access Configuration before Disk Takeover 
     



    Concurrent Access Configuration after Disk Takeover

    Fast Disk Takeover

    In the case of a cluster failure, enhanced concurrent volume groups are taken over faster than in previous releases of HACMP due to the improved disk takeover mechanism. HACMP automatically detects enhanced concurrent volume groups and ensures that the faster option for volume group takeover is launched in the event of a node failure. For more information, see Minimizing Takeover Time: Fast Disk Takeover in this chapter.

    Disk Takeover in Non-Concurrent Environments

    In non-concurrent environments, only one connection is active at any given time, and the node with the active connection owns the disk. Disk takeover occurs when the node that currently owns the disk leaves the cluster and an active node assumes control of the shared disk so that it remains available. Note, however, that shared filesystems can be exported and NFS cross-mounted by other cluster nodes that are under the control of HACMP.

    The cl_export_fs utility can use the optional /usr/es/sbin/cluster/etc/exports file instead of the standard /etc/exports file for determining export options.

    IP Address Takeover

    IP address takeover (IPAT) is a networking capability that allows a node to acquire the network address of a node that has left the cluster. IP address takeover is necessary in an HACMP cluster when a service being provided to clients is bound to a specific IP address, that is, when a service IP label through which services are provided to the clients is included as a resource in a cluster resource group. If, instead of performing an IPAT, a surviving node simply did a disk and application takeover, clients would not be able to continue using the application at the specified server IP address.

    HACMP uses two types of IPAT:

  • IPAT via IP Aliases (the default)
  • IPAT via IP Replacement.
  • For more information on each type, see IP Address Takeover via IP Aliases and IP Address Takeover via IP Replacement in Chapter 2: HACMP Cluster Nodes, Sites, Networks, and Heartbeating.

    The following figures illustrate IP address takeover via IP Replacement.

    Configuration before IP Address Takeover via IP Replacement 
     



    Configuration after IP Address Takeover via IP Replacement
    Note: In HACMP on the RS/6000 SP, special considerations apply to IP address takeover on the SP Switch network. For more information, see the section Planning for IPAT with the SP Switch Networking Chapter 3: Planning Cluster Network Connectivity in the Planning Guide.

    Hardware Address Swapping and IP Address Takeover via IP Replacement

    Hardware address swapping works in conjunction with IP address takeover via IP Replacement. With hardware address swapping enabled, a node also assumes the hardware network address (in addition to the IP address) of a node that has failed so that it can provide the service that the failed node was providing to the client nodes in the cluster. Hardware address swapping is also referred to as hardware address takeover (HWAT).

    Without hardware address swapping, TCP/IP clients and routers that reside on the same subnet as the cluster nodes must have their Address Resolution Protocol (ARP) cache updated. The ARP cache contains a mapping of IP addresses to hardware addresses. The use of hardware address swapping is highly recommended for clients that cannot run the Clinfo daemon (machines not running AIX 5L) or that cannot easily update their ARP cache.

    Note: SP Switch networks do not support hardware address swapping. However, note that the SP switch networks can be configured so that their IP network interface cards update their ARP caches automatically when IP address takeover occurs. IP aliases are used in such cases.

    Keep in mind that when an IP address takeover occurs, the netmask of the physical network interface card on which a service IP label is configured is obtained by the network interface card on another node; thus, the netmask follows the service IP address.

    This means that with IPAT via IP Replacement, the netmask for all network interfaces in an HACMP network must be the same to avoid communication problems between network interfaces after an IP address takeover via IP Replacement, and during the subsequent release of the IP address acquired during takeover. The reasoning behind this requirement is as follows:

    Communication problems occur when the network interface card (NIC) on another node releases the service IP address. This NIC assumes its original address, but retains the netmask of the service IP address. This address reassignment causes the NIC on another node to function on a different subnet from other backup NICs in the network. This netmask change can cause changes in the broadcast address and the routing information such that other backup NICs may now be unable to communicate on the same logical network.

    Eliminating Applications as a Single Point of Failure

    The primary reason to create HACMP clusters is to provide a highly available environment for mission-critical applications. For example, an HACMP cluster could run a database server program that services client applications. The clients send queries to the server program that responds to their requests by accessing a database, stored on a shared external disk.

    In an HACMP cluster, these critical applications can be a single point of failure. To ensure the availability of these applications, the node configured to take over the resources of the node leaving the cluster should also restart these applications so that they remain available to client processes.

    You can make an application highly available by using:

  • An application server
  • Cluster control
  • Application monitors
  • Application Availability Analysis Tool.
  • To put the application under HACMP control, you create an application server cluster resource that associates a user-defined name of the server with the names of user-provided written scripts to start and stop the application. By defining an application server, HACMP can start another instance of the application on the takeover node when a fallover occurs.

    Certain applications can be made highly available without application servers. You can place such applications under cluster control by configuring an aspect of the application as part of a resource group. For example, Fast Connect services can all be added as resources to a cluster resource group, making them highly available in the event of node or network interface failure.

    Note: Application takeover is usually associated with IP address takeover. If the node restarting the application also acquires the IP service address on the failed node, the clients only need to reconnect to the same server IP address. If the IP address was not taken over, the client needs to connect to the new server to continue accessing the application.

    Additionally, you can use the AIX 5L System Resource Controller (SRC) to monitor for the presence or absence of an application daemon and to respond accordingly.

    Application Monitors

    You can also configure an application monitor to check for process failure or other application failures and automatically take action to restart the application.

    In HACMP 5.2 and up, you can configure multiple application monitors and associate them with one or more application servers. By supporting multiple monitors per application, HACMP can support more complex configurations. For example, you can configure one monitor for each instance of an Oracle parallel server in use. Or, you can configure a custom monitor to check the health of the database, and a process termination monitor to instantly detect termination of the database process.

    Application Availability Analysis Tool

    The Application Availability Analysis tool measures the exact amount of time that any of your applications have been available. The HACMP software collects, time-stamps, and logs extensive information about the applications you choose to monitor with this tool. Using SMIT, you can select a time period and the tool displays uptime and downtime statistics for a specific application during that period.

    Eliminating Communication Interfaces as a Single Point of Failure

    The HACMP software handles failures of network interfaces on which a service IP label is configured. Two types of such failures are:

  • Out of two network interfaces configured on a node, the network interface with a service IP label fails, but an additional “backup” network interface card remains available on the same node. In this case, the Cluster Manager swaps the roles of these two interface cards on that node. Such a network interface failure is transparent to you except for a small delay while the system reconfigures the network interface on a node.
  • Out of two network interfaces configured on a node, an additional, or a “backup” network interface fails, but the network interface with a service IP label configured on it remains available. In this case, the Cluster Manager detects a “backup” network interface failure, logs the event, and sends a message to the system console. If you want additional processing, you can customize the processing for this event.
  • The following figures illustrate network interface swapping that occurs on the same node:

    Configuration before Network Adapter Swap 
     



    Configuration after Network Adapter Swap

    Hardware Address Swapping and Adapter Swapping

    Hardware address swapping works in conjunction with adapter swapping (as well as IP address takeover via IP Replacement). With hardware address swapping enabled, the “backup” network interface assumes the hardware network address (in addition to the IP address) of the failed network interface that had the service IP label configured on it so that it can provide the service that the failed network interface was providing to the cluster clients.

    Without hardware address swapping, TCP/IP clients and routers that reside on the same subnet as the cluster nodes must have their Address Resolution Protocol (ARP) cache updated. The ARP cache contains a mapping of IP addresses to hardware addresses. The use of hardware address swapping is highly recommended for clients that cannot run the Clinfo daemon (machines not running AIX 5L), or that cannot easily update their ARP cache.

    Note: SP Switch networks do not support hardware address swapping. However, note that the SP switch networks can be configured such that their IP network interfaces update their ARP caches automatically when IP Address Takeover via IP Aliases occurs. For more information, see the Administration Guide.

    Eliminating Networks as a Single Point of Failure

    Network failure occurs when an HACMP network fails for all the nodes in a cluster. This type of failure occurs when none of the cluster nodes can access each other using any of the network interface cards configured for a given HACMP network.

    The following figure illustrates a network failure:

    Network Failure 
    

    The HACMP software’s first line of defense against a network failure is to have the nodes in the cluster connected by multiple networks. If one network fails, the HACMP software uses a network that is still available for cluster traffic and for monitoring the status of the nodes.

    You can specify additional actions to process a network failure—for example, re-routing through an alternate network. Having at least two networks to guard against network failure is highly recommended.

    When a local network failure event occurs, the Cluster Manager takes selective recovery actions for resource groups containing a service IP label connected to that network. The Cluster Manager attempts to move only the resource groups affected by the local network failure event, rather than all resource groups on a particular node.

    Node Isolation and Partitioned Clusters

    Node isolation occurs when all networks connecting two or more parts of the cluster fail. Each group (one or more) of nodes is completely isolated from the other groups. A cluster in which certain groups of nodes are unable to communicate with other groups of nodes is a partitioned cluster.

    In the following illustration of a partitioned cluster, Node A and Node C are on one side of the partition and Node B and Node D are on the other side of the partition.

    Partitioned Cluster 
    

    The problem with a partitioned cluster is that the nodes on one side of the partition interpret the absence of heartbeats from the nodes on the other side of the partition to mean that those nodes have failed and then generate node failure events for those nodes. Once this occurs, nodes on each side of the cluster (if so configured) attempt to take over resources from a node that is still active and, therefore, still legitimately owns those resources. These attempted takeovers can cause unpredictable results in the cluster—for example, data corruption due to a disk being reset.

    Using Device-Based Networks to Prevent Partitioning

    To guard against the TCP/IP subsystem failure causing node isolation, each node in the cluster should be connected by a point-to-point non-IP-based network to its neighboring nodes, forming a logical “ring.” This logical ring of point-to-point networks reduces the chance of node isolation by allowing neighboring Cluster Managers to communicate even when all TCP/IP-based networks fail.

    You can configure two kinds of point-to-point, non-IP-based networks in HACMP:

  • Point-to-point networks, which use serial network interface cards and RS232 connections. Not all serial ports can be used for this function. For more information, see the Planning Guide.
  • Disk networks, which use a shared disk and a disk bus as a point-to-point network. Any disk that is included in an HACMP enhanced concurrent volume group can be used. (You can also use TM SSA or SCSI disks that are not included in an enhanced concurrent volume group).
  • Point-to-point, device-based networks are especially important in concurrent access configurations so that data does not become corrupted when TCP/IP traffic among nodes is lost. Device-based networks do not carry TCP/IP communication between nodes; they only allow nodes to exchange heartbeats and control messages so that Cluster Managers have accurate information about the status of peer nodes.

    Using Global Networks to Prevent Partitioning

    You can also configure a “logical” global network that groups multiple networks of the same type. Global networks help to avoid node isolation when an HACMP cluster network fails.

    Eliminating Disks and Disk Adapters as a Single Point of Failure

    The HACMP software does not itself directly handle disk and disk adapter failures. Rather, AIX 5L handles these failures through LVM mirroring on disks and by internal data redundancy on the IBM 2105 ESS and SSA disks.

    For example, by configuring the system with multiple SCSI-3 chains, serial adapters, and then mirroring the disks across these chains, any single component in the disk subsystem (adapter, cabling, disks) can fail without causing unavailability of data on the disk.

    If you are using the IBM 2105 ESS and SSA disk arrays, the disk array itself is responsible for providing data redundancy.

    AIX Error Notification Facility

    The AIX Error Notification facility allows you to detect an event not specifically monitored by the HACMP software—a disk adapter failure, for example—and to program a response to the event.

    Permanent hardware errors on disk drives, controllers, or adapters can affect the fault resiliency of data. By monitoring these errors through error notification methods, you can assess the impact of a failure on the cluster’s ability to provide high availability. A simple implementation of error notification would be to send a mail message to the system administrator to investigate the problem further. A more complex implementation could include logic to analyze the failure and decide whether to continue processing, stop processing, or escalate the failure to a node failure and have the takeover node make the volume group resources available to clients.

    It is strongly recommended that you implement an error notification method for all errors that affect the disk subsystem. Doing so ensures that degraded fault resiliency does not remain undetected.

    AIX error notification methods are automatically used in HACMP to monitor certain recoverable LVM errors, such as volume group loss errors.

    Automatic Error Notification

    You can automatically configure error notification for certain cluster resources using a specific option in SMIT. If you select this option, error notification is turned on automatically on all nodes in the cluster for particular devices.

    Certain non-recoverable error types are supported by automatic error notification: disk, disk adapter, and SP switch adapter errors. This feature does not support media errors, recovered errors, or temporary errors. One of two error notification methods is assigned for all error types supported by automatic error notification.

    In addition, if you add a volume group to a resource group, HACMP creates an AIX 5L Error Notification method for it. In the case where a volume group loses quorum, HACMP uses this method to selectively move the affected resource group to another node. Do not edit or alter the error notification methods that are generated by HACMP.

    Error Emulation

    The Error Emulation utility allows you to test your error notification methods by simulating an error. When the emulation is complete, you can check whether your customized notification method was exercised as intended.

    Minimizing Scheduled Downtime with HACMP

    The HACMP software enables you to perform most routine maintenance tasks on an active cluster dynamically—without having to stop and then restart cluster services to make the changed configuration the active configuration. Several features contribute to this:

  • Starting Cluster Services without Stopping Applications
  • Dynamic Automatic Reconfiguration (DARE)
  • Resource Group Management
  • Cluster Single Point of Control (C-SPOC)
  • Dynamic Adapter Swap
  • Automatic Verification and Synchronization.
  • Starting Cluster Services without Stopping Applications

    In HACMP 5.4, you can start the HACMP cluster services on the node(s) without stopping your applications. For more information on configuring application monitoring and steps needed to start cluster services without stopping the applications, see the chapter on Starting and Stopping Cluster Services in the Administration Guide.

    Dynamic Automatic Reconfiguration (DARE)

    This process, called dynamic automatic reconfiguration or dynamic reconfiguration (DARE), is triggered when you synchronize the cluster configuration after making changes on an active cluster. Applying a cluster snapshot using SMIT also triggers a dynamic reconfiguration event.

    For example, to add a node to a running cluster, you simply connect the node to the cluster, add the node to the cluster topology on any of the existing cluster nodes, and synchronize the cluster. The new node is added to the cluster topology definition on all cluster nodes and the changed configuration becomes the currently active configuration. After the dynamic reconfiguration event completes, you can start cluster services on the new node.

    HACMP verifies the modified configuration before making it the currently active configuration to ensure that the changes you make result in a valid configuration.

    How Dynamic Reconfiguration Works

    With Dynamic Reconfiguration of a running cluster, whenever HACMP starts, it creates a private copy of the HACMP-specific object classes stored in the system default Object Data Model (ODM). From now on, the ODM is referred to as the HACMP Configuration Database.

    Two directories store configuration database data:

  • The Active Configuration Directory (ACD), a private directory, stores the HACMP Configuration Database data for reference by all the HACMP daemons, scripts, and utilities on a running node.
  • The Default Configuration Directory (DCD), the system default directory, stores HACMP configuration database and data.
  • Note: The operation of DARE is described here for completeness. No manual intervention is required to ensure that HACMP carries out these operations. HACMP correctly manages all dynamic reconfiguration operations in the cluster.

    The DCD is the directory named /etc/objrepos. This directory contains the default system object classes, such as the customized device database (CuDv) and the predefined device database (PdDv), as well as the HACMP-specific object classes. The ACD is /usr/es/sbin/cluster/etc/objrepos/active.

    Note: When you configure a cluster, you modify the HACMP configuration database data stored in the DCD—not data in the ACD. SMIT and other HACMP configuration utilities all modify the HACMP configuration database data in the DCD. In addition, all user commands that display HACMP configuration database data, such as the cllsif command, read data from the DCD.

    The following figure illustrates how the HACMP daemons, scripts, and utilities all reference the ACD when accessing configuration information.

    Relationship of HACMP to ACD at Cluster Start-Up 
    

    Reconfiguring a Cluster Dynamically

    The HACMP software depends on the location of certain HACMP configuration database repositories to store configuration data. The presence or absence of these repositories is sometimes used to determine steps taken during cluster configuration and operation. The ODMPATH environment variable allows HACMP configuration database commands and subroutines to query locations other than the default location (held in the ODMDIR environment variable) if the queried object does not exist in the default location. You can set this variable, but it must not be set to include the /etc/objrepos directory or you will lose the integrity of the HACMP configuration information.

    To change the configuration of an active cluster, you modify the cluster definition stored in the HACMP-specific HACMP configuration database classes stored in the DCD using SMIT. When you change the cluster configuration in an active cluster, you use the same SMIT paths to make the changes, but the changes do not take effect immediately. Therefore, you can make several changes in one operation. When you synchronize your configuration across all cluster nodes, a cluster-wide dynamic reconfiguration event occurs. When HACMP processes a dynamic reconfiguration event, it updates the HACMP configuration database object classes stored in the DCD on each cluster and replaces the HACMP configuration database data stored in the ACD with the new HACMP configuration database data in the DCD, in a coordinated, cluster-wide transition. It also refreshes the cluster daemons so that they reference the new configuration data.

    After this processing, the cluster heartbeat is suspended briefly and the cluster is in an unstable state. The changed configuration becomes the active configuration. After cluster services are started on the newly added node, it is automatically integrated into the cluster.

    The following figure illustrates the processing involved with adding a node to an active cluster using dynamic reconfiguration.

    Dynamic Reconfiguration Processing 
    

    The node to be added is connected to a running cluster, but cluster services are inactive on this node. The configuration is redefined on NodeA. When the changes to the configuration are synchronized, the HACMP configuration database data stored in the DCD on NodeA is copied to the DCDs on other cluster nodes and a dynamic reconfiguration event is triggered. HACMP copies the new HACMP configuration database data in the DCD into a temporary location on each node, called the Staging Configuration Directory (SCD). The location of the SCD is /usr/es/sbin/cluster/etc/objrepos/stage. By using this temporary location, HACMP allows you to start making additional configuration changes while a dynamic reconfiguration is in progress. Before copying the new HACMP configuration database data in the SCD over the current HACMP configuration database data in the ACD, HACMP verifies the new configuration.

    Note: You can initiate a second reconfiguration while a dynamic reconfiguration is in progress, but you cannot synchronize it. The presence of an SCD on any cluster node acts as a lock, preventing the initiation of a new dynamic reconfiguration.

    Resource Group Management

    You can use the Resource Group Management (clRGmove) utility to move resource groups to other cluster nodes (or sites) or take them online or offline without stopping cluster services. This gives you the flexibility for managing resource groups and their applications. You can also use this utility to free the node of any resource groups to perform system maintenance on a particular cluster node.

    In HACMP 5.4, the HACMP Resource Group Management Utility, clRGmove, is significantly improved, making it easier for you to move the resource groups around for cluster management. In addition, you also have a clear and easy way to understand the consequences of manually moving resource groups. For example, you can clearly predict whether the groups will stay on the nodes to which they were moved.

    HACMP follows this simple principle: In all cases, when you move the resource groups, they stay on the nodes until you move them again. (Note that HACMP moves them around when it needs to recover them).

    If you want to check whether the resource group is currently hosted on the highest priority node that is now available, HACMP presents intelligent picklist choices for nodes and sites. For instance, if the group is currently hosted on one node, and HACMP finds another node that has a higher priority, then the SMIT picklist with destination nodes indicates which node has a higher priority. This way, you can always choose to move this group to this node.

    When you move groups to other nodes, these rules apply:

  • For resource groups with a fallback policy of Never Fallback, moving a group will have no effect on the behavior of that group during the future cluster events. The same is true for resource groups with the site fallback policy Online on Either Site.
  • For resource groups with a fallback policy other than Never Fallback (and Prefer Primary Site, if sites are defined), moving a group will result in a destination node becoming an “acting” highest priority node until you move it again, in which case, again, the node becomes an “acting” highest priority node.
  • One important consideration for this behavior has to do with resource groups that have Fallback to Highest Priority Node policy (or Prefer Primary Site policy). When you move such a resource group to a node other than its highest priority node (or Primary site), the node to which it was moved becomes its temporarily “preferred” node while not being its highest priority node (as configured). Such groups stay on the nodes to which they were moved until you move them again. The groups also fall back to these nodes (or sites).

    For more information about resource group management, see the following:

  • Overview in Chapter 7: HACMP Configuration Process and Facilities
  • Chapter on planning resource groups in the Planning Guide
  • Chapter on changing resources and resource groups in the Administration Guide for complete information and instructions on performing resource group management through SMIT.
  • User-Requested Resource Group Management vs. Automatic Resource Group Management

    In general, to keep applications highly available, HACMP automatically manages (and sometimes moves) resource groups and applications included in them. For instance, when it is necessary to recover a resource group, HACMP may attempt to recover it automatically on another node during fallover or fallback operations. While moving a group, HACMP adheres to the resource group policies that you specified, and other settings (for instance, rather than automatically recovering a failed resource group on another node, you can tell HACMP to just notify you of the group’s failure).

    When you request HACMP to perform resource group management, it uses the clRGmove utility, which moves resource groups by calling an rg_move event.

    Note: When troubleshooting log files, it is important to distinguish between an rg_move event that in some cases is triggered automatically by HACMP, and an rg_move event that occurs when you request HACMP to manage resource groups for you. To identify the causes of operations performed on the resource groups in the cluster, look for the command output in SMIT and for information in the hacmp.out file.

    Resource Group Management Operations

    Use resource group management to:

  • Move a resource group from the node on one site to the node on another site.
  • Move a resource group from one node to another.
  • In a working cluster, temporarily move a non-concurrent resource group from a node it currently resides on to any destination node. Resource groups that you move continue to behave consistently with the way you configured them, that is, they follow the startup, fallover and fallback policies specified for them. The SMIT user interface lets you clearly specify and predict the resource group’s behavior, if you decide to move it to another node.
    If you use SMIT to move a resource group to another node, it remains on its new destination node until you manually move it again. Note that HACMP may need to move it during a fallover.
  • Move the resource group back to the node that was originally its highest priority.
  • The resource group may or may not have a fallback policy. If a resource group has a fallback policy of Fallback to Highest Priority Node, after you move it, the group assumes that the “new” node is now its preferred temporary location, and falls back to this node. To change this behavior, you can always move the group back to the node that was originally its highest priority node.

    Similarly, if you have a resource group that has a fallback policy Never Fallback, once you move this resource group, it will not move back to the node from which it was moved but will remain on its new destination node, until you move it again to another node. This way, you can be assured that the group always follows the Never Fallback policy that you specified for it.

  • Bring a resource group online or offline on one or all nodes in the cluster. See the Administration Guide for detailed information on what kinds of online and offline operations you can perform on concurrent and non-concurrent resource groups.
  • Cluster Single Point of Control (C-SPOC)

    With the C-SPOC utility, you can make changes to the whole cluster from a single cluster node. Instead of performing administrative tasks on each cluster node, you can use the SMIT interface to issue a C-SPOC command once, on a single node, and the change is propagated across all cluster nodes.

    For more information about C-SPOC, see the section HACMP System Management with C-SPOC in Chapter 7: HACMP Configuration Process and Facilities.

    Dynamic Adapter Swap

    The dynamic adapter swap functionality lets you swap the IP address of an active network interface card (NIC) with the IP address of a user-specified active, available “backup” network interface card on the same node and network. Cluster services do not have to be stopped to perform the swap.

    This feature can be used to move an IP address off a network interface card that is behaving erratically, to another NIC without shutting down the node. It can also be used if a hot pluggable NIC is being replaced on the node. Hot pluggable NICs can be physically removed and replaced without powering off the node. When the (hot pluggable) NIC to be replaced is pulled from the node, HACMP makes the NIC unavailable as a backup.

    You can configure adapter swap using SMIT. The service IP address is moved from its current NIC to a user-specified NIC. The service IP address then becomes an available “backup” address. When the new card is placed in the node, the NIC is incorporated into the cluster as an available “backup” again. You can then swap the IP address from the backup NIC to the original NIC.

    Note: This type of dynamic adapter swap can only be performed within a single node. You cannot swap the IP address with the IP address on a different node with this functionality. To move a service IP address to another node, move its resource group using the Resource Group Management utility.
    Note: The dynamic adapter swap feature is not supported on the SP switch network.

    Automatic Verification and Synchronization

    Automatic verification and synchronization minimizes downtime when you add a node to your cluster. This process runs prior to starting cluster services and checks to make sure that nodes joining a cluster are synchronized appropriately. This process checks nodes entering either active or inactive configurations.

    Automatic verification and synchronization ensures that typical configuration inconsistencies are corrected as follows:

  • RSCT numbers are consistent across the cluster
  • IP addresses are configured on the network interfaces that RSCT expects
  • Shared volume groups are not set to be automatically varied on
  • Filesystems are not set to be automatically mounted.
  • If any additional configuration errors are found, cluster services are not started on the node, and detailed error messages enable you to resolve the inconsistencies.

    For more information about automatic verification and synchronization, see Chapter 7: Verifying and Synchronizing a Cluster Configuration in the Administration Guide.

    Minimizing Unscheduled Downtime

    Another important goal with HACMP is to minimize unscheduled downtime in response to unplanned cluster component failures. The HACMP software provides the following features to minimize unscheduled downtime:

  • Fast recovery to speed up the fallover in large clusters
  • A delayed fallback timer to allow a custom resource group to fall back at a specified time
  • IPAT via IP Aliases to speed up the processing during recovery of service IP labels
  • Automatic recovery of resource groups that are in the ERROR state, whenever a cluster node comes up. For more information, see the following section.
  • Recovering Resource Groups on Node Startup

    Prior to HACMP 5.2, when a node joined the cluster, it did not acquire any resource groups that had previously gone into an ERROR state on any other node. Such resource groups remained in the ERROR state and required use of the Resource Group Migration utility, clRGmove, to manually bring them back online.

    Starting with HACMP 5.2, the Cluster Manager tries to bring the resource groups that are currently in the ERROR state into the online (active) state on the joining node. This further increases the chances of bringing the applications back online. When a node starts up, if a resource group is in the ERROR state on any node in the cluster, this node attempts to acquire the resource group. Note that the node must be included in the nodelist for the resource group.

    The resource group recovery on node startup is different for non-concurrent and concurrent resource groups:

  • If the starting node fails to activate a non-concurrent resource group that is in the ERROR state, the resource group continues to fall over to another node in the nodelist, if a node is available. The fallover action continues until all available nodes in the nodelist have been tried.
  • If the starting node fails to activate a concurrent resource group that is in the ERROR state on the node, the concurrent resource group is left in the ERROR state on that node. Note that the resource group might still remain online on other nodes.
  • Fast Recovery

    The HACMP fast recovery feature speeds up fallover in large clusters.

    Fast recovery lets you select a filesystems consistency check and a filesystems recovery method:

  • If you configure a filesystem to use a consistency check and a recovery method, it saves time by running logredo rather than fsck on each filesystem. If the subsequent mount fails, then it runs a full fsck.
  • If a filesystem suffers damage in a failure but can still be mounted, logredo may not succeed in fixing the damage, producing an error during data access.

  • In addition, it saves time by acquiring, releasing, and falling over all resource groups and filesystems in parallel, rather than serially.
  • Do not set the system to run these commands in parallel if you have shared, nested filesystems. These must be recovered sequentially. (Note that the cluster verification utility does not report filesystem and fast recovery inconsistencies.)

    The varyonvg and varyoffvg commands always run on volume groups in parallel, regardless of the setting of the recovery method.

    Delayed Fallback Timer for Resource Groups

    The Delayed Fallback Timer lets a resource group fall back to the higher priority node at a time that you specify. The resource group that has a delayed fallback timer configured and that currently resides on a non-home node falls back to the higher priority node at the recurring time (daily, weekly, monthly or yearly), or on a specified date.

    For more information on the delayed fallback timer, see the Planning Guide.

    Minimizing Takeover Time: Fast Disk Takeover

    In the case of a cluster failure, enhanced concurrent volume groups are taken over faster than in previous releases of HACMP due to the improved disk takeover mechanism.

    HACMP automatically detects enhanced concurrent volume groups and ensures that the faster option for volume group takeover is launched in the event of a node failure, if:

  • You have installed AIX 5L 5.2 or 5.3 and HACMP.
  • You include in your non-concurrent resource groups the enhanced concurrent mode volume groups (or convert the existing volume groups to enhanced concurrent volume groups).
  • This functionality is especially useful for fallover of volume groups made up of a large number of disks.

    During fast disk takeover, HACMP skips the extra processing needed to break the disk reserves, or update and synchronize the LVM information by running lazy update. As a result, the disk takeover mechanism of HACMP used for enhanced concurrent volume groups is faster than disk takeover used for standard volume groups included in non-concurrent resource groups.

    Maximizing Disaster Recovery

    HACMP can be an integral part of a comprehensive disaster recovery plan for your enterprise. Three possible ways to distribute backup copies of data to different sites, for possible disaster recovery operations, include:

  • HACMP/XD for Geographic LVM (GLVM)
  • HACMP/XD for Metro Mirror (synchronous PPRC with ESS and DS systems)
  • HACMP/XD for HAGEO (IP Mirroring)
  • Cross-Site LVM Mirroring.
  • For more information on the disaster recovery solutions included in HACMP/XD, see About This Guide for the documentation and the location of Release Notes.

    Cross-Site LVM Mirroring

    Starting with HACMP 5.2, you can set up disks located at two different sites for remote LVM mirroring, using a Storage Area Network (SAN), for example. Cross-site LVM mirroring replicates data between the disk subsystem at each site for disaster recovery.

    A SAN is a high-speed network that allows the establishment of direct connections between storage devices and processors (servers) within the distance supported by Fibre Channel. Thus, two or more servers (nodes) located at different sites can access the same physical disks, which can be separated by some distance as well, through the common SAN. The disks can be combined into a volume group via the AIX 5L Logical Volume Manager, and this volume group can be imported to the nodes located at different sites. The logical volumes in this volume group can have up to three mirrors. Thus, you can set up at least one mirror at each site. The information stored on this logical volume is kept highly available, and in case of certain failures, the remote mirror at another site will still have the latest information, so the operations can be continued on the other site.

    HACMP automatically synchronizes mirrors after a disk or node failure and subsequent reintegration. HACMP handles the automatic mirror synchronization even if one of the disks is in the PVREMOVED or PVMISSING state. Automatic synchronization is not possible for all cases, but you can use C-SPOC to synchronize the data manually from the surviving mirrors to stale mirrors after a disk or site failure and subsequent reintegration.

    Cluster Events

    This section describes how the HACMP software responds to changes in a cluster to maintain high availability.

    The HACMP cluster software monitors all the components that make up the highly available application including disks, network interfaces, nodes and the applications themselves. The Cluster Manager uses different methods for monitoring different resources:

  • RSCT subsystem is responsible for monitoring networks and nodes.
  • The AIX 5L LVM subsystem produces error notifications for volume group quorum loss.
  • The Cluster Manager itself dispatches application monitors.
  • An HACMP cluster environment is event-driven. An event is a change of status within a cluster that the Cluster Manager recognizes and processes. A cluster event can be triggered by a change affecting a network interface card, network, or node, or by the cluster reconfiguration process exceeding its time limit. When the Cluster Manager detects a change in cluster status, it executes a script designated to handle the event and its subevents.

    Note: The logic of cluster events is described here for completeness. No manual intervention is required to ensure that HACMP carries out cluster events correctly.

    The following examples show some events the Cluster Manager recognizes:

  • node_up and node_up_complete events (a node joining the cluster)
  • node_down and node_down_complete events (a node leaving the cluster)
  • Local or global network_down event (a network has failed)
  • network_up event (a network has connected)
  • swap_adapter event (a network adapter failed and a new one has taken its place)
  • Dynamic reconfiguration events.
  • When a cluster event occurs, the Cluster Manager runs the corresponding event script for that event. As the event script is being processed, a series of subevent scripts may be executed. The HACMP software provides a script for each event and subevent. The default scripts are located in the /usr/es/sbin/cluster/events directory.

    By default, the Cluster Manager calls the corresponding event script supplied with the HACMP software for a specific event. You can specify additional processing to customize event handling for your site if needed. For more information, see the section Customizing Event Processing.

    Processing Cluster Events

    The two primary cluster events that HACMP software handles are fallover and reintegration:

  • Fallover refers to the actions taken by the HACMP software when a cluster component fails or a node leaves the cluster.
  • Reintegration refers to the actions that occur within the cluster when a component that had previously left the cluster returns to the cluster.
  • Event scripts control both types of actions. During event script processing, cluster-aware application programs see the state of the cluster as unstable.

    Fallover

    A fallover occurs when a resource group moves from its home node to another node because its home node leaves the cluster.

    Nodes leave the cluster either by a planned transition (a node shutdown or stopping cluster services on a node), or by failure. In the former case, the Cluster Manager controls the release of resources held by the exiting node and the acquisition of these resources by nodes still active in the cluster. When necessary, you can override the release and acquisition of resources (for example, to perform system maintenance). You can also postpone the acquisition of the resources by integrating nodes (by setting the delayed fallback timer for custom resource groups).

    Node failure begins when a node monitoring a neighboring node ceases to receive keepalive traffic for a defined period of time. If the other cluster nodes agree that the failure is a node failure, the failing node is removed from the cluster and its resources are taken over by the active nodes configured to do so.

    If other components, such as a network interface card, fail, the Cluster Manager runs an event script to switch network traffic to a backup network interface card (if present).

    Reintegration

    A reintegration, or a fallback occurs when a resource group moves to a node that has just joined the cluster.

    When a node joins a running cluster, the cluster becomes temporarily unstable. The member nodes coordinate the beginning of the join process and then run event scripts to release any resources the joining node is configured to take over. The joining node then runs an event script to take over these resources. Finally, the joining node becomes a member of the cluster. At this point, the cluster is stable again.

    Emulating Cluster Events

    HACMP provides an emulation utility to test the effects of running a particular event without modifying the cluster state. The emulation runs on every active cluster node, and the output is stored in an output file on the node from which the emulation was launched.

    For more information on the Event Emulator utility, see Chapter 7: HACMP Configuration Process and Facilities.

    Customizing Event Processing

    The HACMP software has an event customization facility you can use to tailor event processing. The Cluster Manager’s ability to recognize a specific series of events and subevents permits a very flexible customization scheme. Customizing event processing allows you to provide the most efficient path to critical resources should a failure occur.

    You can define multiple pre- and post-events for a list of events that appears in the picklist in the Change/Show Pre-Defined HACMP Events SMIT panel.

    Customization for an event could include notification to the system administrator before and after the event is processed, as well as user-defined commands or scripts before and after the event processing, as shown in the list:

  • Notification to system administrator of event to be processed
  • Pre-event script or command
  • HACMP for AIX 5L event script
  • Post-event script or command
  • Notification to system administrator event processing is complete.
  • Use this facility for the following types of customization:

  • Pre- and post-event processing
  • Event notification
  • Event recovery and retry.
  • Note: In HACMP, the event customization information stored in the HACMP configuration database is synchronized across all cluster nodes when the cluster resources are synchronized. Thus, pre- and post-notification, and recovery event script names must be the same on all nodes, although the actual processing done by these scripts can be different.

    Cluster verification includes a function to monitor cluster configuration automatically by means of a new event called cluster_notify. You can use this event to configure an HACMP remote notification method (numeric or alphanumeric page, or text messaging) to send out a message if errors in cluster configuration are found. The output of this event is also logged in hacmp.out on each cluster node that is running cluster services.

    You may also send email notification to cell phones through the event notification scripts; however, using remote notification has advantages. If you are the person responsible for responding to event notifications changes, you must manually change the address in each event notification script. Define for each person remote notification methods that contain all the events and nodes so you can switch the notification methods as a unit when responders change.

    Defining New Events

    In HACMP, it is possible to define new events as well as to tailor the existing ones.

    Pre- and Post-Event Processing

    To tailor event processing to your environment, specify commands or user-defined scripts that execute before and after a specific event is generated by the Cluster Manager. For pre-processing, for example, you may want to send a message to specific users, informing them to stand by while a certain event occurs. For post-processing, you may want to disable login for a specific group of users if a particular network fails.

    Event Notification

    You can specify a command or user-defined script that provides notification (for example, mail) that an event is about to happen and that an event has just occurred, along with the success or failure of the event. You can also define a notification method through the SMIT interface to issue a customized remote notification method in response to a cluster event.

    Event Recovery and Retry

    You can specify a command that attempts to recover from an event command failure. If the retry count is greater than zero and the recovery command succeeds, the event script command is run again. You can also specify the number of times to attempt to execute the recovery command.

    Customizing Event Duration

    HACMP software issues a system warning each time a cluster event takes more time to complete than a specified timeout period.

    Using the SMIT interface, you can customize the time period allowed for a cluster event to complete before HACMP issues a system warning for it.


    PreviousNextIndex