PreviousNextIndex

Chapter 2: HACMP Cluster Nodes, Sites, Networks, and Heartbeating


This chapter introduces major cluster topology-related concepts and definitions that are used throughout the documentation and in the HACMP user interface.

The information in this chapter is organized as follows:

  • Cluster Nodes and Cluster Sites
  • Cluster Networks
  • Subnet Routing Requirements in HACMP
  • IP Address Takeover
  • IP Address Takeover via IP Aliases
  • IP Address Takeover via IP Replacement
  • Heartbeating over Networks and Disks.
  • Cluster Nodes and Cluster Sites

    A typical HACMP cluster environment consists of nodes that can serve as clients or servers. If you are using the HACMP/XD software or LVM cross-site mirroring, sites or groups of nodes become part of the cluster topology.

    Nodes

    A node is a processor that runs both AIX 5L and the HACMP software. Nodes may share a set of resources—disks, volume groups, filesystems, networks, network IP addresses, and applications.

    The HACMP software supports from two to thirty-two nodes in a cluster. In an HACMP cluster, each node is identified by a unique name. In HACMP, a node name and a hostname can usually be the same.

    Nodes serve as core physical components of an HACMP cluster. For more information on nodes and hardware, see the section Nodes in Chapter 1: HACMP for AIX.

    Two types of nodes are defined:

  • Server nodes form the core of an HACMP cluster. Server nodes run services or back end applications that access data on the shared external disks.
  • Client nodes run front end applications that retrieve data from the services provided by the server nodes. Client nodes can run HACMP software to monitor the health of the nodes, and to react to failures.
  • Server Nodes

    A cluster server node usually runs an application that accesses data on the shared external disks. Server nodes run HACMP daemons and keep resources highly available. Typically, applications are run, storage is shared between these nodes, and clients connect to the server nodes through a service IP address.

    Client Nodes

    A full high availability solution typically includes the client machine that uses services provided by the servers. Client nodes can be divided into two categories: naive and intelligent.

  • A naive client views the cluster as a single entity. If a server fails, the client must be restarted, or at least must reconnect to the server.
  • An intelligent client is cluster-aware. A cluster-aware client reacts appropriately in the face of a server failure, connecting to an alternate server, perhaps masking the failure from the user. Such an intelligent client must have knowledge of the cluster state.
  • HACMP extends the cluster paradigm to clients by providing both dynamic cluster configuration reporting and notification of cluster state changes, such as changes in subsystems or node failure.

    Sites

    You can define a group of one or more server nodes as belonging to a site. The site becomes a component, like a node or a network, that is known to the HACMP software. HACMP supports clusters divided into two sites.

    Using sites, you can configure cross-site LVM mirroring. You configure logical volume mirrors between physical volumes in separate storage arrays, and specify to HACMP which physical volumes are located at each site. Later, when you use C-SPOC to create new logical volumes, HACMP automatically displays the site location of each defined physical volume, making it easier to select volumes from different sites for LVM mirrors. For more information on cross-site LVM mirroring, see the Planning Guide.

    In addition, the HACMP/XD (Extended Distance) feature provides three distinct software solutions for disaster recovery. These solutions enable an HACMP cluster to operate over extended distances at two sites.

  • HACMP/XD for Metro Mirror increases data availability for IBM TotalStorage Enterprise Storage Server (ESS) volumes that use Peer-to-Peer Remote Copy (PPRC) to copy data to a remote site for disaster recovery purposes. HACMP/XD for Metro Mirror takes advantage of the PPRC fallover/fallback functions and HACMP cluster management to reduce downtime and recovery time during disaster recovery.
  • When PPRC is used for data mirroring between sites, the physical distance between sites is limited to the capabilities of the ESS hardware.
  • HACMP/XD for Geographic Logical Volume Manager (GLVM) increases data availability for IBM volumes that use GLVM to copy data to a remote site for disaster recovery purposes. HACMP/XD for GLVM takes advantage of the following components to reduce downtime and recovery time during disaster recovery:
  • AIX 5L and HACMP/XD for GLVM data mirroring and synchronization. Both standard and enhanced concurrent volume groups can be made geographically mirrored with the GLVM utilities.
  • TCP/IP-based unlimited distance network support (up to four XD_data data mirroring networks can be configured).
  • HACMP cluster management. HACMP ensures that in case of component failures, a mirrored copy of the data is accessible at either local or remote site. Both concurrent and non-concurrent resource groups can be configured in an HACMP cluster with GLVM, however, inter-site policy should not be concurrent.
  • HACMP/XD for HAGEO Technology uses the TCP/IP network to enable unlimited distance for data mirroring between sites. (Note that although the distance is unlimited, practical restrictions exist on the bandwidth and throughput capabilities of the network).
  • This technology is based on the IBM High Availability Geographic Cluster for AIX (HAGEO) v 2.4 product. HACMP/XD for HAGEO Technology extends an HACMP cluster to encompass two physically separate data centers. Data entered at one site is sent across a point-to-point IP network and mirrored at a second, geographically distant location.

    Each site can be a backup data center for the other, maintaining an updated copy of essential data and running key applications. If a disaster disables one site, the data is available within minutes at the other site. The HACMP/XD software solutions thus increase the level of availability provided by the HACMP software by enabling it to recognize and handle a site failure, to continue processing even though one of the sites has failed, and to reintegrate the failed site back into the cluster.

    For information on HACMP/XD for Metro Mirror, HACMP/XD for GLVM, and HACMP/XD for HAGEO, see the documentation for each of those solutions.

    Cluster Networks

    Cluster nodes communicate with each other over communication networks. If one of the physical network interface cards on a node on a network fails, HACMP preserves the communication to the node by transferring the traffic to another physical network interface card on the same node. If a “connection” to the node fails, HACMP transfers resources to another node to which it has access.

    In addition, RSCT sends heartbeats between the nodes over the cluster networks to periodically check on the health of the cluster nodes themselves. If HACMP detects no heartbeats from a node, a node is considered as failed and resources are automatically transferred to another node. For more information, see Heartbeating over Networks and Disks in this chapter.

    We highly recommend configuring multiple communication paths between the nodes in the cluster. Having multiple communication networks prevents cluster partitioning, in which the nodes within each partition form their own entity. In a partitioned cluster, it is possible that nodes in each partition could allow simultaneous non-synchronized access to the same data. This can potentially lead to different views of data from different nodes.

    Physical and Logical Networks

    A physical network connects two or more physical network interfaces. There are many types of physical networks, and HACMP broadly categorizes them as those that use the TCP/IP protocol, and those that do not:

  • TCP/IP-based, such as Ethernet or Token Ring
  • Device-based, such as RS232 or TM SSA.
  • As stated in the previous section, configuring multiple TCP/IP-based networks helps to prevent cluster partitioning. Multiple device-based networks also help to prevent partitioned clusters by providing additional communications paths in cases when the TCP/IP-based network connections become congested or severed between cluster nodes.

    Note: If you are considering a cluster where the physical networks use external networking devices to route packets from one network to another, consider the following: When you configure an HACMP cluster, HACMP verifies the connectivity and access to all interfaces defined on a particular physical network. However, HACMP cannot determine the presence of external network devices such as bridges and routers in the network path between cluster nodes. If the networks have external networking devices, ensure that you are using devices that are highly available and redundant so that they do not create a single point of failure in the HACMP cluster.

    A logical network is a portion of a physical network that connects two or more logical network interfaces/devices. A logical network interface/device is the software entity that is known by an operating system. There is a one-to-one mapping between a physical network interface/device and a logical network interface/device. Each logical network interface can exchange packets with each logical network interface on the same logical network.

    If a subset of logical network interfaces on the logical network needs to communicate with each other (but with no one else) while sharing the same physical network, subnets are used. A subnet mask defines the part of the IP address that determines whether one logical network interface can send packets to another logical network interface on the same logical network.

    Logical Networks in HACMP

    HACMP has its own, similar concept of a logical network. All logical network interfaces in an HACMP network can communicate HACMP packets with each other directly. Each logical network is identified by a unique name. If you use an automatic discovery function for HACMP cluster configuration, HACMP assigns a name to each HACMP logical network it discovers, such as net_ether_01.

    An HACMP logical network may contain one or more subnets. RSCT takes care of routing packets between logical subnets.

    For more information on RSCT, see Chapter 4: HACMP Cluster Hardware and Software.

    Global Networks

    A global network is a combination of multiple HACMP networks. The HACMP networks may be composed of any combination of physically different networks, and/or different logical networks (subnets), as long as they share the same network type, (for example, ethernet). HACMP treats the combined global network as a single network. RSCT handles the routing between the networks defined in a global network.

    Global networks cannot be defined for all IP-based networks but only for those IP-based networks that are used for heartbeating.

    Having multiple heartbeat paths between cluster nodes reduces the chance that the loss of any single network will result in a partitioned cluster. For example, multiple heartbeat paths between cluster nodes would be useful in a typical configuration of the SP Administrative Ethernet on two separate SP systems.

    Local and Global Network Failures

    When a failure occurs on a cluster network, HACMP uses network failure events to manage such cases. HACMP watches for and distinguishes between two types of network failure events: local network failure and global network failure events.

    Local Network Failure

    A local network failure is an HACMP event in which packets cannot be sent or received by one node over an HACMP logical network. This may occur, for instance, if all of the node’s network interface cards participating in the particular HACMP logical network fail. Note that in the case of a local network failure, the network is still in use by other nodes.

    To handle local network failures, HACMP selectively moves the resources (on that network) from one node to another. This operation is referred to as selective fallover.

    Global Network Failure

    A global network failure is an HACMP event in which packets cannot be sent or received by any node over an HACMP logical network. This may occur, for instance, if the physical network is damaged.

    Note: It is important to distinguish between these two terms in HACMP: a “global network” and a “global network failure event.” A global network is a combination of HACMP networks; a global network failure event refers to a failure that affects all nodes connected to any logical HACMP network, not necessarily a global network.

    HACMP Communication Interfaces

    An HACMP communication interface is a grouping of a logical network interface, a service IP address and a service IP label that you defined in HACMP. HACMP communication interfaces combine to create IP-based networks.

    An HACMP communication interface is a combination of:

  • A logical network interface is the name to which AIX 5L resolves a port (for example, en0) of a physical network interface card.
  • A service IP address is an IP address (for example, 129.9.201.1) over which services, such as an application, are provided, and over which client nodes communicate.
  • A service IP label is a label (for example, a hostname in the /etc/hosts file, or a logical equivalent of a service IP address, such as node_A_en_service) that maps to the service IP address.
  • Communication interfaces in HACMP are used in the following ways:

  • A communication interface refers to IP-based networks and NICs. The NICs that are connected to a common physical network are combined into logical networks that are used by HACMP.
  • Each NIC is capable of hosting several TCP/IP addresses. When configuring a cluster, you define to HACMP the IP addresses that HACMP monitors (base or boot IP addresses), and the IP addresses that HACMP keeps highly available (the service IP addresses).
  • Heartbeating in HACMP occurs over communication interfaces. HACMP uses the heartbeating facility of the RSCT subsystem to monitor its network interfaces and IP addresses. HACMP passes the network topology you create to RSCT, while RSCT provides failure notifications to HACMP.
  • HACMP Communication Devices

    HACMP also monitors network devices that are not capable of IP communications. These devices include RS232 connections and Target Mode (disk-based) connections.

    Device-based networks are point-to-point connections that are free of IP-related considerations such as subnets and routing—each device on a node communicates with only one other device on a remote node.

    Communication devices make up device-based networks. The devices have names defined by the operating system (such as tty0). HACMP allows you to name them as well (such as TTY1_Device1).

    For example, an RS232 or a point-to-point connection would use a device name of /dev/tty2 as the device configured to HACMP on each end of the connection. Two such devices need to be defined—one on each node.

    Note: The previous sections that described local and global network failures are true for TCP/IP-based HACMP logical networks. For device-based HACMP logical networks, these concepts do not apply. However, the heartbeating process occurs on device-based networks.

    Subnet Routing Requirements in HACMP

    A subnet route defines a path, defined by a subnet, for sending packets through the logical network to an address on another logical network. AIX 5L lets you add multiple routes for the same destination in the kernel routing table. If multiple matching routes have equal criteria, routing can be performed alternatively using one of the several subnet routes.

    It is important to consider subnet routing in HACMP because of the following considerations:

  • HACMP does not distinguish between logical network interfaces that share the same subnet route. If a logical network interface shares a route with another interface, HACMP has no means to determine its health. For more information on network routes, please see the AIX 5L man page for the route command.
  • Various constraints are often imposed on the IP-based networks by a network administrator or by TCP/IP requirements. The subnets and routes are also constraints within which HACMP must be configured for operation.
  • Note: We recommend that each communication interface on a node belongs to a unique subnet, so that HACMP can monitor each interface. This is not a strict requirement in all cases, and depends on several factors. In such cases where it is a requirement, HACMP enforces it. Also, ask your network administrator about the class and subnets used at your site.

    Service IP Label/Address

    A service IP label is a label that maps to the service IP address and is used to establish communication between client nodes and the server node. Services, such as a database application, are provided using the connection made over the service IP label.

    A service IP label can be placed in a resource group as a resource, which allows HACMP to monitor its health and keep it highly available, either within a node or, if IP Address Takeover is configured, between the cluster nodes by transferring it to another node in the event of a failure.

    Note: A service IP label/address is configured as part of configuring cluster resources, not as part of topology.

    IP Alias

    An IP alias is an IP label/address that is configured onto a network interface card in addition to the originally-configured IP label/address on the NIC. IP aliases are an AIX 5L function that is supported by HACMP. AIX 5L supports multiple IP aliases on a NIC. Each IP alias on a NIC can be configured on a separate subnet.

    IP aliases are used in HACMP both as service and non-service addresses for IP address takeover, as well as for the configuration of the heartbeating method.

    See the following sections for information on how HACMP binds a service IP label with a communication interface depending on which mechanism is used to recover a service IP label.

    IP Address Takeover

    If the physical network interface card on one node fails, and if there are no other accessible physical network interface cards on the same network on the same node (and, therefore, swapping IP labels of these NICs within the same node cannot be performed), HACMP may use the IP Address Takeover (IPAT) operation.

    IP Address Takeover is a mechanism for recovering a service IP label by moving it to another NIC on another node, when the initial NIC fails. IPAT is useful because it ensures that an IP label over which services are provided to the client nodes remains available.

    HACMP supports two methods for performing IPAT:

  • IPAT via IP Aliases (this is the default)
  • IPAT via IP Replacement (this method was known in previous releases as IPAT, or traditional IPAT).
  • Both methods are described in the sections that follow.

    IPAT and Service IP Labels

    The following list summarizes how IPAT manipulates the service IP label:

    When IPAT via IP Aliases is used
    The service IP label/address is aliased onto the same network interface as an existing communications interface.
    That is, multiple IP addresses/labels are configured on the same network interface at the same time. In this configuration, all IP addresses/labels that you define must be configured on different subnets.
    This method can save hardware, but requires additional subnets.
    When IPAT via IP Replacement is used
    The service IP label/address replaces the existing IP label/address on the network interface.
    That is, only one IP label/address is configured on the same network interface at the same time. In this configuration, two IP addresses/labels on a node can share a subnet, while a backup IP label/address on the node must be on a different subnet.
    This method can save subnets but requires additional hardware.

    IP Address Takeover via IP Aliases

    You can configure IP Address Takeover on certain types of networks using the IP aliasing network capabilities of AIX 5L. Defining IP aliases to network interfaces allows creation of more than one IP label and address on the same network interface. IPAT via IP Aliases utilizes the gratuitous ARP capabilities available on many types of networks.

    In a cluster with a network configured with IPAT via IP Aliases, when the resource group containing the service IP label falls over from the primary node to the target node, the initial IP labels that are used at boot time are added (and removed) as alias addresses on that NIC, or on other NICs that are available. Unlike in IPAT via IP Replacement, this allows a single NIC to support more than one service IP label placed on it as an alias. Therefore, the same node can host more than one resource group at the same time.

    If the IP configuration mechanism for an HACMP network is via IP Aliases, the communication interfaces for that HACMP network must use routes that are different from the one used by the service IP address.

    IPAT via IP Aliases provides the following advantages over the IPAT via IP Replacement scheme:

  • Running IP Address Takeover via IP Aliases is faster than running IPAT via IP Replacement, because moving the IP address and the hardware address takes considerably longer than simply moving the IP address.
  • IP aliasing allows co-existence of multiple service labels on the same network interface— you can use fewer physical network interface cards in your cluster. Note that upon fallover, HACMP equally distributes aliases between available network interface cards.
  • IPAT via IP Aliases is the default mechanism for keeping a service IP label highly available.

    Distribution Preference for Service IP Label Aliases

    By default, HACMP uses the IP Address Takeover (IPAT) via IP Aliases method for keeping the service IP labels in resource groups highly available.

    At cluster startup, by default HACMP distributes all service IP label aliases across all available boot interfaces on a network using the principle of the “least load.” HACMP assigns any new service address to the interface that has the least number of aliases or persistent IP labels already assigned to it.

    However, in some cases, it may be desirable to specify other types of allocation, or to ensure that the labels continue to be allocated in a particular manner, not only during startup but also during the subsequent cluster events.

    For instance, you may want to allocate all service IP label aliases to the same boot interface as the one currently hosting the persistent IP label for that node. This option may be useful in VPN firewall configurations where only one interface is granted external connectivity and all IP labels (persistent and service IP label aliases) must be placed on the same interface to enable the connectivity.

    You can configure a distribution preference for the aliases of the service IP labels that are placed under HACMP control.

    A distribution preference for service IP label aliases is a network-wide attribute used to control the placement of the service IP label aliases on the physical network interface cards on the nodes in the cluster. Configuring a distribution preference for service IP label aliases does the following:

  • Lets you customize the load balancing for service IP labels in the cluster, taking into account the persistent IP labels previously assigned on the nodes. See Persistent Node IP Labels in Chapter 7: HACMP Configuration Process and Facilities.
  • Enables HACMP to redistribute the alias service IP labels according to the preference you specify.
  • Allows you to configure the type of distribution preference suitable for the VPN firewall external connectivity requirements.
  • Although the service IP labels may move to another network interface, HACMP ensures that the labels continue to be allocated according to the specified distribution preference. That is, the distribution preference is maintained during startup and the subsequent cluster events, such as a fallover, fallback or a change of the interface on the same node. For instance, if you specified the labels to be mapped to the same interface, the labels will remain mapped on the same interface, even if the initially configured service IP label moves to another node.
  • The distribution preference is exercised as long as acceptable network interfaces are available in the cluster. HACMP always keeps service IP labels active, even if the preference cannot be satisfied.
  • For information on the types of distribution preference you can specify in HACMP, see the Planning Guide.

    For information on configuring the distribution preference for service IP labels, see the Administration Guide.

    IP Address Takeover via IP Replacement

    The IP Address Takeover via IP Replacement facility moves the service IP label (along with the IP address associated with it) off a NIC on one node to a NIC on another node, should the NIC on the first node fail. IPAT via IP Replacement ensures that the service IP label that is included as a resource in a resource group in HACMP is accessible through its IP address, no matter which physical network interface card this service IP label is currently placed on.

    If the IP address configuration mechanism is IP Replacement, only one communication interface for that HACMP network must use a route that is the same as the one used by the service IP address.

    In conjunction with IPAT via IP Replacement (also, previously known as traditional IPAT), you may also configure Hardware Address Takeover (HWAT) to ensure that the mappings in the ARP cache are correct on the target adapter.

    Heartbeating over Networks and Disks

    A heartbeat is a type of a communication packet that is sent between nodes. Heartbeats are used to monitor the health of the nodes, networks and network interfaces, and to prevent cluster partitioning.

    Heartbeating in HACMP: Overview

    In order for an HACMP cluster to recognize and respond to failures, it must continually check the health of the cluster. Some of these checks are provided by the heartbeat function. Each cluster node sends heartbeat messages at specific intervals to other cluster nodes, and expects to receive heartbeat messages from the nodes at specific intervals. If messages stop being received, HACMP recognizes that a failure has occurred.

    Heartbeats can be sent over:

  • TCP/IP networks
  • Point-to-point networks
  • Shared disks.
  • The heartbeat function is configured to use specific paths between nodes. This allows heartbeats to monitor the health of all HACMP networks and network interfaces, as well as the cluster nodes themselves.

    The TCP/IP heartbeat paths are set up automatically by RSCT; you have the option to configure point-to-point and disk paths as part of HACMP configuration.

    HACMP passes the network topology you create to RSCT. RSCT Topology Services provides the actual heartbeat service, setting up the heartbeat paths, then sending and receiving the heartbeats over the defined paths. If heartbeats are not received within the specified time interval, Topology Services informs HACMP.

    Heartbeating over TCP/IP Networks

    RSCT Topology Services uses the HACMP network topology to dynamically create a set of heartbeat paths that provide coverage for all TCP/IP interfaces and networks. These paths form heartbeat rings, so that all components can be monitored without requiring excessive numbers of heartbeat packets.

    In order for RSCT to reliably determine where a failure occurs, it must send and receive heartbeat packets over specific interfaces. This means that each NIC configured in HACMP must have an IP label on a separate subnet. There are two ways to accomplish this:

  • Configure heartbeating over IP interfaces. If this method is used, you configure all service and non-service IP labels on separate subnets.
  • Configure heartbeating over IP Aliases. If this method is used, you specify a base address for the heartbeat paths. HACMP then configures a set of IP addresses and subnets for heartbeating, which are totally separate from those used as service and non-service addresses. With this heartbeating method, all service and non-service IP labels can be configured on the same subnet or on different subnets. Since HACMP automatically generates the proper addresses required for heartbeating, all other addresses are free of any constraints.
  • Heartbeating over IP Aliases provides the greatest flexibility for configuring boot (base) and service IP addresses at the cost of reserving a unique address and subnet range that is used specifically for heartbeating.

    Note: Although heartbeating over IP Aliases bypasses the subnet requirements for HACMP to perform the heartbeating function correctly, the existence of multiple routes to the same subnet (outside of HACMP) may produce undesired results for your application. For information on subnet requirements, see Subnet Routing Requirements in HACMP.

    Heartbeating over Point-to-Point Networks

    You can also configure non-IP point-to-point network connections that directly link cluster nodes. These connections can provide an alternate heartbeat path for a cluster that uses a single TCP/IP-based network. They also prevent the TCP/IP software itself from being a single point of failure in the cluster environment.

    Point-to-point networks that you plan to use for heartbeating should be free of any other traffic for the exclusive use by HACMP.

    You can configure non-IP point-to-point heartbeat paths over the following types of networks:

  • Serial (RS232)
  • Target Mode SSA
  • Target Mode SCSI
  • Disk heartbeating (over an enhanced concurrent mode disk).
  • Heartbeating over Disks

    Heartbeating is supported on any shared disk that is part of an enhanced concurrent mode volume group.

    Note: The volume group does not need to be configured as an HACMP resource.

    Heartbeating over an enhanced concurrent mode disk operates with any type of disk—including those that are attached by fibre channel. This avoids the distance limitations (especially when using fibre channel connections) associated with RS232 links, making this solution more cost effective.

    A single common disk serves as the heartbeat path between two cluster nodes. Enhanced concurrent mode supports concurrent read and write access to the non-data portion of the disk. Nodes use this part of the disk to periodically write heartbeat messages and read heartbeats written by the other node.


    PreviousNextIndex