PreviousNextIndex

Chapter 13: Managing the Cluster Topology


This chapter describes how to reconfigure the cluster topology.

The main topics include:

  • Reconfiguring a Cluster Dynamically
  • Dynamic Cluster Topology Changes
  • Viewing the Cluster Topology
  • Configuring Communication Interfaces/Devices to the Operating System on a Node
  • Swapping IP Addresses between Communication Interfaces Dynamically
  • Replacing a PCI Hot-Pluggable Network Interface Card
  • Changing a Cluster Name
  • Changing the Configuration of Cluster Nodes
  • Changing the Configuration of an HACMP Network
  • Changing the Configuration of Communication Interfaces
  • Managing Persistent Node IP Labels
  • Changing the Configuration of a Global Network
  • Changing the Configuration of a Network Module
  • Changing the Configuration of a Site
  • Synchronizing the Cluster Configuration.
  • Reconfiguring a Cluster Dynamically

    When you configure an HACMP cluster, configuration data is stored in HACMP-specific object classes in the Configuration Database (ODM). The AIX 5L ODM object classes are stored in the default system configuration directory (DCD), /etc/es/objrepos.

    You can make certain changes to both the cluster topology and to the cluster resources while the cluster is running. This is called a dynamic reconfiguration (DARE). You can make a combination of resource and topology changes via one dynamic reconfiguration operation.

    All dynamic topology configuration changes allowed in an HACMP configuration are now supported in HACMP/XD configurations that include clusters with sites defined. This includes changes to XD-type (HAGEO or GLVM) networks, interfaces, sites, nodes, and NIM values. HACMP handles the resource groups that have primary and secondary instances (running on nodes at different sites) properly during these dynamic reconfiguration changes. See Chapter 14: Managing the Cluster Resources for information on supported dynamic changes to resources and resource groups.

    If you have dependent resource groups in the cluster, see the section Reconfiguring Resources in Clusters with Dependent Resource Groups in Chapter 14: Managing the Cluster Resources for information on making dynamic reconfiguration changes to the cluster topology.

    Note: No automatic corrective actions take place during a DARE.

    At cluster startup, HACMP copies HACMP-specific ODM classes into a separate directory called the Active Configuration Directory (ACD). While a cluster is running, the HACMP daemons, scripts, and utilities reference the Configuration Database data stored in the active configuration directory (ACD) in the HACMP Configuration Database.

    If you synchronize the cluster topology or cluster resources definition while the Cluster Manager is running on the local node, this action triggers a dynamic reconfiguration event. In a dynamic reconfiguration event, the HACMP Configuration Database data in the Default Configuration Directories (DCDs) on all cluster nodes is updated and the HACMP Configuration Database data in the ACD is overwritten with the new configuration data. The HACMP daemons are refreshed so that the new configuration becomes the currently active configuration.

    The dynamic reconfiguration operation (that changes both resources and topology) progresses in the following order:

  • Releases any resources affected by the reconfiguration
  • Reconfigures the topology
  • Acquires and reacquires any resources affected by the reconfiguration operation.
  • Requirements before Reconfiguring

    Before making changes to a cluster definition, ensure that:

  • HACMP is installed on all nodes.
  • All nodes are up and running HACMP and able to communicate with each other: no node may be in a forced down state.
  • The cluster is stable; no recent event errors or config_too_long messages exist.
  • Synchronizing Configuration Changes

    When you change the topology or the resources of a cluster, you update the data stored in the HACMP Configuration Database in the DCD. For example, when you add an additional network interface to a cluster node, you must add the interface to the cluster definition so that the cluster nodes can recognize and use it.

    When you change the cluster definition on one cluster node, you must also update the HACMP Configuration Databases on the other cluster nodes, a process called synchronization. Synchronization causes the information stored in the DCD on the local cluster node to be copied to the HACMP Configuration Database object classes in the DCD on the other cluster nodes.

    When synchronizing the cluster triggers a dynamic reconfiguration event, HACMP verifies that both cluster topology and cluster resources are correctly configured, even though you may have only changed an element of one of these. Since a change in topology may invalidate the resource configuration, and vice versa, the software checks both.

    Dynamic Cluster Topology Changes

    DARE (Dynamic Reconfiguration) supports resource and topology changes done in one operation. For information on DARE operations in clusters with dependent resource groups, see Reconfiguring Resources in Clusters with Dependent Resource Groups.

    You can make the following changes to the cluster topology in an active cluster, dynamically:

  • Adding or removing nodes
  • Adding or removing network interfaces
  • Swapping a network interface card
  • Changing network module tuning parameters
  • Adding a new Heartbeating over Aliasing network
  • Changing an active network to (but not from) Heartbeating over Aliasing
  • Changing the address offset for a Heartbeating over Aliasing network
  • Adding, changing or removing a network interface or a node configured in a Heartbeating over Aliasing network.
  • All topology configuration changes allowed in an HACMP configuration are now supported in HACMP/XD configurations. Supported changes include changes for XD-type networks, interfaces, sites, nodes, and NIM values. During the dynamic reconfiguration changes, HACMP properly handles resource groups that contain replicated resources (groups that have primary and secondary instances running on nodes at different sites).

    To avoid unnecessary processing of resources, use clRGmove to move resource groups that will be affected by the change before you make the change. When dynamically reconfiguring a cluster, HACMP will release resource groups if this is found to be necessary, and they will be reacquired later. For example, HACMP will release and reacquire the resource group that is using the associated service IP address on a network interface affected by the change to topology.

    Viewing the Cluster Topology

    When you view the cluster topology, you are viewing the HACMP Configuration Database data stored in the DCD, not the data stored in the ACD.

    Note: You can use WebSMIT to view graphical displays of sites, networks, nodes and resource group dependencies.

    For more information on WebSMIT, see Chapter 2: Administering a Cluster Using WebSMIT.

    Before making changes to a cluster topology, view the current configuration.

    To view the cluster topology:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Show HACMP Topology and press Enter.
    SMIT displays the panel with the following options. Each option provides a different view of the cluster.
      3. Select the appropriate option for the task at hand:
    SMIT option
    Description
    Show Cluster Topology
    Provides complete information about the cluster topology including the nodes in the cluster, their interfaces, and the networks that connect them.
    Show Cluster Definitions
    Lists the names of all clusters accessible from this node.
    Show Topology Information by Node
    Provides information about cluster nodes and their interfaces.
    Show Topology Information by Networks
    Provides information about the networks that connect the cluster nodes.
    Show Topology Information by Communication Interface
    Lists the network interfaces defined in the cluster.
    Extended Configuration > Extended Topology Configuration > Configure HACMP Sites > Change/Show a Site
    Provides information about the sites defined in the cluster.

    Using the cltopinfo Command

    You can also use the /usr/es/sbin/cluster/utilities/cltopinfo command to view the cluster topology configuration. See the man page or the description in Appendix C: HACMP for AIX Commands. The command shows all the topology information and you can choose to see it organized by node, network, or interface.

    Managing Communication Interfaces in HACMP

    This section describes the options under the System Management (C-SPOC) > HACMP Communication Interface Management SMIT menu:

  • Configuring Communication Interfaces/Devices to the Operating System on a Node
  • Updating HACMP Communication Interfaces/Devices with AIX 5L Settings
  • Swapping IP Addresses between Communication Interfaces Dynamically
  • Hot-Replacing a PCI Network Interface Card.
  • Configuring Communication Interfaces/Devices to the Operating System on a Node

    You can configure communication interfaces/devices to AIX 5L without leaving HACMP SMIT, by using the System Management (C-SPOC) > HACMP Communication Interface Management SMIT path.

    To configure communication interfaces/devices to the operating system on a node:

      1. Enter the fastpath smit hacmp
      2. In SMIT, select System Management (C-SPOC) > HACMP Communication Interface Management > Configure Communication Interfaces/Devices to the Operating System on a Node and press Enter.
    A picklist with node names appears.
      3. Select a node on which to configure a network interface or device from the picklist.
      4. Select a communication interface or a device type and press Enter:
    Network Interfaces
    This option leads to the AIX 5L configuration SMIT menus for a particular node. Each network interface must be defined to the operating system before it can be used by HACMP. It is equivalent to running smitty mktcpip.
    RS232 Devices
    This option leads to the AIX 5L configuration SMIT menus for a particular node. Each TTY device must be defined to the operating system before it can be used by HACMP. It is equivalent to running smitty tty.
    Target-Mode SCSI Devices
    This option leads to the AIX 5L configuration SMIT menus for a particular node. Each target-mode SCSI device must be defined to the operating system before it can be used by HACMP. It is equivalent to running smitty scsia.
    Target-Mode SSA Devices
    This option leads to the AIX 5L configuration SMIT menus for a particular node. Each target-mode SSA device must be defined to the operating system before it can be used by HACMP. It is equivalent to running smitty ssaa.
    X.25 Communication Interfaces
    This option leads to the AIX 5L configuration SMIT menus for a particular node. Each X.25 Communication Interface device must be defined to the operating system before it can be used by HACMP.
    SNA Communication Links
    This option leads to the AIX 5L configuration SMIT menus for a particular node. Each SNA Communication Link device must be defined to the operating system before it can be used by HACMP.
    Physical Disk Devices
    This option leads to the AIX 5L configuration SMIT menus for a particular node. Each physical disk device must be defined to the operating system before it can be used by HACMP.
      5. To finish configuring the communication interface or device on a node, fill in the fields in the corresponding AIX 5L SMIT panel that will open. For instructions, see the AIX 5L System Administration Guide.

    Updating HACMP Communication Interfaces/Devices with AIX 5L Settings

    When you define communication interfaces/devices by entering or selecting an HACMP IP label or device, HACMP discovers the associated AIX 5L network interface name. HACMP expects this relationship to remain unchanged. If you change the IP Label/Address associated with the AIX 5L network interface after configuring and synchronizing the cluster, HACMP will not function correctly.

    If this problem occurs, you can reset the network interface IP Label/Address with the AIX 5L settings using the SMIT HACMP System Management (C-SPOC) menu.

    Use this SMIT selection to update HACMP after you make any changes to the underlying AIX 5L configuration of the mapping of a network interface to an IP Label/Address. For example, you should update HACMP after modifying the nameserver or /etc/hosts.

    You must stop cluster services, make the change, and then restart cluster services to apply it to the active configuration. You cannot make these changes dynamically.

    To update HACMP with new AIX 5L settings:

      1. Stop cluster services on the node where you are running the update.
      2. Enter smit hacmp
      3. In SMIT, select System Management (C-SPOC) > HACMP Communication Interface Management > Update HACMP Communication Interfaces Communication Interface with Operating System Settings and press Enter.
    A picklist with node names appears.
      4. Select a node on which to run the utility and press Enter.
    The update automatically calls commands to explicitly re-populate the HACMPadapter Configuration Database with the updated entries and then explicitly re-syncs the HACMPadapter class only.
      5. Start cluster services.

    Swapping IP Addresses between Communication Interfaces Dynamically

    As a systems administrator, you may at some point experience a problem with a network interface card on one of the HACMP cluster nodes. If this occurs, you can use the dynamic communications interface swap feature to swap the IP address of an active service communication interface with the IP address of another active, available communication interface on the same node and network. Cluster services do not have to be stopped to perform the swap.

    You can use this feature to move an IP address off of a NIC that is behaving erratically without shutting down the node. It can also be used if a hot pluggable communication device is being replaced on the node. Hot pluggable NICs can be physically removed and replaced without powering off the node.

    This feature can also be used to move the persistent IP label to another network interface.

    If hardware address swapping is enabled, the hardware address will be swapped along with the IP address.

    Restrictions on IP Address Swapping

    Note the following restrictions:

  • The dynamic communications interface swap is not allowed for service IP labels that are configured on networks using IP aliases.
  • The dynamic communications swap feature is not supported on the SP switch network.
  • The dynamic IP address swap can only be performed within a single node. To move an IP address to another node, move its resource group using the clRGmove Resource Group Management utility. See Chapter 15: Managing Resource Groups in a Cluster.
  • The dynamic IP address Swap IP function is not supported for IP addresses on XD_data networks (these networks are used in clusters with sites that run HACMP/XD for GLVM or HAGEO). If you perform this action on an XD_data network, the cluster may run an incorrect network_down event, which could generate an incorrect rg_move event. If there is no takeover node available for the resource group, then it may be moved into the OFFLINE state, rendering the resources unavailable.
  • Procedure for Swapping an IP Address Dynamically

    Make sure that no other HACMP events are running before swapping a network interface.

    To dynamically swap an IP address between communication interfaces:

      1. Enter smit hacmp
      2. In SMIT, select System Management (C-SPOC) > HACMP Communication Interface Management > Swap IP Addresses Between Communication Interfaces and press Enter.
    SMIT displays a list of available service interfaces. It also displays those interfaces that have persistent labels placed on them, but are not hosting service IP labels. This allows you to move the persistent label to another interface.
      3. Select the service communication interface you want to remove from cluster use, and press Enter.
    SMIT displays a list of available non-service interfaces.
      4. Select a non-service interface and press Enter.
    The Swap IP Addresses Between Communication Interfaces menu appears.
      5. Verify the service IP label, and the non-service IP label you have chosen. If this is correct, press Enter.
    SMIT prompts you to confirm that you want to do this operation.
      6. Press Enter only if you are sure you want to swap the communication interface.

    After the swapping of IP addresses between communication interfaces, the service address becomes an available non-service interface. At this point, you can take action to repair the faulty network interface card. If you have a hot pluggable network interface card, you can replace it while the node and cluster services are up. Otherwise, you will have to stop cluster services and power down the node to replace it.

    If you have a hot pluggable network interface card, HACMP makes the interface unavailable when you pull it from the node. When the new card is placed in the node, the network interface is incorporated into the cluster as an available non-service IP label again. You can then use the dynamic network interface swap feature again to swap the IP address back to the original network interface.

    If you need to power down the node to replace the faulty network interface card, HACMP will configure the service and non-service addresses on their original communication interfaces when cluster services are restarted. You do not need to use the dynamic network interface swap feature again to swap the interfaces. HACMP does not record the swapped interface information in the AIX 5L Configuration Database (ODM). Therefore, the changes are not persistent across system reboots or cluster restarts.

    Replacing a PCI Hot-Pluggable Network Interface Card

    This section takes you through the process of replacing a PCI hot plug network interface card.

    Special Considerations

    Keep the following in mind before you replace a hot-pluggable PCI network interface card:

  • Be aware of the following consideration: If a network interface you are hot-replacing is the only available keepalive path on the node where it resides, you must shut down HACMP on this node in order to prevent a partitioned cluster while the interface is being replaced.
  • SMIT gives you the option of stopping cluster services on this node with resource groups brought offline. From this point, you can manually hot-replace the network interface card.
  • Hot-replacement of Ethernet, Token-Ring, FDDI, and ATM network interface cards is supported. This process is not supported for non-IP communication devices.
  • You should manually record the IP address settings of the network interface being replaced to prepare for unplanned failures.
  • You should not attempt to change any configuration settings while the hot replacement is in progress.
  • To avoid a network failure when using multiple dual-port Ethernet adapter cards on the same node for a particular network, you must configure the interfaces on different physical dual-port Ethernet adapter cards.
  • Note: Hot-replacement of the dual-port Ethernet adapter used to configure two interfaces for one HACMP IP network is currently not supported

    Hot-Replacing a PCI Network Interface Card

    The SMIT interface simplifies the process of replacing a hot-pluggable PCI network interface card. HACMP supports only one PCI hot plug network interface card replacement via SMIT at one time per node.

    Note: If the network interface was alive before the replacement process began, then between the initiation and completion of the hot-replacement, the interface being replaced is in a maintenance mode. During this time, network connectivity monitoring is suspended on the interface for the duration of the replacement process.

    Scenario 1 (Live NICs Only)

    Follow the procedure below when hot-replacing the following:

  • A live PCI network service interface in a resource group and with an available non-service interface
  • A live PCI network service interface not in a resource group and with an available non-service interface
  • A live PCI network boot interface with an available non-service interface.
    1. 1. Go to the node on which you want to replace a hot-pluggable PCI network interface card.
      2. Type smit hacmp
      3. In SMIT, select System Management (C-SPOC) > HACMP Communication Interface Management > PCI Hot Plug Replace a Network Interface Card and press Enter.

    SMIT displays a list of available PCI network interfaces that are hot-pluggable.

      4. Select the network interface you wish to hot-replace. Press Enter. The service address of the PCI interface is moved to the available non-service interface.
      5. SMIT prompts you to physically replace the network interface card. After you have replaced the card, you are asked to confirm that replacement has occurred.

    If you select yes, the service address will be moved back to the network interface that has been hot-replaced. On aliased networks, the service address will not move back to the original network interface, but will remain as an alias on the same network interface. The hot-replacement is complete.

    If you select no, you must manually reconfigure the interface settings to their original values:

      a. Run the drslot command to take the PCI slot out of the removed state.
      b. Run mkdev on the physical interface.
      c. Use ifconfig manually as opposed to smit chinet, cfgmgr, or mkdev in order to avoid configuring duplicate IP addresses or an unwanted boot address.

    Scenario 2 (Live NICs Only)

    Follow the procedure below when hot-replacing a live PCI network service interface on a resource group but with no available non-service interface

      1. Go to the node on which you want to replace a hot-pluggable PCI network interface card.
      2. Enter smit hacmp
      3. Select System Management (C-SPOC) > HACMP Communication Interface Management > PCI Hot Plug Replace a Network Interface Card and press Enter.

    SMIT displays a list of available PCI network interfaces that are hot-pluggable.

      4. Select the network interface you wish to hot-replace and press Enter.
    SMIT prompts you to choose whether to move the resource group to another node during the replacement process in order to ensure its availability.
      5. If you choose to do this, SMIT gives you the option of moving the resource group back to the node on which the hot-replacement took place after completing the replacement process.

    If you do not move the resource group to another node, it will be offline for the duration of the replacement process.

      6. SMIT prompts you to physically replace the card. After you have replaced the network interface card, you are asked to confirm that replacement has occurred.

    If you select Yes, the hot-replacement is complete.

    If you select No, you must manually reconfigure the interface settings to their original values:

      a. Run the drslot command to take the PCI slot out of the removed state.
      b. Run mkdev on the physical interface.
      c. Use ifconfig manually as opposed to smit chinet, cfgmgr, or mkdev in order to avoid configuring duplicate IP addresses or an unwanted boot address.
      d. (If applicable) Move the resource group back to the node from which you moved it in step 5.

    Scenario 3 (Non-alive NICs Only)

    Follow the procedure below when hot-replacing the following:

  • A non-alive PCI network service interface in a resource group and with an available non-service interface
  • A non-alive PCI network service interface not in a resource group and with an available non-service interface
  • A non-alive PCI network boot interface with an available non-service interface.
    1. 1. Go to the node on which you want to replace a hot-pluggable PCI network interface card.
      2. Enter smit hacmp
      3. Select System Management (C-SPOC) > HACMP Communication Interface Management > PCI Hot Plug Replace a Network Interface Card and press Enter.

    SMIT displays a list of available PCI network interfaces that are hot-pluggable.

      4. Select the network interface you wish to hot-replace. Press Enter.
    SMIT prompts you to physically replace the network interface card.
      5. After you have replaced it, SMIT prompts you to confirm that replacement has occurred.

    If you select yes, the hot-replacement is complete.

    If you select no, you must manually reconfigure the interface settings to their original values:

      a. Run the drslot command to take the PCI slot out of the removed state.
      b. Run mkdev on the physical interface.
      c. Use ifconfig manually as opposed to smit chinet, cfgmgr, or mkdev in order to avoid configuring duplicate IP addresses or an unwanted boot address.

    Service Interface Failure During Hot-Replacement

    While an interface is unavailable during its replacement, HACMP continues processing events that occur during this time.

    Consider, for example, where a node in a cluster has a service interface (Interface A) and an available non-service interface (Interface B) on the same network. If you want to hot-replace Interface A, the service network address will first be swapped to Interface B.

    Behavior of Interface B while Interface A Being Hot-Replaced 
    

    Now consider that Interface B (now the service interface) fails while the hot-replace of Interface A is in progress. If there is another available non-service interface (C), HACMP does a swap of Interface B to Interface C. When the hot-replacement is finished, the service network settings are swapped from Interface C back to Interface A (the newly replaced interface), and Interface C is reconfigured to non-service settings.

    Behavior of Interface C while Interface A Being Hot-Replaced and Interface B Fails 
    

    If there are no extra available non-service interfaces, then between the time Interface B (the service interface) fails and the replacement of Interface A is complete, the node has no network connectivity on that network. In this case, if there are no other sufficient network paths alive on the node for keepalive traffic, a partitioned cluster results. If there are sufficient other network paths alive for keepalive traffic, then a local network failure event is generated for the network to which Interfaces A and B belong.

    Any resource group dependent on a service interface in that same network moves to another node, thus the service address moves with the resource group. Following hot plug replacement, Interface A (the newly replaced interface) is reconfigured to a non-service address not currently used on that node and network.

    Hot-Replacing an ATM Network Interface Card

    ATM network interface cards support multiple logical interfaces on one network interface card. An ATM network interface hot- replacement is managed the same as other network interface cards, with the following exceptions:

  • All logical interfaces configured on the card being replaced that are not configured for and managed by HACMP are lost during the replacement process. They will not be reconfigured on the newly replaced ATM entered interface card. All other logical interfaces configured for and managed by HACMP on the ATM network interface card being replaced are restored when the replacement is complete.
  • Since it is possible to have more than one service interface configured on an ATM network interface card—thus multiple resource groups on one ATM network interface—when you hot-replace an ATM network interface card, SMIT leads you through the process of moving each resource group on the ATM interface, one at a time.
  • Recovering from PCI Hot Plug Network Interface Card Failure

    If an unrecoverable error causes the hot-replacement process to fail, HACMP may be left in a state where your network interface is unconfigured and still in maintenance mode. To recover from this, manually fix the script, then run smit clruncmd to remove any maintenance modes that are still set. You can also use ifconfig to reconfigure the network settings of the interface.

    Changing a Cluster Name

    When changing the name of a cluster, you must stop cluster services, make the change, and then restart cluster services to apply it to the active configuration. You cannot make these changes dynamically.

    To change a cluster’s name:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure an HACMP Cluster > Add/Change/Show an HACMP Cluster, and press Enter.
    SMIT displays the cluster definition with the current value for the cluster name filled in.
      3. Enter the name change. A cluster name can include alphabetic and numeric characters and underscores; it cannot begin with a numeric. Use no more than 32 characters.
      4. After the command completes, return to the HACMP SMIT menus to perform further topology reconfiguration or to synchronize the changes you made. To synchronize the cluster topology, return to the Extended Configuration panel and select the Extended Verification and Synchronization option.

    Changing the Configuration of Cluster Nodes

    As the system administrator of an HACMP cluster, you may need to perform any of the following tasks relating to cluster nodes:

  • Adding one or more nodes to the cluster
  • Removing a node from the cluster
  • Changing the attributes of a cluster node.
  • Adding a Cluster Node to the HACMP Configuration

    You can add a node to an active cluster dynamically. You do not need to stop and restart cluster services on the already-participating cluster nodes for the new node to become part of the cluster.

    Take the following steps on any active cluster node (called the local node from here on), to add the new node to the cluster topology definition:

      1. Enter smit hacmp.
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Nodes > Add a Node to the HACMP Cluster and press Enter.
    SMIT displays the Add a Node to the HACMP Cluster panel.
      3. Enter the name of the node (or nodes) that you want to add to the cluster. A node name can include alphabetic and numeric characters and underscores, but cannot have a leading numeric. Use no more than 32 characters. Separate multiple names with spaces. If you specify a duplicate node name, the operation fails. Press Enter to add the node or nodes to the cluster definition.
      4. Optionally, you can add a Communication Path. Press the F4 key to see the picklist that displays the contents of /etc/hosts. Enter one resolvable IP Label/Address (may be the hostname), IP address, or Fully Qualified Domain Name for the node. This path will be taken to initiate communication with the node. Examples are: “NodeA”, “10.11.12.13”, and “NodeC.ibm.com”.
      5. After the command completes, return to the HACMP SMIT menus to perform further topology reconfiguration or to synchronize the changes you made. To synchronize the cluster, return to the Extended Configuration panel and select the Extended Verification and Synchronization option (You can wait and synchronize after doing the resource configuration. if you prefer).
      6. On the newly added node, start cluster services to integrate it into the cluster.

    Adding Nodes to a Resource Group

    Once you have added the new node to the cluster topology, you can continue by adding the new node (or nodes) to the list of participating nodes in a resource group.

    In a non-concurrent resource group with the startup policy of either Online on Home Node Only or Online on First Available Node, if you give the new node the highest priority by specifying it first in the list of participating nodes, the newly added node will acquire control of the resource group when you start up cluster services on this node. This can be useful when you want the new node to take over a specific resource. For example, you may be adding a high-powered node to a cluster that runs a heavily used database application and you want this application to run on the newly added node.

    Warning: When adding a node to a cluster with a resource group that has disk fencing enabled, add the node to the concurrent resource group immediately. All nodes in a concurrent access cluster must participate in the concurrent access resource group. Include the new node in this resource group immediately to avoid the possibility of unrecoverable data loss.

    When you are finished adding the node to a resource group:

      1. Synchronize the cluster. Return to the Extended Configuration panel and select the Extended Verification and Synchronization option.
      2. When you press Enter, the cluster resources are dynamically reconfigured.

    Removing a Cluster Node from the HACMP Configuration

    You can remove a node from an active cluster dynamically. However, before removing a node from the cluster, you must remove the node from any resource groups it participates in and synchronize resources.

    To remove a cluster node:

      1. Stop cluster services on the node to be removed (usually this is done by stopping cluster services with the Move Resource Groups option).
      2. On another active node, enter smit hacmp
      3. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Nodes > Remove a Node in the HACMP Cluster. SMIT displays a list of all cluster nodes.
      4. Select the node you want to remove and press Enter. SMIT prompts you to confirm that you want to proceed. Press Enter again to remove the node from the cluster.
    Note: When you remove a node from the cluster topology, all communication path information associated with the node is also removed, its resources are released and reacquired, and the node is removed from the resource configuration.
      5. On the local node, return to the Extended Configuration panel and select the Extended Verification and Synchronization option to synchronize the cluster. When the synchronization completes, the node is removed from the cluster definition.

    Changing the Name of a Cluster Node

    When changing the name of a cluster node, you must stop cluster services, make the change, and then restart cluster services to apply it to the active configuration.

    To change the name of a cluster node:

      1. Enter smit hacmp
      2. Select the following options: Extended Configuration > Extended Topology Configuration > Configure HACMP Nodes > Change/Show a Node in the HACMP Cluster and press Enter.
    SMIT displays a picklist of cluster nodes.
      3. Make your selection and press Enter.
    SMIT displays the current node name.
      4. Enter the new name for the node in the New Node Name field. A node name can include alphabetic and numeric characters and underscores, but cannot have a leading numeric. Use no more than 32 characters. When you finish entering data, press Enter. SMIT makes the changes you specified.
      5. After the command completes, return to the HACMP SMIT menus to perform further topology reconfiguration or to synchronize the changes you made. To synchronize the cluster topology, return to the Extended Configuration panel and select the Extended Verification and Synchronization option.
    The change is propagated through both the cluster topology and resource configuration.

    Changing the Configuration of an HACMP Network

    As the system administrator of an HACMP cluster, you may need to perform any of the following tasks relating to cluster networks:

  • Adding a Network
  • Changing Network Attributes
  • Removing an HACMP Network
  • Converting an HACMP Network to use IP Aliasing
  • Establishing Default and Static Routes on Aliased Networks
  • Converting an SP Switch Network to an Aliased Network
  • Disabling IPAT via IP Aliases
  • Controlling Distribution Preferences for Service IP Label Aliases.
  • Adding a Network

    See Configuring HACMP Networks and Heartbeat Paths in Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).

    Changing Network Attributes

    You can change attributes of both IP and non-IP networks. You cannot change network attributes dynamically.

    Changing the Network Attribute to Private for Oracle Inter-Node Communication

    ORACLE uses the private network attribute setting to select networks for Oracle inter-node communications. This attribute is not used by HACMP and will not affect HACMP in any way.The default attribute is public.

    Changing the network attribute to private makes the network Oracle-compatible by changing all interfaces to service (as well as changing the attribute in HACMPnetwork ODM).

    To configure private networks for use by Oracle:

      1. Configure the network and add all interfaces. You cannot change the attribute if the network has no interfaces.
      2. Change the network attribute to private. See Steps for Changing an IP-Based Network below.
      3. Private networks must have either all boot or all service interfaces. If the network has all boot interfaces (the default when using discovery) HACMP converts these interfaces to service. (Oracle only looks at service interfaces.)
      4. Synchronize the cluster after changing the attribute.
    Note: Once you define the network attribute as private you cannot change it back to public. You have to delete the network and then redefine it to HACMP. (It defaults to public.)

    Steps for Changing an IP-Based Network

    To change the name or an attribute of an HACMP network:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Networks > Change/Show a Network in the HACMP Cluster and press Enter.
      3. Select the IP-based network to change. (The following panels depend on the type of network you selected to change.)
      4. SMIT displays the Change/Show a Network in the HACMP Cluster panel. You can change the name and the mechanism for configuring the IP address on the network interface assigned to this network in the corresponding fields:
    Network Name
    The current name of the network is displayed.
    New Network Name
    The new name for this network. Use no more than 32 alphanumeric characters and underscores; do not use a leading numeric.
    Network Type
    Listed according to the chosen network.
    Netmask
    The netmask of the selected network is displayed, for example, 255.255.255.0.
    Enable IP Address Takeover via IP Aliases
    The value in this field determines the mechanism by which an IP Address will be configured onto a network interface.
    By default, if the network and selected configuration supports adding an IP Alias to a network interface, it is set to Yes. Otherwise, it is No.
    If you explicitly want to use IPAT via IP Replacement, set this field to No. IP Replacement is the mechanism by which one IP address is first removed from, and then another IP address is added to, a single network interface.
    Note that this field is set to No by default after a migration of an SP Switch network that was configured to use IPAT via IP Replacement in HACMP.
    IP Address Offset for Heartbeating over IP Aliases
    The base address of a private address range for heartbeat addresses, for example 10.10.10.1. HACMP will use this address to automatically generate IP addresses for heartbeating for each boot interface in the configuration.
    Refer to the Planning Guide and your planning worksheet for more information on selecting a base address for use by Heartbeating over IP Aliases.
    Clear this entry to use the default heartbeating method.
    Network Attribute
    public is the default. Use private for Oracle.
      5. Press Enter to change the definition of the network.
      6. On the same node, synchronize the cluster configuration.

    Steps for Changing Serial Devices

    To change an attribute of a serial device:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Networks > Change/Show a Network in the HACMP Cluster and press Enter.
    SMIT displays a list of serial devices.
      3. Select the serial device to change. (The following panels depend on the type of network you selected to change.)
      4. Make the changes in the fields on the Change/Show a Serial Device in the HACMP Cluster panel as follows:
    Network Name
    The current name of the network is displayed.
    New Network Name
    The new name for the network.
    Network Type
    Valid types are RS232, tmssa, tmscsi, diskhb.
      5. Press Enter to change the definition of the network.
      6. On the same node, synchronize the cluster configuration.

    Removing an HACMP Network

    Note: Deleting all network interfaces associated with a network deletes the network definition from HACMP.

    To remove a network from the HACMP cluster definition:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Networks > Remove a Network from the HACMP Cluster and press Enter.
    SMIT displays the Select a Network to Remove panel.
      3. Select the network to remove.
    SMIT displays Are you sure?
      4. Press Enter to remove the network. All of this network’s subnets and their interfaces are removed from the HACMP configuration.
      5. On the same node, synchronize the cluster configuration.
    If the Cluster Manager is running on the local node, the synchronization triggers a dynamic reconfiguration event. See Synchronizing the Cluster Configuration for more information.

    Converting an HACMP Network to use IP Aliasing

    If you want to change the cluster network configuration to use IPAT via Aliases instead of the previous IPAT via IP Replacement scheme for a specific network in the cluster, you should stop the cluster services on all nodes to make the change. This change is not allowed during a dynamic reconfiguration (DARE) of cluster resources.

    Note: If you have an SP Switch network that has been configured in your cluster in HACMP prior to version 5.1, and want to convert the SP Switch to use the IP aliasing in HACMP, see the section Converting an SP Switch Network to an Aliased Network.

    To convert an HACMP network to use IP Aliasing:

      1. Stop the cluster services on all cluster nodes.
      2. Verify that no HACMP interfaces are defined with HWAT on that network.
      3. Verify that the network is configured to support gratuitous ARP in HACMP, by checking the Extended Configuration > Extended Topology Configuration > Configure an HACMP Network Module > Show a Network Module SMIT panel for the Gratuitous ARP setting for that network type.
      4. To change the cluster network to use IPAT via IP Aliases instead of IPAT via IP Replacement, see the steps in this chapter in the section Changing Network Attributes.
      5. Verify and synchronize the cluster.
      6. Restart the cluster services.

    For more information on IPAT via IP Aliases see the relevant chapters in the Concepts and Facilities Guide and in the Planning Guide.

    Establishing Default and Static Routes on Aliased Networks

    If you are setting up or converting to an IP aliased network and require establishing the default route, and possibly, other static routes that have to be established on the IP aliased service subnet, these routes will fail to be established automatically when the rc.net file runs at boot time. This is because there is no address on that subnet in the Configuration Database.

    To ensure that these routes are established at boot time, we recommend that you also configure a persistent address on that subnet. After configuring the persistent address, HACMP configures the routes.

    If you do not configure persistent addresses, then you should use your own scripts that will configure routes on aliased service subnets. For more information on the rc.net file see Chapter 1: Administering an HACMP Cluster and Chapter 9: Starting and Stopping Cluster Services.

    Converting an SP Switch Network to an Aliased Network

    When you migrate your cluster to HACMP 5.4 from a version prior to HACMP 5.1, your previously configured SP Switch network configuration remains valid. However, after migration, HACMP by default treats your network as non-aliased, although in reality it functions as an aliased network. Therefore, you may consider reconfiguring the existing SP Switch network configuration.

    To change the cluster configuration to use IPAT via IP Aliases instead of the standard IPAT via IP Replacement scheme for the SP Switch network, stop the cluster services on all nodes and make the following changes to the communication interface definitions. Such changes are not allowed during a dynamic reconfiguration (DARE) of cluster resources.

    To convert the SP Switch network to an aliased network, perform the following steps on all cluster nodes:

      1. Stop the cluster services.
      2. Enter smit hacmp
      3. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Communication Interfaces/Devices > Remove Communication Interfaces/Devices and press Enter.
      4. Remove the non-service interfaces on the SP Switch network.
      5. Remove the boot label that you had previously configured within HACMP for your SP switch network.
      6. Go back to the Extended HACMP Verification and Synchronization option and synchronize the cluster. If the cluster network meets all the requirements of an aliased network, the following message appears: “Setting attribute for network <name> to use IP Aliasing for IP address takeover”.
      7. In SMIT, select the Extended Configuration > Extended Topology Configuration > Configure HACMP Communication Interfaces/Devices and configure the base interface address on the SP Switch network as a boot IP label/address in HACMP.
      8. Put the service IP-based communication interface on a different subnet than the boot interface to avoid errors during the verification process. If you have multiple service addresses they should all be on a different subnet than the boot interface.
      9. Verify that HWAT is disabled for all communication interfaces on this network.
      10. Verify that the network is configured to support gratuitous ARP in HACMP, by checking the Gratuitous ARP setting for that network type. See instructions in the section Changing the Tuning Parameters to Custom Values.
      11. In the Change/Show a Network in the HACMP Cluster SMIT panel, set the Enable IP Address Takeover via IP Aliases field to Yes for this network.
      12. Synchronize the cluster. HACMP verifies the configuration.
      13. Restart the cluster services.

    For more information on the SP Switch considerations, see Chapter 3: Planning Cluster Network Connectivity in the Planning Guide.

    For more information on IPAT via IP Aliases see Concepts and Facilities and Chapter 3 in the Planning Guide.

    Disabling IPAT via IP Aliases

    If the network supports gratuitous ARP, you can configure the network in the HACMP cluster to use IPAT via IP Aliases during fallover.

    There are subtle differences between the operation of a network using IP aliasing and one that does not. If you need to troubleshoot problems with external network equipment, clients, or applications, you may want to disable IPAT via IP Aliases on the network and use IPAT via IP Replacement instead.

    To disable IPAT via IP Aliases facility for the entire network type:

      1. In the SMIT Change/Show a Network in the HACMP Cluster panel, set the Enable IP Address Takeover via IP Aliases field to No for this network.
      2. Change the service IP labels to be on different subnets.
      3. Press Enter to accept the changes.
      4. Synchronize the cluster configuration.

    Controlling Distribution Preferences for Service IP Label Aliases

    To control the placement of the service IP label aliases on the cluster node physical network interface cards, you can configure a distribution preference for the aliases of the service IP labels that are placed under HACMP control.

    See the section Distribution Preference for Service IP Label Aliases: Overview in Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).

    Changing the Configuration of Communication Interfaces

    As a system administrator, you may need to perform any of the following tasks relating to cluster network interfaces:

  • Configuring Multiple Logical Interfaces on the Same ATM NIC
  • Adding HACMP Communication Interfaces/Devices
  • Removing a Communications Interface from a Cluster Node.
  • Configuring Multiple Logical Interfaces on the Same ATM NIC

    You can configure multiple logical network interfaces as HACMP communication interfaces, where all logical interfaces belong to the same physical ATM NIC, and each is defined as Classic IP or LANE. The cluster behaves as if it were configured on the same set of logical interfaces and each interface type is defined on a separate ATM NIC.

    For more information on this functionality, refer to Chapter 3 in the Planning Guide.

    Adding HACMP Communication Interfaces/Devices

    You can add a network communication interface to an active cluster dynamically. You do not need to stop and restart cluster services for the network communication interface to become part of the cluster.

      1. On the node getting the new network interface card, complete the prerequisite tasks:
  • Install the new network interface card.
  • Configure the new logical network interface to AIX 5L.
    1. 2. On all cluster nodes, update the /etc/hosts file to include the IP address of the new network interface.
      3. On any cluster node, add the HACMP communication interface to the cluster topology definition.
      4. Synchronize the cluster.

    Adding a Communication Interface to an IP-Based Network

    See Configuring Communication Interfaces/Devices to HACMP in Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).

    Adding a Communication Device to a Non IP-Based Network

    See Configuring Communication Interfaces/Devices to HACMP in Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).

    Changing Communication Interface/Device Attributes

    You cannot change the attributes of a communication interface or device dynamically. You must stop and restart cluster services to make the changed configuration the active configuration.

    To change a communication interface or serial device for the cluster:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Communication Interfaces/Devices > Change/Show HACMP Communication Interfaces/Devices and press Enter.
      3. Select the IP communication interface or the serial device from the picklist.
    Attributes for a communication interface include:
    Node Name
    The name of the node on which this network interface physically exists.
    Network Interface
    The network interface associated with the communication interface.
    IP Label/Address
    The IP label/address associated with this communication interface that will be configured on the network interface when the node boots. The picklist filters out IP labels/addresses already configured to HACMP.
    Network Type
    The type of network media/protocol (Ethernet, Token Ring, fddi, ATM) Select the type from the predefined list of network types.
    Network Name
    A unique name for this logical network.

    Attributes for a serial device are as follows:
    Node Name
    Define a node name for all serial service devices.
    Device Name
    Enter a device file name.
    • RS232 serial devices must have the device file name /dev/ttyn.
    • Target mode SCSI serial devices must have the device file name /dev/tmscsin.
    • Target mode SSA devices must have the device file name /dev/tmssan.
    For disk heartbeating, any disk device in an enhanced concurrent volume group is supported. It could be an hdisk or vpath, for example /dev/hdiskn.
    n = the number of the device.
    Device Path
    For example, /dev/tty0
    Network Type
    This field is automatically filled in (RS232, tmssa, tmscsi, or diskhb) depending on the device name.
    Network Name
    This field is automatically filled in.

      4. Press Enter after making the change (such as a new network name). HACMP now checks the validity of the configuration. You may receive warnings if a node cannot be reached.
      5. Return to the Extended Configuration menu and select the Extended Verification and Synchronization option. If the configuration is verified and synchronized, proceed to the next step.
      6. Restart cluster services.
    The change is propagated through the cluster. Cluster resources are modified, as specified.

    Removing a Communications Interface from a Cluster Node

    You can remove an HACMP communications interface from an active cluster dynamically; you do not need to stop and restart cluster services.

    Note: Deleting all communications interfaces associated with a network deletes the network from HACMP.

    To remove a communications interface from a cluster node:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Communication Interfaces/Devices > Remove HACMP Communication Interfaces/Devices and press Enter.
      3. Select the IP communication interface(s) or the serial device(s) from the picklist and press Enter.

    When you remove a communications interface/device, all information associated with the interface/device is removed from the Configuration Database. SMIT prompts you to confirm that you want to do this operation. Press Enter again only if you are sure you want to remove the interface/device and its associated information.

      4. On the same node, synchronize the cluster. If the Cluster Manager is running on the local node, the synchronization triggers a dynamic reconfiguration event. See Synchronizing the Cluster Configuration for more information.
    When the synchronization completes, the selected communications interfaces/devices are removed from the cluster topology definition.

    Managing Persistent Node IP Labels

    This section describes the following tasks:

  • Configuring Persistent Node IP Labels/Addresses
  • Changing Persistent Node IP Labels
  • Deleting Persistent Node IP Labels.
  • Configuring Persistent Node IP Labels/Addresses

    To configure persistent node IP labels on a specified node:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Persistent Node IP Labels/Addresses > Add a Persistent Node IP Label/Address and press Enter.
      3. Select a cluster node.
      4. Enter the field values as follows:
    Node Name
    The name of the node on which the IP Label/Address will be bound.
    Network Name
    The name of the network on which the IP Label/Address will be bound.
    Node IP Label/Address
    The IP Label/Address to keep bound to the specified node.
      5. Press Enter. The resulting SMIT panel displays the current node name and persistent node IP labels defined on IP networks on that node.

    Changing Persistent Node IP Labels

    To change or view persistent node IP labels configured on a specified node:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Persistent Node IP Labels/Addresses > Change/Show a Persistent Node IP Label/Address and press Enter
      3. Enter field values as follows:.
    Node Name
    The name of the node on which the IP Label/Address will be bound.
    New Node Name
    The new node name for binding the IP Label/Address.
    Network Name
    The name of the network on which the IP Label/Address will be bound.
    Node IP Label/Address
    The IP Label/Address to keep bound to the specified node.
    New Node IP Label/Address
    The new IP Label/Address to be bound to the specified node.
      4. Press Enter. The resulting SMIT panel displays the current node name and persistent node IP labels defined on IP networks on that node.

    Deleting Persistent Node IP Labels

    To delete persistent node IP labels configured on a specified node,

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Persistent Node IP Labels/Addresses > Remove a Persistent Node IP Label/Address.
      3. Press Enter.
    HACMP deletes the persistent node IP label from the node.

    Changing the Configuration of a Global Network

    Configuring a global network informs HACMP how different HACMP networks are connected to one another. This is commonly required when defining SP Ethernet networks that span subnets. You can group multiple HACMP networks of the same type under one logical global network name. This reduces the probability of network partitions that can cause the cluster nodes on one side of the partition to go down.

    Networks combined into a global network cannot use IP Address Takeover (like the SP Ethernet).

    The definition of a global network changes when you add or remove existing HACMP networks to or from the global network.

    Adding an HACMP Network to a Global Network

    To add a network to the global network definition:

      1. Enter smit hacmp
      2. Select Extended Configuration > Extended Topology Configuration > Configure HACMP Global Networks and press Enter.
    SMIT displays a picklist of defined HACMP networks.
      3. Select an HACMP network and press Enter.
    SMIT displays the Change/Show a Global Network panel. The name of the network you selected is entered as the local network name.
      4. Enter the name of the global network (character string) and press Enter.
      5. Repeat these steps to define any new HACMP networks to be included in each global network.

    Removing an HACMP Network from a Global Network

    To remove a network from the global network definition, complete the following steps:

      1. Enter smit hacmp
      2. Select Extended Configuration > Extended Topology Configuration > Configure Global Networks and press Enter.
    SMIT displays a picklist of defined HACMP networks.
      3. Select the network to remove and press Enter.
    SMIT displays the Change/Show a Global Network panel. The name of the network you selected is entered as the local network name, along with the name of the global network where it currently belongs.
      4. Remove the name of the global network and press Enter.
      5. Repeat these steps to remove any other HACMP networks from a global network.

    Changing the Configuration of a Network Module

    The HACMP SMIT interface allows you to change the configuration of an HACMP network module. You may want to tune the parameters of the topology services by changing the failure detection rate of a network module.

    This section contains the following topics:

  • Understanding Network Module Settings
  • Resetting the Network Module Tunable Values to Defaults
  • Behavior of Network Down on Serial Networks
  • Changing the Failure Detection Rate of a Network Module
  • Showing a Network Module
  • Removing a Network Module
  • Changing an RS232 Network Module Baud Rate.
  • Understanding Network Module Settings

    The normal detection rate is usually optimal. Speeding up or slowing down failure detection rate is a small, but potentially significant area where you can adjust cluster fallover behavior. However, the amount and type of customization you add to event processing has a much greater impact on the total fallover time. You should test the system for some time before deciding to change the failure detection rate of any network module.

    Be sure you have tuned the AIX 5L performance parameters for I/O pacing and syncd frequency before changing tuning parameters for a network module. See the section on Configuring AIX 5L for HACMP in the Installation Guide.

    Warning: I/O pacing and other tuning parameters should only be set to values other than defaults after a system performance analysis indicates that doing so will lead to both the desired and acceptable side effects. In addition, make sure you read the Setting I/O Pacing section in Chapter 1: Troubleshooting HACMP Clusters in the Troubleshooting Guide for more detailed description on tuning I/O pacing.

    If you decide to change the failure detection rate of a network module, keep the following considerations in mind:

  • Failure detection rate is dependent on the fastest network linking two nodes.
  • Faster heartbeat rates may lead to false failure detections, particularly on busy networks. For example, bursts of high network traffic may delay heartbeats and this may result in nodes being falsely ejected from the cluster. Faster heartbeat rates also place a greater load on networks.
  • If your networks are very busy and you experience false failure detections, you can change the failure detection rate on the network modules to slow to avoid this problem.
  • For instance, in a mixed-version cluster with Token Ring networks, to allow enough time for any type of fallover to be handled properly by HACMP, you may want to adjust the Failure Detection Rate for this network module from normal to slow.

    Note: In rare cases, it is necessary to slow the Failure Detection Rate to even longer than the slow option SMIT offers. In this case, you may change the Failure Detection Rate of a network module to a custom value by changing the tuning parameters from their predefined values to custom values.

    Resetting the Network Module Tunable Values to Defaults

    For troubleshooting purposes, you or IBM support personnel assisting you with cluster administration may optionally reset the HACMP tunable values (such as the tuning parameters for the network module) to their installation-time defaults.

    For more information on how to configure resetting the tunables in SMIT, see Resetting HACMP Tunable Values section in Chapter 1: Troubleshooting HACMP Clusters in the Troubleshooting Guide.

    Behavior of Network Down on Serial Networks

    Because of the point-to-point nature of serial networks, when there is any problem with the connection—such as the cable being unplugged—there is no possibility for other traffic to be visible on the other endpoint as there is with an IP network (like Ethernet). So when a serial interface loses heartbeats, it first declares its neighbor down, after the Failure Detection Rate has expired for that network interface type. HACMP waits the same interval again before declaring the local interface down (if no heartbeat is received from the neighbor).

    Thus, the regular Failure Detection Rate formula applies to the detection of the remote interface down, and twice the Failure Detection Rate (Failure Cycle * Heartbeat Rate * 4) applies to the detection of the local interface down). HACMP does not run a network_down event until both the local and remote interfaces are failed. Therefore, for serial networks, the time from actual failure to the execution of the network down time is actually double the Failure Detection Rate value.

    However, if the serial network is the last network left connecting this node to another, the node_down event is triggered by the first error.

    In summary, for detecting a remote node down, the serial networks behave the same way as IP networks, and the time to detect a remote node down is still the longest Failure Detection Rate of the networks involved.

    In addition, you can use the clstat -s command to display the service IP labels for serial networks that are currently down on a network.

    Note: The RSCT topsvcs daemon logs messages whenever an interface changes state. These errors are visible in the errpt.

    Disk Heartbeating Networks and Failure Detection

    Disk heartbeating networks are identical to other non-IP based networks in terms of the operation of the failure detection rate, however there is a subtle difference that affects the state of the network endpoints and the events run:

  • Disk heartbeating networks work by exchanging heartbeat messages on a reserved portion of a shared disk. As long as the node can access the disk the network endpoint will be considered up, even if heartbeat messages are not being sent between nodes. The disk heartbeating network itself will still be considered down.
  • All other non-IP networks mark the network and both endpoints as down when either endpoint fails.
  • This difference makes it easier to diagnose problems with disk heartbeating networks: If the problem is in the connection of just one node with the shared disk only, then that part of the network will be marked as being down.

    Disk Heartbeating and Fast Detection of Node Failures

    HACMP 5.4 reduces the time it takes for a node failure to be realized throughout the cluster.

    When a node fails, HACMP uses disk heartbeating to place a departing message on the shared disk so neighboring nodes are aware of the node failure within one heartbeat period (hbrate). Topology Services then distributes the information about the node failure throughout the cluster and each Topology Services daemon sends a node_down event to any concerned client.

    You can turn on fast method for node failure detection when you configure disk heartbeating networks and specify a parameter for the disk heartbeating NIM.

    For a procedure information, see the section Reducing the Node Failure Detection Rate: Enabling Fast Detection for Node Failures in this chapter.

    For disk heartbeating information, see Configuring Heartbeating over Disk in Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).

    Changing the Failure Detection Rate of a Network Module

    Two parameters are involved in determining the Failure Detection Rate. They are:

  • Heartbeat rate (in seconds)—frequency at which heartbeats (keepalives) are sent between the nodes.
  • Failure detection cycle—the number of consecutive heartbeats that must be missed before failure is assumed.
  • The following two tables show the actual values of the Heartbeat Rate and the Failure Detection Cycle for IP and non-IP networks depending on predefined Failure Detection Rate (Slow, Normal, or Fast).

    IP Network Setting
    Seconds between Heartbeats
    Failure Cycle
    Failure
    Detection Rate
    Slow
    2
    12
    48
    Normal
    1
    10
    20
    Fast
    1
    5
    10

    Failure Detection and Heartbeat Parameters for IP Networks 
    

    Non-IP Network Setting
    Seconds between Heartbeats
    Failure Cycle
    Failure
    Detection Rate
    Slow
    3
    8
    48
    Normal
    2
    5
    20
    Fast
    1
    5
    10

    Failure Detection and Heartbeat Parameters for Non-IP Networks 
    

    Before changing the default heartbeat settings for IP and non-IP networks, consult Chapter 3: Planning Cluster Network connectivity in the Planning Guide for information on and how these settings interact with the deadman switch.

    Network Grace Period is the time period during IPAT via IP Replacement operations that node reachability is not computed for the network. The grace period value needs to be long enough for the network interface to be reconfigured with the new address and to rejoin its network interface membership group. When IPAT is used with HWAT, it usually takes longer for the operation to complete, so larger values of the grace period may be necessary. The default Grace Period value for Token Ring and ATM is 90 seconds. It is 60 seconds for all other network types.

    SMIT provides two different panels for changing the attributes of a network module. You can either change the tuning parameters of a network module to predefined values of Fast, Normal and Slow, or you can set these attributes to custom values.

    Changing the Tuning Parameters to Predefined Values

    To change the tuning parameters of a network module to the predefined values of Slow, Normal or Fast:

      1. Enter smit hacmp
      2. Select Extended Configuration > Extended Topology Configuration > Configure HACMP Network Modules and press Enter.
      3. Select the Change a Network Module Using Predefined Values option and press Enter. SMIT displays a list of defined network modules.
      4. Select the network module you want to change and press Enter. SMIT displays the attributes of the network module, with their current values.
    Network Module Name
    Name of network type, for example, ether.
    Description
    For example, Ethernet Protocol
    Failure Detection Rate
    Select from Normal, Fast or Slow. This tunes the interval between heartbeats for the selected network module. The time needed to detect a failure can be calculated using this formula: (heartbeat rate) * (failure cycle) * 2 seconds.
      5. Make the selections you need for your configuration.
    HACMP will detect a network interface failure in the time specified by the formula: Failure Detection Rate = Failure Cycle * Heartbeat Rate *2, or very close to it, the software may not take action on this event
    Due to event processing overhead the actual cluster event may not start for another few seconds.
      6. Return to the SMIT Extended Configuration menu and select the Extended Verification and Synchronization option to synchronize the cluster.

    Reducing the Node Failure Detection Rate: Enabling Fast Detection for Node Failures

    Failure detection rates of Fast, Normal and Slow contain hbrates of 1, 2, or 3 seconds respectively. The time for the neighbor nodes to determine the node is down through disk heartbeating would be at most 1, 2, or 3 seconds, followed by the other cluster nodes becoming immediately aware of the failure.

    Starting with HACMP 5.4, you can reduce the time it takes to detect a node failure. With the fast failure detection function, node failures are realized among the nodes in the cluster within one missed heartbeat period.

    This method requires that you configure a disk heartbeating network. To enable this method, change the NIM parameter for the disk heartbeating network, when the cluster services are stopped on the nodes.

    Fast failure detection method is supported on all disks that work with HACMP. It is not supported on SSA disks. For information on disk heartbeating, see Configuring Heartbeating over Disk.

    To enable the fast method of detecting node failures:

      1. Stop the cluster services on all nodes.
      1. Enter the SMIT hacmp
      2. In SMIT, go to Extended Configuration > Extended Topology Configuration > Configure HACMP Network Modules > Change a Network Module using Custom Values and press Enter. A list of network modules appears.
      3. Select a network module that is used for the disk heartbeating network and press Enter.
      4. Type or select values in entry fields as follows:
    Network Module Name
    Filled in with diskhb. This is the name of the network module for the disk heartbeating network.
    Description
    Disk heartbeating network
    Address Type
    Device. This specifies that the adapter associated with this network uses a device file.
    Path
    This is the path to the network executable file, such as /usr/sbin/rsct/hats_diskhb_
    Parameters
    This field lists the parameters passed to the network executable file.
    For the disk heartbeating NIM, enter FFD_ON in this field. This enables HACMP to use the fast method of node failure detection. This value in the NIM cannot be changed dynamically in a running cluster.
      5. Leave the remaining fields in this SMIT screen unchanged for the disk heartbeating network.
      6. Press Enter after making all desired changes and synchronize the cluster.

    Changing the Tuning Parameters to Custom Values

    If the cluster needs more customization than the predefined tuning parameters offer, you can change the Failure Detection Rate of a network module to a custom value. You can always return to the original settings by using the SMIT panel for setting the tuning parameters to predefined values.

    Note: The failure detection rate of the network module affects the deadman switch time-out. The deadman switch time-out is triggered one second before the failure is detected on the slowest network in your cluster.

    Also, use this SMIT panel to change the baud rate for TTYs if you are using RS232 networks that might not handle the default baud rate of 38400.

    Setting Sub-Second Heartbeating Values

    HACMP 5.2 and up lets you set sub-second heartbeating tunable values. These allow faster failure detection and therefore faster takeover operations.

    This capability requires AIX 5L 5.2 or greater and RSCT 2.3.3 or greater on all cluster nodes.

    Choose fast detection tunables with care, since the lower the detection time is, the greater the chance for false failures, that is, situations where a node or network interface is not really down, but appears to be because of a temporary problem.

    For example, if the failure detection time for a network is set to 5 seconds, and the network or a node suffers a high load period or a period where packets are lost, then this may result in a node detecting that a remote node is down.

    Before setting fast detection tunable values, take the following into account:

  • The application load on the system should be such that it does not over-commit the amount of physical memory on the nodes. Having some amount of paging activity is acceptable, but the more paging activity exists on the system, the higher the probability that false failures may be seen because of processes being blocked while waiting for memory pages to be brought to memory.
  • The rate of I/O interrupts on the node should not be such that processes in the system are prevented from getting timely access to the CPU.
  • The traffic on the networks being used should be controlled, to avoid prolonged periods where cluster traffic cannot be reliably transmitted.
  • In cases where the restrictions above cannot be followed, using low detection time values is not recommended.

    To achieve a detection time of five seconds, use the following values:

    NIM type
    Failure Cycle
    Interval Between Heartbeats (seconds)
    All IP NIMs
    5
    0.5
    RS232
    3
    0.8
    Disk HB
    3
    0.8
    TMSSA
    3
    0.8
    TMSCSI
    3
    0.8

    NIM Settings for 5 Second Detection Time 
    

    The failure detection time formula is: 2 x failure cycle x interval between heartbeats.

    Still lower detection times may be used, but not with Disk HB and RS232 devices, since the minimum values for Failure Cycle and Interval Between Heartbeats for such devices are 3 and 0.75, respectively.

    To achieve three seconds of detection time, use the following values:

    NIM type
    Failure Cycle
    Interval between Heartbeats (seconds)
    All IP NIMs
    5
    0.3
    TMSSA
    3
    0.5
    TMSCSI
    3
    0.5

    NIM Settings for 3 Second Detection Time 
    

    Steps for Changing the Tuning Parameters of a Network Module to Custom Values

    To change the tuning parameters of a Network Module to custom values:

      1. Enter smit hacmp
      2. Select Extended Configuration > Extended Topology Configuration > Configure an HACMP Network Module > Change a Network Module Using Custom Values.
    SMIT displays a list of defined network modules.
      3. Select the network module for which you want to change parameters and press Enter.
    SMIT displays the attributes of the network module, with their current values.
    Network Module Name
    Name of network type, for example, ether.
    Description
    For example, Ethernet Protocol
    Address Type
    Select an option: Device or Address.
    The Address option specifies that the network interface that is associated with this network module uses an IP-typed address.
    The Device option specifies that the network interface that is associated with this network module uses a device file.
    Path
    Specifies the path to the network executable file.
    Parameters
    Specifies the parameters passed to the network interface module (NIM) executable.
    For the RS232 NIM, this field specifies the baud rate. Allowable values are 38400 (the default), 19200, and 9600.
    For the disk heartbeating NIM, this field specifies the parameter that is passed to RSCT and that enables HACMP to use the fast method of node failure detection.
    Allowable values are FFD_ON and FFD_OFF (the default). To enable fast detection of node failures, specify FFD_ON in this field. This value in the NIM cannot be changed dynamically in a running cluster.
    Grace Period
    The current setting is the default for the network module selected. This is the time period in which, after a network failure was detected, further network failures of the same type would be ignored. This is 60 seconds for all networks except ATM and Token Ring, which are 90 seconds.
    Failure Cycle
    The current setting is the default for the network module selected. (Default for Ethernet is 10). This is the number of successive heartbeats that can be missed before the interface is considered to have failed. You can enter a number from 1 to 75.
    Interval between Heartbeats (seconds)
    The current setting is the default for the network module selected and is a heartbeat rate. This parameter tunes the interval (in seconds) between heartbeats for the selected network module. You can enter a number from less than 1 to 5.
    Supports Gratuitous ARP
    This field is displayed only for those networks that generally support gratuitous ARP.
    Set this field to true if this network supports gratuitous ARP. Setting this field to true enables HACMP to use IPAT via IP Aliases.
    If you set this field to false for a specific network, you will disable the IPAT via IP Aliases function of HACMP for this network type. Since HACMP relies on the entry in this field to set up the fallover policies for cluster resources, do not change this field for the networks configured in your cluster.
    Entry Type
    This field specifies the type of the network interface. It is either a network interface card (for a NIM specific to a network interface card), or a network interface type (for a NIM to use with a specific type of network device).
    Next Generic Type
    This field specifies the next type of NIM to use if a more suitable NIM cannot be found.
    Next Generic Name
    This field specifies the next generic NIM to use if a more suitable NIM cannot be found.
    Supports Source Routing
    Set this field to true if this network supports IP loose source routing.

    Note: Whenever a change is made to any of the values that affect the failure detection time—failure cycle (FC), heartbeat rate (HB) or failure detection rate—the new value of these parameters is sent as output to the panel in the following message:
    SUCCESS: Adapter Failure Detection time is now FC * HB* 2 or SS seconds
      4. Make the changes for your configuration.
    HACMP will detect a network interface failure in the time specified by the formula: Failure Detection Rate = Failure Cycle * Heartbeat Rate *2, or very close to it, the software may not take action on this event
    Due to event processing overhead the actual cluster event may not start for another few seconds.
      5. Synchronize the cluster.

    After changing the tty baud rate and synchronizing the cluster, you can check the change by executing the following command on a running cluster (assuming tty1 is the name of the HACMP heartbeat network):

    stty -a < /dev/tty1

    Showing a Network Module

    To show the current values of a network module:

      1. Enter smit hacmp
      2. Select Extended Configuration > Extended Topology Configuration > Configure an HACMP Network Module > Show a Network Module and press Enter.
    SMIT displays a list of defined network modules.
      3. Select the name of the network module for which you want to see current settings and press Enter.
    After the command completes, a panel appears that shows the current settings for the specified network module.

    Removing a Network Module

    To remove a network module:

      1. Enter smit hacmp
      2. Select Extended Configuration > Extended Topology Configuration > Configure an HACMP Network Module > Remove a Network Module and press Enter.
    SMIT displays a list of defined network modules.
      3. Select the name of the network module you want to remove and press Enter. You will be asked Are you sure?
      4. Press Enter again.

    Changing an RS232 Network Module Baud Rate

    All RS232 networks used by HACMP are brought up by RSCT with a default baud rate of 38400. However, there may be times when you need to lower that baud rate to a slower speed. To lower the baud rate for an already defined RS232 network, take these steps:

      1. Enter smit hacmp
      2. Select Extended Configuration > Extended Topology Configuration > Configure an HACMP Network Module > Change a Network Module Using Custom Values.
      3. SMIT displays a list of Network Modules. Select RS232 and press Enter.
      4. SMIT displays the Change a Network Module using Custom Values panel. Select the desired value in the Parameters field. 9600, 19200, or 38400 are the only acceptable baud rate values.
      5. Press Enter.
      6. Synchronize the cluster.

    Changing the Configuration of a Site

    If you are using one of the HACMP/XD solutions, be sure to consult the documentation for that software before changing any attributes of a site. Making changes to sites also affects cross-site LVM mirroring.

    All dynamic topology configuration changes allowed in an HACMP configuration are now supported in HACMP/XD configurations. This includes changes to XD-type networks (XD_data used in HACMP/XD for GLVM), interfaces, sites, nodes, and NIM values. HACMP handles the resource group with replicated resources and its primary and secondary instances properly during these operations.

    To avoid unnecessary processing of resources, use the Resource Group Migration utility, clRGmove, (HACMP Resource Group and Application Management in SMIT) to move resource groups that will be affected before you make the cluster topology change.

    When dynamically reconfiguring a cluster, HACMP releases resource groups if this is found to be necessary, and then reacquires them later. For example, HACMP will release and reacquire the resource group that is using the associated service address on a network interface that is affected by the change to topology.

    To change or show a site definition:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Sites > Change/Show a Site. and press Enter.
      3. Select the site to change from the picklist.
      4. Enter the information as follows:
    Site Name
    The current name is displayed.
    New Site Name
    Enter a name for this site using alphanumeric characters and underscores. Use no more than 32 characters.
    Site Nodes
    Add or remove names from the list of the cluster nodes that currently belong to the site.
    Dominance
    Select yes or no to indicate whether the current site is dominant or not. This only applies to HAGEO.
    Backup Communications
    Type
    Select the type of backup communication for your HAGEO cluster (DBFS for telephone line, SGN for a Geo_Secondary network, or NONE. HACMP/XD for Metro Mirror and HACMP/XD for GLVM only use NONE.
      5. Press Enter to change the definition in the Configuration Database.
    Note: If you change a site name that has an associated IP label, the IP label will change to associate with the new name for this site.

    Removing a Site Definition

    To remove a site definition:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Topology Configuration > Configure HACMP Sites > Remove a Site and press Enter.
      3. Select the site to remove from the picklist.
    SMIT displays Are you sure?
      4. Press Enter to remove the site definition.
    Note: If you remove a site definition that has an associated IP label, the IP label remains, but is no longer associated with any site.

    Synchronizing the Cluster Configuration

    Whenever you modify the cluster definition in the Configuration Database on one node, you must synchronize the change with the Configuration Database data on all cluster nodes. You perform a synchronization by choosing the Verification and Synchronization option from either the Standard or the Extended HACMP Configuration SMIT panel, or from the Problem Determination Tools menu.

    See Chapter 7: Verifying and Synchronizing an HACMP Cluster for complete information on this procedure.

    Dynamic Reconfiguration Issues and Synchronization

    This section is relevant for dynamic reconfiguration of both topology and resources.

    Releasing a Dynamic Reconfiguration Lock

    During a dynamic reconfiguration, HACMP creates a temporary copy of the HACMP-specific Configuration Database classes and stores them in the Staging Configuration Directory (SCD). This allows you to modify the cluster configuration while a dynamic reconfiguration is in progress. You cannot, however, synchronize the new configuration until the first is finished. The presence of an SCD on any cluster node prevents dynamic reconfiguration. If, because of a node failure or other reason, an SCD remains on a node after a dynamic reconfiguration is finished, it will prevent any further dynamic reconfiguration. Before you can perform further reconfiguration, you must remove this lock.

    To remove a dynamic reconfiguration lock:

      1. Enter smit hacmp
      2. In SMIT, select Problem Determination Tools and press Enter.
      3. Select the Release Locks Set By Dynamic Reconfiguration option and press Enter. SMIT displays a panel asking if you want to proceed. If you want to remove the SCD, press Enter.

    Processing Configuration Database Data During Dynamic Reconfiguration

    When you synchronize the cluster topology, the processing performed by HACMP varies depending on the status of the Cluster Manager.

    The following describe the variations that may occur:

    Cluster Manager Is Not Running on Any Cluster Node

    If the Cluster Manager is not running on any cluster node (typically the case when a cluster is first configured), synchronizing the topology causes the configuration data stored on each node reachable from the local node to be updated.

    Cluster Manager Is Running on the Local Node

    If the Cluster Manager is running on the local node, synchronizing the topology triggers a dynamic reconfiguration event. While processing this event, HACMP updates the configuration data stored on each cluster node that is reachable. Further processing makes the new configuration the currently active configuration.

    Cluster Manager Is Running on Some Cluster Nodes but Not on the Local Node

    If the Cluster Manager is running on some cluster nodes but not on the local node, synchronizing the topology causes the configuration data stored on each node that is reachable from the local node to be updated. However, the processing performed during a dynamic reconfiguration to make the new configuration the active configuration is not performed.

    Undoing a Dynamic Reconfiguration

    Before HACMP overwrites the configuration defined in the ACD, it saves a record of the configuration in a cluster snapshot. Only the .odm portion of a cluster snapshot is created; the .info file is not created. (For more information about cluster snapshots, see Chapter 18: Saving and Restoring Cluster Configurations.) If you want to undo the dynamic reconfiguration, you can use this cluster snapshot to restore the previous configuration.

    HACMP saves snapshots of the last ten configurations in the default cluster snapshot directory, /usr/es/sbin/cluster/snapshots, with the name active.x.odm, where x is a digit between 0 and 9, with 0 being the most recent.

    Restoring the Configuration Database Data in the DCD

    If a dynamic reconfiguration operation fails or is interrupted, you may want to restore the configuration in the DCD with the current active configuration, which is stored in the ACD. HACMP allows you to save in a snapshot the changes you made to the configuration in the DCD before you overwrite it.

    To replace the Configuration Database data stored in the DCD with the Configuration Database data in the ACD, perform the following procedure.

      1. Enter smit hacmp
      2. In SMIT, select Problem Determination Tools and press Enter.
      3. Select Restore HACMP Configuration Database from Active Configuration and press Enter.
      4. Enter field values as follows:
    Cluster Snapshot Name of System Default HACMP ODMs
    In this field, specify the name you want assigned to the cluster snapshot HACMP creates before it overwrites the ODM data stored in the DCD with the ODM data from the ACD. You can use this snapshot to save the configuration changes you made.
    Cluster Snapshot Description of System Default HACMP ODMs
    Enter any text string you want stored at the beginning of the snapshot.
      5. Press Enter. SMIT displays the results.

    PreviousNextIndex