PreviousNextIndex

Chapter 7: Verifying and Synchronizing an HACMP Cluster


Verifying and synchronizing your HACMP cluster assures you that all resources used by HACMP are configured appropriately and that rules regarding resource ownership and resource takeover are in agreement across all nodes. You should verify and synchronize your cluster configuration after making any change within a cluster (for example, any change to the hardware operating system, node configuration, or cluster configuration).

The main sections of this chapter include the following:

  • Overview
  • Automatic Verification and Synchronization
  • Verifying the HACMP Configuration Using SMIT
  • Managing HACMP File Collections
  • Adding a Custom Verification Method
  • List of Reserved Words.
  • Overview

    Whenever you configure, reconfigure, or update a cluster, run the cluster verification procedure to ensure that all nodes agree on the cluster topology, network configuration, and the ownership and takeover of HACMP resources. If the verification succeeds, the configuration can be synchronized. Synchronization takes effect immediately on an active cluster. A dynamic reconfiguration event is run and the changes are committed to the active cluster.

    Note: If you are using the SMIT Initialization and Standard Configuration path, synchronization automatically follows a successful verification. If you are using the Extended Configuration path, you have more options for types of verification. If you are using the Problem Determination Tools path, you can choose whether to synchronize or not.

    The messages output from verification indicate where the error occurred (for example, the node, device, or command). The utility uses verbose logging to write to the /var/hacmp/clverify/clverify.log file.

    Note: Verification is not supported on a mixed-version HACMP cluster.

    HACMP 5.4 has an additional verification check to ensure that each node can reach each other node in the cluster through non-IP network connections. If this is not true, a message is displayed.

    Error conditions result when information is not properly configured on all cluster nodes. This information may be important for the operation of HACMP, but not part of the HACMP software itself; for example, AIX 5L volumes do not exist in the cluster. In some of these situations, you can authorize a corrective action before verification continues. When verification detects certain conditions, such as mismatched HACMP shared volume group time stamps or a node is missing required entries in /etc/services, HACMP fixes the problem. For a list of all conditions for which HACMP issues automatic corrective actions, see Running Corrective Actions during Verification.

    On the node where you run the utility, detailed information is collected into log files, which contain a record of all data collected and the tasks performed.

    You can add your own custom verification methods to ensure that specific components within your cluster are properly configured. You can change or remove these methods from the verification process depending on the level of cluster verification you want. See the section Adding a Custom Verification Method later in this chapter.

    Note: Verification requires 4 MB of disk space in the /var filesystem in order to run; 18 MB of disk space is recommended for a four-node cluster. Typically, the /var/hacmp/clverify/clverify.log files require 1–2 MB of disk space.

    Running Cluster Verification

    After making a change to the cluster, you can perform cluster verification in the following ways:

  • Automatic verification. You can automatically verify your cluster:
  • Each time you start cluster services on a node
  • Each time a node rejoins the cluster
  • Every 24 hours.
  • By default, automatic verification is enabled to run at midnight.

    For detailed instructions, see Automatic Verification and Synchronization.

  • Manual verification. Using the SMIT interface, you can either verify the complete configuration, or only the changes made since the last time the utility was run.
  • Typically, you should run verification whenever you add or change anything in your cluster configuration. For detailed instructions, see Verifying the HACMP Configuration Using SMIT.

    Automatic Verification and Synchronization

    During automatic verification and synchronization, HACMP discovers and corrects several common configuration issues prior to starting cluster services. This automatic behavior ensures that if you had not manually verified and synchronized your cluster prior to starting cluster services, HACMP will do so. Throughout this section, automatic verification and synchronization is often simply referred to as verification.

    Understanding the HACMP Cluster Verification Process

    By default, verification runs automatically without any configuration required. We recommend that you do not disable verification, but if necessary, you can disable it using the Extended Configuration >Extended Cluster Service Settings path.

    Verification occurs on both active and inactive clusters. In order for automatic verification to work, more than one node must exist in the cluster, since HACMP compares the configuration of one node against the configuration of another node.

    Verification ensures an error-free cluster startup and poses a negligible impact on performance, which is directly related to the number of nodes, volume groups, and filesystems in the cluster configuration.

    The phases of the verification and synchronization process are as follows:

      1. Verification
      2. Snapshot (optional)
      3. Synchronization.

    For details on these phases, see the Understanding the Detailed Phases of Verification section. After verification, cluster services start.

    Cluster Verification during a Dynamic Cluster Reconfiguration Event

    If a node is down during a dynamic reconfiguration event and later it attempts to join the cluster, cluster verification and synchronization run prior to starting services on the joining node, and the joining node receives its configuration updates from an active cluster node.

    If verification fails on the joining node, the node will not start cluster services. Likewise, if a node is dynamically removed from the active cluster, the node will not be allowed to join the cluster or cluster services.

    Parameters Automatically Corrected

    Automatic verification and synchronization ensure that typical configuration inconsistencies are automatically corrected as follows:

  • RSCT versions are congruent across the cluster.
  • IP addresses (that RSCT expects) are configured on the network interfaces.
  • Shared volume groups are not set to automatically varyon.
  • Filesystems are not set to automatically mount.
  • Verifying RSCT Versions

    The activity state of your nodes determines which RSCT number is used for synchronization. The number from the active nodes is used to populate the inactive nodes; if cluster services are currently running, it is assumed that all RSCT numbers are correct, so they are not verified.

    If there are no active nodes and the number is inconsistent across the cluster, then verification uses the local node RSCT number to synchronize to all other cluster nodes—except if the local node RSCT number is zero (0), then HACMP uses 1 on all other cluster nodes.

    Verifying Service IP Address Aliases

    At cluster startup, RSCT expects the IP address label to be defined on the interfaces with the same value that has been defined in the HACMP configuration database. The HACMP automatic verification and synchronization process ensures nodes not currently running cluster services are verified and corrected; nodes currently running cluster services are not automatically corrected.

    Note: Only aliased IP interfaces that are used by HACMP are verified and corrected.

    If a node has an interface that is not defined as it appears in the HACMP configuration database, automatic verification detects this and issues an error message.

    Verifying Shared Volume Groups

    Shared volume groups that are configured as part of an HACMP resource group must have their automatic varyon attribute set to No. If the verification phase determines that the automatic varyon attribute is set to Yes, verification notifies you about nodes on which the error occurs and prompts you to correct the situation.

    Verifying Filesystems

    Any filesystems participating in a resource group with AIX 5L attributes that allow the filesystem to be automatically mounted at system restart will raise errors. This includes standard journaled filesystems (JFS) and enhanced journaled filesystems (JFS2). If the filesystem has been set to mount automatically at boot time, verification displays an error.

    Understanding the Detailed Phases of Verification

    This section describes the phases of verification and cluster services startup. These events occur in the following order:

  • Phase One: Verification
  • Phase Two: (Optional) Snapshot
  • Phase Three: Synchronization.
  • After verification, cluster services start up. If cluster services do not start, it is because HACMP has discovered errors. You can resolve these errors by correcting inconsistencies. For information about correcting these inconsistencies, see the section Monitoring Verification and Resolving Configuration Inconsistencies in this chapter.

    Phase One: Verification

    During the verification process the default system configuration directory (DCD) is compared with the active configuration. On an inactive cluster node, the verification process compares the local DCD across all nodes. On an active cluster node, verification propagates a copy of the active configuration to the joining nodes.

    If a node that was once previously synchronized has a DCD that does not match the ACD of an already active cluster node, the ACD of an active node is propagated to the joining node. This new information does not replace the DCD of the joining nodes; it is stored in a temporary directory for the purpose of running verification against it.

    Note: When you attempt to start a node that has an invalid cluster configuration, HACMP transfers a valid configuration database data structure to it, which may consume 1–2 MB of disk space.

    If the verification phase fails, cluster services will not start. In this situation, see the section Monitoring Verification and Resolving Configuration Inconsistencies.

    Phase Two: (Optional) Snapshot

    A snapshot is only taken if a node request to start requires an updated configuration. During the snapshot phase of verification, HACMP records the current cluster configuration to a snapshot file—for backup purposes. HACMP names this snapshot file according to the date of the snapshot and the name of the cluster. Only one snapshot is created per day. If a snapshot file exists and its filename contains the current date, it will not be overwritten.

    This snapshot is written to the /usr/es/sbin/cluster/snapshots/ directory.

    The snapshot filename uses the syntax MM-DD-YYYY-ClusterName-autosnap.odm. For example, a snapshot taken on April 2, 2006 on a cluster hacluster01 would be named usr/es/sbin/cluster/snapshots/04-02-06hacluster01-autosnap.odm.

    Phase Three: Synchronization

    During the synchronization phase of verification, HACMP propagates information to all cluster nodes. For an inactive cluster node, the DCD is propagated to the DCD of the other nodes. For an active cluster node, the ACD is propagated to the DCD.

    If the process succeeds, all nodes are synchronized and cluster services start. If synchronization fails, cluster services do not start and HACMP issues an error.

    Monitoring Verification and Resolving Configuration Inconsistencies

    You can monitor the automatic verification and synchronization progress as it occurs by tracking messages as they appear on the SMIT console. In addition, you can examine any prior processes by reviewing the smit.log file or /var/hacmp/clverify/clverify/log.

    Verification Completion

    When cluster verification completes on the selected cluster node, this node supplies the following information to the other cluster nodes:

  • Name of the node where verification had been run
  • Date and time of the last verification
  • Results of the verification.
  • This information is stored on every available cluster node in the /var/hacmp/clverify/clverify.log file. If the selected node became unavailable or could not complete cluster verification, you can detect this by the lack of a report in the /var/hacmp/clverify/clverify.log file. If the log file does not indicate a specific node, then the error applies to all nodes and cluster services do not start.

    If cluster verification completes and detects some configuration errors, you are notified about the potential problems:

  • The exit status of verification is published across the cluster along with the information about cluster verification process completion.
  • Broadcast messages are sent across the cluster and displayed on the console. These messages inform you of any detected configuration errors.
  • A cluster_notify event runs on the cluster and is logged in hacmp.out (if cluster services are running).
  • Information about the node where you ran the cluster verification is written to the /var/hacmp/clverify/clverify.log file. If a failure occurs during processing, error messages and warnings indicate the node affected and reasons for the verification failure.
  • A configuration snapshot is written to the /usr/es/sbin/cluster/snapshots/ directory.
  • Ongoing Automatic Verification

    Once a valid configuration is defined, the verification process runs once every 24 hours. By default, the first node in alphabetical order runs the verification at midnight; however, you can change these defaults by selecting a node and a time that suits your needs. If the selected node is unavailable (powered off), automatic verification does not run.

    For information on changing the default configuration see the Automatic Cluster Configuration Monitoring section in Chapter 1: Troubleshooting HACMP Clusters in the Troubleshooting Guide.

    Verifying the HACMP Configuration Using SMIT

    After reconfiguring or updating a cluster, run the cluster verification procedure. For a list of the types of verification performed, see Verifying and Synchronizing a Cluster Configuration.

    Note: If you are investigating a problem with the cluster and want to run verification procedures without synchronizing the cluster, use the cluster verification SMIT panels found under the Problem Determination Tools menu. See Chapter 1: Troubleshooting HACMP Clusters in the Troubleshooting Guide.

    Verifying and Synchronizing a Cluster Configuration

    Verification performs many automatic checks. This section provides overviews of the following verifications performed; it is not an exhaustive description of all verifications. HACMP documentation lists the verification checks for each function in the sections describing these functions.

    This section includes the following topics:

  • Verifying the Topology Configuration
  • Verifying the Network Configuration
  • Verifying Disk and Filesystem Configuration
  • Verifying Resource Group Information
  • Verifying Individual Resources
  • Verifying Automatic Error Notification Methods
  • Verifying the Security Configuration
  • Verifying Custom Configurations
  • Verifying HACMP/XD Configurations
  • Verifying Service IP labels.
  • Verifying the Topology Configuration

    Verification ensures that all nodes agree on the topology of the cluster. For example, it checks for invalid characters in cluster names, node names, network names, network interface names, and resource group names. It checks to ensure that interfaces are properly configured, nodes are reachable, and networks have the required number of interfaces.

    It also checks for the reserved words used as cluster names, node names, network names, network interface names and resource group names. These names are listed in the /usr/es/sbin/cluster/etc/reserved_words file. See the List of Reserved Words in this chapter.

    Verifying the Network Configuration

    Verification ensures that the networks are configured correctly and that all nodes agree on the ownership of all defined resources, such as the following:

  • Configuration of network information, such as addresses on all nodes in the cluster or whether multiple non-IP networks exist on the same tty device.
  • No network interfaces configured on unsupported network types (for example, IP, socc, slip and fcs).
  • Verifying Disk and Filesystem Configuration

    Verification ensures that disks and filesystems are in agreement and configured according to the following:

  • Agreement among all nodes on the ownership of defined resources (for example, filesystems, volume groups, disks, and application servers). The verification utility checks for the existence and defined ownership of the filesystems to be taken over, and then checks the volume group and disks where the filesystems reside.
  • Agreement among nodes on the major and minor device numbers for NFS-exported filesystems.
  • If disk fencing is enabled, verification sends an error if all nodes are not included in the concurrent access resource group.
  • Verifying Resource Group Information

    Verification ensures that the resource group information supplied is in agreement and configured according to the following:

  • Verification issues warnings in cases when the startup, fallover or fallback preferences that you choose for resource groups may put the high availability of resources at risk in the case of a cluster failure.
  • The verification utility checks that the choices for distribution of resources in case of a takeover (node priorities) so that the takeover information matches the owned resources information.
  • Verifying Individual Resources

    Verification checks individual resources, such as the following:

  • Event customization.
  • Application server start and stop scripts exists and that they are executable.
  • Verifying Automatic Error Notification Methods

    Verification ensures that automatic error notification (AEN) methods exist and are properly configured for the following:

  • Root volume groups
  • HACMP-defined volume groups or HACMP-defined disks
  • HACMP-defined filesystems (the underlying disks that support the file system)
  • SP switch network interface cards.
  • Verifying the Security Configuration

    If you have configured Kerberos on your system, verification also verifies that:

  • Kerberos is installed on all nodes in the cluster
  • All IP labels listed in the configuration have the appropriate service principals in the .klogin file on each node in the cluster
  • All nodes have the proper service principals
  • All nodes have the same security mode setting.
  • Verifying Custom Configurations

    If you have configured custom cluster snapshot methods or AIX 5L Fast Connect services, verification checks their existence and consistency.

    Verifying HACMP/XD Configurations

    If you are using HACMP/XD configurations, verification confirms that the HACMP/XD cluster configuration for sites and its replicated resources are consistent with your HACMP cluster configuration.

    Verifying Service IP labels

    If a service IP label is configured on the interface instead of the boot label, verification issues an error reminding you to run the sample utility clchipdev before starting cluster services. If that service IP label is an alias, verification has a correct action to reverse it. The sample utility clchipdev helps configure the application service interface correctly in HACMP. For information on clchipdev, see Configuring an Application Service Interface in Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).

    Verifying and Synchronizing the Cluster Configuration

    You can verify and synchronize your cluster from either SMIT cluster configuration path:

  • Initialization and Standard Configuration
  • Extended Configuration.
  • Verifying the Cluster Using the Initialization and Standard Configuration Path

    If you use the Initialization and Standard Configuration path, when you select the option Verify and Synchronize HACMP Configuration, the command executes immediately. Messages appear in the SMIT command status screen as the configuration is checked.

    Verifying the Cluster Using the Extended Configuration Path

    If you use the Extended Configuration path, you can set parameters for the command before it runs. These parameters differ depending on whether or not the cluster is active.

    To verify and synchronize the HACMP cluster configuration:

      1. Enter smit hacmp
      2. In SMIT, select Extended Configuration > Extended Verification and Synchronization and press Enter.
    The software checks whether cluster services are running on any cluster node and displays one of the following screens:
    If the cluster is active, the following options appear.

    Emulate or Actual
    Actual is the default.
    Verify changes only?
    No is the default. (Run the full check on resource and topology configuration.) Select Yes to verify only resource or topology configurations that have changed since the last time the cluster was verified.
    Note: If you have changed the AIX 5L configuration, do not use this mode; it only applies to HACMP configuration changes.
    Logging
    Standard is the default. You can also select Verbose. Verification messages are logged to /var/hacmp/clverify/clverify.log.


    If the cluster is inactive, the following options appear:

    Verify Synchronize or Both
    Both is the default. You can also select Verify only or Synchronize only.
    Automatically correct errors found during verification?
    No is the default. HACMP will not perform corrective actions.
    If you select Interactively, during verification you will be prompted when it finds a problem it can correct, for example:
    • Importing a volume group
    • Re-importing shared volume groups (mount points and filesystems issues).
    You then choose to have the action taken or not. For more information, see the section Conditions That Can Trigger a Corrective Action in this chapter.
    Force synchronization if verification fails?
    No is the default. If you select Yes, cluster verification runs but verification errors are ignored and the cluster is synchronized.
    Use the Yes option with caution. Correct functioning of the cluster at runtime cannot be guaranteed if you synchronize without verification. Cluster topology errors may lead to an abnormal exit of the Cluster Manager. Resource configuration errors may lead to resource group acquisition errors.
    Verify changes only?
    No is the default. (Run the full check on resource and topology configuration.) Yes opts to verify only resource or topology configurations that have changed since the last time the cluster was verified.
    Note: If you have changed the AIX 5L configuration, do not use this mode; it only applies to HACMP configuration changes.
    Logging
    Standard is the default. You can also select Verbose. All verification messages (including Verbose messages) are logged to /var/hacmp/clverify/clverify.log.

      3. Press Enter and SMIT starts the verification process. The verification output appears in the SMIT Command Status window.
      4. If any error messages appear, make the necessary changes and run the verification procedure again. You may see Warnings if the configuration has a limitation on its availability; for example, only one interface per node per network is configured, or Workload Manager is configured but there is no application server assigned to use it.

    Running Corrective Actions during Verification

    You can run automatic corrective actions during cluster verification on an inactive cluster. By default, automatic corrective action is enabled for Initialization and Standard path and disabled for Extended path.

    Automatic corrective actions can be disabled for the Extended path (from the System Management (C-SPOC) > Manage HACMP Services > menu) but it cannot be disabled for the Standard path. You can run verification with corrective actions in one of two modes:

  • Interactively. If you select Interactively, when verification detects a correctable condition related to importing a volume group or to re-importing mount points and filesystems, you are prompted to authorize a corrective action before verification continues.
  • Automatically. If you select Yes, when verification detects that any of the error conditions exists, as listed in section Conditions That Can Trigger a Corrective Action, it takes the corrective action automatically without a prompt.
  • If an error discovered during verification has a corrective action, the item is corrected and the run continues. For situations when the correction involves importing a shared volume group, re-importing a shared volume group, or updating the /etc/hosts file, the utility runs all verification checks again after it corrects one of the above conditions. If the same error condition is triggered again, the associated corrective action is not executed. The error is logged and verification fails. If the original condition is a warning, verification succeeds.

    HACMP 5.4 detects active service IP labels and active volume groups on nodes regardless of whether or not nodes are running cluster services. HACMP looks at the resource group state instead of cluster services. When verification detects active resources on a node that does not have the resource group in ONLINE state or does not have the resource group in an UNMANAGED state, verification gives you the option to bring these resources OFFLINE according the following table.

    Verification cannot tell which node will actually acquire the resource group that has active resources. Thus, the warning messages mentioned in the following table are printed every time active resources are found on a node that is or is not stopped and the state of the resource group to which active resources belong is UNMANAGED, OFFLINE, or ERROR.

     
    Resource Group Attribute: Manage Resource Group Automatically
    Resource Group Attribute: Manage Resource Group Manually
    Interactively correct errors
    Display message with option to bring resources offline.
    Display message with option to bring resources offline.
    Automatically correct errors
    Reset the startup attribute Managed Resource group to: Manually. Display warning message.
    Print reminder/warning and steps to take.
    No corrective actions
    Print reminder/warning and steps to take
    Print reminder/warning and steps to take
    Cluster Services are running
    N/A
    N/A

    Conditions That Can Trigger a Corrective Action

    HACMP shared volume group time stamps are not up-to-date on a node

    If the shared volume group time stamp file does not exist on a node, or the time stamp files do not match on all nodes, the corrective action ensures that all nodes have the latest up-to-date VGDA time stamp for the volume group and imports the volume group on all cluster nodes where the shared volume group was out of sync with the latest volume group changes. The corrective action ensures that volume groups whose definitions have changed will be properly imported on a node that does not have the latest definition.

    The /etc/hosts file on a node does not contain all HACMP-managed IP addresses

    If an IP label is missing, the corrective action modifies the file to add the entry and saves a copy of the old version to /etc/hosts.date. If a backup file already exists for that day, no additional backups are made for that day.

  • If the /etc/hosts entry exists but is commented out, verification adds a new entry; comment lines are ignored.
  • If the label specified in the HACMP Configuration does not exist in /etc/hosts, but the IP address is defined in /etc/hosts, the label is added to the existing /etc/hosts entry. If the label is different between /etc/hosts and the HACMP configuration, then verification reports a different error message; no corrective action is taken.
  • If the entry does not exist, meaning both the IP address and the label are missing from /etc/hosts, then the entry is added. This corrective action takes place on a node-by-node basis. If different nodes report different IP labels for the same IP address, verification catches these cases and reports an error. However, this error is unrelated to this corrective action. Inconsistent definitions of an IP label defined to HACMP are not corrected.
  • SSA concurrent volume groups need unique SSA node numbers

    If verification finds that the SSA node numbers are not unique, the corrective action changes the number of one of the nodes where the number is not unique. See the Installation Guide for more information on SSA configuration.

    Note: The SSA node number check is not performed for enhanced concurrent volume group that sit on the SSA hdisks. Disks that make up enhanced concurrent volume groups do not have any SSA-specific numbers assigned to them.

    A filesystem is not created on a node, although disks are available

    If a filesystem has not been created on one of the cluster nodes, but the volume group is available, the corrective action creates the mount point and filesystem. The filesystem must be part of a resource group for this action to take place. In addition, the following conditions must be met:

  • This is a shared volume group.
  • The volume group must already exist on at least one node.
  • One or more node(s) that participate in the resource group where the filesystem is defined must already have the filesystem created.
  • The filesystem must already exist within the logical volume on the volume group in such a way that simply re-importing that volume group would acquire the necessary filesystem information.
  • The mount point directory must already exist on the node where the filesystem does not exist.
  • The corrective action handles only those mount points that are on a shared volume group, such that exporting and re-importing of the volume group will acquire the missing filesystems available on that volume group. The volume group is varied off on the remote node(s), or the cluster is down and the volume group is then varied off if it is currently varied on, prior to executing this corrective action.

    If Mount All Filesystems is specified in the resource group, the node with the latest time stamp is used to compare the list of filesystems that exists on that node with other nodes in the cluster. If any node is missing a filesystem, then HACMP imports the filesystem.

    Disks are available, but the volume group has not been imported to a node

    If the disks are available but the volume group has not been imported to a node that participates in a resource group where the volume group is defined, then the corrective action imports the volume group.

    The corrective action gets the information regarding the disks and the volume group major number from a node that already has the volume group available. If the major number is unavailable on a node, the next available number is used. The corrective action is only performed under the following conditions:

  • The cluster is down.
  • The volume group is varied off if it is currently varied on.
  • The volume group is defined as a resource in a resource group.
  • The major number and associated PVIDS for the disks can be acquired from a cluster node that participates in the resource group where the volume group is defined.
  • Note: This functionality will not turn off the auto varyon flag if the volume group has the attribute set. A separate corrective action handles auto varyon.

    Shared volume groups configured as part of an HACMP resource group have their automatic varyon attribute set to Yes.

    If verification finds that a shared volume group inadvertently has the auto varyon attribute set to Yes on any node, the corrective action automatically sets the attribute to No on that node.

    Required /etc/services entries are missing on a node.

    If a required entry is commented out, missing, or invalid in /etc/services on a node, the corrective action adds it. Required entries are:

    Note: Starting with HACMP 5.3, the software no longer uses the clsmuxpd daemon; the SNMP server functions are included in the Cluster Manager—the clstrmgr daemon.
    Name
    Port
    Protocol
    topsvcs
    6178
    udp
    grpsvcs
    6179
    udp
    clinfo_deadman
    6176
    udp
    clcomd
    6191
    tcp


    Required HACMP snmpd entries are missing on a node

    If a required entry is commented out, missing, or invalid on a node, the corrective action adds it.

    Note: The default version of the snmpd.conf file for AIX 5L v.5.2 and v. 5.3 is snmpdv3.conf.

    In /etc/snmpdv3.conf or /etc/snmpd.conf, the required HACMP snmpd entry is:

    smux 	1.3.6.1.4.1.2.3.1.2.1.5 	"clsmuxpd_password" # HACMP clsmuxpd 
     

    In /etc snmpd.peers, the required HACMP snmpd entry is:

    clsmuxpd 1.3.6.1.4.1.2.3.1.2.1.5 "clsmuxpd_password" # HACMP clsmuxpd

    If changes are required to the /etc/snmpd.peers or snmpd[v3].conf file, HACMP creates a backup of the original file. A copy of the pre-existing version is saved prior to making modifications in the file /etc/snmpd.{peers | conf}.date. If a backup has already been made of the original file, then no additional backups are made.

    HACMP makes one backup per day for each snmpd configuration file. As a result, running verification a number of times in one day only produces one backup file for each file modified. If no configuration files are changed, HACMP does not make a backup.

    Required RSCT Network Options Settings

    HACMP requires that the nonlocsrcroute, ipsrcroutesend, ipsrcrouterecv, and ipsrcrouteforward network options be set to 1; these are set by RSCT’s topsvcs startup script. The corrective action run on inactive cluster nodes ensures these options are not disabled and are set correctly.

    Required HACMP Network Options Settings

    The corrective action ensures that the value of each of the following network options is consistent across all nodes in a running cluster (out-of-sync setting on any node is corrected):

  • tcp_pmtu_discover
  • udp_pmtu_discover
  • ipignoreredirects
  • Required routerevalidate Network Option Setting

    Changing hardware and IP addresses within HACMP changes and deletes routes. Because AIX 5L caches routes, setting the routerevalidate network option is required as follows:

    no -o routerevalidate=1  
    

    This setting ensures the maintenance of communication between cluster nodes. Verification run with corrective action automatically adjusts this setting for nodes in a running cluster.

    Note: No corrective actions take place during a dynamic reconfiguration event.

    clverify.log File

    During verification, HACMP collects configuration data from all the nodes as it runs through a series of checks. The verbose output is saved to the /var/hacmp/clverify/clverify.log file. The log file is rotated; this helps you and IBM Support obtain a history of what configuration changes have been made when you need to determine the root cause of a problem.

    Ten copies of the log are saved, as follow:

    drwxr-xr-x   3 root     system         1024 Mar 13 00:02 . 
    drwxr-xr-x   6 root     system          512 Mar 11 10:03 .. 
    -rw-------   1 root     system       165229 Mar 13 00:02 clverify.log 
    -rw-------   1 root     system       165261 Mar 12 17:31 clverify.log.1 
    -rw-------   1 root     system       165515 Mar 12 15:22 clverify.log.2 
    -rw-------   1 root     system       163883 Mar 12 15:04 clverify.log.3 
    -rw-------   1 root     system       164781 Mar 12 14:54 clverify.log.4 
    -rw-------   1 root     system       164459 Mar 12 14:36 clverify.log.5 
    -rw-------   1 root     system       160194 Mar 12 09:27 clverify.log.6 
    -rw-------   1 root     system       160410 Mar 12 09:20 clverify.log.7 
    -rw-------   1 root     system       160427 Mar 12 09:16 clverify.log.8 
    -rw-------   1 root     system       160211 Mar 12 09:06 clverify.log.9 
    

    You can redirect the clverify.log file to write to a different location using the standard HACMP logfile redirection mechanism. If the clverify.log file is redirected to a different location, the location of all the data saved in the subdirectories in the path /var/hacmp/clverify moves along with it. However, pre-existing data under /var/hacmp/clverify is not automatically moved if the clverify.log is redirected.

    For information on this procedure see the Steps for Redirecting a Cluster Log File section in Chapter 1: Using Cluster Log Files in the Troubleshooting Guide.

    Archived Configuration Databases

    All verification checks use HACMP Configuration Database data supplied by the common communication infrastructure, which is designed to provide efficient access to configuration databases from the other nodes. When the verification runs, it stores copies of the following:

  • All HACMP Configuration Databases (ODMs) used during verification
  • All AIX 5L ODMs (Custom Attributes, Device Definitions, and so forth) collected from the remote nodes.
  • The verification utility manages these files by storing the copies in various directories depending on the success or failure of the verification.

    Managing HACMP File Collections

    HACMP requires that event scripts, application scripts, AIX 5L files, and HACMP configuration files must be identical on each cluster node. The HACMP File Collections facility automatically synchronizes these files among cluster nodes and warns you if there are any unexpected results (for example, if one or more files in a collection has been deleted or has a length of zero on one or more cluster nodes).

    Default HACMP File Collections

    When you install HACMP, it sets up the following file collections:

  • Configuration_Files
  • HACMP_Files
  • HACMP Configuration_Files Collection

    Configuration_Files is a container for the following essential system files:

  • /etc/hosts
  • /etc/services
  • /etc/snmpd.conf
  • /etc/snmpdv3.conf
  • /etc/rc.net
  • /etc/inetd.conf
  • /usr/es/sbin/cluster/netmon.cf
  • /usr/es/sbin/cluster/etc/clhosts
  • /usr/es/sbin/cluster/etc/rhosts
  • /usr/es/sbin/cluster/etc/clinfo.rc
  • For more information on the /netmon.cf file configuration see the Planning Guide, and for information about the /clhosts file during an upgrade, see the Installation Guide.

    HACMP_Files Collection

    HACMP_Files is a container for user-configurable files in the HACMP configuration. HACMP uses this file collection to reference all of the user-configurable files in the HACMP Configuration Database classes.

    The HACMP_Files collection references the following Configuration Database fields:

    Configuration Database Class
    Configuration Database
    Field
    Description
    HACMPevent:
    notify
    Event notify script
    HACMPevent:
    pre
    Pre-event script
    HACMPevent:
    post
    Post-event script
    HACMPevent:
    recv
    Recovery script
    HACMPserver:
    start
    Application server start script
    HACMPserver:
    stop
    Application server stop script
    HACMPmonitor:
    value, when name=NOTIFY_METHOD
    Application monitor notify script
    HACMPmonitor:
    value, when name=CLEANUP_METHOD
    Application monitor cleanup script
    HACMPmonitor:
    value, when name=RESTART_METHOD
    Application monitor restart script
    HACMPpager:
    filename
    Pager text message file
    HACMPsna:
    app_svc_file
    SNA link start and stop scripts
    HACMPx25:
    app_svc_file
    X.25 link start and stop scripts
    HACMPtape:
    start_script_name
    Tape start script
    HACMPtape:
    stop_script_name
    Tape stop script
    HACMPude:
    recovery_prog_path
    User-defined event recovery program
    HACMPcustom:
    value
    Custom snapshot method script

    Note: This collection excludes the HACMPevent:cmd event script. Do not modify or rename the HACMP event script files. Also, do not include HACMP event scripts in any HACMP file collection.
    Note: When copying a file to a remote node, the local node’s owner, group, modification time stamp, and permission settings are maintained on the remote node. That is, the remote node inherits these settings from the local node.
    Permissions for all files in the HACMP_Files collection are set to execute, which helps to prevent problems if you have not yet set execute permission for scripts on all nodes. (This is often the cause of an event failure.)

    You cannot rename or delete the HACMP_Files collection. You cannot add or remove files from the collection.

    You can add a file that is already included in the HACMP_Files collection (for example, an application start script) to another file collection. However, in any other case, a file can only be included in one file collection and you receive the following error message, where XXX _Files is the name of the previously defined collection:

    This file is already included in the <XXX_Files> collection).

    You can add and remove files or delete the Configuration_Files collection.

    Neither of these file collections is enabled by default. If you prefer to include some user-configurable files in another collection instead of propagating all of them, leave the HACMP_Files collection disabled.

    Options for Propagating an HACMP File Collection

    Propagating a file collection copies the files in a file collection from the current node to the other cluster nodes. Use one of the following methods to propagate an HACMP file collection:

  • Propagate the file collection at any time manually. You can propagate files in a file collection from the HACMP File Collection SMIT menu on the local node (the node that has the files you want to propagate).
  • Set the option to propagate the file collection whenever cluster verification and synchronization is executed. The node from which verification is run is the propagation node. (This is set to No by default.)
  • Set the option to propagate the file collection automatically after a change to one of the files in the collection. HACMP checks the file collection status on each node (every 10 minutes by default) and propagates any changes. (This is set to No by default.)
  • One timer is set for all file collections. You can change the timer. The maximum is 1440 minutes (24 hours) and the minimum is 10 minutes.

    You can set up and change file collections on a running cluster. However, note that if you add a node dynamically, the file collection on that node may have files that are not in sync with the files on the other cluster nodes. If the file collection on the node being added is set for automatic propagation upon cluster verification and synchronization, the files on the node just added are updated properly. If this flag is not set, you must manually run the file collection propagation from one of the other nodes.

    Backup Files and Error Handling

    During file propagation, before HACMP copies a file to a remote node, the remote node makes a backup copy of the original file if it exists and its size is greater than zero, with the original time stamp. The copy is kept in the /var/hacmp/filebackup/ directory.

    Only the most recent backup is kept for each file that is overwritten. When another propagation replaces the file, the new backup overwrites the old one. You cannot customize these backups. If you need to use a backup file, you must manually copy the file back to its original location.

    If the local (propagation) node has a zero-length or non-existent file in a file collection, then an error message is logged and the file is not copied during the propagation process. The zero-length or non-existent file remains until you run a manual propagation from another node, or when an automatic propagation from another node sees a change to the file and propagates it.

    All errors during file propagation are logged to SMIT if the propagation happens during a cluster verification or synchronization or manual propagation. Errors are also written to the /var/hacmp/log/clutils.log file.

    It is your responsibility to ensure that the file on the local (propagation) node is the latest copy and is not corrupt. HACMP only checks for the existence and length of the file on this node.

    Tracking HACMP File Collection Operations

    Whenever the HACMP File Collections utility replaces a file on a node, the following information about it is saved in the /var/hacmp/log/clutils.log file:

  • Date and time of replacement
  • Propagation type
  • File name and file collection name
  • Name of the remote and local nodes.
  • For example:

    Wed Jan 07 11:08:55 2006: clfileprop: Manual file collection propagation 
    called. 
    Wed Jan 07 11:08:55 2006: clfileprop: The following file collections 
    will be processed: 
    Wed Jan 07 11:08:55 2006: clfileprop: Test_Files Wed Jan 07 11:08:55 
    2004: clfileprop:  
    Wed Jan 07 11:08:55 2006: clfileprop: Starting file propagation to 
    remote node riga. 
    Wed Jan 07 11:08:55 2006: clfileprop: Successfully propagated file 
    /tmp/kris to node riga. 
    Wed Jan 07 11:08:55 2006: clfileprop: Successfully propagated file 
    /tmp/k2 to node riga. 
    Wed Jan 07 11:08:55 2006: clfileprop: Total number of files propagated 
    to node riga: 2 
    

    Using SMIT to Manage HACMP File Collections

    The SMIT interface enables you to perform the following actions:

  • Creating an HACMP File Collection
  • Setting the Automatic Timer for File Collections
  • Changing a File Collection
  • Removing Files from a File Collection
  • Removing a File Collection
  • Verifying and Synchronizing File Collections.
  • Creating an HACMP File Collection

    To create an HACMP File Collection, at least one working IP communications path defined to HACMP must exist between the node running the file propagation and each remote node defined to the cluster. The clcomd daemon must be running on all nodes.

    To create an HACMP file collection:

      1. Enter smit hacmp
      2. In SMIT, select System Management (C-SPOC) > HACMP File Collection Management > Manage File Collections > Add a File Collection and press Enter.
      3. Enter field values as follows:
    File Collection name
    The name can include alphabetic and numeric characters and underscores. Use no more than 32 characters. Do not use reserved names. For a list of reserved names, see List of Reserved Words.
    File Collection Description
    A description of the file collection. Use no more than 100 characters.
    Propagate files during cluster synchronization?
    No is the default. If you select Yes, HACMP propagates files listed in the current collection before every cluster verification and synchronization process.
    Propagate changes to files automatically?
    No is the default. If you select Yes, HACMP propagates files listed in the current collection across the cluster when a change is detected on any file in the collection. HACMP checks for changes every ten minutes by default. You can adjust the timer on the Manage File Collections panel.
      4. In SMIT, select HACMP File Collection Management > Manage Files in File Collections > Add Files to a File Collection and press Enter.
      5. Select the File Collection where you want to add the files.
      6. Enter the file names in the New Files field:
    File Collection name
    The name of the selected file collection is displayed.
    File Collection Description
    The current description is displayed.
    Propagate files during cluster synchronization?
    The current choice is displayed.
    Propagate changes to files automatically?
    The current choice is displayed.
    Collection Files
    Any files already in the collection are displayed.
    New Files
    Add the full pathname of the new file. The name must begin with a forward slash. A file cannot be a symbolic link, a directory, a pipe, a socket, or any file in /dev or /proc. It cannot begin with /etc/objrepos/*or /etc/es/objrepos/*. The file cannot be in another file collection (except for HACMP_Files).
      7. When you finish creating the file collection(s), synchronize the cluster using SMIT Extended Configuration > Extended Verification and Synchronization.

    Setting the Automatic Timer for File Collections

    The default timer for automatic checks on file collections is ten minutes. You can change the amount of time as needed.

    Note: The periodic check for changes to a file in a file collection runs on each node. However, these checks are not coordinated to run simultaneously on every node. Make changes to a file only on one node within the general time limit.

    To customize the file collection time interval:

      1. Enter smit hacmp
      2. In SMIT, select System Management (C-SPOC) > HACMP File Collection Management > Manage File Collections > Change/Show Automatic Update Time and press Enter.
      3. Enter the amount of time (in minutes) that you want HACMP to pause before performing file collection synchronization. The maximum is 1440 minutes (24 hours) and the minimum is 10 minutes. Press Enter.
      4. Synchronize the cluster using SMIT (Extended Configuration > Extended Verification and Synchronization).

    Changing a File Collection

    You can modify a file collection as follows:

  • Change the attributes of a file collection (name, description, propagation parameters).
  • Add or remove files in the collection.
  • Remove a file collection.
  • Change the automatic timer for all file collections, as described in Setting the Automatic Timer for File Collections.
  • To change an attribute of a particular file collection:

      1. Enter smit hacmp
      2. In SMIT, select System Management (C-SPOC) > HACMP File Collection Management > Manage File Collections > Change/Show a File Collection and press Enter.
      3. Select the file collection.
      4. Change the name, description, and synchronization parameters on this panel:
    File Collection name
    The current name appears here.
    New File Collection name
    Enter the new name.
    Propagate files during cluster synchronization?
    No is the default. If you select Yes, HACMP propagates files listed in the current collection before every cluster verification and synchronization process.
    Propagate changes to files automatically?
    No is the default. If you select Yes, HACMP propagates files listed in the current collection across the cluster automatically when a change is detected on any file in the collection. HACMP checks for changes every ten minutes by default. You can adjust the timer on the Manage File Collections panel.
    Collection Files
    Any files already in the collection are displayed. Press F4 to see the list. You cannot change this field.
      5. Synchronize the cluster. In SMIT, select Extended Configuration > Extended Verification and Synchronization and press Enter.

    Removing Files from a File Collection

    To remove files from a file collection:

      1. Enter smit hacmp
      2. In SMIT, select System Management (C-SPOC) > HACMP File Collection Management > Manage Files in File Collections > Remove Files from a File Collection and press Enter.
      3. Select the File Collection from which you want to remove the files.
      4. Select one or more files to remove from the file collection and press Enter.
      5. Synchronize the cluster to update Configuration Databases. In SMIT, select Extended Configuration > Extended Verification and Synchronization and press Enter.

    Removing a File Collection

    To remove a file collection from the HACMP configuration:

      1. Enter smit hacmp
      2. In SMIT, select System Management (C-SPOC) > HACMP File Collection Management > Manage File Collections > Remove a File Collection and press Enter.
      3. Select the file collection to remove and press Enter.
      4. SMIT displays Are you sure? Press Enter again.
      5. In SMIT, select Extended Configuration > Extended Verification and Synchronization and press Enter to synchronize the cluster.

    Verifying and Synchronizing File Collections

    If file collections exist, HACMP checks and propagates the file collections with the flag set to yes for “propagate during verify and synchronize” before running the rest of the cluster verification and synchronization process. Before the files in each collection are propagated to all the cluster nodes, HACMP performs the following verification checks:

  • Verifies that no files are listed twice in any file collection. If a file is listed twice, a warning displays and verification continues.
  • Verifies that each file listed in each collection is a real file on the local node (the node from which cluster synchronization is being run). A file cannot be a symbolic link, a directory, a pipe, a socket, or any file in /dev or /proc. It cannot begin with /etc/objrepos/* or /etc/es/objrepos/*. If a file in a file collection is one of these, HACMP displays an error and verification fails.
  • Verifies that each file exists on the local node and has a file size greater than zero. If a file does not exist on the local node or has a size of zero, HACMP displays an error and verification fails.
  • Verifies that each file has a full path name that begins with a forward slash. If a file's pathname does not begin with a forward slash, HACMP displays an error and verification fails.
  • Adding a Custom Verification Method

    You may want to add a custom verification method to check for a particular issue on your cluster. For example, you could add a script to check for the version of an application. You could include an error message for display and to write to the clverify.log file.

    Note: During node startup, automatic verification and synchronization does not include any custom verification methods.

    To add a custom verification method:

      1. Enter smit hacmp
      2. In SMIT, select Problem Determination Tools > HACMP Verification > Configure Custom Verification Method > Add a Custom Verification Method and press Enter.
      3. Enter the field values as follows:
    Verification Method Name
    Enter a name for the verification method. Method names can be up to 32 alphanumeric characters. Do not use the word “all,” as this is a keyword indicating that all custom verification methods are to be run.
    Verification Method Description
    Enter a short description of the verification method.
    Verification Method
    Enter a filename for the verification method (executable). The method name can be different from the filename.
      4. Press Enter. The method is added to the list of verification methods you can use when you select the HACMP Verification option under the Problem Determination Tools menu.

    Changing or Showing a Custom Verification Method

    To change or show a custom verification method:

      1. Enter smit hacmp
      2. From the Problem Determination Tools menu, select HACMP Verification > Define Custom Verification Method > Change/Show a Custom Verification Method and press Enter. SMIT displays a popup list of verification methods
      3. Select the verification method you want to change or show and press Enter.
      4. Enter a new name, new verification method description, and/or new filename as desired for the verification method and press Enter.

    Removing a Custom Verification Method

    To remove a custom verification method:

      1. Enter smit hacmp
      2. In SMIT, select Problem Determination Tools menu, select HACMP Verification > Define Custom Verification Method > Remove a Custom Verification Method and press Enter. SMIT displays a popup list of custom verification methods.
      3. Select the verification method you want to remove and press Enter. SMIT prompts you to confirm that you want to remove the specified verification method.
      4. Press Enter to remove the verification method.

    List of Reserved Words

    Do not use the following words as names in a cluster. However, you may use these words when combined with numerals or another word (for example, my_network or rs232_02).

    adapter
    false
    nim
    socc
    alias
    FBHPN
    node
    subnet
    all
    fcs
    nodename
    tmscsi
    ALL
    fddi
    OAAN
    tmssa
    ANY
    FNPN
    OFAN
    token
    atm
    fscsi
    OHN
    true
    BO
    FUDNP
    OTHER
    tty
    cluster
    grep
    OUDP
    volume
    command
    group
    private
    vpath
    CROSS_SITE_RG_
    MOVE
    hps
    public
    vscsi
    custom
    ib
    resource
    XD_data
    daemon
    ip
    RESTORE
    XD_ip
    disk
    IP
    root
    XD_rs232
    diskhb
    name
    rs232
     
    ether
    network
    serial
     
    event
    NFB
    slip
     


    PreviousNextIndex