![]() ![]() ![]() |
Chapter 7: HACMP Configuration Process and Facilities
This chapter provides an overview of the HACMP cluster configuration process. It covers the following topics:
This chapter also provides an overview of the following administrative tools supplied with the HACMP software:
Information You Provide to HACMP
Prior to configuring a cluster, make sure the building blocks are planned and configured, and the initial communication path exists for HACMP to reach each node. This section covers the basic tasks you need to perform to configure a cluster.
Information on Physical Configuration of a Cluster
Physical configuration of a cluster consists of the following planning and configuration tasks:
Ensure the TCP/IP network support for the cluster. Ensure the point-to-point network support for the cluster. Ensure the heartbeating support for the cluster. Configure the shared disk devices for the cluster. Configure the shared volume groups for the cluster. Consider the mission-critical applications for which you are using HACMP. Also, consider application server and what type of resource group management is best for each application. Examine issues relating to HACMP clients. Ensure physical redundancy by using multiple circuits or uninterruptable power supplies, redundant physical network interface cards, multiple networks to connect nodes and disk mirroring. These tasks are described in detail in the Planning Guide.
AIX 5L Configuration Information
Cluster components must be properly configured on the AIX level. For this task, ensure that:
Basic communication to cluster nodes exists. Volume groups, logical volumes, mirroring and filesystems are configured and set up. To ensure logical redundancy, consider different types of resource groups, and plan how you will group your resources in resource groups. For the specifics of configuring volume groups, logical volumes and filesystems, refer to the AIX manuals and to the Installation Guide.
Establishing the Initial Communication Path
The initial communication path is a path to a node that you are adding to a cluster. To establish the initial communication path, you provide the name of the node, or other information that can serve as the name of the node.
In general, a node name and a hostname can be the same. When configuring a new node, you can enter any of the following denominations that will serve as an initial communication path to a node:
An IP address of a physical network interface card (NIC) on that node, such as 1.2.3.4. In this case, the address is used as a communication path for contacting a node. An IP label associated with an IP address of a NIC on that node, such as servername. In this case, the name is used to determine the communication path for contacting a node, based on the assumption that the local TCP/IP configuration (Domain Nameserver or Hosts Table) supplies domain qualifiers and resolves the IP label to an IP address. A Fully Qualified Domain Name (FQDN), such as "servername.thecompanyname.com". In this case the communication path is "servername.thecompanyname.com", based on the assumption that the local TCP/IP configuration (Domain Nameserver or Hosts Table) supplies domain qualifiers and resolves the IP label to an IP address. When you enter any of these names, HACMP ensures unique name resolution and uses the hostname as a node name, unless you explicitly specify otherwise.
Note: In HACMP, node names and hostnames have to be different in some cases where the application you are using requires that the AIX “hostname attribute” moves with the application in the case of a cluster component failure. This procedure is done through setting up special event scripts.
If the nodes and physical network interface cards have been properly configured to AIX, HACMP can use this information to assist you in the configuration process, by running the automatic discovery process discussed in the following section.
Information Discovered by HACMP
You can define the basic cluster components in just a few steps. To assist you in the cluster configuration, HACMP can automatically retrieve the information necessary for configuration from each node.
Note: For easier and faster cluster configuration, you can also use a cluster configuration assistant. For more information, see Two-Node Cluster Configuration Assistant.
For the automatic discovery process to work, the following conditions should be met in HACMP:
You have previously configured the physical components and performed all the necessary AIX configurations. Working communications paths exist to each node. This information will be used to automatically configure the cluster TCP/IP topology when the standard configuration path is used. Once these tasks are done, HACMP automatically discovers predefined physical components within the cluster, and selects default behaviors. In addition, HACMP performs discovery of cluster information if there are any changes made during the configuration process.
Running discovery retrieves current AIX configuration information from all cluster nodes. This information appears in picklists to help you make accurate selections of existing components.
The HACMP automatic discovery process is easy, fast, and does not place a "waiting" burden on you as the cluster administrator.
Cluster Configuration Options: Standard and Extended
In this section, the configuration process is significantly simplified. While the details of the configuration process are covered in the Administration Guide, this section provides a brief overview of two ways to configure an HACMP cluster.
Configuring an HACMP Cluster Using the Standard Configuration Path
You can add the basic components of a cluster to the HACMP configuration database in a few steps. The standard cluster configuration path simplifies and speeds up the configuration process, because HACMP automatically launches discovery to collect the information and to select default behaviors.
If you use this path:
Automatic discovery of cluster information runs by default. Before starting the HACMP configuration process, you need to configure network interfaces/devices in AIX. In HACMP, you establish initial communication paths to other nodes. Once this is done, HACMP collects this information and automatically configures the cluster nodes and networks based on physical connectivity. All discovered networks are added to the cluster configuration. IP aliasing is used as the default mechanism for binding IP labels/addresses to network interfaces. You can configure the most common types of resources. However, customizing of resource group fallover and fallback behavior is limited. Configuring an HACMP Cluster Using the Extended Configuration Path
In order to configure the less common cluster elements, or if connectivity to each of the cluster nodes is not established, you can manually enter the information in a way similar to previous releases of the HACMP software.
When using the HACMP extended configuration SMIT paths, if any components are on remote nodes, you must manually initiate the discovery of cluster information. That is, discovery is optional (rather than automatic, as it is when using the standard HACMP configuration SMIT path).
Using the options under the extended configuration menu, you can add the basic components of a cluster to the HACMP configuration database, as well as many additional types of resources. Use the extended configuration path to customize the cluster for all the components, policies, and options that are not included in the standard configuration menus.
Overview: HACMP Administrative Facilities
The HACMP software provides you with the following administrative facilities:
Cluster Security
All communication between nodes is sent through the Cluster Communications daemon, clcomd, which runs on each node. The clcomd daemon manages the connection authentication between nodes and any message authentication or encryption configured. HACMP’s Cluster Communications daemon uses the trusted /usr/es/sbin/cluster/etc/rhosts file, and removes reliance on an /.rhosts file. In HACMP 5.2 and up, the daemon provides support for message authentication and encryption.
Installation, Configuration and Management Tools
HACMP includes the tools described in the following sections for installing, configuring, and managing clusters.
Two-Node Cluster Configuration Assistant
HACMP provides the Two-Node Cluster Configuration Assistant to simplify the process for configuring a basic two-node cluster. The wizard-like application requires the minimum information to define an HACMP cluster and uses discovery to complete the cluster configuration. The application is designed for users with little knowledge of HACMP who want to quickly set up a basic HACMP configuration. The underlying AIX configuration must be in place before you run the Assistant.
Smart Assists for Integrating Specific Applications with HACMP
The Smart Assist for a given application examines the configuration on the system to determine the resources HACMP needs to monitor (Service IP label, volume groups). The Smart Assist then configures one or more resource groups to make applications and their resources highly available.
The Smart Assist takes the following actions:
Discovers the installation of the application and if necessary the currently configured resources such as service IP address, file systems and volume groups Provides a SMIT interface for getting or changing configuration information from the user including a new service IP address Defines the application to HACMP and supplies custom start and stop scripts for it Supplies an application monitor for the application Configures a resource group to contain: Primary and takeover nodes The application The service IP address Shared volume groups. Configures resource group temporal and location dependencies, should the application solution require this Specifies files that need to be synchronized using the HACMP File Collections feature Modifies previously configured applications as necessary Verifies the configuration Tests the application's cluster configuration. Supported Applications
HACMP 5.4 supplies Smart Assists for the following applications and configuration models:
DB2 DB2 - Hot Standby DB2 - Mutual Takeover WebSphere 6.0 WebSphere Application Server 6.0 WebSphere Cluster Transaction Log recovery Deployment Manager Tivoli Directory Server IBM HTTP Server Oracle 10G General Application Smart Assist
The General Application Smart Assist helps users to configure installed applications that do not have their own Smart Assist. The user supplies some basic information such as:
Primary node - by default, the local node Takeover node(s) - by default, all configured nodes except the local node Application Name Application Start Script Application Stop Script Service IP label. The General Smart Assist then completes the cluster configuration in much the same way as the Two-Node Cluster Configuration Assistant (but the configuration can have more than two nodes). The user can modify, test, or remove the application when using the General Application Smart Assist.
Smart Assist API
HACMP 5.4 includes a Smart Assist Developers Guide so that OEMs can develop Smart Assists to integrate their own applications with HACMP.
Planning Worksheets
Along with your HACMP software and documentation set, you have two types of worksheets to aid in planning your cluster topology and resource configuration: online or paper.
Online Planning Worksheets
HACMP provides the Online Planning Worksheets application, which enables you to:
Plan a cluster. Create a cluster definition file. Examine the configuration for an HACMP cluster. You can review information about a cluster configuration in an easy-to-view format for use in testing and troubleshooting situations. After you save an HACMP cluster definition file, you can open that file in an XML editor or in Online Planning Worksheets running on a node, a laptop, or other computer running the application. This enables you to examine the cluster definition on a non-cluster node or share the file with a colleague.
Besides providing an easy-to-view format, the XML structure enables your configuration information to be quickly converted from one format to another, which eases data exchange between applications. For example, you can save a cluster snapshot and then import it into your OLPW configuration.
For more information on the requirements and instructions for using the Online Planning Worksheets application, see the Planning Guide.
Paper Worksheets
The HACMP documentation includes a set of planning worksheets to guide your entire cluster planning process, from cluster topology to resource groups and application servers. You can use these worksheets as guidelines when installing and configuring your cluster. You may find these paper worksheets useful in the beginning stages of planning. The planning worksheets are found in the Planning Guide.
Starting, Stopping and Restarting Cluster Services
Once you install HACMP and configure your cluster, you can start cluster services. In HACMP 5.4, your options for starting, stopping and restarting cluster services have been streamlined and improved. HACMP handles your requests to start and stop cluster services without disrupting your applications, allowing you to have full control.
In HACMP 5.4, you can:
Start and restart cluster services. When you start cluster services, or restart them after a shutdown, HACMP by default automatically activates the resources according to how you defined them, taking into consideration application dependencies, application start and stop scripts, dynamic attributes and other parameters. That is, HACMP automatically manages (and activates, if needed) resource groups and applications in them. You can also start HACMP cluster services and tell it not to start up any resource groups (and applications) automatically for you. If an application is already running, you no longer need to stop it before starting the cluster services. HACMP relies on the application monitor and application startup script to verify whether it needs to start the application for you or the application is already running (HACMP attempts not to start a second instance of the application).
Note: HACMP relies on the configured application monitors to detect application failures. Application monitors must be configured for HACMP to detect a running cluster during startup so that it does not start duplicate instances of the application. The alternative approach is to run scripts that ensure duplicate instances of the application server are not started.
Shut down the cluster services. During an HACMP shutdown, you may select one of the following three actions for the resource groups: Bring Offline. Move to other node(s). Place resource groups in an UNMANAGED state. The Cluster Manager "remembers" the state of all the nodes and responds appropriately when users attempt to restart the nodes.
For information on how to configure application monitors as well as HACMP cluster startup and shutdown options, see the Administration Guide
SMIT Interface
You can use the SMIT panels supplied with the HACMP software to perform the following tasks:
Configure clusters, nodes, networks, resources, and events. Capture and restore snapshots of cluster configurations. Read log files. Diagnose cluster problems. Manage a cluster using the C-SPOC utility. Perform resource group management tasks. Configure Automatic Error Notification. Perform dynamic adapter swap. Configure cluster performance tuning. Configure custom disk methods. Web-Based SMIT Interface
WebSMIT is a Web-based user interface that provides consolidated access to the SMIT functions of configuration and management, display of interactive cluster status, and the HACMP documentation. Starting with HACMP 5.4, you can use WebSMIT to navigate and view the status of the running cluster, configure and manage the cluster, and view graphical displays of sites, networks, nodes and resource group dependencies.
The WebSMIT interface is similar to the ASCII SMIT interface. Because WebSMIT runs in a Web browser, it can be accessed from any platform.
To use the WebSMIT interface, you must configure and run a Web server process on at least one of the cluster node(s) to be administered. The /usr/es/sbin/cluster/wsm/README file contains information on basic Web server configuration, the default security mechanisms in place when HACMP is installed, and the configuration files available for customization.
For more information on installing and configuring WebSMIT, see the Installation Guide.
For more information on using WebSMIT, see Using WebSMIT for Configuring, Managing, and Monitoring a Cluster in Chapter 2: Administering an HACMP Cluster using WebSMIT in the Administration Guide.
Cluster Status Display Linked to Management Functions
When using the WebSMIT interface to see the cluster status display, you have links to the related WebSMIT management functions. Therefore, HACMP provides a consolidated user interface for cluster status with management capabilities.
For example, the node status display has a link to (among other options) the SMIT panels for starting and stopping Cluster Services. Now you can manipulate entities in the status display interactively rather than having to go to an ASCII SMIT interface on the node.
Specifying Read-Only User Access
In HACMP 5.4, you can specify a group of users that have read-only access to WebSMIT. Users with read-only access may view the configuration and cluster status, and may navigate through the WebSMIT screens, but cannot execute commands or make any changes to the configuration. For more information about configuring read-only access to WebSMIT, see the section on WebSMIT Security Considerations and WebSMIT Prerequisites in the Installation Guide.
HACMP System Management with C-SPOC
To facilitate management of a cluster, HACMP provides a way to run commands from one node and then verify and synchronize the changes to all the other nodes. You can use the HACMP System Management tool, the Cluster Single Point of Control (C-SPOC) to add users, files, and hardware automatically without stopping mission-critical jobs.
C-SPOC lets you perform the following tasks:
Start/Stop HACMP Services HACMP Communication Interface Management HACMP Resource Group and Application Management HACMP File Collection Management HACMP Log Viewing and Management HACMP Security and Users Management HACMP Logical Volume Management HACMP Concurrent Logical Volume Management HACMP Physical Volume Management GPFS Filesystem Support Open a SMIT Session on a Node. The C-SPOC utility simplifies maintenance of shared LVM components in clusters of up to 32 nodes. C-SPOC commands provide comparable functions in a cluster environment to the standard AIX commands that work on a single node. By automating repetitive tasks, C-SPOC eliminates a potential source of errors, and speeds up the process.
Without C-SPOC functionality, the system administrator must execute administrative tasks individually on each cluster node. For example, to add a user you usually must perform this task on each cluster node. Using the C-SPOC utility, a command executed on one node is also executed on other cluster nodes. Thus C-SPOC minimizes administrative overhead and reduces the possibility of inconsistent node states. Using C-SPOC, you issue a C-SPOC command once on a single node, and the user is added to all specified cluster nodes.
C-SPOC also makes managing logical volume components and controlling cluster services more efficient. You can use the C-SPOC utility to start or stop cluster services on nodes from a single node. The following figure illustrates a two-node configuration and the interaction of commands, scripts, and nodes when starting cluster services from a single cluster node. Note the prefix cl_ begins all C-SPOC commands.
![]()
C-SPOC provides this functionality through its own set of cluster administration commands, accessible through SMIT menus and panels. To use C-SPOC, select the Cluster System Management option from the HACMP SMIT menu.
Cluster Snapshot Utility
The Cluster Snapshot utility allows you to save cluster configurations you would like to restore later. You also can save additional system and cluster information that can be useful for diagnosing system or cluster configuration problems. You can create your own custom snapshot methods to store additional information about your cluster.
A cluster snapshot lets you skip saving log files in the snapshot. Cluster snapshots are used for recording the cluster configuration information, whereas cluster logs only record the operation of the cluster and not the configuration information. By default, HACMP no longer collects cluster log files when you create the cluster snapshot, although you can still specify collecting the logs in SMIT. Skipping the logs collection speeds up the running time of the snapshot utility and reduces the size of the snapshot.
Customized Event Processing
You can define multiple pre- and post-events to tailor your event processing for your site’s unique needs. For more information about writing your own scripts for pre- and post-events, see the Administration Guide.
Resource Group Management Utility
The resource group management utility, clRGmove, provides a means for managing resource groups in the cluster, and enhances failure recovery capabilities of HACMP. It allows you to move any type of resource group (along with its resources—IP addresses, applications, and disks) online, offline or to another node, without stopping cluster services.
Resource group management helps you to manage your cluster more effectively, giving you better use of your cluster hardware resources. Resource group management also lets you perform selective maintenance without rebooting the cluster or disturbing operational nodes. For instance, you can use this utility to free the node of any resource groups to perform system maintenance on a particular cluster node.
Using the resource group management utility does not affect other resource groups currently owned by a node. The current node releases it, and the destination node acquires it just as it would during a node fallover. (If you have location dependencies configured between resource groups, HACMP verifies and ensures that they are honored).
Use resource group management to:
Temporarily move a non-concurrent resource group from one node to another (and from one site to another) in a working cluster. Bring a resource group online or offline on one or all nodes in the cluster. When you move a group, it stays on the node to which it was moved, until you move it again.
If you move a group that has Fallback to Highest Priority Node fallback policy, the group falls back or returns to its “new” temporary highest priority node (in cases when HACMP has to recover it on other nodes during subsequent cluster events).
If you want to move the group again, HACMP intelligently informs you (in the picklists with destination nodes) if it finds that a node with a higher priority exists that can host a group. You can always choose to move the group to that node.
HACMP File Collection Management
Like volume groups, certain files located on each cluster node need to be kept in sync in order for HACMP (and other applications) to behave correctly. Such files include event scripts, application scripts, and some AIX and HACMP configuration files.
HACMP File Collection management provides an easy way to request that a list of files be kept in sync across the cluster. Using HACMP file collection, you do not have to manually copy an updated file to every cluster node, verify that the file is properly copied, and confirm that each node has the same version of it.
Also, if one or more of these files is inadvertently deleted or damaged on one or more cluster nodes, it can take time and effort to determine the problem. Using HACMP file collection, this scenario is mitigated. HACMP detects when a file in a file collection is deleted or if the file size is changed to zero, and logs a message to inform the administrator.
Two predefined HACMP file collections are installed by default:
Configuration_Files. A container for essential system files, such as /etc/hosts and /etc/services. HACMP_Files. A container for all the user-configurable files in the HACMP configuration. This is a special file collection that the underlying file collection propagation utility uses to reference all the user-configurable files in the HACMP configuration database (ODM) classes. For a complete list of configuration files and user-configurable HACMP files, see the Installation Guide.
For information on configuring file collections in SMIT, see the Administration Guide.
Monitoring Tools
HACMP supplies the monitoring tools described in the following sections:
Many of the utilities described in this section use the clhosts file to enable communication among HACMP cluster nodes. For information about the clhosts file, see Understanding the clhosts File. For detailed information about using each of these monitoring utilities, see the Administration Guide.
Cluster Manager
The Cluster Manager provides SNMP information and traps for SNMP clients. It gathers cluster information relative to cluster state changes of nodes and interfaces. Cluster information can be retrieved using SNMP commands or by SNMP-based client programs such as HATivoli. For more information, see the section Cluster Manager and SNMP Monitoring Programs in Chapter 4: HACMP Cluster Hardware and Software.
Cluster Information Program
The Cluster Information Program (Clinfo) gathers cluster information from SNMP and enables clients communicating with this program to be aware of changes in a cluster state. For information about Clinfo, see the section Cluster Information Program in Chapter 4: HACMP Cluster Hardware and Software.
Application Monitoring
Application Monitoring enables you to configure multiple monitors for an application server to monitor specific applications and processes; and define action to take upon detection of an unexpected termination of a process or other application failures. See the section Application Monitors in Chapter 5: Ensuring Application Availability.
Show Cluster Applications SMIT Option
The Show Cluster Applications SMIT option provides an application-centric view of the cluster configuration. This utility displays existing interfaces and information in an “application down” type of view. You can access it from both ASCII SMIT and WebSMIT.
Cluster Status Utility (clstat)
The Cluster Status utility, /usr/es/sbin/cluster/clstat, monitors cluster status. The utility reports the status of key cluster components: the cluster itself, the nodes in the cluster, the network interfaces connected to the nodes, and the resource groups on each node. It reports whether the cluster is up, down, or unstable. It also reports whether a node is up, down, joining, leaving, or reconfiguring, and the number of nodes in the cluster. The clstat utility provides ASCII, Motif, X Windows, and HTML interfaces. You can run clstat from either ASCII SMIT or WebSMIT.
For the cluster as a whole, clstat indicates the cluster state and the number of cluster nodes. For each node, clstat displays the IP label and address of each service network interface attached to the node, and whether that interface is up or down. clstat also displays resource group state.
You can view cluster status information in ASCII or X Window display mode or through a web browser.
Note: The clstat utility uses the Clinfo API to retrieve information about the cluster. Therefore, ensure Clinfo is running on the client system to view the clstat display.
HAView Cluster Monitoring Utility
The HAView utility extends Tivoli NetView services so you can monitor HACMP clusters and cluster components across a network from a single node. Using HAView, you can also view the full cluster event history in the /usr/es/sbin/cluster/history/cluster.mmddyyyy file.
The HAView cluster monitoring utility makes use of the Tivoli TME 10 NetView for AIX graphical interface to provide a set of visual maps and submaps of HACMP clusters. HAView extends NetView services to allow you to monitor HACMP clusters and cluster components across a network from a single node. HAView creates symbols that reflect the state of all nodes, networks, and network interface objects associated in a cluster. You can also monitor resource groups and their resources through HAView.
HAView monitors cluster status using the Simple Network Management Protocol (SNMP). It combines periodic polling and event notification through traps to retrieve cluster topology and state changes from the HACMP Management Information Base (MIB). The MIB is maintained by the Cluster Manager, the HACMP management agent. HAView allows you to:
View maps and submaps of cluster symbols showing the location and status of nodes, networks, and addresses, and monitor resource groups and resources. View detailed information in NetView dialog boxes about a cluster, network, IP address, and cluster events. View cluster event history using the HACMP Event Browser. View node event history using the Cluster Event Log. Open a SMIT HACMP session for an active node and perform cluster administration functions from within HAView, using the HAView Cluster Administration facility. Cluster Monitoring and Administration with Tivoli Framework
The Tivoli Framework enterprise management system enables you to monitor the state of an HACMP cluster and its components and perform cluster administration tasks. Using various windows of the Tivoli Desktop, you can monitor the following aspects of your cluster:
Cluster state and substate Configured networks and network state Participating nodes and node state Configured resource groups and resource group state Resource group location. In addition, you can perform the following cluster administration tasks through Tivoli:
Start cluster services on specified nodes. Stop cluster services on specified nodes. Bring a resource group online. Bring a resource group offline. Move a resource group to another node. For complete information about installing, configuring, and using the cluster monitoring through Tivoli functionality, see the Administration Guide.
Application Availability Analysis Tool
The Application Availability Analysis tool measures uptime statistics for applications with application servers defined to HACMP. The HACMP software collects, time-stamps, and logs extensive information about the applications you choose to monitor with this tool. Using SMIT, you can select a time period and the tool displays uptime and downtime statistics for a given application during that period.
Persistent Node IP Labels
A persistent node IP label is a useful administrative “tool” that lets you contact a node even if the HACMP cluster services are down on that node. (In this case, HACMP attempts to put an IP address on the node). Assigning a persistent node IP label to a network on a node allows you to have a node-bound IP address on a cluster network that you can use for administrative purposes to access a specific node in the cluster.
A persistent node IP label is an IP alias that can be assigned to a specific node on a cluster network and that:
Always stays on the same node (is node-bound) Co-exists on a network interface card that already has a service IP label defined Does not require installing an additional physical network interface card on that node Is not part of any resource group. There can be one persistent node IP label per network per node.
HACMP Verification and Synchronization
The HACMP verification and synchronization process verifies that HACMP-specific modifications to AIX 5L system files are correct, that the cluster and its resources are configured correctly, that security (if set up) is configured correctly, that all nodes agree on the cluster topology, network configuration, and the ownership and takeover of HACMP resources, among other things. Verification also indicates whether custom cluster snapshot methods exist and whether they are executable on each cluster node.
Whenever you have configured, reconfigured, or updated a cluster, you should then run the cluster verification procedure. If the verification succeeds, the configuration is automatically synchronized. Synchronization takes effect immediately on an active cluster.
The verification utility keeps a detailed record of the information in the HACMP configuration database on each of the nodes after it runs. Subdirectories for each node contain information for the last successful verification (pass), the next-to-last successful verification (pass.prev), and the last unsuccessful verification (fail).
Messages output by the utility indicate where the error occurred (for example, the node, device, command, and so forth).
Verification with Automatic Cluster Configuration Monitoring
HACMP 5.4 provides automatic cluster configuration monitoring. By default, HACMP automatically runs verification on the node that is first in alphabetical order once every 24 hours at midnight. The cluster administrator is notified if the cluster configuration has become invalid.
When cluster verification completes on the selected cluster node, this node notifies the other cluster nodes. Every node stores the information about the date, time, which node performed the verification, and the results of the verification in the /var/hacmp/log/clutils.log file. If the selected node becomes unavailable or cannot complete cluster verification, you can detect this by the lack of a report in the /var/hacmp/log/clutils.log file.
If cluster verification completes and detects some configuration errors, you are notified about the potential problems:
The exit status of verification is published across the cluster along with the information about cluster verification process completion. Broadcast messages are sent across the cluster and displayed on stdout. These messages inform you about detected configuration errors. A general_notification event runs on the cluster and is logged in hacmp.out (if cluster services is running). Verification with Corrective Actions
Cluster verification consists of a series of checks performed against various user-configured HACMP server components. Each check attempts to detect either a cluster consistency issue or a configuration error. Some error conditions result when information important to the operation of HACMP, but not part of the HACMP software itself, is not propagated properly to all cluster nodes.
By default, verification runs with the automatic corrective actions mode enabled for both the Standard and Extended configuration. This is the recommended mode for running verification. If necessary, the automatic corrective actions mode can be disabled for the Extended configuration. However, note that running verification in automatic corrective action mode enables you to automate many configuration tasks, such as creating a client-based clhosts file, which is used by many of the monitors described in this chapter. For more information about both the client-based and server-based clhosts file, see the section Understanding the clhosts File.
When verification detects any of the following conditions, you can authorize a corrective action before error checking continues:
HACMP shared volume group time stamps do not match on all nodes. The /etc/hosts file on a node does not contain all HACMP-managed labels/IP addresses. SSA concurrent volume groups need SSA node numbers. A filesystem is not created on a node that is part of the resource group, although disks are available. Disks are available, but a volume group has not been imported to a node. Required /etc/services entries are missing on a node. Required HACMP snmpd entries are missing on a node. If an error found during verification triggers any corrective actions, then the utility runs all checks again after it finishes the first pass. If the same check fails again and the original problem is an error, the error is logged and verification fails. If the original condition is a warning, verification succeeds.
Custom Verification Methods
Through SMIT you also can add, change, or remove custom-defined verification methods that perform specific checks on your cluster configuration.
You can perform verification from the command line or through the SMIT interface to issue a customized remote notification method in response to a cluster event.
Understanding the clhosts File
Many of the monitors described in this section, including Clinfo, HAView, and clstat rely on the use of a clhosts file. The clhosts file contains IP address information that helps enable communications among HACMP cluster nodes. The clhosts file resides on all HACMP cluster servers and clients. There are differences, depending on where the file resides, as summarized in the following table.
When a monitor daemon starts up, it reads in the local /usr/es/sbin/cluster/etc/clhosts file to determine which nodes are available for communication as follows:
For daemons running on an HACMP server node, the local server-based clhosts file only requires the loopback address (127.0.0.1), that is automatically added to the server-based clhosts file when the server portion of HACMP is installed. For daemons running on an HACMP client node, the local client-based clhosts file should contain a list of the IP addresses for the HACMP server nodes. In this way, if a particular HACMP server node is unavailable (for example, powered down), then the daemon on the client node still can communicate with other HACMP server nodes. The HACMP verification utility assists in populating the client-based clhosts file in the following manner:
When you run cluster verification with automatic corrective actions enabled, HACMP finds all available HACMP server nodes, creates a /usr/es/sbin/cluster/etc/clhosts.client file on the server nodes, and populates the file with the IP addresses of those HACMP server nodes.
After you finish verifying and synchronizing HACMP on your cluster, you must manually copy this clhosts.client file to each client node as /usr/es/sbin/cluster/etc/clhosts (rename it by removing the .client extension).
For more information about verification, see HACMP Verification and Synchronization in this chapter.
Troubleshooting Tools
Typically, a functioning HACMP cluster requires minimal intervention. If a problem occurs, however, diagnostic and recovery skills are essential. Thus, troubleshooting requires that you identify the problem quickly and apply your understanding of the HACMP software to restore the cluster to full operation.
HACMP supplies the following tools:
These utilities are described in the following sections. For more detailed information on each of these utilities, see the Administration Guide.
For general testing or emulation tools see:
Log Files
The HACMP software writes the messages it generates to the system console and to several log files. Because each log file contains a different level of detail, system administrators can focus on different aspects of HACMP processing by viewing different log files. The main log files include:
The /usr/es/adm/cluster.log file tracks cluster events. The /tmp/hacmp.out file records the output generated by configuration scripts as they execute. Event summaries appear after the verbose output for events initiated by the Cluster Manager, making it easier to scan the hacmp.out file for important information. In addition, event summaries provide HTML links to the corresponding events within the hacmp.out file. The /usr/es/sbin/cluster/history/cluster.mmddyyyy log file logs the daily cluster history. The /var/hacmp/clverify/clverify.log file contains the verbose messages output during verification. Cluster verification consists of a series of checks performed against various HACMP configurations. Each check attempts to detect either a cluster consistency issue or an error. The messages output by the verification utility indicate where the error occurred (for example, the node, device, command, and so forth). HACMP lets you view, redirect, save and change parameters of the log files, so you can tailor them to your particular needs.
You can also collect log files for problem reporting. For more information, see the Administration Guide.
Resetting HACMP Tunable Values
While configuring and testing a cluster, you may change a value for one of the HACMP tunable values that affects the cluster performance. Or, you may want to reset tunable values to their default settings without changing any other aspects of the configuration. A third-party cluster administrator or a consultant may be asked to take over the administration of a cluster that they did not configure and may need to reset the tunable values to their defaults.
You can reset cluster tunable values using the SMIT interface. HACMP takes a cluster snapshot, prior to resetting. After the values have been reset to defaults, if you want to return to customized cluster settings, you can apply the cluster snapshot.
Resetting the cluster tunable values resets information in the cluster configuration database. The information that is reset or removed comprises two categories:
Information supplied by the users (for example, pre- and post-event scripts and network parameters, such as netmasks). Note that resetting cluster tunable values does not remove the pre- and post-event scripts that you already have configured. However, if you reset the tunable values, HACMP’s knowledge of pre- and post-event scripts is removed from the configuration, and these scripts are no longer used by HACMP to manage resources in your cluster. You can reconfigure HACMP to use these scripts again, if needed. Information automatically generated by HACMP during configuration and synchronization. This includes node and network IDs, and information discovered from the operating system, such as netmasks. Typically, users cannot see generated information. For a complete list of tunable values that you can restore to their default settings, see the Installation Guide.
For instructions on how to reset the tunable values using SMIT, see the Administration Guide.
Cluster Status Information File
When you use the HACMP Cluster Snapshot utility to save a record of a cluster configuration (as seen from each cluster node), you optionally cause the utility to run many standard AIX commands and HACMP commands to obtain status information about the cluster. This information is stored in a file, identified by the .info extension, in the snapshots directory. The snapshots directory is defined by the value of the SNAPSHOTPATH environment variable. By default, the cluster snapshot utility includes the output from the commands, such as cllssif, cllsnw, df, ls, and netstat. You can create custom snapshot methods to specify additional information you would like stored in the .info file.
A cluster snapshot lets you skip saving log files in the snapshot. Cluster snapshots are used for recording the cluster configuration information, whereas cluster logs only record the operation of the cluster and not the configuration information. By default, HACMP no longer collects cluster log files when you create the cluster snapshot, although you can still specify collecting the logs in SMIT. Skipping the logs collection reduces the size of the snapshot and speeds up running the snapshot utility. The size of the cluster snapshot depends on the configuration. For instance, a basic two-node configuration requires roughly 40KB.
Automatic Error Notification
You can use the AIX Error Notification facility to detect events not specifically monitored by the HACMP software—a disk adapter failure, for example—and specify a response to take place if the event occurs.
Normally, you define error notification methods manually, one by one. HACMP provides a set of pre-specified notification methods for important errors that you can automatically “turn on” in one step through the SMIT interface, saving considerable time and effort by not having to define each notification method manually.
Custom Remote Notification
You can define a notification method through the SMIT interface to issue a customized notification method in response to a cluster event. In HACMP 5.4, you can also send text messaging notification to any address including a cell phone, or mail to an email address.
After configuring a remote notification method, you can send a test message to confirm that the configuration is correct.
You can configure any number of notification methods, for different events and with different text messages and telephone numbers to dial. The same notification method can be used for several different events, as long as the associated text message conveys enough information to respond to all of the possible events that trigger the notification.
User-Defined Events
You can define your own events for which HACMP can run your specified recovery programs. This adds a new dimension to the predefined HACMP pre- and post-event script customization facility.
Note: HACMP 5.2 and up interact with the RSCT Resource Monitoring and Control (RMC) subsystem instead of with the RSCT Event Management subsystem. (The Event Management subsystem continues to be used for interaction with Oracle 9i). Only a subset of Event Management user-defined event definitions is automatically converted to the corresponding RMC event definitions, upon migration to HACMP 5.2 and up. After migration is complete, all user-defined event definitions must be manually reconfigured with the exception of seven UDE definitions defined by DB2. For more information, see the Administration Guide.
You specify the mapping between events that you define and recovery programs defining the event recovery actions through the SMIT interface. This lets you control both the scope of each recovery action and the number of event steps synchronized across all nodes. For details about registering events, see the RSCT documentation.
You must put all the specified recovery programs on all nodes in the cluster, and make sure they are executable, before starting the Cluster Manager on any node.
AIX resource monitor. This monitor generates events for OS-related events such as the percentage of CPU that is idle or percentage of disk space in use. The attribute names start with: IBM.Host. IBM.Processor. IBM.PhysicalVolume. Program resource monitor. This monitor generates events for process-related occurrences such as unexpected termination of a process. It uses the resource attribute IBM.Program.ProgramName. Event Preambles and Summaries
Details of cluster events are recorded in the hacmp.out file. The verbose output of this file contains many lines of event information; you see a concise summary at the end of each event’s details. For a quick and efficient check of what has happened in the cluster lately, you can view a compilation of only the event summary portions of current and previous hacmp.out files, by using the Display Event Summaries panel in SMIT. You can also select to save the compiled event summaries to a file of your choice. Optionally, event summaries provide HTML links to the corresponding events in the hacmp.out file.
The Cluster Manager also prints out a preamble that tells you which resource groups are enqueued for processing for each event; you can see the processing order that will be followed.
For details on viewing event preambles and summaries, see the Troubleshooting Guide.
Trace Facility
If the log files have no relevant information and the component-by-component investigation does not yield concrete results, you may need to use the HACMP trace facility to attempt to diagnose the problem. The trace facility provides a detailed look at selected system events.
Note that both the HACMP and AIX software must be running in order to use HACMP tracing.
For details on using the trace facility, see the Troubleshooting Guide.
Cluster Test Tool
The Cluster Test Tool is a utility that lets you test an HACMP cluster configuration to evaluate how a cluster behaves under a set of specified circumstances, such as when a node becomes inaccessible, a network becomes inaccessible, a resource group moves from one node to another, and so forth. You can start the test, let it run unattended, and return later to evaluate the results of your testing.
If you want to run an automated suite of basic cluster tests for topology and resource group management, you can run the automated test suite from SMIT. If you are an experienced HACMP administrator and want to tailor cluster testing to your environment, you can also create custom tests that can be run from SMIT.
It is recommended to run the tool after you initially configure HACMP and before you put your cluster into a production environment; after you make cluster configuration changes while the cluster is out of service; or at regular intervals even though the cluster appears to be functioning well.
Emulation Tools
HACMP includes the Event Emulator for running cluster event emulations and the Error Emulation functionality for testing notification methods.
HACMP Event Emulator
The HACMP Event Emulator is a utility that emulates cluster events and dynamic reconfiguration events by running event scripts that produce output but do not affect the cluster configuration or status. Emulation allows you to predict a cluster’s reaction to a particular event just as though the event actually occurred.
The Event Emulator follows the same procedure used by the Cluster Manager given a particular event, but does not execute any commands that would change the status of the Cluster Manager. For descriptions of cluster events and how the Cluster Manager processes these events, see the Administration Guide for more information.
You can run the Event Emulator through SMIT or from the command line. The Event Emulator runs the events scripts on every active node of a stable cluster, regardless of the cluster’s size. The output from each node is stored in an output file on the node from which the event emulator is invoked. You can specify the name and location of the output file using the environment variable EMUL_OUTPUT, or use the default output file, /tmp/emuhacmp.out.
Note: The Event Emulator requires that both the Cluster Manager and the Cluster Information Program (clinfo) be running on your cluster.
The events emulated are categorized in two groups:
Cluster events Dynamic reconfiguration events. Emulating Cluster Events
The cluster events that can be emulated are:
Emulating Dynamic Reconfiguration Events
The dynamic reconfiguration event that can be emulated is Synchronize the HACMP Cluster.
Restrictions on Event Emulation
Note: If your current cluster does not meet any of the following restrictions, you can use the cluster test tool as an alternative to executing cluster events in emulation mode. The cluster test tool performs real cluster events and pre- and post-event customizations.
The Event Emulator has the following restrictions:
You can run only one instance of the event emulator at a time. If you attempt to start a new emulation in a cluster while an emulation is already running, the integrity of the results cannot be guaranteed. clinfo must be running. You cannot run successive emulations. Each emulation is a standalone process; one emulation cannot be based on the results of a previous emulation. When you run an event emulation, the Emulator’s outcome may be different from the Cluster Manager’s reaction to the same event under certain conditions: The Event Emulator will not change the configuration of a cluster device. Therefore, if your configuration contains a process that makes changes to the Cluster Manager (disk fencing, for example), the Event Emulator will not show these changes. This could lead to a different output, especially if the hardware devices cause a fallover. The Event Emulator runs customized scripts (pre- and post-event scripts) associated with an event, but does not execute commands within these scripts. Therefore, if these customized scripts change the cluster configuration when actually run, the outcome may differ from the outcome of an emulation. When emulating an event that contains a customized script, the Event Emulator uses the ksh flags -n and -v. The -n flag reads commands and checks them for syntax errors, but does not execute them. The -v flag indicates verbose mode. When writing customized scripts that may be accessed during an emulation, be aware that the other ksh flags may not be compatible with the -n flag and may cause unpredictable results during the emulation. See the ksh man page for flag descriptions. Emulation of Error Log Driven Events
Although the HACMP software does not monitor the status of disk resources, it does provide a SMIT interface to the AIX Error Notification facility.
HACMP uses the following utilities for monitoring purposes:
RSCT AIX Error Notification RMC User-defined events Application monitoring. The AIX Error Notification facility allows you to detect an event not specifically monitored by the HACMP software—a disk adapter failure, for example—and to program a response (notification method) to the event. In addition, if you add a volume group to a resource group, HACMP automatically creates an AIX Error Notification method for it. In the case where the loss of quorum error occurs for a mirrored volume group, HACMP uses this method to selectively move the affected resource group to another node. Do not edit or alter the error notification methods that are generated by HACMP.
HACMP provides a utility for testing your error notification methods. After you add one or more error notification methods with the AIX Error Notification facility, you can test your methods by emulating an error. By inserting an error into the AIX error device file (/dev/error), you cause the AIX error daemon to run the appropriate pre-specified notification method. This allows you to determine whether your pre-defined action is carried through, without having to actually cause the error to occur.
When the emulation is complete, you can view the error log by typing the errpt command to be sure the emulation took place. The error log entry has either the resource name EMULATOR, or a name as specified by the user in the Resource Name field during the process of creating an error notification object.
You will then be able to determine whether the specified notification method was carried out.
![]() ![]() ![]() |