![]() ![]() ![]() |
Chapter 2: Using Cluster Log Files
This chapter explains how to use the HACMP cluster log files to troubleshoot the cluster. It also includes sections on managing parameters for some of the logs.
Major sections of the chapter include:
Viewing HACMP Cluster Log Files
Your first approach to diagnosing a problem affecting your cluster should be to examine the cluster log files for messages output by the HACMP subsystems. These messages provide valuable information for understanding the current state of the cluster. The following sections describe the types of messages output by the HACMP software and the log files into which the system writes these messages.
For most troubleshooting, the /tmp/hacmp.out file will be the most helpful log file. Resource group handling has been enhanced in recent releases and the hacmp.out file has been expanded to capture more information on the activity and location of resource groups after cluster events. For instance, the hacmp.out file captures details of resource group parallel processing that other logs (such as the cluster history log) cannot report. The event summaries included in this log make it easier to see quickly what events have occurred recently in the cluster.
Reviewing Cluster Message Log Files
The HACMP software writes the messages it generates to the system console and to several log files. Each log file contains a different subset of messages generated by the HACMP software. When viewed as a group, the log files provide a detailed view of all cluster activity.
The following list describes the log files into which the HACMP software writes messages and the types of cluster messages they contain. The list also provides recommendations for using the different log files. Note that the default log directories are listed here; you have the option of redirecting some log files to a chosen directory. For more information about how to redirect cluster log files see the section Steps for Redirecting a Cluster Log File. If you have redirected any logs, check the appropriate location.
/tmp/clinfo.debug The /tmp/clinfo.debug file records the output generated by the event scripts as they run. This information supplements and expands upon the information in the /usr/var/hacmp/log file.You can install Client Information (Clinfo) services on both client and server systems—client systems (cluster.es.client) will not have any HACMP ODMs (for example HACMPlogs) or utilities (for example clcycle); therefore, the Clinfo logging will not take advantage of cycling or redirection.The default debug level is 0 or “off”. You can enable logging using command line flags. Use the clinfo -l flag to change the log file name. /tmp/clstrmgr.debug Contains time-stamped, formatted messages generated by the clstrmgrES daemon. The default messages are verbose and are typically adequate for troubleshooting most problems, however IBM support may direct you to enable additional debugging.Recommended Use: Information in this file is for IBM Support personnel. /tmp/cspoc.log Contains time-stamped, formatted messages generated by HACMP C-SPOC commands. The /tmp/cspoc.log file resides on the node that invokes the C-SPOC command.Recommended Use: Use the C-SPOC log file when tracing a C-SPOC command’s execution on cluster nodes. /tmp/emuhacmp.out Contains time-stamped, formatted messages generated by the HACMP Event Emulator. The messages are collected from output files on each node of the cluster, and cataloged together into the /tmp/emuhacmp.out log file.In verbose mode (recommended), this log file contains a line-by-line record of every event emulated. Customized scripts within the event are displayed, but commands within those scripts are not executed. /tmp/hacmp.out Contains time-stamped, formatted messages generated by HACMP scripts on the current day.In verbose mode (recommended), this log file contains a line-by-line record of every command executed by scripts, including the values of all arguments to each command. An event summary of each high-level event is included at the end of each event’s details. For information about viewing this log and interpreting its messages, see the section Understanding the hacmp.out Log File.Recommended Use: Because the information in this log file supplements and expands upon the information in the /usr/es/adm/cluster.log file, it is the primary source of information when investigating a problem.Note: With recent changes in the way resource groups are handled and prioritized in fallover circumstances, the hacmp.out file and its event summaries have become even more important in tracking the activity and resulting location of your resource groups.In HACMP releases prior to 5.2, non-recoverable event script failures result in the event_error event being run on the cluster node where the failure occurred. The remaining cluster nodes do not indicate the failure. With HACMP 5.2 and up, all cluster nodes run the event_error event if any node has a fatal error. All nodes log the error and call out the failing node name in the hacmp.out log file. /usr/es/adm/cluster.log Contains time-stamped, formatted messages generated by HACMP scripts and daemons. For information about viewing this log file and interpreting its messages, see the following section Understanding the cluster.log File.Recommended Use: Because this log file provides a high-level view of current cluster status, check this file first when diagnosing a cluster problem. /usr/es/sbin/cluster/history/
cluster.mmddyyyy Contains time-stamped, formatted messages generated by HACMP scripts. The system creates a cluster history file every day, identifying each file by its file name extension, where mm indicates the month, dd indicates the day, and yyyy the year. For information about viewing this log file and interpreting its messages, see the section Understanding the Cluster History Log File.Recommended Use: Use the cluster history log files to get an extended view of cluster behavior over time.Note that this log is not a good tool for tracking resource groups processed in parallel. In parallel processing, certain steps formerly run as separate events are now processed differently and these steps will not be evident in the cluster history log. Use the hacmp.out file to track parallel processing activity. /usr/es/sbin/cluster/snapshots/
clsnapshot.log Contains logging information from the snapshot utility of HACMP, and information about errors found and/or actions taken by HACMP for resetting cluster tunable values. /var/adm/clavan.log Contains the state transitions of applications managed by HACMP. For example, when each application managed by HACMP is started or stopped and when the node stops on which an application is running.Each node has its own instance of the file. Each record in the clavan.log file consists of a single line. Each line contains a fixed portion and a variable portion:Recommended Use: By collecting the records in the clavan.log file from every node in the cluster, a utility program can determine how long each application has been up, as well as compute other statistics describing application availability time. /var/ha/log/grpglsm Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Group Services Globalized Switch Membership daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it promptly if there is a chance you may need it. /var/ha/log/grpsvcs Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Group Services daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it promptly if there is a chance you may need it. /var/ha/log/topsvcs Contains time-stamped messages in ASCII format. These track the execution of internal activities of the RSCT Topology Services daemon. IBM support personnel use this information for troubleshooting. The file gets trimmed regularly. Therefore, please save it promptly if there is a chance you may need it. /var/hacmp/clcomd/
clcomddiag.log Contains time-stamped, formatted, diagnostic messages generated by clcomd.Recommended Use: Information in this file is for IBM Support personnel. /var/hacmp/clcomd/clcomd.log Contains time-stamped, formatted messages generated by Cluster Communications daemon (clcomd) activity. The log shows information about incoming and outgoing connections, both successful and unsuccessful. Also displays a warning if the file permissions for /usr/es/sbin/cluster/etc/rhosts are not set correctly—users on the system should not be able to write to the file.Recommended Use: Use information in this file to troubleshoot inter-node communications, and to obtain information about attempted connections to the daemon (and therefore to HACMP). /var/hacmp/clverify/clverify.log The /var/hacmp/clverify/clverify.log file contains the verbose messages output by the cluster verification utility. The messages indicate the node(s), devices, command, etc. in which any verification error occurred. For complete information see Chapter 7: Verifying and Synchronizing a Cluster Configuration in the Administration Guide. /var/hacmp/log/clutils.log Contains information about the date, time, results, and which node performed an automatic cluster configuration verification.It also contains information for the file collection utility, the two-node cluster configuration assistant, the cluster test tool and the OLPW conversion tool. /var/hacmp/utilities/
cl_configassist.log Contains debugging information for the Two-Node Cluster Configuration Assistant. The Assistant stores up to ten copies of the numbered log files to assist with troubleshooting activities. /var/hacmp/utilities/
cl_testtool.log Includes excerpts from the hacmp.out file. The Cluster Test Tool saves up to three log files and numbers them so that you can compare the results of different cluster tests. The tool also rotates the files with the oldest file being overwritten system error log Contains time-stamped, formatted messages from all AIX 5L subsystems, including scripts and daemons. For information about viewing this log file and interpreting the messages it contains, see the section Understanding the System Error Log.Recommended Use: Because the system error log contains time-stamped messages from many other system components, it is a good place to correlate cluster events with system events. tmp/clconvert.log Contains a record of the conversion progress when upgrading to a recent HACMP release. The installation process runs the cl_convert utility and creates the /tmp/clconvert.log file.Recommended Use: View the clconvert.log to gauge conversion success when running cl_convert from the command line. For detailed information on the cl_convert utility see the chapter on Upgrading an HACMP Cluster, in the Installation Guide. /usr/es/sbin/cluster/wsm/logs/
wsm_smit.log All operations of the WebSMIT interface are logged to the wsm_smit.log file and are equivalent to the logging done with smitty -v. Script commands are also captured in the wsm_smit.script log file.wsm_smit log files are created by the CGI scripts using a relative path of <../logs>. If you copy the CGI scripts to the default location for the IBM HTTP Server, the final path to the logs is /usr/IBMIHS/logs. The WebSMIT logs are not subject to manipulation (redirect, backup) by HACMP logs. Just like smit.log and smit.script, the files grow indefinitely.The snap -e utility captures the WebSMIT log files in the default location (/usr/es/sbin/cluster/wsm/logs).There is no default logging of the cluster status display, although logging can be enabled through the wsm_clstat.com configuration file.
Understanding the cluster.log File
The /usr/es/adm/cluster.log file is a standard text file. When checking this file, first find the most recent error message associated with your problem. Then read back through the log file to the first message relating to that problem. Many error messages cascade from an initial error that usually indicates the problem source.
Format of Messages in the cluster.log File
The entries in the /usr/es/adm/cluster.log file use the following format:
Each entry has the following information:
The entry in the previous example indicates that the Cluster Information program (clinfoES) stopped running on the node named nodeA at 5:25 P.M. on March 3.
Because the /usr/es/adm/cluster.log file is a standard ASCII text file, you can view it using standard AIX 5L file commands, such as the more or tail commands. However, you can also use the SMIT interface. The following sections describe each of the options.
Viewing the cluster.log File Using SMIT
To view the /usr/es/adm/cluster.log file using SMIT:
1. Enter smit hacmp
2. In SMIT, select Problem Determination Tools > HACMP Log Viewing and Management > View Detailed HACMP Log Files and press Enter.
3. Select Scan the HACMP for AIX System Log and press Enter. This option references the /usr/es/adm/cluster.log file.
Note: You can select to either scan the contents of the /usr/es/adm/cluster.log file as it exists, or you can watch an active log file as new events are appended to it in real time. Typically, you scan the file to try to find a problem that has already occurred; you watch the file as you test a solution to a problem to determine the results.
Understanding the hacmp.out Log File
The /tmp/hacmp.out file is a standard text file. The system cycles hacmp.out log file seven times. Each copy is identified by a number appended to the file name. The most recent log file is named /tmp/hacmp.out; the oldest version of the file is named /tmp/hacmp.out.7.
Given the recent changes in the way resource groups are handled and prioritized in fallover circumstances, the hacmp.out file and its event summaries have become even more important in tracking the activity and resulting location of your resource groups.
You can customize the wait period before a warning message appears. Since this affects how often the config_too_long message is posted to the log, the config_too_long console message may not be evident in every case where a problem exists. See details below in the Config_too_long Message in the hacmp.out File section.
In HACMP releases prior to 5.2, non-recoverable event script failures result in the event_error event being run on the cluster node where the failure occurred. The remaining cluster nodes do not indicate the failure. With HACMP 5.2 and up, all cluster nodes run the event_error event if any node has a fatal error. All nodes log the error and call out the failing node name in the hacmp.out log file.
When checking the /tmp/hacmp.out file, search for EVENT FAILED messages. These messages indicate that a failure has occurred. Then, starting from the failure message, read back through the log file to determine exactly what went wrong. The /tmp/hacmp.out log file provides the most important source of information when investigating a problem.
Note: With HACMP 5.2 and up, EVENT_FAILED_NODE is set to the name of the node where the event failed.
Event Preambles
When an event processes resource groups with dependencies or with HACMP/XD replicated resources, an event preamble is included in the hacmp.out file. This preamble shows you the logic the Cluster Manager will use to process the event in question. See the sample below.
HACMP Event Preamble ------------------------------------------------------------------ Node Down Completion Event has been enqueued. ------------------------------------------------------------------ xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx HACMP Event Preamble Action: Resource: ------------------------------------------------------------------ Enqueued rg_move acquire event for resource group rg3. Enqueued rg_move release event for resource group rg3. Enqueued rg_move secondary acquire event for resource group 'rg1'. Node Up Completion Event has been enqueued. ------------------------------------------------------------------Event Summaries
Event summaries that appear at the end of each event’s details make it easier to check the hacmp.out file for errors. The event summaries contain pointers back to the corresponding event, which allow you to easily locate the output for any event. See the section Verbose Output Example with Event Summary for an example of the output.
You can also view a compilation of only the event summary sections pulled from current and past hacmp.out files. The option for this display is found on the Problem Determination Tools > HACMP Log Viewing and Management > View/Save/Remove Event Summaries > View Event Summaries SMIT panel. For more detail, see the section Viewing Compiled hacmp.out Event Summaries later in this chapter.
hacmp.out in HTML Format
You can view the hacmp.out log file in HTML format by setting formatting options on the Problem Determination Tools > HACMP Log Viewing and Management > Change/Show HACMP Log File Parameters SMIT panel. For instructions see the section Setting the Level and Format of Information Recorded in the hacmp.out File.
Resource Group Acquisition Failures and Volume Group Failures in hacmp.out
Reported resource group acquisition failures (failures indicated by a non-zero exit code returned by a command) are tracked in hacmp.out. This information includes:
The start and stop times for the event Which resource groups were affected (acquired or released) as a result of the event In the case of a failed event, an indication of which resource action failed. You can track the path the Cluster Manager takes as it tries to keep resources available.
In addition, the automatically configured AIX 5L Error Notification method that runs in the case of a volume group failure writes the following information in the hacmp.out log file:
AIX 5L error label and ID for which the method was launched The name of the affected resource group The node’s name on which the error occurred. Messages for Resource Group Recovery Upon node_up
The hacmp.out file, event summaries, and clstat include information and messages about resource groups in the ERROR state that attempted to get online on a joining node, or on a node that is starting up.
Similarly, you can trace the cases in which the acquisition of such a resource group has failed, and HACMP launched an rg_move event to move the resource group to another node in the nodelist. If, as a result of consecutive rg_move events through the nodes, a non-concurrent resource group still failed to get acquired, HACMP adds a message to the hacmp.out file.
“Standby” Events Reported for Networks Using Aliases
When you add a network interface on a network using aliases, the actual event that runs in this case is called join_interface. This is reflected in the hacmp.out file. However, such networks by definition do not have standby interfaces defined, so the event that is being run in this case simply indicates that a network interface joins the cluster. Similarly, when a network interface failure occurs, the actual event that is run in is called fail_interface. This is also reflected in the hacmp.out file. Remember that the event that is being run in this case simply indicates that a network interface on the given network has failed.
Resource Group Processing Messages in the hacmp.out File
The hacmp.out file allows you to fully track how resource groups have been processed in HACMP. This section provides a brief description, for detailed information and examples of event summaries with job types, see the section Tracking Resource Group Parallel and Serial Processing in the hacmp.out File later in this chapter.
For each resource group that has been processed by HACMP, the software sends the following information to the hacmp.out file:
Resource group name Script name Name of the command that is being executed. The general pattern of the output is:
In cases where an event script does not process a specific resource group, for instance, in the beginning of a node_up event, a resource group’s name cannot be obtained. In this case, the resource group’s name part of the tag is blank.
For example, the hacmp.out file may contain either of the following lines:
In addition, references to the individual resources in the event summaries in the hacmp.out file contain reference tags to the associated resource groups. For instance:
Config_too_long Message in the hacmp.out File
You can customize the waiting period before a config_too_long message is sent.
For each cluster event that does not complete within the specified event duration time, config_too_long messages are logged in the hacmp.out file and sent to the console according to the following pattern:
The first five config_too_long messages appear in the hacmp.out file at
30-second intervalsThe next set of five messages appears at an interval that is double the previous interval until the interval reaches one hour These messages are logged every hour until the event completes or is terminated on
that node.For more information on customizing the event duration time before receiving a config_too_long warning message, see the chapter on Planning for Cluster Events in the Planning Guide.
Non-Verbose and Verbose Output of the hacmp.out Log File
You can select either verbose or non-verbose output.
Non-Verbose Output
In non-verbose mode, the hacmp.out log contains the start, completion, and error notification messages output by all HACMP scripts. Each entry contains the following information:
Verbose Output
In verbose mode, the hacmp.out file also includes the values of arguments and flag settings passed to the scripts and commands.
Verbose Output Example with Event Summary
Some events (those initiated by the Cluster Manager) are followed by event summaries, as shown in these excerpts:
.... Mar 25 15:20:30 EVENT COMPLETED: network_up alcuin tmssanet_alcuin_bede HACMP Event Summary Event: network_up alcuin tmssanet_alcuin_bede Start time: Tue Mar 25 15:20:30 2003 End time: Tue Mar 25 15:20:30 2003 Action: Resource: Script Name: ------------------------------------------------------------------------ No resources changed as a result of this event ------------------------------------------------------------------------Event Summary for the Settling Time
CustomRG has a settling time configured. A lower priority node joins the cluster:
Mar 25 15:20:30 EVENT COMPLETED: node_up alcuin HACMP Event Summary Event: node_up alcuin Start time: Tue Mar 25 15:20:30 2003 End time: Tue Mar 25 15:20:30 2003 Action: Resource: Script Name: ---------------------------------------------------------------- No action taken on resource group 'CustomRG'. The Resource Group 'CustomRG' has been configured to use 20 Seconds Settling Time. This group will be processed when the timer expires. ----------------------------------------------------------------------Event Summary for the Fallback Timer
CustomRG has a daily fallback timer configured to fall back on 22 hrs 10 minutes. The resource group is on a lower priority node (bede). Therefore, the timer is ticking; the higher priority node (alcuin) joins the cluster:
The message on bede ... Mar 25 15:20:30 EVENT COMPLETED: node_up alcuin HACMP Event Summary Event: node_up alcuin Start time: Tue Mar 25 15:20:30 2003 End time: Tue Mar 25 15:20:30 2003 Action: Resource: Script Name: ---------------------------------------------------------------- No action taken on resource group 'CustomRG'. The Resource Group 'CustomRG' has been configured to fallback on Mon Mar 25 22:10:00 2003 ---------------------------------------------------------------------- The message on alcuin ... Mar 25 15:20:30 EVENT COMPLETED: node_up alcuin HACMP Event Summary Event: node_up alcuin Start time: Tue Mar 25 15:20:30 2003 End time: Tue Mar 25 15:20:30 2003 Action: Resource: Script Name: ---------------------------------------------------------------- The Resource Group 'CustomRG' has been configured to fallback using daily1 Timer Policy ----------------------------------------------------------------------Viewing the hacmp.out File using SMIT
To view the /tmp/hacmp.out file using SMIT:
1. Enter smit hacmp
2. In SMIT, select Problem Determination Tools > HACMP Log Viewing and Management > View Detailed HACMP Log Files and press Enter.
3. On the View Detailed HACMP Log Files menu, you can select to either scan the contents of the /tmp/hacmp.out file or watch as new events are appended to the log file. Typically, you will scan the file to try to find a problem that has already occurred and then watch the file as you test a solution to the problem. In the menu, the /tmp/hacmp.out file is referred to as the HACMP Script Log File.
4. Select Scan the HACMP Script Log File and press Enter.
5. Select a script log file and press Enter.
Setting the Level and Format of Information Recorded in the hacmp.out File
To set the level of information recorded in the /tmp/hacmp.out file:
1. Enter smit hacmp
2. In SMIT, select Problem Determination Tools > HACMP Log Viewing and Management > Change/Show HACMP Log File Parameters.
SMIT prompts you to specify the name of the cluster node you want to modify. Runtime parameters are configured on a per-node basis.
3. Type the node name and press Enter.
4. To obtain verbose output, set the value of the Debug Level field to high.
5. To change the hacmp.out display format, select Formatting options for hacmp.out. Select a node and set the formatting to HTML (Low), HTML (High), Default (None), or Standard.
Note: If you set your formatting options for hacmp.out to Default (None), then no event summaries will be generated. For information about event summaries, see the section Viewing Compiled hacmp.out Event Summaries.
6. To change the level of debug information, set the value of New Cluster Manager debug level field to either Low or High.
Viewing Compiled hacmp.out Event Summaries
In the hacmp.out file, event summaries appear after those events that are initiated by the Cluster Manager. For example, node_up and node_up_complete and related subevents such as node_up_local and node_up_remote_complete. Note that event summaries do not appear for all events; for example, when you move a resource group through SMIT.
The View Event Summaries option displays a compilation of all event summaries written to a node’s hacmp.out file. This utility can gather and display this information even if you have redirected the hacmp.out file to a new location. You can also save the event summaries to a file of your choice instead of viewing them via SMIT.
Note: Event summaries pulled from the hacmp.out file are stored in the /usr/es/sbin/cluster/cl_event_summary.txt file. This file continues to accumulate as hacmp.out cycles, and is not automatically truncated or replaced. Consequently, it can grow too large and crowd your /usr directory. You should clear event summaries periodically, using the Remove Event Summary History option in SMIT.
This feature is node-specific. Therefore, you cannot access one node’s event summary information from another node in the cluster. Run the View Event Summaries option on each node for which you want to gather and display event summaries.
The event summaries display is a good way to get a quick overview of what has happened in the cluster lately. If the event summaries reveal a problem event, you will probably want to examine the source hacmp.out file to see full details of what happened.
Note: If you have set your formatting options for hacmp.out to Default (None), then no event summaries will be generated. The View Event Summaries command will yield no results.
How Event Summary View Information Is Gathered
The Problem Determination Tools > HACMP Log Viewing and Management > View Event Summaries option gathers information from the hacmp.out log file, not directly from HACMP while it is running. Consequently, you can access event summary information even when HACMP is not running. The summary display is updated once per day with the current day’s event summaries.
In addition, at the bottom of the display the resource group location and state information is shown. This information reflects output from the clRGinfo command.
Note that clRGinfo displays resource group information more quickly when the cluster is running. If the cluster is not running, wait a few minutes and the resource group information will eventually appear.
Viewing Event Summaries
To view a compiled list of event summaries on a node:
1. Enter smit hacmp
2. In SMIT, select View Event Summaries and press Enter. SMIT displays a list of event summaries generated on the node. SMIT will notify you if no event summaries were found.
Saving Event Summaries to a Specified File
To store the compiled list of a node’s event summaries to a file:
1. Enter smit hacmp
2. In SMIT, select View/Save/Remove HACMP Event Summaries.
3. Select Save Event Summaries to a file.
4. Enter the path/file name where you wish to store the event summaries.
Depending on the format you select (for example .txt or .html), you can then move this file to be able to view it in a text editor or browser.
Removing Event Summaries
When you select the Remove Event Summary History option, HACMP deletes all event summaries compiled from hacmp.out files. A new list is then started.
Note: You should clear the event summary history periodically to keep the /usr/es/sbin/cluster/cl_event_summary.txt file from crowding your /usr directory.
Follow the steps below to delete the list of summaries:
1. Enter smit hacmp
2. In SMIT, select View/Save/Remove HACMP Event Summaries.
3. Select Remove Event Summary History. HACMP deletes all event summaries from the file.
Understanding the System Error Log
The HACMP software logs messages to the system error log whenever a daemon generates a state message.
Format of Messages in the System Error Log
The HACMP messages in the system error log follow the same format used by other AIX 5L subsystems. You can view the messages in the system error log in short or long format.
In short format, also called summary format, each message in the system error log occupies a single line. The description of the fields in the short format of the system error log:
In long format, a page of formatted information is displayed for each error.
Unlike the HACMP log files, the system error log is not a text file.
Using the AIX 5L Error Report Command
The AIX 5L errpt command generates an error report from entries in the system error log. For information on using this command see the errpt man page.
Viewing the System Error Log Using SMIT
To view the AIX 5L system error log, you must use the AIX 5L SMIT:
1. Enter smit
2. In SMIT, select Problem Determination Tools > HACMP Log Viewing and Management > View Detailed HACMP Log Files > Scan the HACMP for AIX System Log and press Enter.
For more information on this log file, refer to your AIX 5L documentation.
Understanding the Cluster History Log File
The cluster history log file is a standard text file with the system-assigned name /usr/es/sbin/cluster/history/cluster.mmddyyyy, where mm indicates the month, dd indicates the day in the month and yyyy indicates the year. You should decide how many of these log files you want to retain and purge the excess copies on a regular basis to conserve disk storage space. You may also decide to include the cluster history log file in your regular system backup procedures.
Format of Messages in the Cluster History Log File
The description of the fields in the cluster history log file messages:
Date and Time stamp The date and time at which the event occurred. Message Text of the message. Description Name of the event script.
Note: This log reports specific events. Note that when resource groups are processed in parallel, certain steps previously run as separate events are now processed differently, and therefore do not show up as events in the cluster history log file. You should use the hacmp.out file, which contains greater detail on resource group activity and location, to track parallel processing activity.
Viewing the Cluster History Log File
Because the cluster history log file is a standard text file, you can view its contents using standard AIX 5L file commands, such as cat, more, and tail. You cannot view this log file using SMIT.
Understanding the Cluster Manager Debug Log File
The /tmp/clstrmgr.debug file is a standard text file that contains the debug messages generated by the Cluster Manager. IBM Support uses this file. In terse mode, the default debug levels are recorded. In verbose mode, all debug levels are recorded.
Format of Messages in the Cluster Manager Debug Log File
The clstrmgr.debug log file contains time-stamped, formatted messages generated by HACMP clstrmgrES activity.
Viewing the Cluster Manager Debug Log File
Because the clstrmgr.debug log file is a standard text file, you can view its contents using standard AIX 5L file commands, such as cat, more, and tail. You cannot view this log file using SMIT.
Understanding the cspoc.log File
The /tmp/cspoc.log file is a standard text file that resides on the source node—the node on which the C-SPOC command is invoked. Many error messages cascade from an underlying AIX 5L error that usually indicates the problem source and success or failure status.
Format of Messages in the cspoc.log File
Each /tmp/cspoc.log entry contains a command delimiter to separate C-SPOC command output. The first line of the command’s output, which contains arguments (parameters) passed to the command, follows this delimiter. Additionally, each entry contains the following information:
Viewing the cspoc.log File
The /tmp/cspoc.log file is a standard text file that can be viewed in any of the following ways:
Using standard AIX 5L file commands, such as the more or tail commands Using the SMIT interface. Using the SMIT Interface to View the cspoc.log File
To view the /tmp/cspoc.log file using SMIT:
1. Enter smit hacmp
2. In SMIT, select Problem Determination Tools > HACMP Log Viewing and Management > View Detailed HACMP Log Files > Scan the C-SPOC System Log File.
Note: Note that you can select to either scan the contents of the /tmp/cspoc.log file as it exists, or you can watch an active log file as new events are appended to it in real time. Typically, you scan the file to try to find a problem that has already occurred; you watch the file while duplicating a problem to help determine its cause, or as you test a solution to a problem to determine the results.
Understanding the emuhacmp.out File
The /tmp/emuhacmp.out file is a standard text file that resides on the node from which the HACMP Event Emulator was invoked. The file contains information from log files generated by the Event Emulator on all nodes in the cluster. When the emulation is complete, the information in these files is transferred to the /tmp/emuhacmp.out file on the node from which the emulation was invoked, and all other files are deleted.
Using the EMUL_OUTPUT environment variable, you can specify another name and location for this output file. The format of the file does not change.
Format of Messages in the emuhacmp.out File
The entries in the /tmp/emuhacmp.out file use the following format:
********************************************************************** ******************START OF EMULATION FOR NODE buzzcut*************** ********************************************************************** Jul 21 17:17:21 EVENT START: node_down buzzcut graceful + [ buzzcut = buzzcut -a graceful = forced ] + [ EMUL = EMUL ] + cl_echo 3020 NOTICE >>>> The following command was not executed <<<< \n NOTICE >>>> The following command was not executed <<<< + echo /usr/es/sbin/cluster/events/utils/cl_ssa_fence down buzzcut\n /usr/es/sbin/cluster/events/utils/cl_ssa_fence down buzzcut + [ 0 -ne 0 ] + [ EMUL = EMUL ] + cl_echo 3020 NOTICE >>>> The following command was not executed <<<< \n NOTICE >>>> The following command was not executed <<<< + echo /usr/es/sbin/cluster/events/utils/cl_ssa_fence down buzzcut graceful\n /usr/es/sbin/cluster/events/utils/cl_ssa_fence down buzzcut graceful **************** END OF EMULATION FOR NODE BUZZCUT *********************The output of emulated events is presented as in the /tmp/hacmp.out file described earlier in this chapter. The /tmp/emuhacmp.out file also contains the following information:
Viewing the /tmp/emuhacmp.out File
You can view the /tmp/emuhacmp.out file using standard AIX 5L file commands. You cannot view this log file using the SMIT interface.
Collecting Cluster Log Files for Problem Reporting
If you encounter a problem with HACMP and report it to IBM support, you may be asked to collect log files pertaining to the problem. In HACMP 5.2 and up, the Collect HACMP Log Files for Problem Reporting SMIT panel aids in this process.
Warning: Use this panel only if requested by the IBM support personnel. If you use this utility without direction from IBM support, be careful to fully understand the actions and the potential consequences.
To collect cluster log files for problem reporting:
1. Enter smit hacmp
2. In SMIT, select Problem Determination Tools > HACMP Log Viewing and Management > Collect Log Files for Problem Reporting.
3. Type or select values in entry fields:
Tracking Resource Group Parallel and Serial Processing in the hacmp.out File
Output to the hacmp.out file lets you isolate details related to a specific resource group and its resources. Based on the content of the hacmp.out event summaries, you can determine whether or not the resource groups are being processed in the expected order.
Depending on whether resource groups are processed serially or in parallel, you will see different output in the event summaries and in the log files. In HACMP, parallel processing is the default method. If you migrated the cluster from an earlier version of HACMP, serial processing is maintained.
Note: If you configured dependent resource groups and specified the serial order of processing, the rules for processing dependent resource groups override the serial order. To avoid this, the serial order of processing that you specify should not contradict the configured dependencies between resource groups.
This section contains detailed information on the following:
Serial Processing Order Reflected in Event Summaries
If you have defined customized serial processing lists for some of the resource groups, you can determine whether or not the resource groups are being processed in the expected order based on the content of the hacmp.out file event summaries.
The following example shows an event summary for two serially-processed resource groups named cascrg1 and cascrg2:
HACMP Event Summary Event: node_ up electron Start time: Wed May 8 11: 06: 30 2002 End time: Wed May 8 11: 07: 49 2002 Action: Resource: Script Name: ------------------------------------------------------------- Acquiring resource group: cascrg1 node_ up_ local Search on: Wed. May 8. 11: 06: 33. EDT. 2002. node_ up_ local.cascrg1. Acquiring resource: 192. 168. 41. 30 cl_ swap_ IP_ address Search on: Wed. May. 8. 11: 06: 36. EDT. 2002. cl_ swap_ IP_ address. 192. 168. Acquiring resource: hdisk1 cl_ disk_ available Search on: Wed. May. 8. 11: 06: 40. EDT. 2002. cl_ disk_ available. hdisk1. ca Resource online: hdisk1 cl_ disk_ available Search on: Wed. May. 8. 11: 06: 42. EDT. 2002. cl_ disk_ available. hdisk1. ca . . . Acquiring resource group: cascrg2 node_ up_ local Search on: Wed. May. 8. 11: 07: 14. EDT. 2002. node_ up_ local. cascrg2. ref Acquiring resource: hdisk2 cl_ disk_ available Search on: Wed. May. 8. 11: 07: 20. EDT. 2002. cl_ disk_ available. hdisk2. ca Resource online: hdisk2 cl_ disk_ available Search on: Wed. May. 8. 11: 07: 23. EDT. 2002. cl_ disk_ available. hdisk2. caAs shown here, each resource group appears with all of its accounted resources below it.
Parallel Processing Order Reflected in Event Summaries
The following features, listed in the hacmp.out file and in the event summaries, help you to follow the flow of parallel resource group processing:
Each line in the hacmp.out file flow includes the name of the resource group to which it applies The event summary information includes details about all resource types Each line in the event summary indicates the related resource group. The following example shows an event summary for resource groups named cascrg1 and cascrg2 that are processed in parallel:
HACMP Event Summary Event: node_ up electron Start time: Wed May 8 11: 06: 30 2002 End time: Wed May 8 11: 07: 49 2002 Action: Resource: Script Name: ------------------------------------------------------------- Acquiring resource group: cascrg1 process_ resources Search on: Wed. May. 8. 11: 06: 33. EDT. 2002. process_ resources. cascrg1. ref Acquiring resource group: cascrg2 process_ resources Search on: Wed. May. 8. 11: 06: 34. EDT. 2002. process_ resources. cascrg2. ref Acquiring resource: 192. 168. 41. 30 cl_ swap_ IP_ address Search on: Wed. May. 8. 11: 06: 36. EDT. 2002. cl_ swap_ IP_ address. 192. 168. 41. 30 Acquiring resource: hdisk1 cl_ disk_ available Search on: Wed. May. 8. 11: 06: 40. EDT. 2002. cl_ disk_ available. hdisk1. cascrg1 Acquiring resource: hdisk2 cl_ disk_ available Search on: Wed. May. 8. 11: 06: 40. EDT. 2002. cl_ disk_ available. hdisk2. cascrg2 Resource online: hdisk1 cl_ disk_ available Search on: Wed. May. 8. 11: 06: 42. EDT. 2002. cl_ disk_ available. hdisk1. cascrg1 Resource online: hdisk2 cl_ disk_ available Search on: Wed. May. 8. 11: 06: 43. EDT. 2002. cl_ disk_ available. hdisk2. cascrg2As shown here, all processed resource groups are listed first, followed by the individual resources that are being processed.
Job Types: Parallel Resource Group Processing
The process_resources event script uses different JOB_TYPES that are launched during parallel processing of resource groups.
If resource group dependencies or sites are configured in the cluster, it is also useful to check the event preamble which lists the plan of action the Cluster Manager will follow to process the resource groups for a given event.
Job types are listed in the hacmp.out log file and help you identify the sequence of events that take place during acquisition or release of different types of resources. Depending on the cluster's resource group configuration, you may see many specific job types that take place during parallel processing of resource groups.
There is one job type for each resource type: DISKS, FILESYSTEMS, TAKEOVER_LABELS, TAPE_RESOURCES, AIX_FAST_CONNECTIONS, APPLICATIONS, COMMUNICATION_LINKS, CONCURRENT_VOLUME_GROUPS, EXPORT_FILESYSTEMS, and MOUNT_FILESYSTEMS. There are also a number of job types that are used to help capitalize on the benefits of parallel processing: SETPRKEY, TELINIT, SYNC_VGS, LOGREDO, and UPDATESTATD. The related operations are now run once per event, rather than once per resource group. This is one of the primary areas of benefit from parallel resource group processing, especially for small clusters. The following sections describe some of the most common job types in more detail and provide abstracts from the events in the hacmp.out log file which include these job types.
JOB_TYPE=ONLINE
In the complete phase of an acquisition event, after all resources for all resource groups have been successfully acquired, the ONLINE job type is run. This job ensures that all successfully acquired resource groups are set to the online state. The RESOURCE_GROUPS variable contains the list of all groups that were acquired.
:process_resources[1476] clRGPA:clRGPA[48] [[ high = high ]]:clRGPA[48] version= 1. 16:clRGPA[50] usingVer= clrgpa:clRGPA[55] clrgpa:clRGPA[56] exit 0:process_resources[1476] eval JOB_TYPE= ONLINE RESOURCE_GROUPS=" cascrg1 cascrg2 conc_ rg1":process_resources[1476] JOB_TYPE= ONLINE RESOURCE_GROUPS= cascrg1 cascrg2 conc_rg1 :process_resources[1478] RC= 0:process_resources[1479] set +a:process_resources[1481] [ 0 -ne 0 ]:process_resources[1700] set_resource_group_state UPJOB_TYPE= OFFLINE
In the complete phase of a release event, after all resources for all resource groups have been successfully released, the OFFLINE job type is run. This job ensures that all successfully released resource groups are set to the offline state. The RESOURCE_GROUPS variable contains the list of all groups that were released.
conc_rg1 :process_resources[1476] clRGPAconc_rg1 :clRGPA[48] [[ high = high ]]conc_rg1 :clRGPA[48] version= 1. 16conc_rg1 :clRGPA[50] usingVer= clrgpaconc_rg1 :clRGPA[55] clrgpaconc_rg1 :clRGPA[56] exit 0conc_rg1 :process_resources[1476] eval JOB_TYPE= OFFLINE RESOURCE_GROUPS=" cascrg2 conc_ rg1"conc_ rg1:process_resources[1476] JOB_TYPE= OFFLINE RESOURCE_GROUPS= cascrg2 conc_rg1conc_ rg1 :process_resources[1478] RC= 0conc_rg1 :process_resources[1479] set +aconc_rg1 :process_resources[1481] [ 0 -ne 0 ]conc_rg1 :process_resources[1704] set_resource_group_state DOWNJOB_TYPE=ERROR
If an error occurred during the acquisition or release of any resource, the ERROR job type is run. The variable RESOURCE_GROUPS contains the list of all groups where acquisition or release failed during the current event. These resource groups are moved into the error state. When this job is run during an acquisition event, HACMP uses the Recovery from Resource Group Acquisition Failure feature and launches an rg_move event for each resource group in the error state. For more information, see the Handling of Resource Group Acquisition Failures section in Appendix B: Resource Group Behavior During Cluster Events in the
Administration Guide.conc_rg1: process_resources[1476] clRGPAconc_rg1: clRGPA[50] usingVer= clrgpaconc_rg1: clRGPA[55] clrgpaconc_rg1: clRGPA[56] exit 0conc_rg1: process_resources[1476] eval JOB_ TYPE= ERROR RESOURCE_GROUPS=" cascrg1"conc_rg1: process_resources[1476] JOB_TYPE= ERROR RESOURCE_GROUPS= cascrg1conc_rg1: process_resources[1478] RC= 0conc_rg1: process_resources[1479] set +aconc_rg1: process_resources[1481] [ 0 -ne 0 ]conc_rg1: process_resources[1712] set_resource_group_state ERRORJOB_TYPE=NONE
After all processing is complete for the current process_resources script, the final job type of NONE is used to indicate that processing is complete and the script can return. When exiting after receiving this job, the process_resources script always returns 0 for success.
conc_rg1: process_resources[1476] clRGPAconc_rg1: clRGPA[48] [[ high = high ]]conc_rg1: clRGPA[48] version= 1.16conc_rg1: clRGPA[50] usingVer= clrgpaconc_rg1: clRGPA[55] clrgpaconc_rg1: clRGPA[56] exit 0conc_rg1: process_resources[1476] eval JOB_TYPE= NONEconc_rg1: process_resources[1476] JOB_TYPE= NONEconc_rg1: process_resources[1478] RC= 0conc_rg1: process_resources[1479] set +aconc_rg1: process_resources[1481] [ 0 -ne 0 ]conc_rg1: process_resources[1721] breakconc_rg1: process_resources[1731] exit 0JOB_TYPE=ACQUIRE
The ACQUIRE job type occurs at the beginning of any resource group acquisition event. Search hacmp. out for JOB_ TYPE= ACQUIRE and view the value of the RESOURCE_ GROUPS variable to see a list of which resource groups are being acquired in parallel during the event.
:process_resources[1476] clRGPA:clRGPA[48] [[ high = high ]]:clRGPA[48] version= 1. 16:clRGPA[50] usingVer= clrgpa:clRGPA[55] clrgpa:clRGPA[56] exit 0:process_resources[1476] eval JOB_TYPE= ACQUIRE RESOURCE_GROUPS=" cascrg1 cascrg2":process_resources[1476] JOB_TYPE= ACQUIRE RESOURCE_GROUPS= cascrg1 cascrg2:process_resources[1478] RC= 0:process_resources[1479] set +a:process_resources[1481] [ 0 -ne 0 ]:process_resources[1687] set_resource_group_state ACQUIRINGJOB_TYPE=RELEASE
The RELEASE job type occurs at the beginning of any resource group release event. Search hacmp. out for JOB_ TYPE= RELEASE and view the value of the RESOURCE_ GROUPS variable to see a list of which resource groups are being released in parallel during the event.
:process_resources[1476] clRGPA:clRGPA[48] [[ high = high ]]:clRGPA[48] version= 1. 16:clRGPA[50] usingVer= clrgpa:clRGPA[55] clrgpa:clRGPA[56] exit 0:process_resources[1476] eval JOB_ TYPE= RELEASE RESOURCE_ GROUPS=" cascrg1 cascrg2":process_resources[1476] JOB_ TYPE= RELEASE RESOURCE_ GROUPS= cascrg1 cascrg2:process_resources[1478] RC= 0:process_resources[1479] set +a:process_resources[1481] [ 0 -ne 0 ]:process_resources[1691] set_ resource_ group_ state RELEASINGJOB_TYPE= SSA_FENCE
The SSA_FENCE job type is used to handle fencing and unfencing of SSA disks. The variable ACTION indicates what should be done to the disks listed in the HDISKS variable. All resources groups (both parallel and serial) use this method for disk fencing.
:process_resources[1476] clRGPA FENCE:clRGPA[48] [[ high = high ]]:clRGPA[55] clrgpa FENCE:clRGPA[56] exit 0:process_resources[1476] eval JOB_TYPE= SSA_ FENCE ACTION= ACQUIRE HDISKS=" hdisk6" RESOURCE_GROUPS=" conc_ rg1 " HOSTS=" electron":process_ resources[1476] JOB_TYPE= SSA_FENCE ACTION= ACQUIRE HDISKS= hdisk6 RESOURCE_GROUPS= conc_rg1 HOSTS=electron:process_ resources[1478] RC= 0:process_ resources[1479] set +a:process_ resources[1481] [ 0 -ne 0 ]:process_ resources[1675] export GROUPNAME= conc_ rg1 conc_ rg1:process_ resources[1676] process_ ssa_ fence ACQUIRENote: Notice that disk fencing uses the process_resources script, and, therefore, when disk fencing occurs, it may mislead you to assume that resource processing is taking place, when, in fact, only disk fencing is taking place. If disk fencing is enabled, you will see in the hacmp.out file that the disk fencing operation occurs before any resource group processing.
Although the process_ resources script handles SSA disk fencing, the resource groups are processed serially. cl_ ssa_ fence is called once for each resource group that requires disk fencing. The hacmp.out content indicates which resource group is being processed.
conc_ rg1: process_resources[8] export GROUPNAMEconc_ rg1: process_resources[10] get_ list_ head hdisk6conc_ rg1: process_resources[10] read LIST_OF_HDISKS_ FOR_ RGconc_ rg1: process_resources[11] read HDISKSconc_ rg1: process_resources[11] get_ list_ tail hdisk6conc_ rg1: process_resources[13] get_ list_ head electronconc_ rg1: process_resources[13] read HOST_ FOR_ RGconc_ rg1: process_resources[14] get_ list_ tail electronconc_ rg1: process_resources[14] read HOSTSconc_ rg1: process_resources[18] cl_ ssa_fence ACQUIRE electron hdisk6conc_ rg1: cl_ssa_fence[43] version= 1. 9. 1. 2conc_ rg1: cl_ssa_fence[44]conc_ rg1: cl_ssa_fence[44]conc_ rg1: cl_ssa_fence[46] STATUS= 0conc_ rg1: cl_ssa_fence[48] (( 3 < 3conc_ rg1: cl_ssa_fence[56] OPERATION= ACQUIREJOB_TYPE=SERVICE_LABELS
The SERVICE_LABELS job type handles the acquisition or release of service labels. The variable ACTION indicates what should be done to the service IP labels listed in the IP_LABELS variable.
conc_ rg1: process_ resources[ 1476] clRGPAconc_ rg1: clRGPA[ 55] clrgpaconc_ rg1: clRGPA[ 56] exit 0conc_ rg1: process_ resources[ 1476] eval JOB_ TYPE= SERVICE_ LABELS ACTION= ACQUIRE IP_ LABELS=" elect_ svc0: shared_ svc1, shared_ svc2"RESOURCE_ GROUPS=" cascrg1 rotrg1" COMMUNICATION_ LINKS=": commlink1"conc_ rg1: process_ resources[1476] JOB_ TYPE= SERVICE_ LABELS ACTION= ACQUIRE IP_ LABELS= elect_ svc0: shared_ svc1, shared_ svc2RESOURCE_ GROUPS= cascrg1 rotrg1 COMMUNICATION_ LINKS=: commlink1conc_ rg1: process_ resources[1478] RC= 0conc_ rg1: process_ resources[1479] set +aconc_ rg1: process_ resources[1481] [ 0 -ne 0 ]conc_ rg1: process_ resources[ 1492] export GROUPNAME= cascrg1This job type launches an acquire_service_addr event. Within the event, each individual service label is acquired. The content of the hacmp.out file indicates which resource group is being processed. Within each resource group, the event flow is the same as it is under serial processing.
cascrg1: acquire_service_addr[ 251] export GROUPNAMEcascrg1: acquire_service_addr[251] [[ true = true ]]cascrg1: acquire_service_addr[254] read SERVICELABELScascrg1: acquire_service_addr[254] get_ list_ head electron_ svc0cascrg1: acquire_service_addr[255] get_ list_ tail electron_ svc0cascrg1: acquire_service_addr[255] read IP_ LABELScascrg1: acquire_service_addr[257] get_ list_ headcascrg1: acquire_service_addr[257] read SNA_ CONNECTIONScascrg1: acquire_service_addr[258] export SNA_ CONNECTIONScascrg1: acquire_service_addr[259] get_ list_ tailcascrg1: acquire_service_addr[259] read _SNA_ CONNECTIONScascrg1: acquire_service_addr[270] clgetif -a electron_ svc0JOB_TYPE=VGS
The VGS job type handles the acquisition or release of volume groups. The variable ACTION indicates what should be done to the volume groups being processed, and the names of the volume groups are listed in the VOLUME_GROUPS and CONCURRENT_VOLUME_GROUPS variables.
conc_rg1 :process_resources[1476] clRGPAconc_rg1 :clRGPA[55] clrgpaconc_rg1 :clRGPA[56] exit 0conc_rg1 :process_resources[1476] eval JOB_TYPE= VGS ACTION= ACQUIRE CONCURRENT_VOLUME_GROUP=" con_vg6" VOLUME_GROUPS=""casc_vg1: casc_vg2" RESOURCE_GROUPS=" cascrg1 cascrg2 " EXPORT_FILESYSTEM=""conc_rg1 :process_resources[1476] JOB_TYPE= VGS ACTION= ACQUIRE CONCURRENT_VOLUME_GROUP= con_vg6 VOLUME_GROUPS= casc_vg1: casc_ vg2 RESOURCE_GROUPS= cascrg1 cascrg2 EXPORT_FILESYSTEM=""conc_rg1 :process_resources[1478] RC= 0conc_rg1 :process_resources[1481] [ 0 -ne 0 ]conc_rg1 :process_resources[1529]export GROUPNAME= cascrg1 cascrg2This job type runs the cl_activate_vgs event utility script, which acquires each individual volume group. The content of the hacmp.out file indicates which resource group is being processed, and within each resource group, the script flow is the same as it is under serial processing.
cascrg1 cascrg2 :cl_activate_vgs[256] 1> /usr/ es/ sbin/ cluster/ etc/ lsvg. out. 21266 2> /tmp/ lsvg. errcascrg1: cl_activate_vgs[260] export GROUPNAMEcascrg1: cl_activate_vgs[262] get_ list_headcasc_vg1: casc_vg2cascrg1: cl_activate_vgs[ 62] read_LIST_OF_VOLUME_GROUPS_FOR_RGcascrg1: cl_activate_vgs[263] get_list_tail casc_vg1: casc_vg2cascrg1: cl_activate_vgs[263] read VOLUME_GROUPScascrg1: cl_activate_vgs[265] LIST_OF_VOLUME_GROUPS_ FOR_ RG= cascrg1: cl_activate_vgs[ 270] fgrep -s -x casc_ vg1 /usr/ es/ sbin/ cluster/ etc/ lsvg. out. 21266cascrg1: cl_activate_vgs[275] LIST_OF_VOLUME_GROUPS_FOR_RG= casc_vg1cascrg1: cl_activate_vgs[275] [[ casc_ vg1 = ]]Disk Fencing with Serial or Parallel Processing
Disk fencing with either serial or parallel processing uses the process_resources script with the JOB_TYPE=SSA_FENCE as described in the previous section.
Processing in Clusters with Dependent Resource Groups or Sites
Resource groups in clusters with dependent groups or sites configured are handled with dynamic event phasing. These events process one or more resource groups at a time. Multiple non-concurrent resource groups can be processed within one rg_move event.
If you specify serial order of processing (HACMP to use clsetenvgrp) and have dependent resource groups configured, make sure that the serial order does not contradict the dependency specified. The resource groups dependency overrides any customized serial order in the cluster.
Also, see the examples for handling resource groups with location dependencies in the Appendix: Applications and HACMP in the Planning Guide.
Processing Replicated Resource Groups
HACMP uses rg_move events for dynamic processing of replicated resources.
JOB_TYPE=SIBLINGS provides the interface variables to the HACMP/XD product in the event script's environment and prints the appropriate SIBLING variables:
SIBLING_GROUPS; (example: rg1 rg2)
SIBLING_NODES_BY_GROUP; (example: n1 : n2) Note: colon separator
SIBLING_RELEASING_GROUPS; (example: rg4 rg5)
SIBLING_RELEASING_NODES_BY_GROUP; (example: n3 : n4) Note: colon separator
SIBLING_ACQUIRING_GROUPS; (example: rg4 rg5)
SIBLING_ACQUIRING_NODES_BY_GROUP; (example: n3 : n4) Note: colon separator
These variables are used only with the process_resource code path. Once The Cluster Manager sends this data to the event scripts a call to clsetrepenv sets the environment for a specific resource group. The SIBLING variables are printed to the environment even though the local node is not processing any resource groups. They reflect the environment values at the peer site.
For JOB_TYPE=ACQUIRE, along with other variables that are currently set in the environment the following variables are set on each node (both in node_up and rg_move acquire):
SIBLING_GROUPS
Every resource group that has a non-ignore site policy appears in this list of group names in the HACMP event if the resource group is in either ONLINE or ONLINE_SECONDARY state on the peer site.
SIBLING_NODES_BY_GROUP
For every resource group listed in SIBLING_GROUPS, the SIBLING_NODES_BY_GROUP variable lists the node that hosts the resource group (in either ONLINE or ONLINE_SECONDARY state).
SIBLING_ACQUIRE_GROUPS
SIBLING_ACQUIRE_NODES_BY_GROUP
These sets of variables provide a picture of resource group actions on the peer site during the course of the local event during the acquire phase.
For JOB_TYPE=RELEASE the following variables are used (both in node_down and rg_move release):
SIBLING_GROUPS
SIBLING_NODES_BY_GROUP
SIBLING_RELEASE_GROUPS
SIBLING_RELEASE_NODES_BY_GROUP
On a per resource group basis the following variables are tracked:
SIBLING_NODES
SIBLING_NON_OWNER_NODES
SIBLING_ACQURING_GROUPS or SIBLING_RELEASING_GROUPS
SIBLING_ACQUIRING_NODES_BY_GROUP or SIBLING_RELEASING_GROUPS_BY_NODE
Sample Event with Siblings Output to hacmp.out
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Mar 28 09:40:42 EVENT START: rg_move a2 1ACQUIRE xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx :process_resources[1952] eval JOB_TYPE=ACQUIRE RESOURCE_GROUPS="rg3" SIBLING_GROUPS="rg1 rg3" SIBLING_NODES_BY_GROUP="b2 : b2" SIBLING_ACQUIRING_GROUPS="" SIBLING_ACQUIRING_NODES_BY_GROUP="" PRINCIPAL_ACTION="ACQUIRE" AUXILLIARY_ACTION="NONE" :process_resources[1952] JOB_TYPE=ACQUIRE RESOURCE_GROUPS=rg3 SIBLING_GROUPS=rg1 rg3 SIBLING_NODES_BY_GROUP=b2 : b2 SIBLING_ACQUIRING_GROUPS= SIBLING_ACQUIRING_NODES_BY_GROUP= PRINCIPAL_ACTION=ACQUIRE AUXILLIARY_ACTION=NONE xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx :rg_move_complete[157] eval FORCEDOWN_GROUPS="" RESOURCE_GROUPS="" HOMELESS_GROUPS="" ERRSTATE_GROUPS="" PRINCIPAL_ACTIONS="" ASSOCIATE_ACTIONS="" AUXILLIARY_ACTIONS="" SIBLING_GROUPS="rg1 rg3" SIBLING_NODES_BY_GROUP="b2 : b2" SIBLING_ACQUIRING_GROUPS="" SIBLING _ACQUIRING_NODES_BY_GROUP="" SIBLING_RELEASING_GROUPS="" SIBLING_RELEASING_NODES_BY_GROUP="" :rg_move_complete[157] FORCEDOWN_GROUPS= RESOURCE_GROUPS= HOMELESS_GROUPS= ERRSTATE_GROUPS= PRINCIPAL_ACTIONS= ASSOCIATE_ACTIONS= AUXILLIARY_ACTIONS= SIBLING_GROUPS=rg1 rg3 SIBLING_NODES_BY_GROUP=b2 : b2 SIBLING_ACQUIRING_GROUPS= SIBLING_ACQUIRING_NODES_BY_GROUP = SIBLING_RELEASING_GROUPS= SIBLING_RELEASING_NODES_BY_GROUP= xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx :process_resources[1952] eval JOB_TYPE=SYNC_VGS ACTION=ACQUIRE VOLUME_GROUPS="vg3,vg3sm" RESOURCE_GROUPS="rg3 " :process_resources[1952] JOB_TYPE=SYNC_VGS ACTION=ACQUIRE_VOLUME_GROUPS=vg3,vg3sm RESOURCE_GROUPS=rg3 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx rg3:process_resources[1952] eval JOB_TYPE=ONLINE RESOURCE_GROUPS="rg3" rg3:process_resources[1952] JOB_TYPE=ONLINE RESOURCE_GROUPS=rg3 rg3:process_resources[1954] RC=0 rg3:process_resources[1955] set +a rg3:process_resources[1957] [ 0 -ne 0 ] rg3:process_resources[2207] set_resource_group_state UP rg3:process_resources[3] STAT=0 rg3:process_resources[6] export GROUPNAME rg3:process_resources[7] [ UP != DOWN ] rg3:process_resources[9] [ REAL = EMUL ] rg3:process_resources[14] clchdaemons -d clstrmgr_scripts -t resource_locator -n a1 -o rg3 -v UP rg3:process_resources[15] [ 0 -ne 0 ] rg3:process_resources[26] [ UP = ACQUIRING ] rg3:process_resources[31] [ UP = RELEASING ] rg3:process_resources[36] [ UP = UP ] rg3:process_resources[38] cl_RMupdate rg_up rg3 process_resources Reference string: Sun.Mar.27.18:02:09.EST.2005.process_resources.rg3.ref rg3:process_resources[39] continue rg3:process_resources[80] return 0 rg3:process_resources[1947] true rg3:process_resources[1949] set -a rg3:process_resources[1952] clRGPA rg3:clRGPA[33] [[ high = high ]] rg3:clRGPA[33] version=1.16 rg3:clRGPA[35] usingVer=clrgpa rg3:clRGPA[40] clrgpa rg3:clRGPA[41] exit 0 rg3:process_resources[1952] eval JOB_TYPE=NONE rg3:process_resources[1952] JOB_TYPE=NONE rg3:process_resources[1954] RC=0 rg3:process_resources[1955] set +a rg3:process_resources[1957] [ 0 -ne 0 ] rg3:process_resources[2256] break rg3:process_resources[2267] [[ FALSE = TRUE ]] rg3:process_resources[2273] exit 0 :rg_move_complete[346] STATUS=0 :rg_move_complete[348] exit 0 Mar 27 18:02:10 EVENT COMPLETED: rg_move_complete a1 2 0Managing a Node’s HACMP Log File Parameters
Each cluster node supports two log file parameters. These allow you to:
Set the level of debug information output by the HACMP scripts. By default, HACMP sets the debug information parameter to high, which produces detailed output from script execution. Set the output format for the hacmp.out log file. To change the log file parameters for a node:
1. Enter the fastpath smit hacmp
2. In SMIT, select Problem Determination Tools > HACMP Log Viewing and Management > Change/Show HACMP Log File Parameters and press Enter.
3. Select a node from the list.
4. Enter field values as follows:
5. Press Enter to add the values into the HACMP for AIX 5L Configuration Database.
6. Return to the main HACMP menu. Select Extended Configuration > Extended Verification and Synchronization.
The software checks whether cluster services are running on any cluster node. If so, there will be no option to skip verification.
7. Select the options you want to use for verification and Press Enter to synchronize the cluster configuration and node environment across the cluster. See Chapter 7: Verifying and Synchronizing a Cluster Configuration in the Administration Guide for complete information on this operation.
Logging for clcomd
Logging for the clcomd daemon to clcomd.log and clcomddiag.log is turned on by default. The information in clcomd.log provides information about all connections to and from the daemon, including information for the initial connections established during discovery. Because clcomddiag.log contains diagnostic information for the daemon, you usually do not use this file in troubleshooting situations.
The following example shows the type of output generated in the clcomd.log file. The second and third entries are generated during the discovery process.
Wed May 7 12:43:13 2003: Daemon was successfully started Wed May 7 12:44:10 2003: Trying to establish connection to node temporarynode0000001439363040 Wed May 7 12:44:10 2003: Trying to establish connection to node temporarynode0000002020023310 Wed May 7 12:44:10 2003: Connection to node temporarynode0000002020023310, success, 192.0.24.4-> Wed May 7 12:44:10 2003: CONNECTION: ACCEPTED: test2: 192.0.24.4->192.0.24.4 Wed May 7 12:44:10 2003: WARNING: /usr/es/sbin/cluster/etc/rhosts permissions must be -rw------- Wed May 7 12:44:10 2003: Connection to node temporarynode0000001439363040: closed Wed May 7 12:44:10 2003: Connection to node temporarynode0000002020023310: closed Wed May 7 12:44:10 2003: CONNECTION: CLOSED: test2: 192.0.24.4->192.0.24.4 Wed May 7 12:44:11 2003: Trying to establish connection to node test1 Wed May 7 12:44:11 2003: Connection to node test1, success, 192.0.24.4->192.0.24.5 Wed May 7 12:44:11 2003: Trying to establish connection to node test3.You can view the content of the clcomd.log or clcomddiag.log file by using the AIX 5L vi or more commands.
You can turn off logging to clcomddiag.log temporarily (until the next reboot, or until you enable logging for this component again) by using the AIX 5L tracesoff command. To permanently stop logging to clcomddiag.log, start the daemon from SRC without the -d flag by using the following command:
Redirecting HACMP Cluster Log Files
During normal operation, HACMP produces several output log files that you can use to monitor and debug your systems. You can store a cluster log in a location other than its default directory if you so choose. If you do this, keep in mind that the minimum disk space for most cluster logs is 2MB. 14MB is recommended for hacmp.out.
Note: Logs should be redirected to local filesystems and not to shared or NFS filesystems. Having logs on those filesystems may cause problems if the filesystem needs to unmount during a fallover event. Redirecting logs to NFS filesystems may also prevent cluster services from starting during node reintegration.
The log file redirection function does the following:
Checks the location of the target directory to determine whether it is part of a local or remote filesystem. Performs a check to determine whether the target directory is managed by HACMP. If it is, any attempt to redirect a log file will fail. Checks to ensure that the target directory is specified using an absolute path (such as “/mylogdir”) as opposed to a relative path (such as “mylogdir”). These checks decrease the possibility that the chosen filesystem may become unexpectedly unavailable.
Note: The target directory must have read-write access.
Be sure to synchronize the cluster directly before redirecting a log file in order to avoid failure of the redirection process.
Steps for Redirecting a Cluster Log File
To redirect a cluster log from its default directory to another destination, take the following steps:
1. Enter smit hacmp
2. In SMIT, select System Management (C-SPOC) > HACMP Log Viewing and Management > Change/Show a Cluster Log Directory. SMIT displays a picklist of cluster log files with a short description of each:
3. Select a log that you want to redirect.
SMIT displays a panel with the selected log’s name, description, default pathname, and current directory pathname. The current directory pathname will be the same as the default pathname if you do not change it. This panel also asks you to specify whether to allow this log on a remote filesystem (mounted locally using AFS, DFS, or NFS).The default value
is false.
Note: Use of a non-local filesystem for HACMP logs will prevent log information from being collected if the filesystem becomes unavailable. To ensure that cluster services are started during node reintegration, log files should be redirected to local filesystems, and not to NFS filesystems.
The example below shows the cluster.mmddyyyy log file panel. Edit the fourth field to change the default pathname.
4. Press Enter to add the values to the HACMP for AIX 5L Configuration Database.
5. Return to the panel to select another log to redirect, or return to the Cluster System Management panel to proceed to the panel for synchronizing cluster resources.
6. After you change a log directory, a prompt appears reminding you to synchronize cluster resources from this node (Cluster log Configuration Databases must be identical across the cluster). The cluster log destination directories as stored on this node will be synchronized to all nodes in the cluster.
Log destination directory changes will take effect when you synchronize cluster resources.
![]() ![]() ![]() |