![]() ![]() ![]() |
Chapter 8: Testing an HACMP Cluster
This chapter describes how to use the Cluster Test Tool to test the recovery capabilities of an HACMP cluster. The Cluster Test Tool is available for you to test a new cluster before it becomes part of your production environment, and to test configuration changes to an existing cluster, when the cluster is not in service.
The main sections of the chapter include:
Prerequisites
The Cluster Test Tool runs only on a cluster that has:
HACMP 5.2 or greater installed If the cluster is migrated from an earlier version, cluster migration must be complete. If you used the Cluster Test Tool in previous releases, the custom test plans that you created in previous releases continue to work in HACMP v.5.4. The cluster configuration verified and synchronized. Before you run the tool on a cluster node, ensure that:
The node has HACMP installed and is part of the HACMP cluster to be tested. The node has network connectivity to all of the other nodes in the HACMP cluster. You have root permissions. Because log file entries include time stamps, consider synchronizing the clocks on the cluster nodes to make it easier to review log file entries produced by test processing.
Overview
The Cluster Test Tool utility lets you test an HACMP cluster configuration to evaluate how a cluster operates under a set of specified circumstances, such as when cluster services on a node fail or when a node loses connectivity to a cluster network. You can start a test, let it run unattended, and return later to evaluate the results of your testing. You should run the tool under both low load and high load conditions to observe how system load affects your HACMP cluster.
You run the Cluster Test Tool from SMIT on one node in an HACMP cluster. For testing purposes, this node is referred to as the control node. From the control node, the tool runs a series of specified tests—some on other cluster nodes, gathers information about the success or failure of the tests processed, and stores this information in the Cluster Test Tool log file for evaluation or future reference.
The Cluster Test Tool lets you test an HACMP cluster in two ways, by running:
Automated testing (also known as Automated Test Tool). In this mode, the Cluster Test Tool runs a series of predefined sets of tests on the cluster. Custom testing (also known as Test Plan). In this mode, you can create your own test plan, or a custom testing routine, that will include different tests available in the Cluster Test Tool library. Automated Testing
Use the automated test procedure (a predefined set of tests) supplied with the tool to perform basic cluster testing on any cluster. No setup is required. You simply run the test from SMIT and view test results from SMIT and the Cluster Test Tool log file.
The automated test procedure runs a predefined set of tests on a node that the tool randomly selects. The tool ensures that the node selected for testing varies from one test to another. For information about automated testing, see the section Running Automated Tests.
Custom Testing
If you are an experienced HACMP administrator and want to tailor cluster testing to your environment, you can create custom tests that can be run from SMIT. You create a custom test plan (a file that lists a series of tests to be run), to meet requirements specific to your environment and apply that test plan to any number of clusters. You specify the order in which tests run and the specific components to be tested. After you set up your custom test environment, you run the test procedure from SMIT and view test results in SMIT and in the Cluster Test Tool log file. For information about customized testing, see the section Setting up Custom Cluster Testing.
Test Duration
Running automated testing on a basic two-node cluster that has a simple cluster configuration takes approximately 30 to 60 minutes to complete. Individual tests can take around three minutes to run. The following conditions affect the length of time to run the tests:
Cluster complexity Latency on the network Cluster testing relies on network communication between the nodes. Any degradation in network performance slows the performance of the Cluster Test Tool.
Use of verbose logging for the tool If you customize verbose logging to run additional commands from which to capture output, testing takes longer to complete. In general, the more commands you add for verbose logging, the longer a test procedure takes to complete.
Manual intervention on the control node At some points in the test, you may need to intervene. See Recovering the Control Node after Cluster Manager Stops for ways to avoid this situation.
Running custom tests If you run a custom test plan, the number of tests run also affects the time required to run the test procedure. If you run a long list of tests, or if any of the tests require a substantial amount of time to complete, then the time to process the test plan increases.
Security
The Cluster Test Tool uses the HACMP Cluster Communications daemon to communicate between cluster nodes to protect the security of your HACMP cluster. For information about the Cluster Communications Daemon, see Chapter 16: Managing User and Groups.
Limitations
The Cluster Test Tool has the following limitations. It does not support testing of the following HACMP cluster-related components:
High Performance Switch (HPS) networks ATM networks Sites. You can perform general cluster testing for clusters that support sites, but not testing specific to HACMP sites or any of the HACMP/XD products. HACMP/XD for Metro Mirror HACMP/XD for GLVM, and HACMP/XD for HAGEO all use sites in their cluster configuration.
Replicated resources. You can perform general cluster testing for clusters that include replicated resources, but not testing specific to replicated resources or any of the HACMP/XD products. HACMP/XD for Metro Mirror, HACMP/XD for HAGEO, and HACMP/XD for GLVM all include replicated resources in their cluster configuration.
Dynamic cluster reconfiguration. Pre-events and post-events. Pre-events and post-events run in the usual way, but the tool does not verify that the events were run or that the correct action was taken.
In addition, the Cluster Test Tool may not recover from the following situations:
A node that fails unexpectedly, that is a failure not initiated by testing The cluster does not stabilize. Note: The Cluster Test Tool uses the terminology for stopping cluster services that was used in HACMP prior to v.5.4 (graceful stop, graceful with takeover and forced stop). For information how this terminology maps to the currently used terms for stopping the cluster services, see Chapter 9: Starting and Stopping Cluster Services.
Running Automated Tests
You can run the automated test procedure on any HACMP cluster that is not currently in service. The Cluster Test Tool runs a specified set of tests and randomly selects the nodes, networks, resource groups, and so forth for testing. The tool tests different cluster components during the course of the testing. For a list of the tests that are run, see the section Understanding Automated Testing.
Before you start running an automated test:
Ensure that the cluster is not in service in a production environment Stop HACMP cluster services, this is recommended but optional. Note that if the Cluster Manager is running, some of the tests will be irrational for your configuration, but the Test Tool will continue to run. Cluster nodes are attached to two IP networks. One network is used to test a network becoming unavailable then available. The second network provides network connectivity for the Cluster Test Tool. Both networks are tested, one at a time.
Launching the Cluster Test Tool
To run the automated test procedure:
1. Enter smit hacmp
2. In SMIT, select Initialization and Standard Configuration > HACMP Cluster Test Tool and press Enter.
3. Evaluate the test results.
Modifying Logging and Stopping Processing in the Cluster Test Tool
You can also modify processing for automated test procedure to:
Turn off verbose logging Turn off cycling of log files for the tool Stop processing tests after the first test fails To modify processing for an automated test:
1. Enter smit hacmp
2. In SMIT, select either one of the following options:
Extended Configuration Problem Determination Tools 3. In the HACMP Cluster Test Tool panel, select Execute Automated Test Procedure.
4. In the Execute Automated Test Procedure panel, enter field values as follows:
Verbose Logging When set to yes, includes additional information in the log file. This information may help to judge the success or failure of some tests. For more information about verbose logging and how to modify it for your testing, see the section Error Logging.Select no to decrease the amount of information logged by the Cluster Test Tool.The default is yes. Cycle Log File When set to yes, uses a new log file to store output from the Cluster Test Tool.Select no to append messages to the current log file.The default is yes.For more information about cycling the log file, see the section Log File Rotation. Abort on Error When set to no, the Cluster Test Tool continues to run tests after some of the tests being run fail. This may cause subsequent tests to fail because the cluster state is different from the one expected by one of those tests.Select yes to stop processing after the first test fails.For information about the conditions under which the Cluster Test Tool stops running, see the section Cluster Test Tool Stops Running.The default is no.Note: The tool stops running and issues an error if a test fails and Abort on Error is selected.
5. Press Enter to start running the automated tests.
6. Evaluate the test results.
Understanding Automated Testing
This section lists the sequence that the Cluster Test Tool uses for the automated testing, and describes the syntax of the tests run during automated testing.
The automated test procedure performs sets of predefined tests in the following order:
1. General topology tests
2. Resource group tests on non-concurrent resource groups
3. Resource group tests on concurrent resource groups
4. IP-type network tests for each network
5. Non-IP network tests for each network
6. Volume group tests for each resource group
7. Site-specific tests
8. Catastrophic failure test.
The Cluster Test Tool discovers information about the cluster configuration, and randomly selects cluster components, such as nodes and networks, to be used in the testing.
Which nodes are used in testing varies from one test to another.The Cluster Test Tool may select some node(s) for the initial battery of tests, and then, for subsequent tests, it may intentionally select the same node(s), or, choose from nodes on which no tests were run previously. In general, the logic in the automated test sequence ensures that all components are sufficiently tested in all necessary combinations. The testing follows these rules:
Tests operation of a concurrent resource group on one randomly selected node—not all nodes in the resource group. Tests only those resource groups that include monitored application servers or volume groups. Requires at least two active IP networks in the cluster to test non-concurrent resource groups. The automated test procedure runs a node_up event at the beginning of the test to make sure that all cluster nodes are up and available for testing.
These sections list the tests in each group. For more information about a test, including the criteria to determine the success or failure of a test, see the section Description of Tests. The automated test procedure uses variables for parameters, with values drawn from the HACMP cluster configuration.
The examples in the following sections use variables for node, resource group, application server, stop script, and network names. For information about the parameters specified for a test, see the section Description of Tests.
General Topology Tests
The Cluster Test Tool runs the general topology tests in the following order:
1. Bring a node up and start cluster services on all available nodes
2. Stop cluster services on a node and bring resource groups offline.
3. Restart cluster services on the node that was stopped
4. Stop cluster services and move resource groups to another node
5. Restart cluster services on the node that was stopped
6. Stop cluster services on another node and place resource groups in an UNMANAGED state.
7. Restart cluster services on the node that was stopped.
The Cluster Test Tool uses the terminology for stopping cluster services that was used in HACMP in releases prior to v.5.4. For information on how the methods for stopping cluster services map to the terminology used in v.5.4, see Chapter 9: Starting and Stopping Cluster Services.
When the automated test procedure starts, the tool runs each of the following tests in the order shown:
1. NODE_UP, ALL, Start cluster services on all available nodes
2. NODE_DOWN_GRACEFUL, node1, Stop cluster services gracefully on a node
3. NODE_UP, node1, Restart cluster services on the node that was stopped
4. NODE_DOWN_TAKEOVER, node2, Stop cluster services with takeover on a node
5. NODE_UP, node2, Restart cluster services on the node that was stopped
6. NODE_DOWN_FORCED, node3, Stop cluster services forced on a node
7. NODE_UP, node3, Restart cluster services on the node that was stopped
Resource Group Tests
There are two groups of resource group tests that can be run. Which group of tests run depends on the startup policy for the resource group: non-concurrent and concurrent resource groups.
If a resource of the specified type does not exist in the resource group, the tool logs an error in the Cluster Test Tool log file.
Resource Group Starts on a Specified Node
The following tests run if the cluster includes one or more resource groups that have a startup management policy other than Online on All Available Nodes, that is, the cluster includes one or more non-concurrent resource groups.
The Cluster Test Tool runs each of the following tests in the order shown for each resource group:
1. Bring a resource group offline and online on a node.
RG_OFFLINE, RG_ONLINE2. Bring a local network down on a node to produce a resource group fallover.
NETWORK_DOWN_LOCAL, rg_owner, svc1_net, Selective fallover on local network down3. Recover the previously failed network.
NETWORK_UP_LOCAL, prev_rg_owner, svc1_net, Recover previously failed network4. Move a resource group to another node. RG_MOVE
5. Bring an application server down and recover from the application failure. SERVER_DOWN, ANY, app1, /app/stop/script, Recover from application failure
Resource Group Starts on All Available Nodes
If the cluster includes one or more resource groups that have a startup management policy of Online on All Available Nodes, that is, the cluster has concurrent resource groups, the tool runs one test that brings an application server down and recovers from the application failure.
The tool runs the following test:
RG_OFFLINE, RG_ONLINE
SERVER_DOWN, ANY, app1, /app/stop/script, Recover from application failure
Network Tests
The tool runs tests for IP networks and for non-IP networks.
For each IP network, the tool runs these tests:
Bring a network down and up. NETWORK_DOWN_GLOBAL, NETWORK_UP_GLOBALFail a network interface, join a network interface. This test is run for the service interface on the network. If no service interface is configured, the test uses a random interface defined on the network. FAIL_LABEL, JOIN_LABELFor each Non-IP network, the tool runs these tests:
Bring a non-IP network down and up. NETWORK_DOWN_GLOBAL, NETWORK_UP_GLOBALVolume Group Tests
For each resource group in the cluster, the tool runs tests that fail a volume group in the resource group: VG_DOWN
Site-Specific Tests
If sites are present in the cluster, the tool runs tests for them. The automated testing sequence that the Cluster Test Tool uses contains two site-specific tests:
auto_site. This sequence of tests runs if you have any cluster configuration with sites. For instance, this sequence is used for clusters with cross-site LVM mirroring configured that does not use XD_data networks. The tests in this sequence include: SITE_DOWN_GRACEFUL Stop the cluster services on all nodes in a site while taking resources offline SITE_UP Restart the cluster services on the nodes in a site SITE_DOWN_TAKEOVER Stop the cluster services on all nodes in a site and move the resources to nodes at another site SITE_UP Restart the cluster services on the nodes at a site RG_MOVE_SITE Move a resource group to a node at another site auto_site_isolation. This sequence of tests runs only if you configured sites and an XD-type network. The tests in this sequence include: SITE_ISOLATION Isolate sites by failing XD_data networks SITE_MERGE Merge sites by bringing up XD_data networks. Catastrophic Failure Test
As a final test, the tool stops the Cluster Manager on a randomly selected node that currently has at least one active resource group:
CLSTRMGR_KILL, node1, Kill the cluster manager on a node
If the tool terminates the Cluster Manager on the control node, you may need to reboot this node.
Setting up Custom Cluster Testing
If you want to extend cluster testing beyond the scope of the automated testing and you are an experienced HACMP administrator who has experience planning, implementing, and troubleshooting clusters, you can create a custom test procedure to test the HACMP clusters in your environment. You can specify the tests specific to your clusters, and use variables to specify parameters specific to each cluster. Using variables lets you extend a single custom test procedure to run on a number of different clusters. You the run the custom test procedure from SMIT.
Warning: If you uninstall HACMP, the program removes any files you may have customized for the Cluster Test Tool. If you want to retain these files, make a copy of these files before you uninstall HACMP.
Planning a Test Procedure
Before you create a test procedure, make sure that you are familiar with the HACMP clusters on which you plan to run the test. List the components in your cluster and have this list available when setting up a test. Include the following items in the list:
Nodes IP networks Non-IP networks XD-type networks Volume groups Resource groups Application servers Sites. Your test procedure should bring each component offline then online, or cause a resource group fallover, to ensure that the cluster recovers from each failure.
We recommend that your test start by running a node_up event on each cluster node to ensure that all cluster nodes are up and available for testing.
Creating a Custom Test Procedure
To create a custom test procedure:
1. Create a Test Plan, a file that lists the tests to be run.
2. Set values for test parameters.
Creating a Test Plan
A Test Plan is a text file that lists cluster tests to be run in the order in which they are listed in the file. In a Test Plan, specify one test per line. You can set values for test parameters in the Test Plan or use variables to set parameter values.
The tool supports the following tests:
For a full description of these tests, see the section Description of Tests.
Specifying Parameters for Tests
You can specify parameters for the tests in the Test Plan by doing one of the following:
Using a variables file. A variables file defines values for variables assigned to parameters in a test plan. See the section Using a Variables File. Setting values for test parameters as environment variables. See the section Using Environment Variables. Identifying values for parameters in the Test Plan. See the section Using the Test Plan. When the Cluster Test Tool starts, it uses a variables file if you specified the location of one in SMIT. If it does not locate a variables file, it uses values set in an environment variable. If a value is not specified in an environment variable, it uses the value in the Test Plan. If the value set in the Test Plan is not valid, the tool displays an error message.
Using a Variables File
The variables file is a text file that defines the values for test parameters. By setting parameter values in a separate variables file, you can use your Test Plan to test more than one cluster.
The entries in the file have this syntax:
parameter_name=value
For example, to specify a node as node_waltham:
To provide more flexibility, you can:
1. Set the name for a parameter in the Test Plan.
2. Assign the name to another value in the variables file.
For example, you could specify the value for node as node1 in the Test Plan:
NODE_UP,node1, Bring up node1
In the variables file, you can then set the value of node1 to node_waltham:
The following example shows a sample variables file:
Using Environment Variables
If you do not want to use a variables file, you can assign parameter values by setting environment variables for the parameter values. If a variable file is not specified, but there are parameter_name=values in the cluster environment that match the values in the test plan, the Cluster Test Tool will use the values from the cluster environment.
Using the Test Plan
If you want to run a test plan on only one cluster, you can define test parameters in the Test Plan. The associated test can be run only on the cluster that includes those cluster attributes specified. For information about the syntax for parameters for tests, see the section Description of Tests.
Description of Tests
The Test Plan supports the tests listed in this section. The description of each test includes information about the test parameters and the success indicators for a test.
Note: One of the success indicators for each test is that the cluster becomes stable. The definition of cluster stability takes a number of factors into account, beyond the state of the Cluster Manager. The clstat utility, by comparison, uses only the state of the Cluster Manager to assess stability. For information about the factors used to determine cluster stability for the Cluster Test Tool, see the section Evaluating Results.
Test Syntax
The syntax for a test is:
TEST_NAME, parameter1, parametern|PARAMETER, comments
where:
The test name is in uppercase letters. Parameters follow the test name. Italic text indicates parameters expressed as variables. Commas separate the test name from the parameters and the parameters from each other. (Note that the HACMP 5.4 Cluster Test Tool supports spaces around commas). The example syntax line shows parameters as parameter1 and parametern with n representing the next parameter. Tests typically have from two to four parameters.
A pipe ( | ) indicates parameters that are mutually exclusive alternatives. (Optional) Comments (user-defined text) appear at the end of the line. The Cluster Test Tool displays the text string when the Cluster Test Tool runs. In the test plan, the tool ignores:
Lines that start with a pound sign (#) Blank lines. Node Tests
The node tests start and stop cluster services on specified nodes.
NODE_UP, node | ALL, comments
Starts cluster services on a specified node that is offline or on all nodes that are offline.
node The name of a node on which cluster services start ALL Any nodes that are offline have cluster services start comments User-defined text to describe the configured test.
Example
Entrance Criteria
Any node to be started is inactive.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable The cluster services successfully start on all specified nodes No resource group enters the error state No resource group moves from online to offline. NODE_DOWN_GRACEFUL, node | ALL, comments
Stops cluster services on a specified node and brings resource groups offline.
Example
Entrance Criteria
Any node to be stopped is active.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster services stop on the specified node(s) Cluster services continue to run on other nodes if ALL is not specified Resource groups on the specified node go offline, and do not move to other nodes Resource groups on other nodes remain in the same state. NODE_DOWN_TAKEOVER, node, comments
Stops cluster services on a specified node with a resource group acquired by another node as configured, depending on resource availability.
node The name of a node on which to stop cluster services comments User-defined text to describe the configured test.
Example
Entrance Criteria
The specified node is active.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster services stop on the specified node Cluster services continue to run on other nodes All resource groups remain in the same state. NODE_DOWN_FORCED, node, comments
Stops cluster services on a specified node and places resource groups in an UNMANAGED state. Resources on the node remain online, that is they are not released.
node The name of a node on which to stop cluster services comments User-defined text to describe the configured test.
Example
Entrance Criteria
Cluster services on another node have not already been stopped with its resource groups placed in an UNMANAGED state. The specified node is active.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable The resource groups on the node change to UNMANAGED state Cluster services stop on the specified node Cluster services continue to run on other nodes All resource groups remain in the same state. Network Tests for an IP Network
This section lists tests that bring network interfaces up or down on an IP network. The Cluster Test Tool requires two IP networks to run any of the tests described in this section. The second network provides network connectivity for the tool to run. The Cluster Test Tool verifies that two IP networks are configured before running the test.
NETWORK_UP_LOCAL, node, network, comments
Brings a specified network up on a specified node by running the ifconfig up command on the node.
node The name of the node on which to run the ifconfig up command network The name of the network to which the interface is connected comments User-defined text to describe the configured test.
Example
Entrance Criteria
The specified node is active and has at least one inactive interface on the specified network.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster services continue to run on the cluster nodes where they were active before the test Resource groups that are in the ERROR state on the specified node and that have a service IP label available on the network can go online, but should not enter the ERROR state Resource groups on other nodes remain in the same state. NETWORK_DOWN_LOCAL, node, network, comments
Brings a specified network down on a specified node by running the ifconfig down command.
Note: If one IP network is already unavailable on a node, the cluster may become partitioned. The Cluster Test Tool does not take this into account when determining the success or failure of a test.
node The name of the node on which to run the ifconfig down command network The name of the network to which the interface is connected comments User-defined text to describe the configured test.
Example
Entrance Criteria
The specified node is active and has at least one active interface on the specified network.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster services continue to run on the cluster nodes where they were active before the test Resource groups on other nodes remain in the same state; however, some may be hosted on a different node If the node hosts a resource group for which the recovery method is set to notify, the resource group does not move. NETWORK_UP_GLOBAL, network, comments
Brings specified network up on all nodes that have interfaces on the network. The network specified may be an IP network or a serial network.
network The name of the network to which the interface is connected comments User-defined text to describe the configured test.
Example
Entrance Criteria
Specified network is active on at least one node.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster services continue to run on the cluster nodes where they were active before the test Resource groups that are in the ERROR state on the specified node and that have a service IP label available on the network can go online, but should not enter the ERROR state Resource groups on other nodes remain in the same state. NETWORK_DOWN_GLOBAL, network, comments
Brings the specified network down on all nodes that have interfaces on the network. The network specified may be an IP network or a serial network.
Note: If one IP network is already unavailable on a node, the cluster may become partitioned. The Cluster Test Tool does not take this into account when determining the success or failure of a test.
network The name of the network to which the interface is connected comments User-defined text to describe the configured test.
Example
Entrance Criteria
Specified network is inactive on at least one node.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster services continue to run on the cluster nodes where they were active before the test Resource groups on other nodes remain in the same state. Network Interface Tests for IP Networks
JOIN_LABEL iplabel, comments
Brings up a network interface associated with the specified IP label on a specified node by running the ifconfig up command.
Note: You specify the IP label as the parameter. The interface that is currently hosting the IP label is used as the argument to the ifconfig command. The IP label can be a service, boot, or backup (standby) label. If it is a service label, then that service label must be hosted on some interface, for example, when the resource group is actually online. You cannot specify a service label that is not already hosted on an interface.
The only time you could have a resource group online and the service label hosted on an inactive interface would be when the service interface fails but there was no place to move the resource group, in which case it stays online.
Example
Entrance Criteria
Specified interface is currently active on the specified node.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Specified interface comes up on specified node Cluster services continue to run on the cluster nodes where they were active before the test Resource groups that are in the ERROR state on the specified node and that have a service IP label available on the network can go online, but should not enter the ERROR state Resource groups on other nodes remain in the same state. FAIL_LABEL, iplabel, comments
Brings down a network interface associated with a specified label on a specified node by running the ifconfig down command.
Note: You specify the IP label as the parameter. The interface that is currently hosting the IP label is used as the argument to the ifconfig command The IP label can be a service, boot, or standby (backup) label.
Example
Entrance Criteria
The specified interface is currently inactive on the specified node
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Any service labels that were hosted by the interface are recovered Resource groups that are in the ERROR state on the specified node and that have a service IP label available on the network can go online, but should not enter the ERROR state Resource groups remain in the same state; however, the resource group may be hosted by another node. Network Tests for a Non-IP Network
The testing for non-IP networks is part of the NETWORK_UP_GLOBAL, NETWORK_DOWN_GLOBAL, NETWORK_UP_LOCAL and NETWORK_DOWN_LOCAL test procedures.
Resource Group Tests
RG_ONLINE, rg, node | ALL | ANY | RESTORE, comments
Brings a resource group online in a running cluster.
Example
Entrance Criteria
The specified resource group is offline, there are available resources, and can meet all dependencies.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable The resource group is brought online successfully on the specified node No resource groups go offline or into ERROR state. RG_OFFLINE, rg, node | ALL | ANY, comments
Brings a resource group offline that is already online in a running cluster.
Example
Entrance Criteria
The specified resource group is online on the specified node
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Resource group, which was online on the specified node, is brought offline successfully Other resource groups remain in the same state. RG_MOVE, rg, node | ANY | RESTORE, comments
Moves a resource group that is already online in a running cluster to a specific or any available node.
Example
Entrance Criteria
The specified resource group must be non-concurrent and must be online on a node other than the target node.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Resource group is moved to the target node successfully Other resource groups remain in the same state. RG_MOVE_SITE, rg, site | OTHER, comments
Moves a resource group that is already online in a running cluster to an available node at a specific site.
Example
Entrance Criteria
The specified resource group is online on a node, other than the a node in the target site
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Resource group is moved to the target site successfully Other resource groups remain in the same state. Volume Group Tests
VG_DOWN, vg, node | ALL | ANY, comments
Forces an error for a disk that contains a volume group in a resource group.
Example
Entrance Criteria
The resource group containing the specified volume groups is online on the specified node.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Resource group containing the specified volume group successfully moves to another node, or if it is a concurrent resource groups, it goes into an ERROR state Resource groups may change state to meet dependencies. Site Tests
SITE_ISOLATION, comments
Fails all the XD_data networks, causing the site_isolation event.
Example
Entrance Criteria
At least one XD_data network is configured and is up on any node in the cluster.
Success Indicators
The following conditions indicate success for this test:
The XD_data network fails, no resource groups change state The cluster becomes stable. SITE_MERGE, comments
Runs when at least one XD_data network is up to restore connections between the sites, and remove site isolation. Run this test after running the SITE_ISOLATION test.
Example
Entrance Criteria
At least one node must be online.
Success Indicators
The following conditions indicate success for this test:
No resource groups change state The cluster becomes stable. SITE_DOWN_TAKEOVER, site, comments
Stops cluster services and moves the resource groups to other nodes, on all nodes at the specified site.
site The site that contains the nodes on which cluster services will be stopped comments User-defined text to describe the configured test
Example
SITE_DOWN_TAKEOVER, site_1, Stop cluster services on all nodes at site_1, bringing the resource groups offline and moving the resource groups.Entrance Criteria
At least one node at the site must be online.
Success Indicators
The following conditions indicate success for this test:
Cluster services are stopped on all nodes at the specified site All primary instance resource groups mover to the another site. All secondary instance resource groups go offline The cluster becomes stable. SITE_UP, site, comments
Starts cluster services on all nodes at the specified site.
site The site that contains the nodes on which cluster services will be started comments User-defined text to describe the configured test
Example
Entrance Criteria
At least one node at the site must be offline.
Success Indicators
The following conditions indicate success for this test:
Cluster services are started on all nodes at the specified site Resource groups remain in the same state The cluster becomes stable. General Tests
The other tests available to use in HACMP cluster testing:
Bring an application server down Terminate the Cluster Manager on a node Add a wait time for test processing. SERVER_DOWN, node | ANY, appserv, command, comments
Runs the specified command to stop an application server. This test is useful when testing application availability.
In the automated test, the test uses the stop script to turn off the application.
Example
Entrance Criteria
The resource group is online on the specified node.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster nodes remain in the same state The resource group that contains the application server is online; however, the resource group may be hosted by another node, unless it is a concurrent resource group, in which case the group goes into ERROR state. CLSTRMGR_KILL, node, comments
Runs the kill command to terminate the Cluster Manager on a specified node.
Note: If CLSTRMGR_KILL is run on the local node, you may need to reboot the node. On startup, the Cluster Test Tool automatically starts again. For information about how to avoid manually rebooting the node, see the section Recovering the Control Node after Cluster Manager Stops.
For the Cluster Test Tool to accurately assess the success or failure of a CLSTRMGR_KILL test, do not perform other activities in the cluster while the Cluster Test Tool is running.
node The name of the node on which to terminate the Cluster Manager comments User-defined text to describe the configured test.
Example
Entrance Criteria
The specified node is active.
Success Indicators
The following conditions indicate success for this test:
The cluster becomes stable Cluster services stop on the specified node Cluster services continue to run on other nodes Resource groups that were online on the node where the Cluster Manager fails move to other nodes All resource groups on other nodes remain in the same state. For information about potential conditions caused by a CLSTRMGR_KILL test running on the control node, see the section Recovering the Control Node after Cluster Manager Stops.
WAIT, seconds, comments
Generates a wait period for the Cluster Test Tool for a specified number of seconds.
seconds The number of seconds that the Cluster Test Tool waits before proceeding with processing comments User-defined text to describe the configured test
Example
Entrance Criteria
Not applicable.
Success Indicators
Not applicable.
Example Test Plan
The following excerpt from a sample Test Plan includes the tests:
NODE_UP NODE_DOWN_GRACEFUL It also includes a WAIT interval. The comment text at the end of the line describes the action to be taken by the test.
NODE_UP,ALL,starts cluster services on all nodes NODE_DOWN_GRACEFUL,waltham,stops cluster services gracefully on node waltham WAIT,20 NODE_UP,waltham,starts cluster services on node walthamRunning Custom Test Procedures
Before you start running custom tests, ensure that:
Your Test Plan is configured correctly. You have specified values for test parameters. You have logging for the tool configured to capture the information that you want to examine for your cluster. For information about customizing verbose logging for the Cluster Test Tool, see the section Error Logging.
The cluster is not in service in a production environment. Launching a Custom Test Procedure
To run custom testing:
1. Enter smit hacmp
2. In SMIT, select either one of the following options:
Extended Configuration Problem Determination Tools 3. In the HACMP Cluster Test Tool panel, select Execute Custom Test Procedure.
4. In the Execute Custom Test Procedure panel, enter field values as follows:
Test Plan (Required) The full path to the Test Plan for the Cluster Test Tool. This file specifies the tests for the tool to execute. Variable File (Using a variables file is optional but recommended.) The full path to the variables file for the Cluster Test Tool. This file specifies the variable definitions used in processing the Test Plan. Verbose Logging When set to yes, includes additional information in the log file that may help to judge the success or failure of some tests. For more information about verbose logging, see the section Running Automated Tests. The default is yes.Select no to decrease the amount of information logged by the Cluster Test Tool. Cycle Log File When set to yes, uses a new log file to store output from the Cluster Test Tool. The default is yes.Select no to append messages to the current log file.For more information about cycling the log file, see the section Log File Rotation. Abort on Error When set to no, the Cluster Test Tool continues to run tests after some of the tests being run fail. This may cause subsequent tests to fail because the cluster state is different from the one expected by one of those tests. The default is no.Select yes to stop processing after the first test fails.For information about the conditions under which the Cluster Test Tool stops running, see the section Cluster Test Tool Stops Running.Note: The tool stops running and issues an error if a test fails and Abort on Error is selected.
5. Press Enter to start running the custom tests.
6. Evaluate the test results.
Evaluating Results
You evaluate test results by reviewing the contents of the log file created by the Cluster Test Tool. When you run the Cluster Test Tool from SMIT, it displays status messages to the screen and stores output from the tests in the file /var/hacmp/log/cl_testtool.log. Messages indicate when a test starts and finishes and provide additional status information. More detailed information, especially when verbose logging is enabled, is stored in the log file that appears on the screen. Information is also logged to the hacmp.out file. For information about the hacmp.out file, see Chapter 2: Using Cluster Log Files in the Troubleshooting Guide.
Criteria for Test Success or Failure
The following criteria determine the success or failure of cluster tests:
Did the cluster stabilize? The Cluster Manager has a status of stable on each node, or is not running. Nodes that should be online are online. If a node is stopped and that node is the last node in the cluster, the cluster is considered stable when the Cluster Manager is inoperative on all nodes.
No events are in the event queue for HACMP. The Cluster Test Tool also monitors HACMP timers that may be active. The tool waits for some of these timers to complete before determining cluster stability. For more information about how the Cluster Test Tool interacts with HACMP timers, see the section Working with Timer Settings.
Has an appropriate recovery event for the test run? Is a specific node online or offline as specified? Are all expected resource groups still online within the cluster? Did a test that was expected to run actually run? Every test checks to see if it makes sense to be run; this is called a check for “rationality”. A test returning a NOT RATIONAL status indicates the test could not be run because the entrance criteria could not be met; for example, trying to run the NODE_UP test on a node that is already up. A warning message will be issued along with the exit status to explain why the test was not run. Irrational tests do not cause the Cluster Test Tool to abort.
The NOT RATIONAL status indicates the test was not appropriate for your cluster. When performing automated testing, it is important to understand why the test did not run. For Custom Cluster tests, check the sequences of events and modify the test plan to ensure the test runs. Consider the order of the tests and the state of the cluster before running the test plan. For more information, refer to the section Setting up Custom Cluster Testing.
The tool targets availability as being of primary importance when reporting success or failure for a test. For example, if the resource groups that are expected to be available are available, the test passes.
Keep in mind that the Cluster Test Tool is testing the cluster configuration, not testing HACMP. In some cases the configuration may generate an error that causes a test to fail, even though the error is the expected behavior. For example, if a resource group enters the error state and there is no node to acquire the resource group, the test fails.
Note: If a test generates an error, the Cluster Test Tool interprets the error as a test failure. For information about how the Cluster Test Tool determines the success or failure of a test, see the Success Indicators subsections for each test in the section Description of Tests.
Recovering the Control Node after Cluster Manager Stops
If a CLSTRMGR_KILL test runs on the control node and stops the control node, reboot the control node. No action is taken to recover from the failure. After the node reboots, the testing continues.
To monitor testing after the Cluster Test Tool starts again, review output in the /var/hacmp/log/cl_testtool.log file. To determine whether a test procedure completes, run the tail -f command on /var/hacmp/log/cl_testtool.log file.
How to Avoid Manual Intervention
You can avoid manual intervention to reboot the control node during testing by:
Editing the /etc/cluster/hacmp.term file to change the default action after an abnormal exit. The clexit.rc script checks for the presence of this file and, if the file is executable, the script calls it instead of halting the system automatically.
Configuring the node to auto-Initial Program Load (IPL) before running the Cluster Test Tool. Error Logging
The Cluster Test Tool has several useful functions that enable you to work with logs.
Log Files: Overview
If a test fails, the Cluster Test Tool collects information in the automatically created log files.To collect logs, the Cluster Test Tool creates the directory /var/hacmp/cl_testtool if it doesn't exist. HACMP never deletes the files in this directory.You evaluate the success or failure of tests by reviewing the contents of the Cluster Test Tool log file, /var/hacmp/utilities/cl_testtool.log.
For each test plan that has any failures, the tool creates a new directory under /var/hacmp/cl_testtool. If the test plan has no failures, the tool does not create a log directory. The directory name is unique and consists of the name of the Cluster Test Tool plan file, and the time stamp when the test plan was run.
Log File Rotation
The Cluster Test Tool saves up to three log files and numbers them so that you can compare the results of different cluster tests. The tool also rotates the files with the oldest file being overwritten. The following list shows the three files saved:
/var/hacmp/utilities/cl_testtool.log
/var/hacmp/utilities/cl_testtool.log.1
/var/hacmp/utilities/cl_testtool.log.2
If you do not want the tool to rotate the log files, you can disable this feature from SMIT. For information about turning off this feature, see the section Running Automated Tests or Setting up Custom Cluster Testing.
Log File Entries
The entries in the log file are in the format:
where DD/MM/YYYY_hh:mm:ss indicates day/month/year_hour/minutes/seconds.
The following example shows the type of output stored in the log file:
04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: | Initializing Variable Table 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: Using Variable File: /tmp/sample_variables 04/02/2006/_13:21:55: data line: node1=waltham 04/02/2006/_13:21:55: key: node1 - val: waltham 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: | Reading Static Configuration Data 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: Cluster Name: Test_Cluster 04/02/2006/_13:21:55: Cluster Version: 7 04/02/2006/_13:21:55: Local Node Name: waltham 04/02/2006/_13:21:55: Cluster Nodes: waltham belmont 04/02/2006/_13:21:55: Found 1 Cluster Networks 04/02/2006/_13:21:55: Found 4 Cluster Interfaces/Device/Labels 04/02/2006/_13:21:55: Found 0 Cluster Resource Groups 04/02/2006/_13:21:55: Found 0 Cluster Resources 04/02/2006/_13:21:55: Event Timeout Value: 720 04/02/2006/_13:21:55: Maximum Timeout Value: 2880 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: | Building Test Queue 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: Test Plan: /tmp/sample_event 04/02/2006/_13:21:55: Event 1: NODE_UP: NODE_UP,ALL,starts cluster services on all nodes 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: | Validate NODE_UP 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: Event node: ALL 04/02/2006/_13:21:55: Configured nodes: waltham belmont 04/02/2006/_13:21:55: Event 2: NODE_DOWN_GRACEFUL: NODE_DOWN_GRACEFUL,node1,stops cluster services gracefully on node1 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: | Validate NODE_DOWN_GRACEFUL 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: Event node: waltham 04/02/2006/_13:21:55: Configured nodes: waltham belmont 04/02/2006/_13:21:55: Event 3: WAIT: WAIT,20 04/02/2006/_13:21:55: Event 4: NODE_UP: NODE_UP,node1,starts cluster services on node1 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: | Validate NODE_UP 04/02/2006/_13:21:55: ------------------------------------------------------- 04/02/2006/_13:21:55: Event node: waltham 04/02/2006/_13:21:55: Configured nodes: waltham belmont 04/02/2006/_13:21:55: . . .Log File Example
If a test fails, you will see output similar to the following:
===================================================================== Test 1 Complete - NETWORK_DOWN_LOCAL: fail service network Test Completion Status: FAILED ===================================================================== Copying log files hacmp.out and clstrmgr.debug from all nodes to directory /var/hacmp/cl_testtool/rg_fallover_plan.1144942311 on node prodnode1.After that, you can examine the directory /var/hacmp/cl_testtool/rg_fallover_plan.1144942311 on node prodnode1.
In the log directory, the tool creates separate files for each test. The names for the specific log files stored in the directory have this structure:
<testnum>.<testname>.<node>.<logfile>
where
testnum is the order in which the test appears in the test plan file testname is the name of the test that failed node is the node from which the log was collected logfile the source of the logging information, either the hacmp.out or clstrmgr.debug files. For example, if the NETWORK_DOWN_LOCAL test fails and it is the first test that was run, and later in the test plan the fourth test, named RG_MOVE also fails, you will see the following files in the /var/hacmp/cl_testtool/rg_fallover_plan.1144942311 directory:
1.NETWORK_DOWN_LOCAL.prodnode1.clstrmgr.debug 1.NETWORK_DOWN_LOCAL.prodnode1.hacmp.out 1.NETWORK_DOWN_LOCAL.prodnode2.clstrmgr.debug 1.NETWORK_DOWN_LOCAL.prodnode2.hacmp.out 4.RG_MOVE.prodnode1.clstrmgr.debug 4.RG_MOVE.prodnode1.hacmp.out 4.RG_MOVE.prodnode2.clstrmgr.debug 4.RG_MOVE.prodnode2.hacmp.outThe hacmp.out File
The hacmp.out file also logs the start of each test that the Cluster Test Tool runs on each cluster node. This log entry has the following format:
TestName: datetimestring1:datetimestring2
where
Note: The Cluster Test Tool uses the date and time strings to query the AIX 5L error log when necessary.
Verbose Logging: Overview
By default, the Cluster Test Tool uses verbose logging to provide a wealth of information about the results of cluster testing. You can customize the type of information that the tool gathers and stores in the Cluster Test Tool log file.
Note: The Cluster Snapshot utility does not include the Cluster Test Tool log file because this file is specific to HACMP cluster testing at a specific point in time—not an indication of ongoing cluster status.
With verbose logging enabled, the Cluster Test Tool:
Provides detailed information for each test run Runs the following utilities on the control node between the processing of one test and the next test in the list:
Utility Type of Information Collected clRGinfo The location and status of resource groups errpt Errors stored in the system error log fileProcesses each line in the following files to identify additional information to be included in the Cluster Test Tool log file. The utilities included are run on each node in the cluster after a test finishes running.
File Type of Information Specified cl_testtool_log_cmds A list of utilities to be run to collect additional status informationSee the section Customizing the Types of Information to Collect. cl_testtool_search_strings Text strings that may be in the hacmp.out file. The Cluster Test Tool searches for these strings and inserts any lines that match into the Cluster Test Tool log file.See the section Adding Data from hacmp.out to the Cluster Test Tool Log File.If you want to gather only basic information about the results of cluster testing, you can disable verbose logging for the tool. For information about disabling verbose logging for the Cluster Test Tool, see the section Running Automated Tests or Setting up Custom Cluster Testing.
Customizing the Types of Information to Collect
You can customize the types of logging information to be gathered during testing. When verbose logging is enabled for the Cluster Test Tool, it runs the utilities listed in the /usr/es/sbin/cluster/etc/cl_testtool_log_cmds file, and collects status information that the specified commands generate. The Cluster Test Tool runs each of the commands listed in cl_testtool_log_cmds file after each test completes, gathers output for each node in the cluster, and stores this information in the Cluster Test Tool log file.
You can collect information specific to a node by adding or removing utilities from the list. For example, if you have an application server running on two of the nodes in a four-node cluster, you could add application-specific commands to the list on the nodes running the application servers.
If you want all of the cluster nodes to use the same cl_testtool_log_cmds file, you can add it to a file collection. For information about including files in a file collection, see Chapter 7: Verifying and Synchronizing an HACMP Cluster.
By default, the cl_testtool_log_cmds file includes the following utilities:
The file also contains entries for the following utilities, but they are commented out and not run. If you want to run any of these utilities between each test, open the file and remove the comment character from the beginning of the command line for the utility.
You can also add and remove commands from the cl_testtool_log_cmds file.
Adding Data from hacmp.out to the Cluster Test Tool Log File
You can add messages that include specified text in the hacmp.out file to the Cluster Test Tool log file. With verbose logging enabled, the tool uses the /usr/es/sbin/cluster/etc/cl_testtool/cl_testtool_search_strings file to identify text strings to search for in hacmp.out. For any text string that you specify on a separate line in the cl_testtool_search_strings file, the tool:
Searches the hacmp.out file for a matching string Logs the line containing that string, accompanied by the line number from the hacmp.out file, to the Cluster Test Tool log file You can use the line number to locate the line in the hacmp.out file and then review that line within the context of other messages in the file.
By default, the file contains the following lines:
You can edit the cl_testtool_search_strings file on each node to specify a search string specific to a node. This way, the cl_testtool_search_strings file is different on different nodes.
If you want all of the cluster nodes to use the same cl_testtool_search_strings file, you can add it to a file collection and synchronize the cluster. For information about including files in a file collection, see Chapter 7: Verifying and Synchronizing an HACMP Cluster.
Note: Cluster synchronization does not propagate a cl_testtool_search_strings file to other nodes in a cluster unless the file is part of a file collection.
To edit the cl_testtool_search_strings file:
On each line of the file, specify a single text string that you want the tool to locate in the hacmp.out file. Fixing Problems when Running Cluster Tests
This section discusses the following issues that you may encounter when testing a cluster:
Cluster Test Tool Stops Running
The Cluster Test Tool can stop running under the following conditions:
The Cluster Test Tool fails to initialize A test fails and Abort on Error is set to yes for the test procedure The tool times out waiting for cluster stabilization, or the cluster fails to stabilize after a test. An error that prohibits the Cluster Test Tool from running a test, such as a configuration in AIX 5L or a script that is missing A cluster recovery event fails and requires user intervention. Control Node Becomes Unavailable
If the control node experiences an unexpected failure while the Cluster Test Tool is running, the testing stops. No action is taken to recover from the failure.
To recover from the failure:
1. Bring the node back online and start cluster services in the usual manner.
2. Stabilize the cluster.
3. Run the test again.
Note: The failure of the control node may invalidate the testing that occurred prior to the failure.
If a CLSTRMGR_KILL test runs on the control node, the node and cluster services need to restart. For information about handling this situation, see the section Recovering the Control Node after Cluster Manager Stops.
Cluster Does Not Return to a Stable State
The Cluster Test Tool stops running tests after a timeout if the cluster does not return to a stable state either:
While a test is running As a result of a test being processed. The timeout is based on ongoing cluster activity and the cluster-wide event-duration time until warning values. If the Cluster Test Tool stops running, an error appears on the screen and is logged to the Cluster Test Tool log file before the tool stops running.
After the cluster returns to a stable state, it is possible that the cluster components, such as resource groups, networks, and nodes, are not in a state consistent with the specifications of the list of tests. If the tool cannot run a test due to the state of the cluster, the tool generates an error. The Cluster Test Tool continues to process tests.
If the cluster state does not let you continue a test, you can:
1. Reboot cluster nodes and restart the Cluster Manager.
2. Inspect the Cluster Test Tool log file and the hacmp.out file to get more information about what may have happened when the test stopped.
3. Review the timer settings for the following cluster timers, and make sure that the settings are appropriate to your cluster:
Time until warning Stabilization interval Monitor interval. For information about timers in the Cluster Test tool, and about how application monitor timers can affect whether the tool times out, see the section Working with Timer Settings.
Working with Timer Settings
The Cluster Test Tool requires a stable HACMP cluster for testing. If the cluster becomes unstable, the time that the tool waits for the cluster to stabilize depends on the activity in the cluster:
No activity. The tool waits for twice the time until event duration time until warning (also referred to as config_too_long) interval, then times out.
Activity present. The tool calculates a timeout value based on the number of nodes in the cluster and the setting for the time until warning interval.
If the time until warning interval is too short for your cluster, testing may time out. To review or change the setting for the time until warning interval, in HACMP SMIT, select HACMP Extended Configuration > Extended Performance Tuning Parameters Configuration and press Enter.
For complete information on tuning event duration time, see the section Tuning Event Duration Time Until Warning in Chapter 5: Configuring Cluster Events.
The settings for the following timers configured for an application monitor can also affect whether testing times out:
Stabilization interval Monitor interval The settling time for resource groups does not affect whether or not the tool times out.
Stabilization Interval for an Application Monitor
If this timer is active, the Cluster Test Tool does not time out when waiting for cluster stability. If the monitor fails, however, and recovery actions are underway, the Cluster Test Tool may time out before the cluster stabilizes.
Make sure the stabilization interval configured in HACMP is appropriate for the application being monitored.
For information about setting the stabilization interval for an application, see Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).
Monitor Interval for a Custom Application Monitor
When the Cluster Test Tool runs a server_down test, it waits for the length of time specified by the monitor interval before the tool checks for cluster stability. The monitor interval defines how often to poll the application to make sure that the application is running.
The monitor interval should be long enough to allow recovery from a failure. If the monitor interval is too short, the Cluster Test Tool may time out when a recovery is in process.
For information about setting the monitor interval for an application, see Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).
Testing Does Not Progress as Expected
If the Cluster Test Tool is not processing tests and recording results as expected, use the Cluster Test Tool log file to try to resolve the problem:
1. Ensure that verbose logging for the tool is enabled.
2. View logging information from the Cluster Test Tool log file /var/hacmp/utilities/cl_testtool.log. The tool directs more information to the log file than to the screen.
3. Add other tools to the cl_testtool_log_cmds file to gather additional debugging information. This way you can view this information within the context of the larger log file.
For information about adding commands to the cl_testtool_log_cmds file, see the section Customizing the Types of Information to Collect.
Unexpected Test Results
The basic measure of success for a test is availability. In some instances, you may consider that a test has passed, when the tool indicates that the test failed. Be sure that you are familiar with the criteria that determines whether a test passes or fails. For information about the criteria for a test passing or failing, see the section Evaluating Results.
Also ensure that:
Settings for cluster timers are appropriate to your cluster. See the section Cluster Does Not Return to a Stable State. Verbose logging is enabled and customized to investigate an issue. See the section Testing Does Not Progress as Expected.
![]() ![]() ![]() |