![]() ![]() ![]() |
Chapter 9: Starting and Stopping Cluster Services
This chapter explains how to start and stop cluster services on cluster nodes and clients. The following sections describe these options in detail:
Overview
HACMP 5.4 includes new features when you start or stop cluster services:
Start cluster services. When you start the cluster services, HACMP by default automatically activates the resources according to how you defined them, taking into consideration application dependencies, application start and stop scripts, dynamic attributes and other parameters. That is, HACMP automatically manages (and activates, if needed) resource groups and applications in them. Note that you can also start HACMP with the option to manage resource groups manually. This tells HACMP not to acquire any resource groups (and applications) automatically for you.
With HACMP 5.4, you can start HACMP cluster services on the node(s) without stopping your applications, by selecting an option from SMIT (System Management (C-SPOC) > Manage HACMP Services > Start Cluster Services).
HACMP relies on the application monitor and application startup script to verify whether it needs to start the application for you or whether the application is already running (HACMP attempts not to start a second instance of the application).
Shut down the cluster services. During an HACMP shutdown, you may select one of the following three actions for the resource groups: Bring Resource Groups Offline. Move Resource Groups to other node(s). Unmanage Resource Groups. For more information on resource group states, see Appendix B: Resource Group Behavior during Cluster Events.
Starting Cluster Services
In HACMP 5.4, you can allow your applications that run outside of HACMP to continue running during installation of HACMP and when starting HACMP. There is no need to stop, restart or reboot the system or applications.
A Note on Application Monitors
HACMP 5.4 checks for running applications by using the configured application monitor. If the monitor indicates that the application is already running, HACMP will not start the second instance of the application. If the application monitors are not configured to HACMP, then you may write an application start script that checks the state of the application before starting it.
Application monitors, configurable in HACMP, are a critical piece of the HACMP cluster configuration; they enable HACMP to keep applications highly available. When HACMP starts an application server on a node, it also periodically monitors the application (using the monitor that you configure) to make sure that the application is up and running.
An erroneous application monitor may not detect a failed application. As a result, HACMP would not recover it or may erroneously detect an application as failed, which may cause HACMP to move the application to a takeover node, resulting in unnecessary downtime. To summarize, we highly recommend properly configured and tested application monitors for all applications that you want to keep highly available with the use of HACMP. Use them as follows:
Use a process monitor if the intent is to monitor whether the process(es) exist on the UNIX system. Use a custom monitor if the intent is to check the health of the application, for example, whether the database is still functioning by querying a database table. Use both process and custom monitors when needed. During verification, HACMP issues a warning if an application monitor is not configured.
For information on configuring an application monitor, see Configuring Multiple Application Monitors in Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended).
Procedure for Starting Cluster Services
To start HACMP cluster services, as the root user, perform the following steps:
Note: Perform the following only after configuring and synchronizing the cluster. For more information, see Chapter 3: Configuring an HACMP Cluster (Standard).
1. Enter the fastpath smit cl_admin
2. In SMIT, select Manage HACMP Services > Start Cluster Services and press Enter.
3. Enter field values as follows:
Start now, on system restart or both Indicate how you want to start cluster services when you commit the values on this panel by pressing Enter (now), when the operating system reboots by selecting on system restart, or on both occasions.Choosing on system restart or both means that the cluster services are always brought up automatically after a system reboot.Note: When you start the HACMP cluster services with the Manage Resource Group option set to Manually, and select the option both, the timing of a power loss or rebooting the node may affect whether the node is in the OFFLINE or UNMANAGED state after the system reboot. Start Cluster Services on these nodes Enter the name(s) of one or more nodes on which you want to start cluster services. Alternatively, you can select nodes from a picklist. Separate multiple nodes with a comma. Manage Resource Groups Automatically (default). HACMP brings resource group(s) online according to the resource groups' configuration settings and the current cluster state and starts managing the resource group(s) and applications for availability.When you start HACMP cluster services and set the Manage Resource Group option to Automatically, HACMP automatically activates resource groups on the node(s) according to their policies and locations and also starts applications.HACMP may not necessarily start the application on the same node on which it is currently being run, if the application is already running. That is, when this option is selected, HACMP determines on which node to bring a resource group online based on the configured resource group policies, resource group dependency configuration and available resources on the node. If you select this option while starting the cluster services, it is suggested to stop the applications and resources so that HACMP can start them on the appropriate node. Manually. HACMP does not activate resource groups while the cluster services on the selected node are started. After you start cluster services, you can bring any resource groups online or offline, as needed, using the HACMP Resource Group and Application Management SMIT menu (clRGmove).For more information, see Starting HACMP Cluster Services with Manually Managed Resource Groups. BROADCAST message at startup? Indicate whether you want to send a broadcast message to all nodes when the cluster services start.The default is true.Alternately, you can set the broadcast message setting for each local node using the Extended Configuration > Extended Cluster Service Settings path. Startup Cluster Information Daemon? Indicate whether you want to start the clinfoES daemon. For example, if your application uses the Cluster Information daemon, if you use the clstat monitor, or you want to run event emulation, set this field to true. Otherwise, set it to false.The value that you enter in the Startup Cluster Information Services? field works in conjunction with the value you enter in the Start now, on system restart or both field. If you set either (or both) of the startup fields to true and the Start now, on system restart or both field to both, then the clinfoES daemon is also started whenever the cluster services are started. Ignore Verification Errors? Set this value to false (the default) to stop all selected nodes from starting cluster services if verification finds errors on any node.Set this value to true to start cluster services even if verification finds errors on the specified nodes or in the cluster in general. This setting should be used with caution.For more information, see the section Automatic Verification and Synchronization. Automatically correct errors found during cluster start? Note: This field is available only if the automatic verification and synchronization option has been enabled. For more information, see Modifying the Startup of Cluster Services.
- Select Interactively to receive prompts to correct certain errors as they are found during verification. (If you are using WebSMIT, the Interactively option is not available in WebSMIT.)
- Select No if you do not want HACMP to correct any verification errors automatically. If you select No, you must correct errors, if any, manually.
- Select Yes if you want HACMP to correct cluster verification errors automatically without first prompting you.
Note: Not all verification errors are automatically corrected; some must be corrected manually. For more information, see the section Automatic Verification and Synchronization.
4. Press Enter.
The system performs verification and synchronization as needed, and then starts the cluster services on the nodes specified, activating the cluster configuration that you have defined. The time that it takes the commands and scripts to run depends on your configuration (for example, the number of disks, the number of interfaces to configure, the number of filesystems to mount, and the number of applications being started).
SMIT displays a command status window. Note that when the SMIT panel indicates the completion of the cluster startup, HACMP processing of the resource groups in most cases has not yet completed. To verify that the processing has completed, use /usr/es/sbin/cluster/clstat, described in Chapter 10: Monitoring an HACMP Cluster.
Starting HACMP Cluster Services with Manually Managed Resource Groups
Set the cluster services Manage Resource Group startup option to Manually when you want more control over the node on which an application should run. This method ensures that the services provided by the application server are not interrupted.
When you choose this option to start the HACMP cluster services, the resource groups on the node remain in the OFFLINE or UNMANAGED state, depending on whether this is a cold start up or a start after the node was stopped and resource groups placed in an UNMANAGED state.
Note: Please be advised that if a resource group is in the UNMANAGED state, it does not mean that, from HACMP’s point of view, the actual resources in the resource group are not running. To HACMP, it means that HACMP is not managing the resources (and the applications) of the resource group for availability.
Note that either you must have an application monitor configured that HACMP uses to check the application or your application start scripts should be intelligent enough not to start the application if it is already running.
If you want to activate resource groups that are not brought online automatically, use the Resource Group Management utility (clRGmove) to bring the OFFLINE state resource groups to the ONLINE state.
Consider the following example: If an application is running on a node that is not the primary node, and during the startup process you know that HACMP will move the resource group with the application to another node (according to the resource group policy specified), starting HACMP cluster services with the Manage Resource Group option set to Manually tells HACMP not to start the resource groups during startup. You can later use the user-requested rg-move to bring the resource group to the ONLINE state on the same node where your application is already running.
To start cluster services on a resource group that is manually managed:
1. Enter smitty hacmp
2. System Management (C-SPOC) > Manage HACMP Services > Bring Resource Group Online.
3. Select the node where your application is running.
4. Press Enter.
Starting Cluster Services on a Node with a Resource Group in the UNMANAGED State
Resource groups may be in the UNMANAGED state on a node if cluster services on that node have been stopped using the Unmanage Resource Groups option. (For more information, see Stopping Cluster Services later in this chapter.)
This Unmanage Resource Groups option causes HACMP to stop providing high availability services to the resource group; that is, the resource groups will not fall over due to resource failures. This option is intended for temporary situations, such as when you want to upgrade HACMP or perform maintenance without bringing your applications offline.
Starting cluster services on the node after it had been stopped with the resource group option set to UNMANAGED, therefore, puts any resource group that is in the UNMANAGED state on that node back to the state in which it was prior to being UNMANAGED. While bringing the resource group ONLINE from the UNMANAGED state, HACMP checks every resource in the resource group to see whether it is active and activates it if it is found inactive. Thus, it is critical to configure the application monitors so that HACMP can correctly detect a running application and so HACMP does not try to start a second instance.
In cases where you want to bring a resource group from an UNMANAGED state to an ONLINE state on a different node (because the node that was stopped using UNMANAGED option is unavailable), you should do the following:
1. Bring the resource groups to the OFFLINE state using a user-requested rg-move SMIT panel. Note that during this operation, HACMP will not stop any resources as the node that originally hosted the resource group is no longer available.
2. Ensure that all the resources that are configured in the resource group are OFFLINE, including the application, if any.
3. Bring the resource groups from their OFFLINE state to the ONLINE state, just as was necessary in previous releases using the resource group migration utility clRGmove or the SMIT option.
Note: In the case where the node that was stopped is still available, moving a resource group from an UNMANAGED state to OFFLINE state would result in the stopped node actually releasing the resources of the resource group.
Modifying the Startup of Cluster Services
Typically, you should use the default cluster services startup settings—especially the verification setting, which is automatically enabled to ensure a safe startup. However, you can modify these settings by following the procedure described below.
Procedure for Modifying Startup of Cluster Services
To modify the startup of cluster services:
1. Enter the fastpath smit cl_admin or smitty hacmp.
2. Select Extended Configuration > Extended Cluster Service Settings and press Enter.
3. Enter field values in the SMIT panel as follows:
Stopping Cluster Services
You typically stop cluster services:
Before making any hardware or software changes or other scheduled node shutdowns or reboots. Failing to do so may cause unintended cluster events to be triggered on other nodes. Before certain reconfiguration activity. Some changes to the cluster information stored in the Configuration Database require stopping and restarting the cluster services on all nodes for the changes to become active. For example, if you wish to change the name of the cluster, the name of a node, or the name of a network interface, you must stop and restart cluster services on that node or on all nodes, depending on the cluster setup. For more information about which changes to the cluster require HACMP reconfiguration, see Appendix A: 7x24 Maintenance.
When stopping cluster services, minimize activity on the system. If the node you are stopping is currently providing highly available services, notify users of your intentions if their applications will be unavailable. Let them know when services will be restored.
Procedure for Stopping Cluster Services
The steps below describe the procedure for stopping cluster services on a single node or on all nodes in a cluster by using the C-SPOC utility on one of the cluster nodes.
To stop cluster services:
1. Enter the fastpath smit cl_admin or smitty hacmp.
2. Select System Management (C-SPOC) and press Enter.
3. Select Manage HACMP Services > Stop Cluster Services and press Enter.
4. Enter field values in the SMIT panel as follows:
Select an Action on Resource Groups Indicate the type of shutdown:
- Bring Resource Groups Offline: HACMP stops all managed resources currently ONLINE on the node being stopped. HACMP will not activate these resources on any other nodes, that is, no fallover.
This option is equivalent to the option to stopping cluster services gracefully in previous releases.
After successfully stopping all managed resources, HACMP stops RSCT services and goes into ST_INIT state.
- Move Resource Groups. HACMP stops all managed resources currently ONLINE on the node being stopped. The resource groups will be moved to a takeover node according to the configured resource group policies (if defined), dependency configurations (if defined) and available resources.
This option is equivalent to the graceful with takeover option in previous releases.
After successfully stopping all managed resources HACMP, stops RSCT services and the Cluster Manager daemon goes into ST_INIT state.
- Unmanage Resource Groups. The cluster services are stopped immediately. Resources that are online on the node are not stopped. Applications continue to run. This option is equivalent to the forced down option in previous releases.
For more information, see Stopping HACMP Cluster Services without Stopping Applications.
HACMP will not stop the managed resources; applications remain functional.
HACMP does not manage the resources on these nodes.
HACMP continues to run and RSCT remains functional.
Note: In HACMP 5.4, on a node that has Enhanced concurrent (ECM) volume groups, cluster services can be stopped with the resource groups placed in an unmanaged state. RSCT services will be left running so that ECM remains functional.If you stop cluster services with this option, the resource groups that are active on this node go into unmanaged state. Once the resource group is in the unmanaged state, HACMP does not process any resource failures. This applies to hardware resources such as disks and adapters as well as any managed applications. Refer to the section Procedure for Starting Cluster Services for information on reintegrating a node on which the cluster services were stopped back into the cluster. Stop now, on system restart
or both Indicate whether you want the cluster services to stop now, at restart (when the operating system reboots), or on both occasions. If you select restart or both, the entry in the /etc/inittab file that starts cluster services is removed. Cluster services will no longer come up automatically after a reboot. BROADCAST cluster
shutdown? Indicate whether you want to send a broadcast message to users before the cluster services stop. If you specify true, a message is broadcast on all cluster nodes.
5. Press Enter. The system stops the cluster services on the nodes specified.
If the stop operation fails, check the /tmp/cspoc.log file for error messages. This file contains the command execution status of the C-SPOC command executed on each cluster node.
Note: After stopping cluster services, you must wait a minimum of two minutes for the RSCT to quiesce before starting cluster services. If you are using HAGEO, wait for a minimum of four minutes.
Stopping HACMP Cluster Services without Stopping Applications
In HACMP 5.4, in addition to other ways to stop HACMP cluster services, you have a clear way of stopping cluster services without stopping services and applications.
Note: Prior to HACMP 5.4, stopping HACMP cluster services when it does not react to application failures was referred to as forcing down the cluster services. While “forcing down the cluster services” was desirable in many instances, in some cases, HACMP’s actions on resource groups after a force down left the applications unnecessarily inactive. In HACMP 5.4, this operation — stopping HACMP cluster services without disrupting the applications — is handled consistently with what you choose to do.
To stop cluster services without stopping your applications:
1. Enter the fastpath smit cl_admin or smitty hacmp.
2. Select System Management (C-SPOC) and press Enter.
3. Select Manage HACMP Services > Stop Cluster Services and press Enter.
4. Choose Unmanage Resource Groups.
No matter what type of resource group you have, if you stop cluster services on the node on which this group is active and do not stop the application that belongs to the resource group, HACMP puts the group into an UNMANAGED state and keeps the application running according to your request.
The resource group that contains the application remains in the UNMANAGED state (until you tell HACMP to start managing it again) and the application continues to run. While in this condition, HACMP and the RSCT services continue to run, providing services to ECM VGs that the application servers may be using.
You can tell HACMP to start managing it again either by restarting Cluster Services on the node, or by using SMIT to move the resource group to a node that is actively managing its resource groups.
If you have instances of replicated resource groups using the Extended Distance capabilities of the HACMP/XD product, the UNMANAGED SECONDARY state is used for resource groups that were previously in the ONLINE SECONDARY state.
You can view the new states of the resource groups using the cluster utilities clstat and clRGinfo.
You can dynamically reconfigure (DARE) the cluster configuration while some cluster nodes have resource groups in the unmanaged state.
Warning about Placing Resource Groups in an Unmanaged State
When you stop cluster services on a node and place resource groups in an UNMANAGED state, HACMP stops managing the resources on that node. HACMP will not react to the individual resource failures, application failures, or even if the node crashes.
Because the resources of a system are not highly available when you place resource groups in an unmanaged state, HACMP 5.4 prints a message periodically that the node has suspended managing the resources.
The ability to stop a node and place resource groups in an UNMANAGED state is intended for use during brief intervals for applying updates or for maintenance of the cluster hardware or software.
When You May Want to Stop HACMP Cluster Services without Stopping Applications
In general, HACMP cluster services are rarely the cause of problems in your configuration. However, you may still want to stop HACMP cluster services on one or more nodes, for example, while troubleshooting a problem or performing maintenance work on a node.
Also, you may want to stop HACMP cluster services from running without disrupting your application if you expect that your activities will interrupt or stop applications or services. During this period of time, you do not want HACMP to react to any planned application “failures” and cause a resource group to move to another node. Therefore, you may want to remove HACMP temporarily from the picture.
Abnormal Termination of Cluster Manager Daemon
The AIX source controller subsystem monitors the cluster manager daemon process. If the controller detects that the Cluster Manager daemon has exited abnormally (without being shut down using the clstop command), it executes the /usr/es/sbin/cluster/utilities/clexit.rc script to halt the system. This prevents unpredictable behavior from corrupting the data on the shared disks. See the clexit.rc man page for additional information.
The clexit.rc script creates an AIX 5L error log entry. Here is an example showing the long output:
LABEL: OPMSGIDENTIFIER: AA8AB241Date/Time: Fri Jan 7 10:44:46Sequence Number: 626Machine Id: 000001331000Node Id: ppstest8Class: OType: TEMPResource Name: OPERATORDescriptionOPERATOR NOTIFICATIONUser CausesERRLOGGER COMMANDRecommended ActionsREVIEW DETAILED DATADetail DataMESSAGE FROM ERRLOGGER COMMANDclexit.rc : Unexpected termination of clstrmgrESThe clexit.rc error message in short form looks like this:
AA8AB241 0107104400 T O OPERATOR OPERATOR NOTIFICATIONWarning: Never use the kill -9 command on the clstrmgr daemon. Using the kill command causes the clstrmgr daemon to exit abnormally. This causes the System Resource Controller (SRC) facility to run the script /usr/es/sbin/cluster/utilities/clexit.rc, which halts the system immediately and causes the surviving nodes to initiate fallover.
You can modify the file /etc/cluster/hacmp.term to change the default action after an abnormal exit. The clexit.rc script checks for the presence of this file, and if you have made it executable, the instructions there will be followed instead of the automatic halt called by clexit.rc. Please read the caveats contained in the /etc/cluster/hacmp.term file, however, before making any modifications.
AIX 5L Shutdown and Cluster Services
If you prefer to have resources taken over, then prior to issuing the AIX 5L shutdown command, stop HACMP cluster services with the Move Resource Groups option.
When the AIX operating system is shutdown on a node where the HACMP services are active, based on the command line flags that are passed to the shutdown command, the Cluster Manager either recovers the resource groups on a takeover node or simply leaves them in the offline state.
If you issue a shutdown command with “-F or -r” or a combination thereof, the resource groups are taken to the offline state. Resource groups will not fallover to the takeover nodes. The intent is that when the node starts backup, it may start the resource group on the same node.
If the shutdown command is issued with other options (such as -h), the node may not restart. In this case, HACMP will move the resource group to a takeover node.
Note: Using any other method of shutting down the AIX operating system (such as a halt command) or if the AIX operating system crashes results in HACMP recovering the failed application to a takeover node.
Stopping HACMP Cluster Services and RSCT
HACMP 5.4 manages the RSCT services automatically. When users stop cluster services using the Move Resource Group option, the RSCT services are stopped after all the resources and applications on the node are released. When users select the Unmanage Resource Group option to stop the cluster services, the Cluster Manager puts the resource groups into the UNMANAGED state but continues to run under the covers thus leaving the RSCT services up and running under this condition
One of the reasons that HACMP does not stop the RSCT services from running when you stop cluster services is because not only HACMP but also the Enhanced Concurrent Mode (ECM) volume groups use RSCT services. Stopping RSCT services would vary off the ECM volume group and would affect the application that is using it.
There could be rare cases when you need to stop RSCT, for example, to perform an RSCT upgrade. If you need to upgrade RSCT, you can stop and restart it, using SMIT options under the HACMP Problem Determination Tools menu. For the steps needed to stop, restart and upgrade RSCT, see the Troubleshooting Guide.
Maintaining Cluster Information Services
The cluster services on clients consist solely of the clinfoES daemon, which provides clients with status information about the cluster.
Note that the /etc/inittab file is modified when the HACMP software is installed to start the clinfoES daemon whenever the system is rebooted.
The Cluster Information Daemon (clinfo) retrieves information about the cluster configuration and the state of the cluster, topology and resources from the Management Information Base (MIB) and the Cluster Manager on local or remote nodes. The Cluster Manager updates the MIB with this information.
The clinfo daemon populates internal, dynamically allocated data structures with information for each cluster. The cluster(s) can be any combination of local or remote. The clinfo daemon calls the clinfo.rc script in response to cluster changes.
Starting Clinfo on a Client
Use the /usr/es/sbin/cluster/etc/rc.cluster script or the startsrc command to start clinfo on a client, as shown below:
You can also use the standard AIX 5L startsrc command:
Stopping Clinfo on a Client
Use the standard AIX 5L stopsrc command to stop clinfo on a client machine:
Enabling Clinfo for Asynchronous Event Notification
In previous versions of HACMP, clinfo periodically polled the SNMP process for information. In HACMP 5.3 and up, clinfo only obtains data from SNMP when it is requested. You can optionally choose to have clinfo receive notification of events as asynchronous messages (traps).
Only one SNMP application can receive traps. If you are running NetView, you cannot enable clinfo to receive traps.
To enable asynchronous event notification:
1. Start clinfo with the -a option, by entering the following:
chssys -s clinfoES -a "-a".2. Verify that the SRC has the correct command line arguments for clinfo, by entering the following:
lssrc -Ss clinfoES | awk -F: '{print $3}'3. Edit the /etc/snmpd.conf file on the nodes that will send traps. As installed, traps are directed to the loopback address. (clinfo receives those traps generated by the Cluster Manager on the same node). See the comments at the beginning of the /etc/snmpd.conf file for a description of all fields.
Note: The default version of the snmpd.conf file for AIX 5L v.5.2 and AIX 5L v. 5.3 is snmpdv3.conf.
See the AIX documentation for full information on the snmpd.conf file. Version 3 has some differences from Version 1.
a. Find the trap line at the end of the file. It looks like this:
view 1.17.2 system enterprises viewtrap public 127.0.0.1 1.2.3 fe # loopbackb. Add trap lines as desired. Multiple clinfo processes can receive traps from the Cluster Manager. Make sure that the “1.2.3 fe” field is unique.
An entry may look like the following example, with two more trap lines added:
trap public 127.0.0.1 1.2.3 fe #loopbacktrap public 123.456.789.1 #adamtrap public 123.456.789.2 #evec. Stop and restart the snmpd process on the hosts where you made the changes in the snmpd.conf file:
stopsrc -s snmpdstartsrc -s snmpdGratuitous ARP Support
If you are using IPAT via IP Aliases, make sure all your clients support the gratuitous ARP functionality of TCP/IP. For more information, see Steps for Changing the Tuning Parameters of a Network Module to Custom Values in Chapter 13: Managing the Cluster Topology.
![]() ![]() ![]() |