![]() ![]() ![]() |
Chapter 6: Configuring Cluster Events
The HACMP system is event-driven. An event is a change of status within a cluster. When the Cluster Manager detects a change in cluster status, it executes the designated script to handle the event and initiates any user-defined customized processing.
To configure cluster events, you indicate the script that handles the event and any additional processing that should accompany an event as described below. You can define multiple customized pre- and post-event scripts (for a particular cluster event). The environment variable EVENT_STAGE will be set to the appropriate value of pre, post, notify, or recovery when the corresponding event command is run.
The SMIT HACMP Extended Event Configuration menu includes:
Considerations for Pre- and Post-Event Scripts
Take into account the following information when planning your pre- and post-event scripts.
Using Shell Environment Variables in Pre- and Post-Event Scripts
When writing your pre- or post-event script, none of the shell environment variables defined in /etc/environment will be available to your program. If you need to use any of these variables you must explicitly source them by including this line in your script:
. /etc/environment
event_error Now Indicates Failure on a Remote Node
In releases prior to HACMP 5.2, non-recoverable event script failures resulted in the event_error event being run on the cluster node where the failure occurred. The remaining cluster nodes did not indicate the failure.
With HACMP 5.2 and up, all cluster nodes run the event_error event if any node has a fatal error. All nodes log the error and call out the failing node name in the hacmp.out log file. If you have added pre- or post-event scripts for the event_error event, be aware that they are called on each node, not just on the failing node.
A new Korn shell environment variable that indicates the node where the event script failed, EVENT_FAILED_NODE, is set to the name of the node where the event occurred. Use this variable in your pre- or post-event scripts to locate the failure.
The variable LOCALNODENAME identifies the local node; if LOCALNODENAME is not the same as EVENT_FAILED_NODE, then the failure occurred on a remote node.
Parallel Processing of Resource Groups Affects Event Processing
When resource groups are processed in parallel, fewer cluster events occur in the cluster. In particular, only node_up and node_down events take place, and events such as node_up_local, or get_disk_vg_fs do not occur. (This is because HACMP uses other methods to process resources in parallel.) As a result, the use of parallel processing reduces the number of particular cluster events for which you can create customized pre- or post-event scripts. If you start using parallel processing for some of the resource groups in your configuration, be aware that your existing event scripts may not work for the resource groups. For more information, see Appendix B: Resource Group Behavior during Cluster Events in this Guide, and Chapter 7: Planning Events in the Planning Guide.
Dependent Resource Groups and the Use of Pre- and Post-Event Scripts
Prior to HACMP 5.2, to achieve resource group and application sequencing, system administrators had to build the application recovery logic in their pre- and post-event processing scripts. Every cluster would be configured with a pre-event script for all cluster events, and a post-event script for all cluster events.
Such scripts could become all-encompassing “case” statements. For instance, if you want to take an action for a specific event on a specific node, you need to edit that individual case, add the required code for pre- and post-event scripts, and also ensure that the scripts are the same across all nodes. (For instance, to ensure that all scripts are the same on all nodes, each script must contain the logic for all nodes and execute the “case” for the node on which it is being run.)
To summarize, even though the logic of such scripts captures the desired behavior of the cluster, they can be difficult to customize and even more difficult to maintain later on, when the cluster configuration changes.
If you are using pre-and post-event scripts or other methods, such as customized serial resource group processing to establish dependencies between applications that are supported by your cluster, then these methods may no longer be needed or can be significantly simplified. Instead, you can specify dependencies between resource groups in a cluster. For more information on how to configure resource group dependencies, see Configuring Dependencies between Resource Groups.
If you still want to customize behavior for some applications, consider adding a pre- or post-event script to the resource_state_change event. See Chapter 7: Planning Events in the Planning Guide for more details on this event.
Configuring Pre- and Post-Event Commands
To define your customized cluster event scripts:
1. Enter smit hacmp
2. In SMIT, select Extended Configuration > Extended Event Configuration > Configure Pre/ Post-Event Commands > Add a Custom Event Command and press Enter.
3. Enter the field values as follows:
4. Press Enter to add the information to HACMPcustom class in the local HACMP Configuration Database (ODM).
5. Go back to the Extended Configuration menu and select Extended Verification and Synchronization to synchronize your changes across all cluster nodes.
Note: Synchronizing does not propagate the actual new or changed scripts; you must add these to each node manually.
Configuring Pre- and Post- Event Processing
Complete the following steps to set up or change the processing for an event. In this step you indicate to the Cluster Manager to use your customized pre- or post-event commands. You only need to complete these steps on a single node. The HACMP software propagates the information to the other nodes when you verify and synchronize the nodes.
To configure pre- and post-events for customized event processing:
1. Enter smit hacmp
2. Select HACMP Extended Configuration > Extended Event Configuration > Change/Show Pre-defined HACMP Events to display a list of cluster events and subevents.
3. Select an event or subevent that you want to configure and press Enter. SMIT displays the panel with the event name, description, and default event command shown in their respective fields.
4. Enter field values as follows:
5. Press Enter to add this information to the HACMP Configuration Database.
6. Return to the HACMP Extended Configuration panel and synchronize your event customization by selecting the Extended Verification and Synchronization option. Note that all HACMP event scripts are maintained in the /usr/es/sbin/cluster/events directory. The parameters passed to a script are listed in the script’s header.
Note: You or a third-party system administrator can reset the HACMP tunable values, such as cluster event customizations, to their installation-time defaults. For more information, see the Resetting HACMP Tunable Values section in Chapter 1: Troubleshooting HACMP Clusters in the Troubleshooting Guide.
See Chapter 10: Monitoring an HACMP Cluster, for a discussion on how to emulate HACMP event scripts without actually affecting the cluster.
Configuring User-Defined Events
Note: Changes to custom user-defined events are not supported in an active cluster. User-defined events are not supported by a dynamic reconfiguration of the cluster. You must manually distribute the HACMPude ODM to all nodes after you make any changes.
To add a user-defined event:
1. Enter smit hacmp
2. In SMIT, select Extended Configuration > Extended Event Configuration > Configure User-Defined Events > Add Custom User Defined Event panel.
3. Enter the field values as follows:
Changing or Showing User-Defined Events
To verify that the existing event definitions are specified as intended after a migration from a previous release of HACMP, use the SMIT panel. Do not use the odmget command for this purpose. The odmget command displays strings for the event definitions that differ from the information that was entered in SMIT. This is the expected behavior.
When defining the selection string in SMIT, you don't need the escape characters for this expression. HACMP will add those necessary to get the event into and out of the ODM. Specify the event as in the following example:
Note for this event if you issue the odmget HACMPude command, the output displays the escape characters:
HACMPude: name = "fsfull" state = 0 recovery_prog_path = "/tmp/fsfullevent.rp" recovery_type = 2 recovery_level = 0 res_var_name = "IBM.FileSystem" instance_vector = "Name == \"/tmp\"" predicate = "PercentTotUsed>65" rearm_predicate = "PercentTotUsed<65"To change a custom user-defined event, or to show a list of the events currently defined:
1. From the Configure User Defined Event panel, select Change/Show Custom User Defined Event. SMIT displays a list of all currently defined custom events.
2. Select the event to change or view and press Enter.
SMIT displays the Change/Show Custom User Defined Event panel, with the currently defined information about the event.
3. Change any information, and then press Enter.
Removing User-Defined Events
To remove a custom user-defined event:
1. From the Configure User Defined Event panel, select Remove Custom User Defined Event. SMIT lists all currently defined custom events.
2. To remove a particular event, select the event and press Enter.
3. Press Enter to remove the event.
Tuning Event Duration Time Until Warning
Depending on cluster configuration, the speed of cluster nodes and the number and types of resources that need to move during cluster events, certain events may take different times to complete. Cluster events run asynchronously and usually call AIX 5Lsystem commands. Since HACMP has no means to detect whether the event script is actually performing useful work at a given period of time, it runs a config_too_long event (which sends messages to the console and to the hacmp.out file) each time the processing of the event exceeds a certain amount of time. For such events, you may want to customize the time period HACMP waits for an event to complete before issuing the config_too_long warning message.
Also, see the section on this topic in Chapter 7: Planning for Cluster Events in the Planning Guide for more information on when to alter the time before receiving a system warning.
Note: The config_too_long warning timer for node_up should be adjusted to allow for longer time to process node_up events with dependent resource groups. node_up processing in clusters with dependencies could take more time than in the clusters without dependent resource groups.
Prerequisites and Notes
The following are important to keep in mind when you are working with event duration:
The total duration time is calculated differently for “slow” and “fast” cluster events. “Fast” events are those that do not include acquiring or releasing resources and normally take a shorter time to complete.
For “fast” events, the total duration time during which HACMP waits before issuing a warning is equal to Event Duration Time.
“Slow” cluster events are those that involve acquiring and releasing resources, use application server start and stop scripts, or site events using HAGEO. “Slow” events may take a longer time to complete. Customizing event duration time for “slow” events lets you avoid getting unnecessary system warnings during normal cluster operation.
For “slow” events, the total duration time before receiving a config_too_long warning message is set to the sum of Event-only Duration Time and Resource Group Processing Time.
Remember, you can customize event duration time before receiving a warning for cluster events, not for nodes or specific resource groups in your cluster. Once the Total Event Duration Time is specified, the system waits for the specified period of time and sends a config_too_long message to the node which was affected by this event. For example, you have a cluster with five resource groups. A node_down event (a “slow” event) occurs on Node A, which owns some of the resource groups. And, you have previously specified the Event-only Duration Time to be 120 seconds, and the Resource Group Processing Time to be 400 seconds.
When a node_down event occurs on Node A, a config_too_long message is sent to Node A according to this formula:
Event Duration Time (120 seconds) + Resource Group Processing Time (400 seconds) = 520 seconds (Total Event Duration Time). A config_too_long message appears on Node A after 520 seconds.
During dynamic reconfiguration events, the Cluster Manager uses the previously specified values of the event duration time until warning. After dynamic reconfiguration is complete and the new values of event duration time get synchronized, the Cluster Manager uses the newly specified values. You can configure Event Duration Time using the HACMP for AIX > Extended Configuration > Extended Event Configuration > Change/Show Time Until Warning panel in SMIT.
Changing Event Duration Time Until Warning
To change the total event duration time before receiving a config_too_long warning message, perform the following procedure on any cluster node:
1. Enter smit hacmp
2. In SMIT, select HACMP Extended Configuration > Extended Event Configuration > Change/Show Time Until Warning and press Enter.
3. Enter data in the fields as follows:
4. Press Enter to change the field values. HACMP changes these values in the HACMP Configuration Database.
5. Synchronize the cluster to propagate the data to the other cluster nodes. HACMP uses the specified total event duration time before issuing config_too_long warning messages.
Configuring a Custom Remote Notification Method
You can configure a remote notification method through SMIT to issue a customized numeric or alphanumeric page in response to a specified cluster event. Starting with HACMP 5.3, you can also send SMS text message notifications to any address, including a cell phone SMS address or mail to an email address. The pager message is sent through the attached dialer modem. Cell phone text messages are sent through email using the TCP/IP connection or an attached GSM wireless modem.
The following sections describe how to configure custom remote notification methods to respond to an event, how cluster verification confirms the remote notification configuration, and how node failure affects the remote notification method.
You can send the following custom remote notifications:
Numeric and alphanumeric page SMS text message to any address including a cell phone or mail to an email address. SMS text message using a GSM modem to transmit the notification through a wireless connection. Prerequisites
The HACMP remote notification functionality requirements follow:
A tty port used for paging cannot also be used for heartbeat traffic or for the DBFS function of HAGEO. Any tty port specified must be defined to AIX 5L and must be available. Each node that may send a page or text messages must have an appropriate modem installed and enabled. Note: HACMP checks the availability of the tty port when the notification method is configured and before a page is issued. Modem status is not checked.
Note: To send an SMS text message over the dialer modem, your pager provider must offer this service.
Each node that may send email messages from the SMIT panel using AIX 5L mail must have a TCP/IP connection to the Internet. Each node that may send text messages to a cell phone must have an appropriate Hayes-compatible dialer modem installed and enabled. Each node that may transmit an SMS message wirelessly must have a Falcom-compatible GSM modem installed in the RS232 port with the password disabled. Ensure that the modem connects to the cell phone system. Creating a Remote Notification Message File
Before you can issue a message to a pager or cell phone, you must create a file that contains the message text. HACMP provides a template to help you create this file. The template contains default text and instructions for an alphanumeric page or cell phone message. The template is in:
By default, the message contains the following information: the event, the node on which it occurred, the time and date, and the name of the object (node, network, site, etc.) affected by the event. This default message is sent if no message file is found at the time a custom alphanumeric page or cell phone message is triggered.
For numeric pages, the provided sample text is not appropriate; your numeric page file should contain only digits. If no message file is found when a numeric page is triggered, the default message sent is “888.”
The sample.txt file contains comments that relate to an alphanumeric pager or cell phone message. A numeric page does not use this file. Shown below is the sample.txt file; there is no need to alter the file unless you want to add additional recipients.
Note: Save the sample.txt file with a new name before modifying it. However, if you do alter the file when you migrate to a new version of HACMP, the customized file is preserved, even though a new default sample.txt file is installed. See the related section in the Installation Guide on upgrading to HACMP 5.4 for details on where your modified sample.txt file is saved after a new installation.
Note: Place a separate copy of each message file on each node listed in the notification method definition. HACMP does not automatically distribute this file to other nodes during cluster synchronization.
Contents of the Sample.txt file
The following lists the contents of the sample.txt file:
# sample file for alphanumeric paging # you can use the following notations in your message # %d - current time&date # %n - node that sends the message # %e - eventname # '#' is used to comment the line # for example "Node %n: Event %e occured at %d" # if nodename=bazilio, event=node_up # and current date&time = Thu Sep 28 19:41:25 CDT 2006 # will result in sending the message # "Node bazilio: Event node_up occured at Thu Sep 28 19:41:25 CDT 2006"Defining a Remote Notification Method
To define a pager notification method first define a tty port for each node that might issue the page, and then define the remote notification method. To define a cell phone text message, follow the steps listed in the section Defining a New Remote Notification Method.
Defining a TTY Port to Issue a Page
To define a tty port for each node that might issue a page:
1. Enter smit hacmp
2. In SMIT, select HACMP Extended Configuration > HACMP Extended Event Configuration > Configure Remote Notification Method and press Enter.
3. Select Configure Node/Port Pairs.
4. Select the node that will issue the page from the list of cluster nodes.
5. Press F4 for a list of available ports for the chosen node, and select one port from the list.
6. Repeat steps 3 and 4 for each node that might be called to issue a page.
Defining a New Remote Notification Method
To define a new remote notification method:
1. n SMIT, select HACMP Extended Configuration > HACMP Extended Event Configuration > Configure Remote Notification Method > Add a Custom Remote Notification Method and press Enter.
2. Fill in field values as follows:
Method Name Assign a name to the notification method. This could also indicate who would get the message. Description Add a description, if desired, of the notification method. Nodename(s) Enter the name(s) of one or more nodes that you want to issue this or cell phone message. Press F4 to get a list of node names. Each node must have been defined previously in the Define Port/Node Pairs SMIT panel. Separate multiple nodes with a space.Note: The sequence of nodes in this SMIT field determines their priority for sending pages or cell phone messages.See Remote Notification and Node Failure for more information on node priority for remote notification. Number to Dial or Cell Phone Address Indicate the telephone number to dial to reach the pager or the address of the cell phone. The number-to-dial string can contain any characters or sequences supported by a Hayes-compatible modem using the standard Telocator Alphanumeric Protocol (TAP)—your provider must support this service.
- Depending on the type of pager, you will need to enter either the number of the pager alone, or the number of the paging company followed by the pager number:
If you are using a numeric pager, use the form:
18007650102,,,,
The commas create pauses in the dialing sequence. The trailing commas are required because there is always some delay between dialing and the actual sending of the page.If the pager is alphanumeric the input should take the form:
180007654321;2119999
where 18007654321 is the paging company number and 2119999 is the actual pager number.- For cell phone text messaging using email, enter the address of the cell phone. This is in the format: phone_number@provider_address. Consult your provider for the specific provider_address format. It may look like 180007654321@provider.net. Multiple space-separated addresses can be used. Test this by sending an email. To send email to multiple addresses, separate the addresses using a space.
- You can send a text message wirelessly to a cell phone, if a GSM modem is used instead of the dialer modem. The format is <cell phone number>#. For example, it may look like 7564321#.
The SIM providers may support international calls. Filename Specify the path of the text file containing the pager message or cell phone message.Note: Make sure the path refers to the correct location of the message file on each node specified in the Node Name(s) field. Cluster Event(s) Specify the event(s) that activate this notification method. Press F4 to get a list of event names. Separate multiple events with a space. Retry Counter Specify how many times to reissue the page or cell phone message if it fails. The default is 3 times. TIMEOUT Specify how many seconds to wait before considering a page or cell phone message attempt failed. The default is 45 seconds.
3. When you finish entering values in all fields, press Enter.
4. Synchronize the cluster to propagate the configuration information to the other nodes.
Note: The configuration information can be entered on one node and propagated to the others during synchronization, but you must manually make sure that the correct page or cell phone message text files reside in the correct locations on each node in the nodelist.
Verification of Remote Notification Methods
When you synchronize or perform cluster verification, HACMP checks the configuration of your remote notification method and issues an error in these cases:
A specified pager or cell phone message file is missing from a node it should reside on. (The message can still be sent—it will contain the text supplied in the original sample.txt file.) The same tty is defined for both heartbeat traffic and paging. The same tty is defined for both DBFS and paging. Sending a Test Remote Notification Message
You can send a test page or cell phone message to make sure everything is configured correctly, and that the expected notification will be issued for a given event, just as if the event actually occurred.
Before sending the test remote message, you must have a notification method already configured. The test remote message must be sent from a node that is configured for the selected method.
To configure a remote notification message:
1. From the Configure Custom Remote Notification Method menu, select Send a Test Remote Message.
2. Select a remote notification method to use for the test.
3. In the Send a Test Remote Message panel, fill in field values as follows:
4. Press Enter. The Command Status window then reports the remote message was successful, or errors occurred.
The test remote message will be the message file you specified when you configured the notification method. If an object name is included, for the test remote message, it will appear as a pseudo-name such as node_1, adapter_1, site_1, network_1, etc. If the message file cannot be located, a default message will be sent to the pager or cell phone and an error will be displayed. For alphanumeric pages or cell phone messages, the default message is the sample text; for numeric pages, the default message is “888.”
Remote Notification and Node Failure
If a node fails and triggers a page or cell phone message, the remote notification is sent from the node with the next highest priority. (A node’s order of priority is determined by the order in which you listed node names when you defined the method.) If the next highest priority node is up but unable to send the remote notification for some other reason (for instance, the modem is not connected), the system attempts to resend the remote notification message for the number of times specified in the Retry Counter. If the remote notification still cannot be sent, it fails. The remote notification is not passed on to be issued from another node.
Changing or Removing a Custom Remote Notification Method
You can change or remove a notification method through SMIT to issue a customized remote notification in response to a specified cluster event.
Changing a Remote Notification Method
To change the configuration of a custom remote notification method:
1. Enter smit hacmp
2. In SMIT, select HACMP Extended Configuration > Extended HACMP Event Configuration> Configure Remote Notification Methods > Change/Show Custom Remote Notification Method and press Enter.
3. Select the method you want to change.
4. Make your changes.
5. Press Enter.
Deleting a Remote Notification Method
To delete a custom remote notification method:
1. Enter smit hacmp
2. In SMIT, select HACMP Extended Configuration > Extended HACMP Event Configuration> Configure Remote Notification Methods > Remove Custom Remote Notification Method and press Enter.
3. Specify the name of the method you want to delete.
4. Press Enter to delete the method.
![]() ![]() ![]() |