![]() ![]() ![]() |
Chapter 3: Upgrading an HACMP Cluster
This chapter provides instructions for upgrading an existing HACMP cluster configuration to HACMP 5.4.
Overview
Before performing an upgrade, read these sections:
To perform an upgrade, refer to these sections:
After performing a migration, read these sections:
Prerequisites
To understand the concepts in this section, you should have a basic knowledge of the following:
HACMP (from high-level concepts to low-level tasks, such as planning, maintenance, and troubleshooting), since the procedures in this chapter build on that knowledge. The version of HACMP from which you are upgrading. In particular, you should know how to use the cluster snapshot utility. The version of HACMP you are installing. It is helpful to have copies of any paper worksheets you filled in as part of the planning process with information about the cluster nodes, networks, subnets, IP labels, applications, resource groups, and pre- and post-event scripts that are currently used. Terminology Overview
This chapter uses the following terms and acronyms:
Term Definition cllockd Cluster Lock Manager daemon. This daemon was last supported in HACMP 5.1. HACMP (HAS) HACMP, also known as HACMP classic, is an option of HACMP without Enhanced Scalability. This option exists for all versions of HACMP before HACMP 5.1.Starting with HACMP 5.1, the HACMP classic option was merged with the HACMP/ES option and both renamed to HACMP 5.1. HACMP/ES HACMP/ES is an option of HACMP with Enhanced Scalability. The Enhanced Scalability options exists for all versions, starting with HACMP 4.1. Hybrid cluster or mixed cluster Nodes in a cluster running two different versions of HACMP. Migration The process of upgrading an existing HACMP cluster to the HACMP current level. (Used interchangeably with the term “upgrade.”) ODM Object Data Model, also known as the HACMP configuration database. Offline migration A type of upgrade where HACMP is brought offline on all nodes prior to performing the upgrade. During this time, resources are not available. Rolling migration A type of upgrade from one HACMP version to another during which cluster services are not stopped on all nodes in the cluster. Cluster services are stopped on one node at a time, that node is upgraded and reintegrated into the cluster before the next node is upgraded.Starting with HACMP 5.4, when using rolling migration to upgrade HACMP software on an individual node, you may choose to keep the applications and the resources running continuously on that node, though they will not be highly available during the upgrade. Snapshot conversion A type of upgrade from one HACMP version to another during which you take a snapshot of the current cluster configuration, stop cluster services on all nodes, install the next version of HACMP, and then convert the snapshot by running the clconvert_snapshot utility. See also Upgrading Using a Snapshot. SWVPD Software Vital Product Data, a set of installable software product filesets.
Identifying Your Upgrade Path
To identify your upgrade path, determine the version of HACMP from which you are migrating and whether it is appropriate for cluster service to be running.
Supported Upgrade Paths
HACMP 5.4 supports the upgrade scenarios from HACMP versions 5.1, 5.2, and 5.3 using either rolling migration or snapshot conversion.
Upgrading from Pre-5.1 Versions of HACMP
To upgrade to HACMP 5.4 from versions earlier than 5.1, you must first upgrade to one of the supported versions and then upgrade to HACMP 5.4.
For example, to upgrade from HACMP/ES 4.5, the following upgrade paths must be executed:
1. Upgrade from HACMP/ES 4.5 to HACMP 5.1
2. Upgrade from HACMP 5.1 to HACMP 5.4.
For up-to-date information about any available APARs, see the IBM Web site. Apply all applicable APARs after you upgrade to the new version of HACMP.
Choosing Your Upgrade Scenario
Next, identify the way in which you want to perform the upgrade with cluster services running or with them stopped. Both methods are described in this section.
Note: If you are upgrading from a previous release to HACMP 5.4, you must reboot the node after the installation. If you are upgrading HACMP 5.4 with a PTF, you are not required to reboot the node.
Upgrading While Keeping the HACMP Cluster Services Running
You can perform a nondisruptive upgrade of HACMP software, keeping cluster services running and the application continuously available throughout the upgrade if you are applying PTF to HACMP 5.4.
To upgrade the cluster while keeping cluster services running:
1. Stop cluster services on one cluster node and choose the Move Resource Group option.
2. Upgrade the HACMP software on the node.
3. Reintegrate the node into the cluster by restarting cluster services on the node.
4. Repeat steps 1-3 for all nodes in the cluster.
Upgrading While Stopping the HACMP Cluster Services
If you have a maintenance window during which you can temporarily stop cluster services on all nodes, you can upgrade all nodes to HACMP 5.4. Although you can upgrade without using a snapshot, taking a snapshot beforehand is always recommended.
Upgrading Using a Snapshot
An upgrade using a snapshot is also referred to as a snapshot conversion.
To upgrade the cluster using a snapshot:
1. In an active cluster that predates HACMP 5.4, create and save the cluster snapshot.
2. Stop cluster services on all cluster nodes.
3. Remove the current version of the HACMP software on all nodes.
4. Install the HACMP 5.4 cluster software on all nodes.
5. Convert the previously saved snapshot by using the clconvert_snapshot utility. For information on this utility, see Appendix C: HACMP for AIX Commands in the Administration Guide.
6. Apply the snapshot to the cluster with the newly installed version of HACMP.
7. Verify and start cluster services one node at a time.
Upgrading without Using a Snapshot
Upgrade the cluster without using a snapshot: To upgrade the cluster without using a snapshot, follow the steps as described in the section Upgrading to HACMP 5.4 on an Offline Cluster.
To upgrade the cluster without using a snapshot:
1. Stop the HACMP cluster services on all cluster nodes.
2. Upgrade the HACMP software on each node.
3. Start cluster services on one node at a time.
Planning for an Upgrade
To properly plan for an upgrade to HACMP 5.4, follow the steps listed in the following sections.
Updating the Current Cluster Configuration
To update the current cluster configuration, do the following:
1. Check that all nodes in the cluster are up and running the same and most recent version of the HACMP software. Check the IBM web site for the latest HACMP APARs and PTFs available for the current version.
2. Review the installation prerequisites for HACMP 5.4 and ensure that the system being upgraded meets these requirements. See the IBM web site for the latest software levels.
3. If needed, upgrade the AIX 5L operating system and RSCT before upgrading HACMP.
Checking Cluster Condition, Settings, and Worksheets
Do the following:
1. Use clstat to review the cluster state and to make certain that the cluster is in a stable state. For more information on the utility clstat, see the section on Monitoring Clusters with clstat in the Administration Guide.
2. Review the /etc/hosts file on each node to make certain it is correct.
3. Take a snapshot of each node configuration.
4. Save the planning worksheets (paper or online) as a reference. Transfer to new worksheets all information about your existing installation and any changes you plan to make after the upgrade.
5. Ensure that each cluster node has its own HACMP license. Otherwise, contact an IBM representative about licensing HACMP.
6. Ensure you have privileges to perform the installation as the root user, or ask your system administrator to make the appropriate changes.
Reviewing the Cluster and Node Version in the HACMP Configuration Database
To review the version of your cluster:
1. Run odmget HACMP <cluster name> or odmget HACMP <node name>.
The versions in HACMPcluster and HACMPnode classes in the HACMP configuration database (ODM) for previous versions of HACMP are shown in the following table. After migration to HACMP 5.4, the cluster versions noted in the table will be equal to 9.
The following table shows the versions of HACMP in the HACMP configuration database:
Checking Types of Networks
Make sure that HACMP 5.4 supports the types of networks that you plan to use. Remove or change unsupported types before you upgrade the HACMP software.
The following network types are not supported in HACMP 5.4:
IP (Generic IP) SOCC SLIP FCS (Fiber Channel Switch) 802_eth (Ethernet Protocol 802.3). If your previous configuration includes unsupported network types and you attempt to upgrade a node to HACMP 5.4, the installation will fail and an error message will notify you to change the unsupported network type.
Migrating Resource Groups Configured without Sites
Refer to the following table for information on how existing rotating, cascading, and concurrent resource groups are migrated to HACMP 5.4:
New Resource Groups
If you upgraded from pre-5.3 versions of HACMP and plan to add new resource groups to an HACMP 5.4 cluster, refer to the mapping tables in this chapter for information about the combinations of startup, fallover and fallback policies for resource groups.
Resource Group Distribution Policy during Migration
In HACMP 5.3 and above, the resource group distribution policy is a cluster-wide attribute that has only one option, the node-based distribution policy. This ensures that only one resource group that uses this distribution policy is brought online on a node during a node startup.
Note: The network-based distribution policy previously available was deprecated in HACMP 5.3. When you migrate it, it is converted to the node-based policy.
HACMP Configuration Database Security Changes May Affect Migration
The HACMP Configuration Database (ODM) has the following security enhancements:
Ownership. All HACMP ODM files are owned by the root user and the hacmp group. In addition, all HACMP binaries that are intended for use by non-root users are owned by root user and the hacmp group. Permissions. The hacmpdisksubsystem file is set with 600 permissions. Most of the other HACMP ODM files are set with 640 permissions (the root user can read and write, while the hacmp group can only read). All HACMP binaries that are intended for use by non-root users are installed with 2555 permissions (readable and executable by all users, with the setgid bit turned on so that the program runs as hacmp group). During the installation, HACMP creates the hacmp group on all nodes. By default, the hacmp group has permission to read the HACMP ODMs, but does not have any other special authority. For security reasons, do not to expand the authority of the hacmp group.
If you use programs that access the HACMP ODMs directly, you may need to rewrite them if they are intended to be run by non-root users:
All access to the ODM data by non-root users should be handled via the provided HACMP utilities. In addition, if you are using the PSSP File Collections facility to maintain the consistency of /etc/group, the new hacmp group that is created at installation time on the individual cluster nodes may be lost when the next file synchronization occurs. To prevent overwriting your hacmp group, before installing HACMP 5.4, either:
Turn off the PSSP File Collections synchronization of the /etc/group file Include the hacmp group in the master /etc/group file and propagate this change to all cluster nodes. Upgrading Applications Configured Using a Smart Assist
The framework for applications configured with Smart Assists has been enhanced in HACMP 5.4. DB2 and WebSphere applications configured in HACMP 5.3 with the DB2 and WebSphere Smart Assists are converted to use the new infrastructure during an upgrade. After the upgrade, you can manage these applications (change/show/remove) using the new HACMP 5.4 SMIT Standard and Initialization panels.
Applications previously configured with either the Two-Node Configuration Assistant or the Oracle Smart Assist are not converted to the new infrastructure. Continue to use the HACMP Extended SMIT path for managing those applications.
Ensuring Applications Use Proprietary Locking Mechanism
Before upgrading to HACMP 5.4, make sure that applications use their proprietary locking mechanism. The Cluster Lock Manager (cllockd, or cllockdES) is not supported in HACMP 5.4. Check with the application vendor about the concurrent access support.
Installing HACMP 5.4 removes the Lock Manager files and definitions from a node. After a node is upgraded, the Lock Manager state information in SNMP and clinfo shows that the Lock Manager is inactive on a node, whether or not the Lock Manager is running on a back-level node.
Addressing Concurrent Access (AIX 5L) Migration Issues
Before migrating concurrent volume groups to HACMP 5.4, decide whether or not to convert them to enhanced concurrent mode, the default for volume groups created in AIX 5L v.5.2 and up.
Note that:
Enhanced concurrent mode is supported only on AIX 5L v.5.2 and up. SSA concurrent mode is not supported on 64-bit kernels. All nodes in a cluster must use the same form of concurrent mode for a specified volume group. This means that if your cluster included SSA disks in concurrent mode on AIX 4.3.3, and then in preparation for upgrading to HACMP 5.4, you upgrade a node to AIX 5L v.5.2 and chose the 64-bit kernel, you will not be able to bring the concurrent volume group online on all cluster nodes. If you have SSA disks in concurrent mode, you cannot run 64-bit kernels until you have converted all volume groups to enhanced concurrent mode.
Priority Override Location and Persistence
If you are upgrading from a previous release, HACMP 5.4 handles Priority Override Location (POL) and persistence differently than earlier releases did:
The Priority Override Location (POL) setting is not used. The POL setting is not used for resource groups that you move from one node to another. In general, in HACMP 5.4, if you move a resource group from one node to another, it remains on its new node until you need to move it again.
If you reboot the cluster (which you seldom need to do in HACMP 5.4), the group returns to the node that is originally configured as its highest priority node in the nodelist (if the group has a fallback policy that tells it to fall back). If you do not reboot the cluster, the group remains on the node to which you moved it, and, if it has a fallback policy to fall back, then it falls back to its "acting" highest priority node. The persistence after a reboot is not retained. If a resource group in your cluster has a fallback policy with the option Persist Across Cluster Reboot and resides on the node to which you moved it before an upgrade, when you upgrade to HACMP 5.4, then the resource group remains on its destination node after the upgrade. In this case, you did not reboot.
However, if you reboot the cluster, the group returns to the node that is its originally configured highest priority node.
Note: If you want the group to be permanently hosted on the node its originally configured highest priority node, change the highest priority node in the nodelist for the resource group.
Notes on Upgrading from HACMP 5.1
If you are upgrading from HACMP 5.1 to HACMP 5.4, the procedures are the same as described in the section Performing a Rolling Migration to HACMP 5.4 and Upgrading to HACMP 5.4 Using a Snapshot. However, some issues and terminology that are specific to upgrading from HACMP 5.1 are described below.
Environment Variables Are Not Carried Over after Migration from Earlier Versions
This problem may affect you if you used environment variables in your pre- and post-event scripts in releases prior to HACMP 5.1. In general, the shell environment variables declared in /etc/environment are much more strictly managed under HACMP in versions 5.1 and higher.
If, for instance, you previously used HACMP 4.5 (also known as HACMP classic), and had pre- and post-event scripts that used environment variables, such as LOGNAME, be aware that once you start your application with HACMP/ES, that is HACMP v.5.1 or higher, those environment variables are overwritten (empty) and become unavailable to you. In other words, upon an upgrade to HACMP v.5.1 or higher, the environment variables will no longer be set in your pre- and post-event scripts.
Note that in general, when writing your HACMP pre- or post-events or using your pre- and post-evens written for previous versions of HACMP, none of the shell environment variables defined in /etc/environment are available to your program. If you need to use any of these variables, explicitly source them by including this line in your script: “. /etc/environment”.
See the chapter on Cluster Events in the Planning Guide for more information on pre- and post-event scripts.
Security Mode Name Change
The Enhanced security mode in HACMP has been renamed to Kerberos security. The function remains the same.
Migrating HACMP 5.1 Dynamic Node Priority Fallback Policy
If your configuration includes an HACMP 5.1 resource group with a dynamic node priority policy for fallover, migration or conversion to HACMP 5.4 changes this setting. The cl_convert utility changes the fallover policy for that resource group to Fallover to Next Priority Node when migration or conversion is complete. During migration, the default node fallover policy is used.
Migrating HACMP 5.1 Existing Resource Groups to HACMP 5.4
In HACMP 5.4, all types of groups are referred to as resource groups. During an upgrade, the cascading, rotating and concurrent resource groups that you configured in HACMP 5.1 are converted to resource groups with a specific set of startup, fallover and fallback policies. All functional capabilities of resource groups are retained in HACMP 5.4. For resource groups mapping information, see Migrating Resource Groups Configured without Sites.
Migrating HACMP 5.1 Resource Groups Configured with Sites
Starting with HACMP 5.2, inter-site management policy names changed. This table summarizes the mapping between the inter-site policies in HACMP 5.1 and HACMP 5.4 for resource groups configured with sites:
Inter-Site Policy in HACMP 5.1 Inter-Site Policy in HACMP 5.4 Cascading Prefer Primary Site Rotating Online on Either Site Concurrent Online on Both Sites
HACMP 5.4 Inter-Site Selective Fallover for Resource Group Recovery
Selective fallover of resource groups between sites is disabled by default when you upgrade to HACMP 5.4 from a previous release. This is the pre-5.4 release behavior for non-IGNORE site management policy. A particular instance of a resource group can fall over within one site, but cannot move between sites. If no nodes are available on the site where the affected instance resides, that instance goes into ERROR or ERROR_SECONDARY state. It does not stay on the node where it failed. This behavior applies to both primary and secondary instances.
Note that in HACMP 5.3 and above, though the Cluster Manager does not initiate a selective fallover across sites by default, it still moves the resource group if a node_down or node_up event occurs, and you can manually move a resource group between sites.
For a new install of HACMP 5.4, inter-site resource group recovery is enabled by default.
You can change the default behavior after the migration and installation of HACMP 5.4 is complete, using the HACMP SMIT path:
Extended Configuration > Extended Resource Configuration > HACMP Extended Resources Configuration > Customize Resource Group and Resource Recovery > Customize Inter-Site Resource Group Recovery.
Upgrade Do’s and Don’ts
This section lists the major tasks to do during or before an upgrade, and it also lists what not to do.
Upgrade Do’s
Make sure that you do the following:
1. Take a cluster snapshot and save it to the /tmp directory as well as to another machine and CD.
2. Save a copy of any event script files to the /tmp directory as well as to another machine and CD.
3. Ensure that the same level of cluster software (including PTFs) are on all nodes before beginning a migration.
4. Ensure that the cluster software is committed (and not just applied). See Checking That the Software Is Committed.
Upgrade Don’ts
During any type of upgrade, do not do the following:
Do not save your cluster configuration or customized event scripts under these directory paths: /usr/sbin/cluster, /usr/es/sbin/cluster or /usr/lpp/cluster. Data in these directories may be lost during an upgrade. Instead, copy files to a separate machine or to a CD. Do not synchronize the cluster. When migrating from HACMP 5.3 to 5.4, do not stop a node and place resource groups in an UNMANAGED state. Do not attempt a DARE or a C-SPOC command. For example, do not change node priority in a resource group, add or remove a network, update an LVM component, or add users to a cluster. Do not leave the cluster in a hybrid state for an extended period of time. When migrating an active cluster, one node at a time (a rolling migration), the use of commands and functions are restricted as follows when the cluster has mixed versions (is in a hybrid state):
Do not change the cluster topology or configuration. Do not verify and synchronize the cluster configuration. Do not use any System Management (C-SPOC) functions except for the Manage HACMP Services functions. Do not use the Problem Determination Tools > View Current State function. Do not use the Extended Configuration > Snapshot Configuration > Add a Cluster Snapshot option or run the clsnapshot command. Do not use the Problem Determination Tools > Recover From HACMP Script Failure option or run the clruncmd command, except when running the command or SMIT option from the target node specified by the command. Checking That the Software Is Committed
Before upgrading the cluster to HACMP 5.4, ensure that the current software installation is committed (not just applied).
To ensure that the software is already committed:
1. Run the lslpp -h cluster.* command.
2. If the word APPLY displays under the action header, enter smit install_commit before installing the HACMP software.
SMIT displays the Commit Applied Software Updates (Remove Saved Files) panel.
3. Enter field values as follows:
SOFTWARE name From the picklist, select all cluster filesets. COMMIT old version if above version used it? Set this field to yes. EXTEND filesystem if space needed? Set this field to yes.
Performing a Rolling Migration to HACMP 5.4
This section describes how to upgrade an HACMP cluster to HACMP 5.4 while keeping cluster services running on the nodes. During the upgrade window, the cluster will be running in a hybrid state where HACMP 5.3 cluster components run in unison with HACMP 5.4 cluster components in processing any specific cluster events that may occur.
Before continuing, review the section Security Mode Name Change.
Step 1: Stop Cluster Services on a Node Hosting the Application
Once all HACMP cluster nodes that have a previous version installed are up and the cluster is stable, stop cluster services on node A using the graceful with takeover option. (Starting with HACMP 5.4, this is known as stopping cluster services and moving the resources groups to other nodes.) If needed, install a new version of RSCT.
1. Enter smit hacmp
2. Use the System Management (C-SPOC) > Manage HACMP Services > Stop Cluster Services SMIT menu to stop cluster services.
3. Select takeover for Shutdown mode.
4. Select local node only and press Enter.
For example, if application X is running on node A, stopping node A gracefully with takeover causes the resource group containing the application to fall over to node B. After fallover is complete, the upgrade of node A to HACMP 5.4 can continue.
Step 2: Install the HACMP 5.4 Software
On node A, install HACMP 5.4, which converts the previous HACMP configuration database (ODMs) to the HACMP 5.4 format. This installation process uses the cl_convert utility and creates the /tmp/clconvert.log file. A previously created version of the file is overwritten.
To install the HACMP software:
1. Enter smit install
2. In SMIT, select Install and Update Software > Update Installed Software to Latest Level (Update All) and press Enter.
3. Enter the values for Preview only? and Accept new license agreements? For all other field values, select the defaults.
4. Press Enter.
Step 3: Reboot the Node
Reboot the node with the shutdown –Fr command.
Step 4: Start Cluster Services on the Upgraded Node
Start cluster services on node A. Node A is running HACMP 5.4 while nodes B, C, and others are running HACMP with an earlier version.
To start cluster services on a single upgraded node:
1. Enter smit clstart
2. Enter field values as follows and press Enter
Note: Verification is not supported on a mixed version cluster. Run verification only when all nodes have been upgraded.
Step 5: Repeat Steps for Other Cluster Nodes
Repeat steps 1–3 for the remaining cluster nodes, one node at a time.
Step 6: Verify the Upgraded Cluster Definition
Verification provides errors or warnings to ensure that the cluster definition is the same on all nodes. You can verify and synchronize a cluster only when all nodes in the cluster are running the same version of the software.
To verify the cluster:
1. Enter smit hacmp
2. In SMIT, select Extended Configuration > Extended Verification and Synchronization and press Enter.
3. Change Verify Changes Only to yes.
Upgrading to HACMP 5.4 Using a Snapshot
This section describes how to upgrade a cluster with an earlier version of HACMP to HACMP 5.4 using a cluster snapshot that was created on the configuration with an earlier version of HACMP. This upgrade path requires cluster services to be down simultaneously on all nodes; as a result, your applications will not be highly available during the upgrade window.
Step 1: Creating a Snapshot
While all HACMP 5.3 cluster nodes are up and the cluster is stable, create a snapshot.odm file on node A. In addition, copy the snapshot.odm file to the /tmp directory.
For instructions on creating a snapshot, see the chapter on Saving and Restoring Cluster Configurations in the Administration Guide.
Step 2: Stopping All Nodes
Stop all nodes, one at a time, gracefully (Starting with HACMP 5.4, stopping gracefully will be referred to as stopping cluster services and taking resource groups offline.)
1. Enter smit hacmp
2. Use the System Management (C-SPOC) > Manage HACMP Services > Stop Cluster Services SMIT menu to stop cluster services.
Step 3: Removing the Existing Version of the HACMP Software
Before you remove the HACMP software from a system, make sure that HACMP cluster services are stopped. You cannot remove the HACMP software when the cluster is running: The system displays an error message and prevents removal of core filesets.
To remove the existing version:
1. Ensure that cluster services have been stopped on all nodes.
2. Enter smit install_remove
3. Enter field values in the Remove Installed Software SMIT panel as follows and press Enter:
Note: If HAGEO is installed, you must also remove it from your system. Use the same steps as listed above, using hageo as the software name.
Step 4: Installing the HACMP 5.4 Software
To install the HACMP 5.4 software on each cluster node:
1. Enter smit install_all
2. In SMIT, select cluster.es.server.rte and any additional component you would like to install. Press Enter.
3. Enter the values for Preview only? and Accept new license agreements? For all other field values, select the defaults.
4. Press Enter.
Step 5: Converting and Applying a Saved Snapshot
After you install HACMP 5.4 on all cluster nodes, convert and apply the snapshot on the same node where the cluster snapshot was added.
Use the clconvert_snapshot utility to convert the cluster snapshot. The clconvert_snapshot utility converts your existing snapshot.odm file to the HACMP 5.4 format and saves a backup snapshot.odm.old file in the HACMP format that was previously installed.
In the following example, version is the HACMP version number and snapshotfile is the name of your snapshot file. The snapshot file is stored in the directory specified by the $SNAPSHOTPATH variable that by default is /usr/es/sbin/cluster/snapshots:
clconvert_snapshot -v version -s snapshotfile
For example, if converting a snapshot called my530snapshot use:
clconvert_snapshot -v 5.3 -s my530snapshot
For information on the clconvert_snapshot utility, see Saving and Restoring Cluster Configurations in the Administration Guide.
Step 6: Verifying the Upgraded Cluster Definition
After the HACMP software is installed on all nodes, and all nodes have been rebooted, verify and synchronize the cluster configuration. Verification provides errors or warnings to ensure that the cluster definition is the same on all nodes. You can verify and synchronize a cluster only when all nodes in the cluster are running the same version of the software.
To verify the cluster:
1. Enter smit hacmp
2. In SMIT, select Extended Configuration > Extended Verification and Synchronization and press Enter.
3. Change Verify Changes Only to yes.
4. Start nodes one at a time.
Post-Upgrade Check: Troubleshooting a Stalled Snapshot Application
If you upgrade to HACMP 5.4 using the previously created snapshot, upgrading to the new version may stall if the cluster configuration in the snapshot is not 100% accurate according to verification. (Also, dynamic cluster reconfiguration does not work if verification finds any errors).
If you apply a snapshot and see an error, review the log files to check if it can be automatically corrected in HACMP 5.4 by the verification utility. If the error is included in the list, to continue the upgrade process, force apply the snapshot and run the cluster synchronization and verification process, with the option Automatically Correct Errors during the Cluster Verification set to Interactively.
Note: Be careful when applying the snapshot forcefully; only use this option if you are sure that the error you encountered can be automatically corrected.
You may see the following warnings and errors:
WARNING: "The NFS mount/Filesystem specified for resource group rg1 is using incorrect syntax for specifying an NFS cross mount: /mnt/fs1". ERROR: "Disk Heartbeat Networks have been defined, but no Disk Heartbeat devices. You must configure one device for each node in order for a Disk Heartbeat network to function".In these cases, apply the snapshot forcefully to continue an upgrade process to HACMP 5.4. Although the upgrade process via a snapshot fails (there is no automatic corrective action for them), the cluster remains intact, therefore, force applying the snapshot is safe.
Upgrading to HACMP 5.4 on an Offline Cluster
It is possible to bring cluster services down on all of the nodes and then migrate the cluster definitions individually on each node. This process is supported for the following versions:
HACMP 5.1 HACMP 5.2 HACMP 5.3. To bring a cluster offline and upgrade the HACMP software on the cluster nodes, complete the following procedure:
1. Stop cluster services on all nodes.
2. Ensure that cluster services have been stopped on all nodes.
3. Install the new HACMP software on each node.
4. Review the /tmp/clconvert.log file to ensure that a conversion of the HACMP ODMs has occurred.
5. Start cluster services, one node at a time, and ensure that each node successfully joins the cluster.
See also sections on Performing a Rolling Migration to HACMP 5.4, Testing An Upgraded Cluster, and Upgrading to HACMP 5.4 Using a Snapshot.
Applying a PTF
This section describes how to apply a Program Temporary Fix (PTF)—a correction for a software problem reported in an APAR. Starting with HACMP 5.4, you can apply an HACMP 5.4 PTF update on an individual node using rolling migration. During the upgrade your critical applications and resources continue to run on that node, though they will not be highly available.
Before continuing, review the section Security Mode Name Change.
Step 1: Stop Cluster Services on a Node Hosting the Application
Once all HACMP 5.4 cluster nodes are up and the cluster is stable, bring node A offline without bringing its associated resource groups offline by stopping cluster services using the Unmanage Resource Groups option. (Prior to HACMP 5.4, this was known as forced down.)
1. Enter smit hacmp
2. In SMIT, select the System Management (C-SPOC) > Manage HACMP Services > Stop Cluster Services SMIT menu to stop cluster services. Use the Unmanage Resource Groups option.
3. Enter field values. The example below displays only the settings that are different from the default settings.
For more information on stopping cluster services, see Understanding Stopping Cluster Services in the Administration Guide.
Note: HACMP cluster services are not monitoring the applications at some periods during the rolling migration—applications continue to run, but HACMP cluster services are suspended on the nodes—your applications may potentially fail at this time. If your application must be highly available, we recommend that you do not keep the application in the UNMANAGED state for a long period of time.
Step 2: Install the PTF Update
On node A, apply the PTF update, which updates the HACMP 5.4 configuration database (ODMs). This installation process uses the cl_convert utility and creates the /tmp/clconvert.log file. A previously created version of the file is overwritten.
To install the HACMP software:
1. Enter smit install
2. In SMIT, select Install and Update Software > Update Installed Software to Latest Level (Update All) and press Enter.
3. Enter the values for Preview only? and Accept new license agreements? For all other field values, select the defaults.
4. Press Enter.
Step 3: Start Cluster Services on the Upgraded Node
Start cluster services on node A. Node A is running the updated HACMP version while nodes B, C, and others are running the previous version of HACMP 5.4. HACMP starts monitoring the running applications and resources on Node A.
To start cluster services on a single upgraded node:
1. Enter smit hacmp
2. In SMIT, select System Management (C-SPOC) > Manage HACMP Services > Start Cluster Services and press Enter.
3. Enter field values as follows and press Enter
Note: Verification is not supported on a mixed version cluster. Run verification only when all nodes have been upgraded.
Note: After applying a PTF update, a warning may display stating the Cluster Manager daemon did not start. Use the command lssrc -ls clstrmgrES to verify the Cluster Manager started successfully.
Step 4: Repeat Steps for Other Cluster Nodes
Repeat steps 1–3 for the remaining cluster nodes, one node at a time.
Step 5: Verify the Upgraded Cluster Definition
Once all nodes are up and the cluster is stable, run cluster verification on the upgraded HACMP cluster.
Verification provides errors or warnings to ensure that the cluster definition is the same on all nodes. You can verify and synchronize a cluster only when all nodes in the cluster are running the same version of the software.
To verify the cluster:
1. Enter smit hacmp
2. In SMIT, select Extended Configuration > Extended Verification and Synchronization and press Enter.
3. Change Verify Changes Only to yes.
Additional Migration Tasks
After you complete the upgrade to HACMP 5.4, you may need complete additional tasks such as:
Recompiling Clinfo Clients after Migrating to HACMP 5.4
Recompiling existing Clinfo applications is not required. However, in HACMP 5.4:
CL_MAXNAMELEN is now 256 characters There are changes related to resource group information. If you wish to incorporate these changes in your applications then make the desired modifications, recompile and link the applications using the Clinfo library. For updated information about the Clinfo C and C++ API routines, see Programming Client Applications.
Using a Web-Based Interface for Configuration and Management
HACMP 5.4 includes a web-based user interface, called WebSMIT, that provides consolidated access to the SMIT functions of configuration and management as well as to interactive cluster status and documentation.
To use the WebSMIT interface, you must configure and run a web server process on the cluster node(s) to be administered. See the /usr/es/sbin/cluster/wsm/README file for full information on basic web server configuration, the default security mechanisms in place when HACMP 5.4 is installed, and the configuration files available for customization.
Also see Installing and Configuring WebSMIT in Chapter 4: Installing HACMP on Server Nodes.
Resetting HACMP Tunable Values
In HACMP 5.4, you can change the settings for a list of tunable values that were changed during the cluster maintenance and reset them to their default settings, or installation-time cluster settings.
The installation-time cluster settings are equivalent to the values that appear in the cluster after manually installing HACMP.
Note: Resetting the tunable values does not change any other aspects of the configuration, while installing HACMP removes all user-configured configuration information including nodes, networks and resources.
List of Tunable Values
The following values can be reset:
User-supplied information: Network module tuning parameters, such as failure detection rate, grace period and heartbeat rate. HACMP resets these parameters to their installation-time default values. Cluster event customizations, such as all changes to cluster events. Note that resetting changes to cluster events does not remove any files or scripts that the customization used, only the HACMP’s knowledge of pre- and post-event scripts. Cluster event rules. Any changes made to the event rules database are reset to their installation-time default values. HACMP command customizations. Any changes to the default set of HACMP commands are reset to their installation-time defaults. Automatically generated and discovered information: Typically, you cannot see this information. HACMP rediscovers or regenerates this information when the cluster services are restarted or during the next cluster synchronization.
HACMP resets the following:
Local node names stored in the cluster definition database Netmasks for all cluster networks Netmasks, interface names and aliases for heartbeat (if configured) for all cluster interfaces SP switch information generated during the latest node_up event (this information is regenerated at the next node_up event) Instance numbers and default log sizes for the RSCT subsystem. For information about how to reset the tunable values using SMIT, see the Administration Guide. Verifying the Success of Your Migration
Now that all nodes have been migrated, you need to complete the following tasks described in the following sections to ensure that everything is working correctly:
Verifying Software Levels Installed Using AIX 5L Commands
Verify the software installation by using the lppchk AIX 5L command, and check the installed directories to see that expected files are present.
The lppchk command verifies that files for an installable software product (fileset) match the Software Vital Product Data (SWVPD) database information for file sizes, checksum values, or symbolic links.
Run the commands lppchk -v and lppchk -c “cluster.*”
Both commands return nothing if the installation is OK.
Automatically Saved Files
The following files in the /usr/lpp/save.config directory are automatically saved during the upgrade process:
/usr/lpp/save.config/usr/es/sbin/cluster/events/node_up.rp /usr/lpp/save.config/usr/es/sbin/cluster/events/node_down.rpIn addition, the following files are saved during the upgrade process and removed from the system at the end of migration:
If you are upgrading from HACMP/ES 4.5, HACMP also saves the following file in the /usr/lpp/save.config directory during the upgrade process:
During the migration process, the SNMP MIBs are inconsistent between the nodes running HACMP 5.4 and the nodes running HACMP/ES 4.5. When migration is complete, all nodes use the same version of the MIBs.
Verifying the Upgraded Cluster Definition
After the HACMP 5.4 software is installed on all of the nodes in the cluster and cluster services restored, verify and synchronize the cluster configuration. Verification ensures that the cluster definition is the same on all nodes. You can verify and synchronize a cluster only when all nodes in the cluster are running the same version of the software.
To verify the cluster:
1. Enter smit hacmp
2. In SMIT, select Extended Configuration > Extended Verification and Synchronization > Verify Changes only and press Enter.
Verifying All Cluster Filesets Have Migrated
Previous documentation APARs may not be successfully converted resulting in the inability to synchronize the cluster. Execute the following to verify all cluster filesets are at the expected level:
lslpp –l | “cluster.*”
Testing HACMP on the Upgraded Cluster
You can also use the Cluster TestTool available in HACMP 5.4 to test your cluster. For information about this tool see Chapter 8: Testing and HACMP Cluster in the Administration Guide.
Running AIX 5L Commands on the Migrated Cluster
To determine which daemons are active on a cluster node, you can use the options on the SMIT System Management (C-SPOC) > Manage HACMP Services > Show Cluster Services panel or lssrc command as follows:
1. Run lssrc –ls topsvcs. The result show the internal state of the RSCT Daemon as follows:
Subsystem Group PID Statustopsvcs topsvcs 24726 activeNetwork Name Indx Defd Mbrs St Adapter ID Group IDEthernet_1_0 [ 0] 3 3 S 192.168.245.42 192.168.245.72Ethernet_1_0 [ 0] (192.168.245.40 )Ethernet_1_0 [ 0] en1 0x417fc659 0x417fc93fHB Interval = 1 secs. Sensitivity = 10 missed beatsMissed HBs: Total: 0 Current group: 0Packets sent : 465057 ICMP 0 Errors: 0 No mbuf: 0Packets received: 681363 ICMP 0 Dropped: 0NIM's PID: 26012Token_Ring_1_0 [ 1] 3 3 S 192.168.248.40 192.168.248.70Token_Ring_1_0 [ 1] tr0 0x417fc61a 0x417fc940HB Interval = 1 secs. Sensitivity = 10 missed beatsMissed HBs: Total: 0 Current group: 0Packets sent : 443287 ICMP 0 Errors: 0 No mbuf: 0Packets received: 661396 ICMP 0 Dropped: 0NIM's PID: 22400rs232_0 [ 2] 2 0 D 255.255.0.1rs232_0 [ 2] tty0HB Interval = 2 secs. Sensitivity = 5 missed beatsMissed HBs: Total: 0 Current group: 0Packets sent : 88465 ICMP 0 Errors: 0 No mbuf: 0Packets received: 0 ICMP 0 Dropped: 0NIM's PID: 27412tmssa_1 [ 4] 2 2 S 255.255.2.4 255.255.2.4tmssa_1 [ 4] ssa1 0x817fc94a 0x817fc951HB Interval = 2 secs. Sensitivity = 5 missed beatsMissed HBs: Total: 0 Current group: 0Packets sent : 319657 ICMP 0 Errors: 460 No mbuf: 0Packets received: 319661 ICMP 0 Dropped: 0NIM's PID: 27818tmssa_2 [ 5] 2 2 S 255.255.2.5 255.255.2.5tmssa_2 [ 5] ssa3 0x817fc620 0x817fc626HB Interval = 2 secs. Sensitivity = 5 missed beatsMissed HBs: Total: 0 Current group: 0Packets sent : 319855 ICMP 0 Errors: 0 No mbuf: 0Packets received: 319864 ICMP 0 Dropped: 0NIM's PID: 149062 locally connected Clients with PIDs:haemd( 29630) hagsd( 30326)Dead Man Switch Enabled:reset interval = 1 secondstrip interval = 20 secondsConfiguration Instance = 20Default: HB Interval = 1 secs. Sensitivity = 4 missed beatsDaemon employs no securitySegments pinned: Text Data.Text segment size: 730 KB. Static data segment size: 639 KB.Dynamic data segment size: 3849. Number of outstanding malloc: 313User time 430 sec. System time 526 sec.Number of page faults: 0. Process swapped out 0 times.Number of nodes up: 3. Number of nodes down: 0.2. Execute the /usr/es/sbin/cluster/utilities/clshowsrv -v utility, which produces results similar to the following:
Status of the RSCT subsystems used by HACMP:Subsystem Group PID Statustopsvcs topsvcs 32182 activegrpsvcs grpsvcs 33070 activegrpglsm grpsvcs 30886 activeemsvcs emsvcs 32628 activeemaixos emsvcs 31942 activectrmc rsct 14458 activeStatus of the HACMP subsystems:Subsystem Group PID StatusclcomdES clcomdES 15778 activeclstrmgrES cluster 32792 activeStatus of the optional HACMP subsystems:Subsystem Group PID StatusclinfoES cluster 31210 activeTroubleshooting Your Migration
For help with specific errors during your migration, see the following sections:
Error: “config_too_long”
When the migration process has completed and the HACMP filesets are being uninstalled, you may see a config_too_long message.
This message appears when the cluster manager detects that an event has been processing for more than the specified time. The config_too_long messages continue to be appended to the hacmp.out log until the event completes. If you observe these messages, periodically check that the event is indeed still running and has not failed.
Error: “cldare cannot be run”
Making configuration changes is not supported during the migration process. If you try to change the cluster topology or resources when migration is incomplete, the synchronization process fails. You receive a message similar to the following:
cldare: Migration from HACMP 5.1 to HACMP 5.4 Detected. cldare cannot be run until migration has completed.When migration is complete, you can apply changes or remove them.
To remove changes, restore the active HACMP configuration database:
1. Enter smit hacmp
2. In SMIT, select Problem Determination Tools > Restore HACMP Configuration Database from Active Configuration.
Recovering Your Previous Version of HACMP
If you want to completely undo your migration, see the following sections:
Recovering from a Conversion Failure
When you install HACMP, the cl_convert command runs automatically to convert the HACMP configuration database from a previous version of HACMP to that of the current version. If the installation fails, run cl_convert to convert the database.
In a failed conversion, run cl_convert using the -F flag. For example, to convert from HACMP/ES 5.1 to HACMP 5.3, use the -F and -v (version) flags as follows:
cl_convert -F -v 5.1.0To run a conversion utility the following is required:
Root user privileges The HACMP version from which you are converting. The cl_convert utility records the conversion progress to the /tmp/clconvert.log file so you can gauge conversion success.
Recovering Configuration Information
The post-installation output informs you to merge site-specific configuration information into the newly installed files:
Some configuration files could not be automatically merged into the system during the installation. The previous versions of these files have been saved in a configuration directory as listed below. Compare the saved files and the newly installed files to determine if you need to recover configuration data. Consult your product documentation to determine how to merge the data. Configuration files that were saved in /usr/lpp/save.config: /usr/es/sbin/cluster/etc/rc.cluster /usr/es/sbin/cluster/samples/clinfo.rc /usr/es/sbin/cluster/samples/pager/sample.txt /usr/es/sbin/cluster/etc/clinfo.rc /usr/es/sbin/cluster/utilities/clexit.rc /usr/es/sbin/cluster/etc/clhosts /usr/es/sbin/cluster/etc/rc.shutdown /usr/es/sbin/cluster/diag/clconraid.dat /usr/es/sbin/cluster/etc/hacmp.term /etc/cluster/lunreset.lst /etc/cluster/disktype.lst
![]() ![]() ![]() |