![]() ![]() ![]() |
Chapter 5: Planning Shared LVM Components
This chapter describes planning shared volume groups for an HACMP cluster. This chapter contains the following sections:
Note: If you are planning an IBM General Parallel File System (GPFS) cluster, see the appendix on network requirements in GPFS Cluster Configuration in the Installation Guide.
If you are planning to use OEM disks, volume groups, or filesystems in your cluster (including Veritas volumes) see the appendix on OEM Disk, Volume Group, and Filesystems Accommodation in the Installation Guide.
Prerequisites
At this point, you should have completed the planning steps in the previous chapters:
You should also be familiar with how to use the Logical Volume Manager (LVM). For information about AIX 5L LVM, see the AIX 5L System Management Guide.
Overview
Planning shared LVM components for an HACMP cluster depends on both of the following:
The type of shared disk device The method of shared disk access. To avoid a single point of failure for data storage, use data redundancy as supported by LVM or your storage system.
Planning for LVM Components
The LVM controls disk resources by mapping data between physical and logical storage. Physical storage refers to the actual location of data on a disk. Logical storage controls how data is made available to the user. Logical storage can be discontiguous, expanded, replicated, and can span multiple physical disks. These facilities provide improved availability of data.
The LVM organizes data into the following components:
Physical Volumes
A physical volume is a single physical disk or a logical unit presented by a storage array. The physical volume is partitioned to provide AIX 5L with a way of managing how data is mapped to the volume. The following figure shows a conventional use of physical partitions within a physical volume.
![]()
When planning shared physical volumes, ensure that:
The list of PVIDs for a volume group is identical on all cluster nodes that have access to the shared physical volume The setting for the concurrent attribute of the volume group is consistent across all related cluster nodes. Volume Groups
A volume group is a set of physical volumes that AIX 5L treats as a contiguous, addressable disk region. You can place from one to 32 physical volumes in the same volume group.
The following figure shows a volume group of three physical volumes:
![]()
In the HACMP environment, a shared volume group is a volume group that resides entirely on the external disks shared by the cluster nodes. A non-concurrent shared volume group can be varied on by only one node at a time.
When working with a shared volume group:
Do not include an internal disk in a shared volume group, because it cannot be accessed by other nodes. If you include an internal disk in a shared volume group, the varyonvg command fails. Do not activate (vary on) the shared volume groups in an HACMP cluster manually at system boot. Use cluster event scripts to do this. Ensure that the automatic varyon attribute in the AIX 5L ODM is set to No for shared volume groups listed within a resource group. The HACMP cluster verification utility automatically corrects this attribute for you upon verification of cluster resources and sets the automatic varyon attribute to No. If you define a volume group to HACMP, do not manage it manually on any node outside of HACMP while HACMP is running on other nodes. This can lead to unpredictable results. If you want to perform actions on a volume group independent of HACMP, stop the cluster services, perform a manual volume group management task, leave the volume group varied off, and restart HACMP. To ease the planning of HACMP’s use of physical volumes, the verification utility checks for: Volume group consistency Disk availability. Logical Volumes
A logical volume is a set of logical partitions that AIX 5L makes available as a single storage unit—that is, the logical view of a disk. A logical partition is the logical view of a physical partition. Logical partitions may be mapped to one, two, or three physical partitions to implement mirroring.
In the HACMP environment, logical volumes can be used to support a journaled filesystem or a raw device.
Filesystems
A filesystem is written to a single logical volume. Ordinarily, you organize a set of files as a filesystem for convenience and speed in managing data.
In the HACMP system, a shared filesystem is a journaled filesystem that resides entirely in a shared logical volume.
You want to plan shared filesystems to be placed on external disks shared by cluster nodes. Data resides in filesystems on these external shared disks in order to be made highly available.
Planning LVM Mirroring
LVM mirroring provides the ability to allocate more than one copy of a physical partition to increase the availability of the data. When a disk fails and its physical partitions become unavailable, you still have access to mirrored data on an available disk. The LVM performs mirroring within the logical volume. Within an HACMP cluster, you mirror:
Logical volume data in a shared volume group The log logical volume for each shared volume group with filesystems. Note: LVM mirroring does not apply to the IBM 2105 Enterprise Storage Servers, TotalStorage DS4000, 6000, 8000 and other disk devices that use RAID, which provide their own data redundancy.
Mirroring Physical Partitions
To improve the availability of the logical volume, you allocate one, two, or three copies of a physical partition to mirror data contained in the partition. If a copy is lost due to an error, the other undamaged copies are accessed, and AIX 5L continues processing with an accurate copy. After access is restored to the failed physical partition, AIX 5L resynchronizes the contents (data) of the physical partition with the contents (data) of a consistent mirror copy.
The following figure shows a logical volume composed of two logical partitions with three mirrored copies. In the diagram, each logical partition maps to three physical partitions. Each physical partition should be designated to reside on a separate physical volume within a single volume group. This configuration provides the maximum number of alternative paths to the mirror copies and, therefore, the greatest availability.
![]()
The mirrored copies are transparent, meaning that you cannot isolate one of these copies. For example, if a user deletes a file from a logical volume with multiple copies, the deleted file is removed from all copies of the logical volume.
The following configurations increase data availability:
Allocating three copies of a logical partition rather than allocating one or two copies. Allocating the copies of a logical partition on different physical volumes rather than allocating the copies on the same physical volume. Allocating the copies of a logical partition across different physical disk enclosures instead of the same enclosure, if possible. Allocating the copies of a logical partition across different disk adapters. rather than using a single disk adapter. Although using mirrored copies spanning multiple disks (on separate power supplies) together with multiple disk adapters ensures that no disk is a single point of failure for your cluster, these configurations may increase the time for write operations.
Specify the superstrict disk allocation policy for the logical volumes in volume groups for which forced varyon is specified. This configuration:
Guarantees that copies of a logical volume always reside on separate disks Increases the chances that forced varyon will be successful after a failure of one or more disks. If you plan to use forced varyon for the logical volume, apply the superstrict disk allocation policy for disk enclosures in the cluster.
For more information about forced varyon, see the section Using Quorum and Varyon to Increase Data Availability.
Mirroring Journal Logs
Non-concurrent access configurations support journaled filesystems and enhanced journaled filesystems. AIX 5L uses journaling for its filesystems. In general, this means that the internal state of a filesystem at startup (in terms of the block list and free list) is the same state as at shutdown. In practical terms, this means that when AIX 5L starts up, the extent of any file corruption can be no worse than at shutdown.
Each volume group contains a jfslog or jfs2log, which is itself a logical volume. This log typically resides on a different physical disk in the volume group than the journaled filesystem. However, if access to that disk is lost, changes to filesystems after that point are in jeopardy.
To avoid the possibility of that physical disk being a single point of failure, you can specify mirrored copies of each jfslog or jfs2log. Place these copies on separate physical volumes.
Mirroring across Sites
You can set up disks located at two different sites for remote LVM mirroring, using a Storage Area Network (SAN), for example. Cross-site LVM mirroring replicates data between the disk subsystem at each site for disaster recovery.
A SAN is a high-speed network that allows the establishment of direct connections between storage devices and processors (servers). Thus, two or more servers (nodes) located at different sites can access the same physical disks, which can be separated by some distance as well, through the common SAN. These remote disks can be combined into a volume group via the AIX 5L Logical Volume Manager, and this volume group may be imported to the nodes located at different sites. The logical volumes in this volume group can have up to three mirrors. Thus, you can set up at least one mirror at each site. The information stored on this logical volume is kept highly available, and in case of certain failures, the remote mirror at another site will still have the latest information, so the operations can be continued on the other site.
HACMP automatically synchronizes mirrors after a disk or node failure and subsequent reintegration. HACMP handles the automatic mirror synchronization even if one of the disks is in the PVREMOVED or PVMISSING state. The automatic synchronization is not possible for all cases, but you can use C-SPOC to synchronize the data from the surviving mirrors to stale mirrors after a disk or site failure and subsequent reintegration.
See the Administration Guide for information about configuration. Plan the sites and nodes ahead of time, and include this information on the Shared Volume Group/Filesystem Worksheet.
Note: In HACMP/XD, you can also use mirroring in a cluster that spans two sites, using the Geographic Logical Volume Manager (GLVM) mirroring function. For more information on GLVM, see Planning Cluster Sites or the HACMP/XD for GLVM Planning and Administration Guide.
Planning for Disk Access
You can configure disks to have the following types of access:
Enhanced Concurrent Access. The data on the disks is available to all connected nodes concurrently and all the nodes have access to the metadata on the disks. This access mode allows for fast disk takeover, because the volume group can be brought online before the metadata is read. Typically, all volume groups should be configured for enhanced concurrent mode. In HACMP 5.1 and up, enhanced concurrent mode is the default for creating concurrent volume groups. You can also convert your migrated volume groups to enhanced concurrent mode.
Concurrent access configurations do not support journaled filesystems. Concurrent access configurations that use IBM 7131-405 and 7133 SSA serial disk subsystems should use LVM mirroring.
Concurrent access configurations that use IBM TotalStorage DS Series or IBM 2105 Enterprise Storage Servers do not use LVM mirroring; instead, these systems provide their own data redundancy.
Note: See the IBM website for announcements and information about new storage devices.
Non-Concurrent Access. Only one node at a time can access information on the disks. If the resource group containing those disks moves to another node, the new node can then access the disks, read the metadata (information about the current state of the volume groups and other components), and then vary on the volume groups and mount any associated filesystems.
Non-concurrent access configurations typically use journaled filesystems. (In some cases, a database application running in a non-concurrent environment may bypass the journaled filesystem and access the raw logical volume directly.)
Enhanced Concurrent Access
Any disk supported by HACMP for attachment to multiple nodes can be an enhanced concurrent mode volume group, and can be used in either concurrent or non-concurrent environments (as specified by the type of resource group):
Concurrent. An application runs on all active cluster nodes at the same time. To allow such applications to access their data, concurrent volume groups are varied on on all active cluster nodes. The application has the responsibility to ensure consistent data access.
Non-concurrent. An application runs on one node at a time. The volume groups are not concurrently accessed, they are still accessed by only one node at any given time.
When you vary on the volume group in enhanced concurrent mode on all nodes that own the resource group in a cluster, the LVM allows access to the volume group on all nodes. However, it restricts the higher-level connections, such as NFS mounts and JFS mounts, on all nodes, and allows them only on the node that currently owns the volume group in HACMP.
About Enhanced Concurrent Mode
In AIX 5L v.5.2 and up, all concurrent volume groups are created as enhanced concurrent mode volume groups by default. For enhanced concurrent volume groups, the Concurrent Logical Volume Manager (CLVM) coordinates changes between nodes through the Group Services component of the Reliable Scalable Cluster Technology (RSCT) facility in AIX 5L. Group Services protocols flow over the communications links between the cluster nodes.
Enhanced concurrent mode replaces the special facilities provided by concurrent mode for SSA. Also, note that:
SSA concurrent mode is not supported on operating systems with 64-bit kernels. If you are running AIX 5L v.5.2 or greater, you cannot create new SSA concurrent mode volume groups. You can convert these volume groups to enhanced concurrent mode. If you are running AIX 5L v. 5.2, you can continue to use SSA concurrent mode volume groups created on AIX 5L v.5.1. If you are running AIX 5L v.5.3, you must convert all volume groups to enhanced concurrent mode.
Use C-SPOC to convert both SSA and RAID concurrent volume groups to enhanced concurrent mode. For information about converting volume groups to enhanced concurrent mode, see the section Converting Volume Groups to Enhanced Concurrent Mode in Chapter 12: Managing Shared LVM Components in a Concurrent Access Environment in the Administration Guide.
Partitioned Clusters with Enhanced Concurrent Access
Because Group Services protocols flow over the communications links between cluster nodes and not through the disks themselves, take steps to avoid partitioned clusters that include enhanced concurrent mode volume groups:
Use multiple IP networks. Do not make online changes to an enhanced concurrent mode volume group unless all cluster nodes are online. When fast disk takeover is used, the SCSI disk reservation functionality is not used. If the cluster becomes partitioned, nodes in each partition could accidentally vary on the volume group in active state. Because active state varyon of the volume group allows mounting of filesystems and changing physical volumes, this situation can result in different copies of the same volume group. For more information about fast disk takeover and using multiple networks, see the section Using Fast Disk Takeover.
Non-Concurrent Access
Journaled filesystems support only non-concurrent access. The JFS and JFS2 filesystems do not coordinate their access between nodes. As a result, if a JFS or JFS2 filesystem was mounted on two or more nodes simultaneously, the two nodes could allocate the same block to different files.
Using Fast Disk Takeover
In HACMP 5.1 and up, HACMP automatically detects failed volume groups and initiates a fast disk takeover for enhanced concurrent mode volume groups that are included as resources in non-concurrent resource groups. Fast disk takeover requires:
AIX 5L v.5.2 and up HACMP 5.1 and up with the Concurrent Resource Manager component installed on all nodes in the cluster Enhanced concurrent mode volume groups in non-concurrent resource groups. For existing volume groups included in non-concurrent resource groups, convert these volume groups to enhanced concurrent volume groups after upgrading your HACMP software. For information about converting volume groups, see the section Converting Volume Groups to Enhanced Concurrent Mode in Chapter 11: Managing Shared LVM Components in a Concurrent Access Environment in the Administration Guide.
Fast disk takeover is especially useful for fallover of enhanced concurrent volume groups made up of a large number of disks. This disk takeover mechanism is faster than disk takeover used for standard volume groups included in non-concurrent resource groups. During fast disk takeover, HACMP skips the extra processing needed to break the disk reserves, or update and synchronize the LVM information by running lazy update.
Fast disk takeover has been observed to take no more than ten seconds for a volume group with two disks. This time is expected to increase very slowly for larger numbers of disks and volume groups. The actual time observed in any configuration depends on factors outside of HACMP control, such as the processing power of the nodes and the amount of unrelated activity at the time of the fallover. The actual time observed for completion of fallover processing depends on additional factors, such as whether or not a filesystem check is required, and the amount of time needed to restart the application.
Note: Enhanced concurrent mode volume groups are not concurrently accessed, they are still accessed by only one node at any given time. The fast disk takeover mechanism works at the volume group level, and is thus independent of the number of disks used.
Fast Disk Takeover and Active and Passive Varyon
An enhanced concurrent volume group can be made active on a node, or varied on, in two ways:
Active Passive. To enable fast disk takeover, HACMP activates enhanced concurrent volume groups in the active and passive states:
Active Varyon
Active varyon behaves the same as ordinary varyon, and makes the logical volumes available. When an enhanced concurrent volume group is varied on in active state on a node, it allows the following:
Operations on filesystems, such as filesystem mounts Operations on applications Operations on logical volumes, such as creating logical volumes Synchronizing volume groups. Passive Varyon
When an enhanced concurrent volume group is varied on in passive state, the LVM provides the equivalent of disk fencing for the volume group at the LVM level.
Passive state varyon allows only a limited number of read-only operations on the volume group:
LVM read-only access to the volume group’s special file LVM read-only access to the first 4K of all logical volumes that are owned by the volume group. The following operations are not allowed when a volume group is varied on in passive state:
Operations on filesystems, such as filesystems mounting Any operations on logical volumes, such as having logical volumes open Synchronizing volume groups. HACMP and Active and Passive Varyon
HACMP correctly varies on the volume group in active state on the node that owns the resource group, and changes active and passive states appropriately as the state and location of the resource group changes.
Upon cluster startup: On the node that owns the resource group, HACMP activates the volume group in active state. Note that HACMP activates a volume group in active state only on one node at a time. HACMP activates the volume group in passive state on all other nodes in the cluster. Upon fallover: If a node releases a resource group, or, if the resource group is being moved to another node for any other reason, HACMP switches the varyon state for the volume group from active to passive on the node that releases the resource group, and activates the volume group in active state on the node that acquires the resource group. The volume group remains in passive state on all other nodes in the cluster. Upon node reintegration, HACMP does the following: Changes the varyon state of the volume group from active to passive on the node that releases the resource group Varies on the volume group in active state on the joining node Activates his volume group in passive state on all other nodes in the cluster. Note: The switch between active and passive states is necessary to prevent mounting filesystems on more than one node at a time.
Disk Takeover with Breaking Disk Reserves
In HACMP 5.1 and up, processing for regular disk takeover takes place (as opposed to fast disk takeover) in the following cases:
Concurrent Resource Manager as part of HACMP 5.1 and up is not installed on the nodes in the cluster You did not convert the volume groups that are included in non-concurrent resource groups to enhanced concurrent mode. The regular disk takeover processing requires breaking the disk reserves and checking the logical partitions to determine changes made to the volume groups. Also, prior to fallover, HACMP uses lazy update to update the LVM information on cluster nodes. When a lazy update is performed, prior to fallover HACMP processes changes made to the volume groups, and synchronizes the LVM information on cluster nodes.
Using Quorum and Varyon to Increase Data Availability
How you configure quorum and varyon for volume groups can increase the availability of mirrored data.
Using Quorum
Quorum ensures that more than half of the physical disks in a volume group are available. It does not keep track of logical volume mirrors, and is therefore not a useful way to ensure data availability. You can lose quorum when you still have all your data. Conversely, you can lose access to some of your data, and not lose quorum.
Quorum is beneficial for volume groups on RAID arrays, such as the ESS and IBM TotalStorage DS Series. Note that the RAID device provides data availability and recovery from loss of a single disk. Mirroring is typically not used for volume groups contained entirely within a single RAID device. If a volume group is mirrored between RAID devices, forced varyon can bring a volume group online despite loss of one of the RAID devices.
Decide whether to enable or disable quorum for each volume group. The following table shows how quorum affects when volume groups vary on and off:
Quorum checking is enabled by default. You can disable quorum by using the chvg -Qn vgname command, or by using the smit chvg fastpath.
Quorum in Concurrent Access Configurations
Quorum must be enabled for an HACMP concurrent access configuration. Disabling quorum could result in data corruption. Any concurrent access configuration where multiple failures could result in no common shared disk between cluster nodes has the potential for data corruption or inconsistency.
The following figure shows a cluster with two sets of IBM SSA disk subsystems configured for no single point of failure. The logical volumes are mirrored across subsystems and each disk subsystem is connected to each node with separate NICs.
![]()
If multiple failures result in a communications loss between each node and one set of disks in such a way that Node A can access subsystem 1 but not subsystem 2, and Node B can access subsystem 2 but not subsystem 1. Both nodes continue to operate on the same baseline of data from the mirrored copy they can access. However, each node does not see modifications made by the other node to data on disk. As a result, the data becomes inconsistent between nodes.
With quorum protection enabled, the communications failure results in one or both nodes varying off the volume group. Although an application does not have access to data on the volume group that is varied off, data consistency is preserved.
Selective Fallover Triggered by Loss of Quorum
HACMP selectively provides recovery for non-concurrent resource groups (with the startup policy not Online on All Available Nodes) that are affected by failures of specific resources. HACMP 4.5 and up automatically reacts to a “loss of quorum” LVM_SA_QUORCLOSE error associated with a volume group going offline on a cluster node. In response to this error, a non-concurrent resource group goes offline on the node where the error occurred.
If the AIX 5L Logical Volume Manager takes a volume group in the resource group offline due to a loss of quorum for the volume group on the node, HACMP selectively moves the resource group to another node. You can change this default behavior by customizing resource recovery to use a notify method instead of fallover. For more information, see Chapter 4: Configuring HACMP Cluster Topology and Resources (Extended) in the Administration Guide.
Note: HACMP launches selective fallover and moves the affected resource group only in the case of the LVM_SA_QUORCLOSE error. This error occurs if you use mirrored volume groups with quorum enabled. However, other types of “volume group failure” errors could occur. HACMP does not react to any other type of volume group errors automatically. In these cases, you still need to configure customized error notification methods, or use AIX 5L Automatic Error Notification methods to react to volume group failures.
For more information about LVM_SA_QUORCLOSE errors, see the section on Error Notification Method Used for Volume Group Loss in the Installation Guide.
For more information about selective fallover triggered by loss of quorum for a volume group on a node, see the section Selective Fallover for Handling Resource Groups in the Appendix B: Resource Group Behavior During Cluster Events in the Administration Guide.
Using Forced Varyon
HACMP 5.1 and up provides a forced varyon function to use in conjunction with AIX 5L Automatic Error Notification methods. The forced varyon function enables you to have the highest possible data availability. Forcing a varyon of a volume group lets you keep a volume group online as long as there is one valid copy of the data available. Use a forced varyon only for volume groups that have mirrored logical volumes.
You can use SMIT to force a varyon of a volume group on a node if the normal varyon command fails on that volume group due to a lack of quorum but with one valid copy of the data available. Using SMIT to force a varyon is useful for local disaster recovery—when data is mirrored between two disk enclosures, and one of the disk enclosures becomes unavailable.
For more information about the circumstances under which you can forcefully activate a volume group in HACMP safely, see the section Forcing a Varyon of a Volume Group in Chapter 14: Managing Resource Groups in a Cluster in the Administration Guide.
Note: You can specify a forced varyon attribute for volume groups on SSA or SCSI disks that use LVM mirroring, and for volume groups that are mirrored between separate RAID or ESS devices.
If you want to force the volume group to vary on when disks are unavailable, use varyonvg -f, which will force the volume group to vary on, whether or not there are copies of your data. You can specify forced varyon in SMIT for volume groups in a resource group.
Forced Varyon and Cluster Partitioning
If you have enabled forced varyon in HACMP, ensure that a heartbeating network exists. A heartbeating network ensures that each node always has a communication path to the other nodes—even if a network fails. This prevents your cluster from becoming partitioned. Otherwise, a network failure may cause nodes to attempt to take over resource groups that are still active on other nodes. In this situation, if you have set a forced varyon setting, you may experience data loss or divergence.
Other Ways to Force a Varyon
To achieve a forced varyon of a volume group, you can continue using methods that existed before HACMP 5.1 by using either:
Pre- or post- event scripts Event recovery routines to respond to failure of the activation and acquisition of raw physical volumes and volume groups on a node. Using HACMP forced varyon eliminates the need for a quorum buster disk, which was added to the cluster to avoid the problems associated with the loss of quorum. A quorum buster disk was a single additional disk added to the volume group, on a separate power and field replaceable unit (FRU) from either of the mirrors of the data. This disk contained no data, it simply served as a quorum buster so that if one enclosure failed, or connectivity to it was lost, quorum was maintained and the data remained available on the other disk enclosure.
Using NFS with HACMP
The HACMP software provides availability enhancements to NFS handling:
Reliable NFS server capability that allows a backup processor to recover current NFS activity should the primary NFS server fail, preserving the locks on NFS filesystems and the duplicate request cache. The locking functionality is available only for two-node clusters. Ability to specify a network for NFS mounting. Ability to define NFS exports and mounts at the directory level. Ability to specify export options for NFS-exported directories and filesystems. For NFS to work as expected on an HACMP cluster, there are specific configuration requirements, so you can plan accordingly for:
Creating shared volume groups Exporting NFS filesystems NFS mounting and fallover. The HACMP scripts address default NFS behavior. You may need to modify the scripts to handle your particular configuration. The following sections provide suggestions for planning for a variety of situations.
You can configure NFS in all resource groups that behave as non-concurrent; that is, they do not have an Online on All Available Nodes startup policy.
For information about using NFS Version 4, see Chapter 2 of the Installation Guide.
Relinquishing Control over NFS filesystems in an HACMP Cluster
Once you configure resource groups that contain NFS filesystems, you relinquish control over NFS filesystems to HACMP.
Once NFS filesystems become part of resource groups that belong to an active HACMP cluster, HACMP takes care of cross-mounting and unmounting the filesystems, during cluster events (such as fallover of a resource group containing the filesystem to another node in the cluster).
If for some reason you stop the cluster services and must manage the NFS filesystems manually, the filesystems must be unmounted before you restart the cluster services. This enables management of NFS filesystems by HACMP once the nodes join the cluster.
Reliable NFS Server Capability
An HACMP two-node cluster can take advantage of AIX 5L extensions to the standard NFS functionality that enable it to handle duplicate requests correctly and restore lock state during NFS server fallover and reintegration.
Although HACMP has no dependency on hostname, the hostname must be able to be resolved to an IP address that is always present on the node and always active on an interface. This means that it cannot be a service IP address that may move to another node, for example. To ensure that the IP address that is going to be used by NFS always resides on the node you can:
Use an IP address that is associated with a persistent label For an IPAT via Aliases configuration, use the IP address used at boot time Use an IP address that resides on an interface that is not controlled by HACMP. Shared Volume Groups
When creating shared volume groups, typically you can leave the Major Number field blank and let the system provide a default. However, NFS uses volume group major numbers to help uniquely identify exported filesystems. Therefore, all nodes to be included in a resource group containing an NFS-exported filesystem must have the same major number for the volume group on which the filesystem resides.
In the event of node failure, NFS clients attached to an HACMP cluster operate the same way as when a standard NFS server fails and reboots. That is, accesses to the filesystems hang, then recover when the filesystems become available again. However, if the major numbers are not the same, when another cluster node takes over the filesystem and re-exports the filesystem, the client application will not recover, since the filesystem exported by the node will appear to be different from the one exported by the failed node.
NFS Exporting Filesystems and Directories
The process of NFS-exporting filesystems and directories in HACMP is different from that in AIX 5L. Keep in mind the following points when planning for NFS-exporting in HACMP:
Filesystems and directories to NFS-export: In AIX 5L, you specify filesystems and directories to NFS-export by using the smit mknfsexp command (which creates the /etc/exports file). In HACMP, you specify filesystems and directories to NFS-export by including them in a resource group in HACMP. For more information, see the section NFS Cross-Mounting in HACMP.
Export options for NFS exported filesystems and directories: If you want to specify special options for NFS-exporting in HACMP, you can create a /usr/es/sbin/cluster/etc/exports file. This file has the same format as the regular AIX 5L /etc/exports file.
Note: Using this alternate exports file is optional. HACMP checks the /usr/es/sbin/cluster/etc/exports file when NFS-exporting a filesystem or directory. If there is an entry for the filesystem or directory in this file, HACMP uses the options listed. If the filesystem or directory for NFS-export is not listed in the file, or, if the alternate file does not exist, the filesystem or directory will be NFS-exported with the default option of root access for all cluster nodes.
A resource group that specifies filesystems to export: In SMIT, set the Filesystems Mounted before IP Configured field for the resource group to true. This ensures that IP address takeover is performed after exporting the filesystems. If the IP addresses were managed first, the NFS server would reject client requests until the filesystems had been exported.
NFS and Fallover
For HACMP and NFS to work properly together:
NFS requires configuring resource groups with IP Address Takeover (IP Replacement or IP Aliases). IPAT via IP Aliases with NFS has specific requirements. For information about these requirements, see the section NFS Cross-Mounting and IP Labels. To ensure the best NFS performance, NFS filesystems used by HACMP should include the entry vers = <version number> in the options field in the /etc/filesystems file.
NFS Cross-Mounting in HACMP
NFS cross-mounting is an HACMP-specific NFS configuration in which one node is both the NFS server for one filesystem, and the NFS client for another filesystem, while a second node is the NFS client for the first filesystem, and the NFS server for the second filesystem. Essentially, each node is part of a “mutual takeover” or “active-active” cluster configuration, both providing and mounting an NFS filesystem.
By default, resource groups that contain NFS exported filesystems automatically cross-mount these filesystems (if both export and import are configured):
On the node currently hosting the resource group, all NFS filesystems in the group are NFS exported. Each node that may host this resource group NFS mounts all the NFS filesystems in the resource group. This lets applications access the NFS filesystems on any node that is part of the resource group.
With IP address takeover configured for the resource group, on fallover:
The NFS filesystem is locally mounted by the takeover node and re-exported. All other nodes in the resource group maintain their NFS mounts. Node-to-Node NFS Cross-Mounting Example
In the following figure:
NodeA currently hosts a non-concurrent resource group, RG1, which includes: /fs1 as an NFS exported filesystem service1 as an service IP label NodeB currently hosts a non-concurrent resource group, RG2, which includes: /fs2 as an NFS exported filesystem service2 as an service IP label Both resource groups contain both nodes as possible owners of the resource groups.
The two resource groups would be defined in SMIT as follows:
In this scenario:
NodeA locally mounts and exports /fs1, then over-mounts on /mnt1. NodeB NFS-mounts /fs1, on /mnt1 from NodeA. Setting up a resource group like this ensures the expected default node-to-node NFS behavior described at the beginning of this section.
When NodeA fails, NodeB closes any open files in NodeA:/fs1, unmounts it, mounts it locally, and re-exports it to waiting clients.
After takeover, NodeB has:
/fs2 locally mounted /fs2 NFS-exported /fs1 locally mounted /fs1 NFS-exported service1:/fs1 NFS mounted over /mnt1 service2:/fs2 NFS mounted over /mnt2b On reintegration, /fs1 is passed back to NodeA, locally mounted and exported. NodeB mounts it over NFS again.
NFS Cross-Mounting and IP Labels
To enable NFS cross-mounting, each cluster node may act as an NFS client. Each of these nodes must have a valid route to the service IP label of the NFS server node. That is, to enable NFS cross-mounting, an IP label must exist on the client nodes, and this IP label must be configured on the same subnet as the service IP label of the NFS server node.
If the NFS client nodes have service IP labels on the same network, this is not an issue. However, in certain cluster configurations, you need to create a valid route.
The following sections describe these cluster configurations and also include two ways to configure valid routes.
Cluster Configurations That Require Creating a Valid Route
The following cluster configuration may not have a route to the IP label on the NFS server node:
If heartbeating over IP Aliases is not configured, non-service interfaces must be on a different subnet than service interfaces. This creates a situation where the NFS client nodes may not have an interface configured on the subnet used to NFS export the filesystems.
For non-concurrent resource groups with IPAT via IP Aliases to support NFS cross-mounting, you must create a route between the NFS client nodes and the node that is exporting the filesystems. The following section provides options for creating the valid route.
Ways to Create a Route to the NFS Server
The easiest way to ensure access to the NFS server is to have an IP label on the client node that is on the same subnet as the service IP label of the NFS server node.
To create a valid route between the NFS client node and the node that is exporting the filesystem, you can configure either of the following:
A separate NIC with an IP label configured on the service IP network and subnet or
A persistent node IP label on the service IP network and subnet. Note: If the client node has a non-service IP label on the service IP network, configuring heartbeat over IP aliases allows the non-service IP label to be on the same subnet as the service IP label. See section Heartbeating over IP Aliases in Chapter 3: Planning Cluster Network Connectivity.
Be aware that these solutions do not provide automatic root permissions to the filesystems because of the export options for NFS filesystems that are set in HACMP by default.
To enable root level access to NFS mounted filesystems on the client node, add all of the node’s IP labels or addresses to the root = option in the cluster exports file: /usr/es/sbin/cluster/etc/exports. You can do this on one node, synchronizing the cluster resources propagates this information to the other cluster nodes. For more information on the /usr/es/sbin/cluster/etc/exports file, see the section NFS Exporting Filesystems and Directories.
Resource Group Takeover with Cross-Mounted NFS Filesystems
This section describes how to set up non-concurrent resource groups with cross-mounted NFS filesystems so that NFS filesystems are handled correctly during takeover and reintegration. In addition, non-concurrent resource groups support automatic NFS mounting across servers during fallover.
Setting Up NFS Mount Point Different from Local Mount Point
HACMP handles NFS mounting in non-concurrent resource groups as follows:
The node that currently owns the resource group mounts the filesystem over the filesystem’s local mount point, and this node NFS exports the filesystem. All the nodes in the resource group (including the current owner of the group) NFS mount the filesystem over a different mount point. Therefore, the owner of the group has the filesystem mounted twice—once as a local mount and once as an NFS mount.
Since IPAT is used in resource groups that have NFS mounted filesystems, the nodes will not unmount and remount NFS filesystems during a fallover. When the resource group falls over to a new node, the acquiring node locally mounts the filesystem and NFS exports it. (The NFS mounted filesystem is temporarily unavailable to cluster nodes during fallover.) As soon as the new node acquires the IPAT label, access to the NFS filesystem is restored.
All applications must reference the filesystem through the NFS mount. If the applications used must always reference the filesystem by the same mount point name, you can change the mount point for the local filesystem mount (for example, change it to mount point_local and use the previous local mount point as the new NFS mount point).
Default NFS Mount Options for HACMP
The default options used by HACMP when performing NFS mounts are soft, intr.
To set hard mounts or any other options on the NFS mounts:
1. Enter smit mknfsmnt
2. In the MOUNT now, add entry to /etc/filesystems or both? field, select the filesystems option
3. In the /etc/filesystems entry will mount the directory on system RESTART field, accept the default value of no.
This procedure adds the options you have chosen to the /etc/filesystems entry created. The HACMP scripts then read this entry to pick up any options you may have selected.
Creating and Configuring NFS Mount Points on Clients
An NFS mount point is required to mount a filesystem via NFS. In a non-concurrent resource group all the nodes in the resource group NFS mount the filesystem. You create an NFS mount point on each node in the resource group. The NFS mount point must be outside the directory tree of the local mount point.
Once the NFS mount point is created on all nodes in the resource group, configure the NFS Filesystem to NFS Mount attribute for the resource group.
To create NFS mount points and to configure the resource group for the NFS mount:
1. On each node in the resource group, create an NFS mount point by executing the following command:
mkdir /mountpointwhere mountpoint is the name of the local NFS mountpoint over which the remote filesystem is mounted.
2. In the Change/Show Resources and Attributes for a Resource Group SMIT panel, the Filesystem to NFS Mount field must specify both mount points.
Specify the nfs mount point, then the local mount point, separating the two with a semicolon. For example:
/nfspoint1;/local1 /nfspoint2;/local23. (Optional) If there are nested mount points, nest the NFS mount points in the same manner as the local mount points so that they match up properly.
4. (Optional) When cross-mounting NFS filesystems, set the Filesystems Mounted before IP Configured field in SMIT for the resource group to true.
Completing the Shared LVM Components Worksheets
After you identify the physical and logical storage components for your cluster, complete all of the appropriate worksheets from the following list:
Non-Shared Volume Group Worksheet Shared Volume Group/filesystem Worksheet NFS-Exported Filesystem/Directory Worksheet. Appendix A: Planning Worksheets contains the worksheets referenced in the following procedures.
Refer to the completed worksheets when you define the shared LVM components following the instructions in the chapter on Defining Shared LVM Components in the Installation Guide, and the cluster resource configuration following the instructions in chapters 2–4 of the Administration Guide.
Planning for LVM Components
Consider the following guidelines as you plan shared LVM components:
In general, planning for logical volumes concerns the availability of your data. However, creating logical volume copies is not a substitute for regularly scheduled backups. Backups protect against loss of data regardless of cause. Logical volume copies protect against loss of data from physical access failure. All operating system files should reside in the root volume group (rootvg) and all user data should reside outside that group. This makes it more manageable to update or reinstall the operating system and to back up data. Volume groups that contain at least three physical volumes provide the maximum availability when implementing mirroring. If you plan to specify the Use Forced Varyon of Volume Groups, if Necessary attribute in SMIT for the volume groups, use the super strict disk allocation policy for mirrored physical volumes. When using copies, each physical volume containing a copy should get its power from a separate source. If one power source fails, separate power sources maintain the no “single point of failure” objective. Consider quorum issues when laying out a volume group. With quorum enabled, a two-disk volume group puts you at risk for losing quorum and data access. Either build three-disk volume groups or disable quorum. Plan for NFS mounted filesystems and directories. Keep in mind the cluster configurations that you have designed. A node whose resources are not taken over should not own critical volume groups. Completing the Non-Shared Volume Group Worksheet
For each node in the cluster, complete a Non-Shared Volume Group Worksheet for each volume group residing on a local (non-shared) disk:
1. Fill in the node name in the Node Name field.
2. Record the name of the volume group in the Volume Group Name field.
3. List the device names of the physical volumes comprising the volume group in the Physical Volumes field.
In the remaining sections of the worksheet, enter the following information for each logical volume in the volume group. Use additional sheets if necessary.
4. Enter the name of the logical volume in the Logical Volume Name field.
5. If you are using LVM mirroring, indicate the number of logical partition copies (mirrors) in the Number Of Copies Of Logical Partition field. You can specify one or two copies (in addition to the original logical partition, for a total of three).
6. If you are using LVM mirroring, specify whether each copy will be on a separate physical volume in the On Separate Physical Volumes? field.
7. Record the full path mount point of the filesystem in the filesystem Mount Point field.
8. Record the size of the filesystem in 512-byte blocks in the Size field.
Completing the Shared Volume Group and Filesystem Worksheet
For each volume group that will reside on the shared disks, complete a separate Shared Volume Group/Filesystem Worksheet for each volume group residing on a local (non-shared) disk.
To complete the Group and Filesystem Worksheet:
1. Enter the name of each node in the cluster in the Node Names field. You determined the node names in Chapter 2: Initial Cluster Planning. Note that all nodes must participate in a concurrent resource group, if disk fencing is enabled. If disk fencing is not enabled, you can include a subset of nodes in the group.
2. Assign a name to the shared volume group and record it in the Shared Volume Group Name field. The name of the shared volume group must be unique within the cluster and distinct from the service IP label/address and resource group names; it should relate to the application it serves, as well as to any corresponding device, such as websphere_service_address.
3. Leave the Major Number field blank for now. You will enter a value in this field when you address NFS issues in the following Chapter 6: Planning Resource Groups.
4. Record the name of the log logical volume (jfslog or jfs2log) in the Log Logical Volume Name field.
5. Pencil-in the planned physical volumes in the Physical Volumes field. You will enter exact values for this field after you have installed the disks following the instructions in the chapter on Configuring Installed Hardware in the Installation Guide.
Physical volumes are known in the AIX 5L operating system by sequential hdisk numbers assigned when the system boots. For example, /dev/hdisk0 identifies the first physical volume in the system, /dev/hdisk1 identifies the second physical volume in the system, and so on.
When sharing a disk in an HACMP cluster, the nodes sharing the disk each assign an hdisk number to that disk. These hdisk numbers may not match, but refer to the same physical volume. For example, each node may have a different number of internal disks, or the disks may have changed since AIX 5L was installed.
The HACMP software does not require that the hdisk numbers match across nodes (although your system is easier to manage if they do). In situations where the hdisk numbers must differ, be sure that you understand each node’s view of the shared disks. Draw a diagram that indicates the hdisk numbers that each node assigns to the shared disks and record these numbers on the appropriate volume group worksheets in Appendix A: Planning Worksheets. When in doubt, use the hdisk’s PVID to identify it on a shared bus.
In the remaining sections of the worksheet, enter the following information for each logical volume in the volume group. Use additional sheets as necessary.
6. Assign a name to the logical volume and record it in the Logical Volume Name field.
A shared logical volume must have a unique name within an HACMP cluster. By default, AIX 5L assigns a name to any logical volume that is created as part of a journaled filesystem (for example, lv01). If you rely on the system generated logical volume name, this name could cause the import to fail when you attempt to import the volume group containing the logical volume into another node’s ODM structure, especially if that volume group already exists. The chapter on Defining Shared LVM Components in the Installation Guide describes how to change the name of a logical volume.
7. If you are using LVM mirroring, indicate the number of logical partition copies (mirrors) in the Number Of Copies of Logical Partition field. You can specify that you want one or two copies (in addition to the original logical partition, for a total of three).
8. If you are using LVM mirroring, specify whether each copy will be on a separate physical volume in the On Separate Physical Volumes? field. If you are planning to use a forced varyon option for the volume groups, make sure that each copy will be mirrored on a separate physical volume.
9. Record the full-path mount point of the filesystem in the filesystem Mount Point field.
10. Record the size of the filesystem in 512-byte blocks in the Size field.
11. Record whether this volume group will have cross-site LVM mirroring enabled. When a volume group is enabled for cross-site LVM mirroring, cluster verification ensures that the volume group and logical volume structure is consistent and there is at least one mirror of each logical volume at each site.
The volume group must also be configured as a resource in a resource group. Cross-site LVM mirroring supports two-site clusters where LVM mirroring through a Storage Area Network (SAN) replicates data between disk subsystems at geographically separated sites.
Completing the NFS-Exported Filesystem Worksheet
Print the NFS-Exported Filesystem or Directory Worksheet (Non-Concurrent Access) from Appendix A: Planning Worksheets, and fill it out using the information in this section. Print one copy for each application you want to keep highly available in the cluster.
For filesystems or directories to be NFS-exported from a node, complete an NFS-Exported Filesystem or Directory Worksheet. The information you provide will be used to update the /usr/es/sbin/cluster/etc/exports file.
To complete a NFS-Exported Filesystem or Directory Worksheet:
1. Record the name of the resource group from which the filesystems or directories will be NFS exported in the Resource Group field.
2. In the Network for NFS Mount field record the preferred network to NFS mount the filesystems or directories.
3. In the Filesystem Mounted Before IP Configured field, write true if you want the takeover of filesystems to occur before the takeover of IP address(es). Specify false for the IP address(es) to be taken over first.
4. Record the full pathname of the filesystem or directory to be exported in the Exported Directory field.
5. (Optional) Record the export options you want to assign the directories, filesystems, or both to be NFS exported. See the exports man page for a full list of export options.
6. Repeat steps 4 and 5 for each filesystem or directory to be exported.
Completing Concurrent Access Worksheets
Complete your concurrent access worksheets as explained in the following sections:
Completing the Non-Shared Volume Group Worksheet (Concurrent Access)
For each node, complete a Non-Shared Volume Group Worksheet (Concurrent Access) for each volume group that resides on a local (non-shared) disk:
1. Enter the node name in the Node Name field.
2. Record the name of the volume group in the Volume Group Name field.
3. Enter the name of the logical volume in the Logical Volume Name field.
4. List the device names of the physical volumes that comprise the volume group in the Physical Volumes field.
In the remaining sections of the worksheet, enter the following information for each logical volume in the volume group. Use additional sheets if necessary.
5. Enter the name of the logical volume in the Logical Volume Name field.
6. If you are using LVM mirroring, indicate the number of logical partition copies (mirrors) in the Number Of Copies Of Logical Partition field. You can specify one or two copies (in addition to the original logical volume, for a total of three).
7. If you are using LVM mirroring, specify whether each copy will be on a separate physical volume in the On Separate Physical Volumes? field. Specifying this option is especially important if you plan to force a varyon of volume groups, if a normal varyon operation fails due to a lack of quorum.
8. Record the full path mount point of the filesystem in the filesystem Mount Point field.
9. Record the size of the filesystem in 512-byte blocks in the Size field.
Completing the Shared Volume Group Worksheet (Concurrent Access)
Complete a separate Shared Volume Group and Filesystem Worksheet for each volume group that will reside on the shared disks.
If you plan to create concurrent volume groups on SSA disk subsystem, assign unique non-zero node numbers with ssar on each cluster node.
If you specify the use of SSA disk fencing in your concurrent resource group, HACMP verifies that all nodes are included in the resource group and assigns the node numbers when you synchronize the resources.
If you do not specify the use of SSA disk fencing in your concurrent resource group, assign the node numbers with the following command:
where x is the number to assign to that node. Then reboot the system.
To complete a Shared Volume Group and Filesystem Worksheet (Concurrent Access):
1. Enter the name of each node in the cluster in the Node Names field.
2. Record the name of the shared volume group in the Shared Volume Group Name field.
3. Pencil in the planned physical volumes in the Physical Volumes field. You will enter exact values for this field after you have installed the disks following the instructions in the chapter Installing HACMP on Server Nodes in the Installation Guide.
In the remaining sections of the worksheet, enter the following information for each logical volume in the volume group. Use additional sheets as necessary.
4. Enter the name of the logical volume in the Logical Volume Name field.
5. Identify the number of logical partition copies (mirrors) in the Number Of Copies Of Logical Partition field. You can specify one or two copies (in addition to the original logical partition, for a total of three).
6. Specify whether each copy will be on a separate physical volume in the On Separate Physical Volumes? field. Specifying this option is especially important if you plan to force a varyon of volume groups, if a normal varyon operation fails due to a lack of quorum.
Adding LVM Information to the Cluster Diagram
Add the LVM information to the cluster diagram, including volume group and logical volume definitions. Include the site information if you are using cross-site LVM mirroring.
Where You Go from Here
You have now planned the shared LVM components for your cluster. Use this information when you define the volume groups, logical volumes, and filesystems during the install.
In the next step of the planning process, you address issues relating to planning for your resource groups. Chapter 6: Planning Resource Groups describes this step of the planning process.
![]() ![]() ![]() |