PreviousNextIndex

Chapter 6: Planning Resource Groups


This chapter describes how to plan resource groups within an HACMP cluster. This chapter contains the following sections:

  • Prerequisites
  • Overview
  • General Rules for Resources and Resource Groups
  • Two Types of Resource Groups: Concurrent and Non-Concurrent
  • Resource Group Policies for Startup, Fallover and Fallback
  • Resource Group Attributes
  • Moving Resource Groups to Another Node
  • Planning for Replicated Resources
  • Planning Cluster Networks and Resource Groups
  • Planning Parallel or Serial Order for Processing Resource Groups
  • Planning Resource Groups in Clusters with Sites
  • Planning for Workload Manager
  • Completing the Resource Group Worksheet
  • Where You Go from Here.
  • Prerequisites

    By now, you should have completed the planning steps in the previous chapters:

  • Chapter 2: Initial Cluster Planning
  • Chapter 3: Planning Cluster Network Connectivity
  • Chapter 4: Planning Shared Disk and Tape Devices
  • Chapter 5: Planning Shared LVM Components.
  • Overview

    HACMP organizes resources into resource groups. Each resource group is handled as a unit that contains shared resources such as IP labels, applications, filesystems and volume groups. You define the policies for each resource group that define when and how it will be acquired or released.

    In Chapter 2: Initial Cluster Planning, you made preliminary choices about the resource group policies and the takeover priority for each node in the resource group nodelists. In this chapter you do the following:

  • Identify the individual resources that constitute each resource group.
  • For each resource group, identify which type of group it is: concurrent or non-concurrent.
  • Define the participating nodelist for the resource groups. The nodelist consists of the nodes assigned to participate in the takeover of a given resource group.
  • Identify the resource group startup, fallover, and fallback policy.
  • Identify applications and their resource groups for which you want to set up location dependencies, parent/child dependencies, or both.
  • Identify the inter-site management policies of the resource groups. Is the group site-aware, are there replicated resources to consider.
  • Identify other attributes and runtime policies to refine resource group behavior.
  • Note: For information about how resource groups policies and attributes from versions of HACMP prior to 5.2 are mapped to the resource group policies in the current version, see the chapter on Upgrading an HACMP Cluster in the Installation Guide.

    The following definitions are used in this section:

  • Participating nodelist. (A list of nodes that can host a particular resource group, as defined in the Participating Node Names for a resource group in SMIT). Be aware that the combination of the different resource group policies and the current cluster conditions also affect the resource group placement on the nodes in the cluster.
  • Home node (or the highest priority node for this resource group). The first node that is listed in the participating nodelist for any non-concurrent resource groups.
  • HACMP resource groups support NFS filesystems. For information about using resource groups with NFS, see the section NFS Cross-Mounting and IP Labels in Chapter 5: Planning Shared LVM Components.

    General Rules for Resources and Resource Groups

    The following rules and restrictions apply to resources and resource groups:

  • In order for HACMP to keep a cluster resource highly available, it must be part of a resource group. If you want a resource to be kept separate, define a group for that resource alone. A resource group may have one or more resources defined.
  • A resource may not be included in more than one resource group.
  • The components of a resource group must be unique. Put the application along with the resources it requires in the same resource group.
  • The service IP labels, volume groups and resource group names must be both unique within the cluster and distinct from each other. It is recommended that he name of a resource should relate to the application it serves, as well as to any corresponding device, such as websphere_service_address.
  • If you include the same node in participating nodelists for more than one resource group, make sure that the node has the memory, network interfaces, etc. necessary to manage all resource groups simultaneously.
  • Two Types of Resource Groups: Concurrent and Non-Concurrent

    To categorize and describe resource group behavior, we first divide the resource groups into two types: concurrent and non-concurrent:

    Concurrent Resource Groups

    A concurrent resource group may be online on multiple nodes. All nodes in the nodelist of the resource group acquire that resource group when they join the cluster. There are no priorities among nodes. Concurrent resource groups are supported in clusters with up to 32 nodes.

    The only resources included in a concurrent resource group are volume groups with raw logical volumes, raw disks, and application servers that use the disks. The device on which these logical storage entities are defined must support concurrent access.

    Concurrent resource groups have the startup policy Online on All Available Nodes and do not fallover or fallback from one node to another.

    Non-Concurrent Resource Groups

    Non-concurrent resource groups may not be online on multiple nodes. You can define a variety of startup, fallover, and fallback policies for these resource groups.

    You can fine tune the non-concurrent resource group behavior for node preferences during a node startup, resource group fallover to another node in the case of a node failure, or when the resource group falls back to the reintegrating node. See Resource Group Policies for Startup, Fallover and Fallback for more information.

    Resource Group Policies for Startup, Fallover and Fallback

    Resource group behaviors are separated into three kinds of node policies:
  • Startup policy defines on which node the resource group will be activated when a node joins the cluster and the resource group is not active on any node.
  • Fallover policy defines to which node the resource group will fall over when the resource group must leave the node where it is currently online due to a failure condition (or if you stop the cluster services on a node using the fallover option).
  • Fallback policy defines to which node the resource group will fall back when a node joins and the resource group is already active on another node.
  • HACMP allows you to configure only valid combinations of startup, fallover, and fallback behaviors for resource groups. The following table summarizes the basic startup, fallover, and fallback behaviors you can configure for resource groups in HACMP 5.4:

    Startup Behavior
    Fallover Behavior
    Fallback Behavior
    Online only on home node (first node in the nodelist)
    • Fallover to next priority node in the list
    • or

    • Fallover using Dynamic Node Priority
    • Never fall back
    • or

    • Fall back to higher priority node in the list
    Online using node distribution policy
    • Fallover to next priority node in the list
    • or

    • Fallover using Dynamic Node Priority
    Never fall back
    Online on first available node

    Any of these:

    • Fallover to next priority node in the list
    • Fallover using Dynamic Node Priority
    • Bring offline (on error node only)
    • Never fall back
    • or

    • Fall back to higher priority node in the list
    Online on all available nodes
    Bring offline (on error node only)
    Never fall back

    In addition to the node policies described in the previous table, other issues may determine the resource groups that a node acquires. For more information about the event processing logic for resource groups, see Chapter 7: Planning for Cluster Events.

    Resource Group Attributes

    This section provides an overview of the following resource group attributes that you can use to fine tune the startup, fallover, and fallback policies of resource groups:

    How Resource Group Attributes Relate to Startup, Fallover, and Fallback. Each attribute affects resource group startup, resource group fallover to another node in the case of a node failure, or resource group fallback to the reintegrating node

  • Settling Time for Startup. You can modify a resource group’s startup behavior by specifying a settling time for a resource group that is currently offline. With a settling time specified, you can avoid having a resource group activated on the first available node; a higher priority node for the resource group may join the cluster during this time period.
  • Node Distribution Policy. You can configure a startup behavior of a resource group to use the node distribution policy during startup. This policy ensures that only one resource group with this policy enabled is acquired on a node during startup.
  • Dynamic Node Priority Policy. You can configure a resource group’s fallover behavior to use dynamic node priority. It allows you to use a predefined RSCT resource variable such as “lowest CPU load” to select the takeover node.
  • Delayed Fallback Timer. You can configure a resource group’s fallback behavior to occur at one of the predefined recurring times: daily, weekly, monthly and yearly, or on a specific date and time, by specifying and assigning a delayed fallback timer.
  • Parent and Child Dependent Resource Groups. Related applications in different resource groups are configured to be processed in logical order.
  • Resource Group Location Dependencies. Certain applications in different resource groups stay online together on a node or on a site, or stay online on different nodes.
  • How Resource Group Attributes Relate to Startup, Fallover, and Fallback

    The following table summarizes which resource group startup, fallover or fallback policies are affected by a given attribute or run-time policy..

    Attribute
    Startup
    Policy
    Fallover
    Policy
    Fallback
    Policy
    Settling Time
     
     
     
    Node Distribution Policy
     
     
     
    Dynamic Node Priority
     
     
     
    Delayed Fallback Timer
     
     
     
    Resource Groups Parent/Child Dependency
     
     
     
    Resource Groups Location Dependency
     
     
     

    See Parent and Child Dependent Resource Groups and Resource Group Location Dependencies for guidelines.

    Settling Time for Startup

    Settling Time lets the Cluster Manager wait for a specified amount of time before activating a resource group. Use this attribute to ensure that a resource group does not bounce among nodes, as nodes with increasing priority for the resource group are brought online.

    If the node that is starting is the first node in the nodelist for this resource group, the settling time period is skipped and HACMP immediately attempts to acquire the resource group on this node.

    The settling time has the following characteristics:

  • Affects only those resource groups that are currently offline, and for which you have specified the startup policy to be Online on First Available Node. You configure one settling time for all such resource groups.
  • Activates when the first node that can acquire the resource group joins the cluster, unless this is the first node in the nodelist (then the settling time is ignored and the group is acquired).
  • If the first node that joins the cluster and can potentially acquire the resource group fails, this condition can either cancel the settling time, or reset it.
  • Delays the activation of the group during node_up events, in case higher priority nodes join the cluster.
  • If a settling time period is specified for a resource group and a resource group is currently in the ERROR state, the Cluster Manager waits for the settling time period before attempting to bring the resource group online during a node_up event.
  • Node Reintegration with Settling Time Configured

    In general, when a node joins the cluster, it can acquire resource groups. The following list describes the role of the settling time in this process:

  • If the node is the highest priority node for a specific resource group, the node immediately acquires that resource group and the settling time is ignored. (This is only one circumstance under which HACMP ignores the setting).
  • If the node is able to acquire some resource groups, but is not the highest priority node for those groups, the resource groups do not get acquired on that node. Instead, they wait during the settling time interval to see whether a higher priority node joins the cluster.
  • When the settling time interval expires, HACMP moves the resource group to the highest priority node which is currently available and which can take the resource group. If HACMP does not find appropriate nodes, the resource group remains offline.

    See the Administration Guide for details on configuring settling time.

    Node Distribution Policy

    You can use node distribution policy for cluster startup, to ensure that HACMP activates only one resource group with this policy enabled on each node. This policy helps you distribute your CPU-intensive applications on different nodes.

    If two resource groups with this policy enabled are offline at the time when a particular node joins, only one of the two resource groups is acquired on a node. HACMP gives preference to the resource group that has fewer nodes in the nodelist and then sorts the list of resource groups alphabetically.

    Note: If one of the resource groups is a parent resource group (has a child resource group), HACMP gives preference to the parent resource group and it is activated on a node.
    Note: To ensure that your resource groups are distributed not only at startup but also for recovery events (fallover and fallback), use location dependencies. See Resource Group Dependencies.

    Dynamic Node Priority Policy

    Setting a dynamic node priority policy allows you to use a predefined RMC resource variable such as “lowest CPU load” to select the takeover node. With a dynamic priority policy enabled, the order of the takeover nodelist is determined by the state of the cluster at the time of the event, as measured by the selected RMC resource variable. You can set different policies for different groups or the same policy for several groups.

    If you decide to define dynamic node priority policies using RMC resource variables to determine the fallover node for a resource group, consider the following points:

  • Dynamic node priority policy is most useful in a cluster where all the nodes have equal processing power and memory
  • Dynamic node priority policy is irrelevant for clusters of fewer than three nodes
  • Dynamic node priority policy is irrelevant for concurrent resource groups.
  • Remember that selecting a takeover node also depends on such conditions as the availability of a network interface on that node.

    Delayed Fallback Timer

    You can use a Delayed Fallback Timer to set the time for a resource group to fall back to a higher priority node. You can configure the fallback behavior for a resource group to occur at one of a predefined recurring time (daily, weekly, monthly, specific date).

    The delayed fallback timer has the following characteristics:

  • Specifies the time when a resource group that is online and residing on a non-home or low priority node falls back its home node or a higher priority node
  • Affects the movement of the resource group to another node. For example, if you move the non-concurrent resource group (that has a fallback timer attribute) to another node using the Resource Group Management utility (clRGmove), the group stays on the destination node (unless you reboot the cluster, which is rarely done). If the destination node goes down and then reintegrates, the resource group also falls back to this node at the specified time.
  • Node Reintegration with a Delayed Fallback Timer Set

    The resource group does not fallback to its higher priority node immediately under the following condition:

  • You have configured a delayed fallback timer for a resource group, and
  • A higher priority node joins the cluster.
  • At the time specified in the Delayed Fallback Timer attribute, one of two scenarios takes place:

  • A higher priority node is found. If a higher priority node is available for the resource group, HACMP attempts to move the resource group to this node when the fallback timer expires. If the acquisition is successful, the resource group is acquired on that node.
  • However, if the acquisition of the resource group on the node fails, HACMP attempts to move the resource group to the next higher priority node in the group nodelist, and so on. If the acquisition of the resource group on the last node that is available fails, the resource group goes into an error state. You must take action to fix the error and bring such a resource group back online. For more information, see the chapter on Managing Resource Groups in a Cluster in the Administration Guide.

  • A higher priority node is not found. If there are no higher priority nodes available for a resource group, the resource group remains online on the same node until the fallback timer expires again. for example, if a daily fallback timer expires at 11:00 p.m. and there are no higher priority nodes available for the resource group to fallback on, the fallback timer reoccurs the next night at 11:00 p.m.
  • A fallback timer that is set to a specific date does not reoccur.

    Resource Group Dependencies

    HACMP offers a wide variety of configurations where you can specify the relationships between resource groups that you want to maintain at startup, fallover, and fallback. You can configure:

  • Parent and Child Dependent Resource Groups. Related applications and other resources in different resource groups are configured to be processed in the proper order.
  • Resource Group Location Dependencies. Certain applications in different resource groups stay online together on a node or on a site, or stay online on different nodes.
  • Keep the following points in mind when planning how to configure these dependencies:

  • Although by default all resource groups are processed in parallel, HACMP processes dependent resource groups according to the order dictated by the dependency, and not necessarily in parallel. Resource group dependencies are honored cluster-wide and override any customization for serial order of processing of any resource groups included in the dependency. For more information, see Dependent Resource Groups and Parallel or Serial Order.
  • Dependencies between resource groups offer a predictable and reliable way of building clusters with multi-tiered applications.
  • The following limitations apply to configurations that combine dependencies. Verification will fail if you do not follow these guidelines:

  • Only one resource group can belong to an Online on Same Node dependency set and an Online On Different Nodes dependency set at the same time
  • Only resource groups with the same Priority within an Online on Different Nodes dependency set can participate in an Online on Same Site dependency set.
  • For information on configuring dependent resource groups see the Administration Guide.

    Parent and Child Dependent Resource Groups

    Configuring a resource group dependency allows for better control for clusters with multi-tiered applications where one application depends on the successful startup of another application, and both applications are required to be kept highly available with HACMP. For more information, see Planning Considerations for Multi-Tiered Applications in Chapter 2: Initial Cluster Planning.

    The following example illustrates the parent/child dependency behavior:

  • If resource group A depends on resource group B, resource group B must be brought online before resource group A is acquired on any node in the cluster. Note that resource group A is defined as a child resource group, and resource group B as a parent resource group.
  • If child resource group A depends on parent resource group B, during a node startup or node reintegration, child resource group A cannot come online before parent resource group B gets online. If parent resource group B is taken offline, the child resource group A is taken offline first, since it depends on resource group B.
  • Business configurations that use multi-tiered applications can utilize parent/child dependent resource groups. For example, the database must be online before the application server. In this case, if the database is moved to a different node, the resource group containing the application server would have to be brought down and back up on any node in the cluster. For more information about examples of multi-tiered applications, see the Concepts and Facilities Guide.

    If a child resource group contains an application that depends on resources in the parent resource group and the parent resource group falls over to another node, the child resource group is temporarily stopped and automatically restarted. Similarly, if the child resource group is concurrent, HACMP takes it offline temporarily on all nodes, and brings it back online on all available nodes. If the fallover of the parent resource group is not successful, both the parent and the child resource groups go into an ERROR state.

    Consider the following when planning for parent/child dependent resource groups:

  • Plan applications you need to keep highly available and consider whether your business environment requires one application to be running before another application can be started.
  • Ensure that those applications that require sequencing are included in different resource groups. This way, you can establish dependencies between these resource groups.
  • Plan for application monitors for each application that you are planning to include in a child or parent resource group. For an application in a parent resource group, configure a monitor in the monitoring startup mode.
  • To minimize the chance of data loss during the application stop and restart process, customize your application server scripts to ensure that any uncommitted data is stored to a shared disk temporarily during the application stop process and read back to the application during the application restart process. It is important to use a shared disk as the application may be restarted on a node other than the one on which it was stopped.

    Resource Group Location Dependencies

    If failures do occur over the course of time, HACMP distributes resource groups so that they remain available, but not necessarily on the nodes you originally specified, unless they have the same home node and the same fallover and fallback policies.

    Resource group location dependency offers you an explicit way to specify that certain resource groups will always be online on the same node, or that certain resource groups will always be online on different nodes. You can combine these location policies with parent/child dependencies, to have all child resource groups online on the same node while the parent is online on a different node; or, to have all child resource groups be online on different nodes for better performance.

    If you have replicated resources, you can combine resource groups into a site dependency to keep them online at the same site. For more information, see the section Special Considerations for Using Sites in this chapter.

    For detailed examples of how resource group location dependencies affect the handling of resource groups, see Appendix B in the Administration Guide.

    HACMP supports three types of resource group location dependencies between resource groups:

  • Online on Same Node
  • The following rules and restrictions apply to the Online On Same Node Dependency set of resource groups. Verification will fail if you do not follow these guidelines:
  • All resource groups configured as part of a given Same Node dependency set must have the same nodelist (the same nodes in the same order).
  • All non-concurrent resource groups in the Same Node dependency set must have the same Startup/Fallover/Fallback policies.
  • Online Using Node Distribution Policy is not allowed for Startup.
  • If a Dynamic Node Priority Policy is configured as Fallover Policy, then all resource groups in the set must have the same policy.
  • If one resource group has a fallback timer configured, it applies to the set of resource groups. All resource groups in the set must have the same fallback time setting.
  • Both concurrent and non-concurrent resource groups can be included.
  • You can have more than one Same Node dependency set in the cluster.
  • HACMP enforces the condition that all resource groups in the Same Node dependency set that are active (ONLINE) are required to be ONLINE on the same node. Some resource groups in the set can be OFFLINE or in the ERROR state.
  • If one or more resource groups in the Same Node dependency set fail, HACMP tries to place all resource groups in the set on the node that can host all resource groups that are currently ONLINE (the ones that are still active) plus one or more failed resource groups.
  • Online on Same Site
  • The following rules and restrictions are applicable to Online On Same Site Dependency set of resource group. Verification will fail if you do not follow these guidelines:
  • All resource groups in a Same Site dependency set must have the same Inter-Site Management Policy but may have different Startup/Fallover/Fallback Policies. If fallback timers are used, these must be identical for all resource groups in the set.
  • The fallback timer does not apply to moving a resource group across site boundaries.
  • All resource groups in the Same Site Dependency set must be configured so that the nodes that can own the resource groups are assigned to the same primary and secondary sites.
  • Online Using Node Distribution Policy Startup policy is supported.
  • Both concurrent and non-concurrent resource groups can be included.
  • You can have more than one Same Site dependency set in the cluster.
  • All resource groups in the Same Site dependency set that are active (ONLINE) are required to be ONLINE on the same site, even though some resource groups in the set may be OFFLINE or in the ERROR state.
  • If you add a resource group that is included in a Same Node dependency set to a Same Site Dependency set, then all the other resource groups in the Same Node Dependency set must be added to the Same Site dependency set.
  • Also see the section Planning Resource Groups in Clusters with Sites.
  • Online on Different Nodes
  • When you configure resource groups in the Online On Different Nodes dependency set you assign priorities to each resource group in case there is contention for a given node at any point in time. You can assign High, Intermediate, and Low priority. Higher priority resource groups take precedence over lower priority groups at startup, fallover, and fallback.
    The following rules and restrictions apply to the Online On Different Nodes dependency set of resource groups. Verification will fail if you do not follow these guidelines:
  • Only one Online On Different Nodes dependency set is allowed per cluster.
  • Plan startup policies so that each resource group in the set will start up on a different node.
  • If a parent/child dependency is specified, then the child resource group cannot have a higher priority than its parent resource group.
  • Once the cluster is running with these groups configured, be aware that:
  • If a resource group with High Priority is ONLINE on a node, then no other lower priority resource group in the Different Nodes dependency set can come ONLINE on that node.
  • If a resource group with a higher priority falls over or falls back to a given node, the resource group with the higher priority will come ONLINE and the Cluster Manager takes the lower priority resource group OFFLINE and moves it to another node if this is possible.
  • Resource groups with the same priority cannot come ONLINE (startup) on the same node. Priority of a resource group for a node within the same Priority Level is determined by the groups’ alphabetical order in the set.
  • Resource groups with the same priority do not cause one another to be moved from the node after a fallover or fallback.
  • Combination of Same Site with Node Location Dependencies.
  • You can have resource groups that belong to both an Online on Same Node and Online on Same Site policy. You can also have resource groups that belong to both Online on Same Site and Online on Different Nodes policy.
  • Moving Resource Groups to Another Node

    In HACMP 5.4, when you tell HACMP to move the resource group to another node, you have the following options:

  • Resource groups remain on the nodes to which they are moved.
  • You can move resource groups that have the Never Fallback policy to another node. When you do so, you can tell HACMP to leave the resource group on the destination node until you decide to move the group again.

  • When you move a resource group with RG_move, it will remain on the node to which it was moved either indefinitely (until you tell HACMP to move it to another node) or until you reboot the cluster.
  • Note: If you do stop the cluster services, which rarely has to be done, and you wish to permanently change the resource group’s nodelist and highest priority node, change the resource group’s attributes and restart the cluster.
  • If you take the resource group online or offline on any of the node(s), it will remain online or offline either until the next cluster reboot, or until you manually bring the group online elsewhere in the cluster.
  • If your resource group has the Fallback to Highest Priority Node policy, the group falls back to its destination node, after you move it.
  • For instance, if the group has node A configured as its highest priority node, and you move it to node B, then this group will remain on node B and will treat this node now as its highest priority node. You can always choose to move the group again to node A. When you use SMIT to do so, HACMP informs you if the “original” highest priority node (node A) is now available to host the group.
    You can keep track of all resource groups that were manually moved by using the clRGinfo  –p command.

    Using clRGmove to Move Resource Groups

    You can use the command clRGmove to move a resource group to another node or to another site, or to take a resource group online or offline. The resource group remains on the node until the cluster reboot. You can run the clRGmove command via SMIT or from the command line.

    If you use clRGmove with resource groups that have the fallback policy Never Fallback, the resource group remains on that node until you move it elsewhere.

    Moving Parent/Child Dependent Resource Groups with clRGmove

    The following rules apply to resource groups with a parent/child dependency:

  • If the parent resource groups are offline due to your request made through clRGmove, HACMP rejects manual attempts to bring the child resource groups that depend on these resource groups online. The error message lists the parent resource groups that must be brought online first.
  • If you have a parent and a child resource group online, and would like to move the parent resource group to another node or take it offline, HACMP prevents you from doing so before a child resource group is taken offline.
  • For more information, see the sections on moving resource groups in the chapter on Managing Resource Groups in a Cluster in the Administration Guide.

    Moving Location Dependent Resource Groups with clRGmove

    The following rules apply to resource groups with a location dependency:

  • If you move a same-site dependent resource group to the other site, the entire set of resource groups in the dependency is moved to the other site.
  • If you move a same-node dependent resource group to another node, the entire set of resource groups in the dependency is moved.
  • You cannot move a resource group to any node that hosts a resource group that is online and part of a different-node dependency. You have to take the resource group that is included in the different node dependency offline on the selected node first.
  • Planning Cluster Networks and Resource Groups

    You cannot mix IPAT via IP Aliases and IPAT via IP Replacement labels in the same resource group. This restriction is enforced during verification of cluster resources.

    IPAT of any type does not apply to concurrent resource groups.

    This section contains the following topics:

  • Aliased Networks and Resource Groups
  • IPAT via IP Replacement Networks and Resource Groups
  • Planning Service IP Labels in Resource Groups.
  • Aliased Networks and Resource Groups

    A resource group may include multiple service IP labels. When a resource group configured with IPAT via IP Aliases is moved on an aliased network, all service labels in the resource group are moved as aliases to an available network interface, according to the resource group management policies in HACMP.

    For information on planning IPAT via IP Aliases, see IP Address Takeover via IP Aliases in Chapter 3: Planning Cluster Network Connectivity.

    If you configure aliased networks in your cluster, see Appendix B: Resource Group Behavior during Cluster Events in the Administration Guide for information on how the service IP label is moved at cluster startup and during fallover.

    IPAT via IP Replacement Networks and Resource Groups

    All non-concurrent resource groups can have their service IP labels configured on IPAT via IP Replacement networks.

    In addition, the following planning considerations apply:

  • For each IPAT via IP Replacement network, a node may be configured as the highest priority node for only one non-concurrent resource group in which a service IP label is also configured.
  • The resource groups cannot be configured to use the service labels from the same network.
  • The resource group cannot contain more than one service IP label from the same IPAT via Replacement network.
  • Planning Service IP Labels in Resource Groups

    The subnet requirements for boot and service IP labels/addresses managed by HACMP depend on the following:

  • The method that you choose for IP label/address recovery: IPAT via IP Aliases, or IPAT via IP Replacement with or without Hardware Address Takeover (HWAT)
  • The type of resource group in which the IP label is included.
  • Note: The boot IP label/address is the initial IP label/address that is used at boot time. The term “boot” is used in this section to explain the HACMP requirements for subnets on which different IP labels (boot and service) are configured.

    The following requirements are placed on the cluster configurations in order for HACMP to monitor the health of network interfaces and to properly maintain subnet routing during fallover and fallback:

    Configuration
    Subnet Requirements Placed on Service IP Labels
    Configuration includes:
    • Heartbeating over IP Aliases
    • IPAT
    • Any type of non-concurrent resource group
    In this case HACMP places no subnet requirements on any boot or service IP labels/addresses. Boot and service addresses can coexist on the same subnet or on different subnets. Because HACMP automatically generates the proper addresses required for heartbeating, all other addresses are free of constraints.
    Heartbeating over IP aliases works with all types of IPAT: IPAT via IP Aliases, IPAT via IP Replacement, and IPAT via IP Replacement with HWAT.
    The resource group management policy places no additional restrictions on IP labels. All service labels are handled according to the group behaviors and are not affected by subnetting.
    Selecting IP aliasing for heartbeating provides the greatest flexibility for configuring boot and service IP addresses at the cost of reserving a unique IP address and subnet range specifically for sending heartbeats.
    Configuration includes:
    • Heartbeating over IP interfaces
    • IPAT via IP Aliases
    • Any type of non-concurrent resource group
    With IPAT via IP Aliases, HACMP has the following requirements regardless of the resource group type:
    • All boot interfaces on any node must be configured on different subnets. This is required for correct operation of heartbeating. (Otherwise, having multiple interfaces on the same subnet would produce multiple subnet routes. This prevents reliable heartbeating and failure detection.)
    • All service IP addresses must be configured on different subnets than any of the boot addresses. This requirement prevents any possibility of having multiple routes leading to the same subnet.
    • Multiple service IP labels can be configured on the same subnet, because HACMP sends heartbeats only across the boot IP addresses.

    The following table continues to summarize subnet requirements for each resource group and network configuration:

    Configuration
    Subnet Requirements Placed on Service IP Labels
    Configuration includes:
    • Heartbeating over IP interfaces
    • IPAT via IP Replacement
    • Resource group with the startup policy Online on Home Node Only, the fallover policy Fallover to Next Priority Node in the List, and the fallback policy Fallback to Highest Priority Node.
    The highest priority node in the resource group must be the node that contains the boot IP address/label. Select the boot IP address so that it is on the same subnet as the service IP address to be used in the resource group.
    All other network interfaces on the highest priority node must be placed on subnet(s) that are different from that used by the boot/service IP addresses.
    HACMP verifies these requirements.
    Configuration includes:
    • Heartbeating over IP interfaces
    • IPAT via IP Replacement
    • Resource group with the startup policies Online on First Available Node or Online Using Node Distribution Policy, the fallover policy Fallover to Next Priority Node in the List, and the fallback policy Never Fallback.
    For all nodes, the network interfaces should be configured to use one subnet for boot/service IP addresses and another subnet for the standby.
    HACMP verifies these requirements.

    Planning Parallel or Serial Order for Processing Resource Groups

    Note: The option to configure a resource group for parallel or serial order of processing may be discontinued in the future. Configure resource group dependencies to ensure the proper order of processing instead of using this option.

    By default, HACMP acquires and releases all individual resources configured in your cluster in parallel. However, you can specify a specific serial order according to which some or all of the individual resource groups should be acquired or released. In this case, during acquisition:

      1. HACMP acquires the resource groups serially in the order that you specified in the list
      2. HACMP acquires the remaining resource groups in parallel.

    During release, the process is reversed:

      1. HACMP releases the resource groups for which you did not define a specific serial order in parallel.
      2. The remaining resource groups in the cluster are processed in the order that you specified for these resource groups in the list.

    If you upgraded your cluster from a previous version of HACMP, for more information on which processing order is used in this case, see the chapter on Upgrading an HACMP Cluster.

    Note: Even if you specify the order of resource group processing on a single node, the actual fallover of the resource groups may be triggered by different policies. Therefore, it is not guaranteed that resource groups are processed cluster-wide in the order specified because the serial customized processing order of resource groups applies to their processing on a particular node only.

    When resource groups are processed in parallel, fewer cluster events occur in the cluster. In particular, events such as node_up_local or get_disk_vg_fs do not occur if resource groups are processed in parallel.

    As a result, using parallel processing reduces the number of particular cluster events for which you can create customized pre- or post-event scripts. If you start using parallel processing for some of the resource groups in your configuration, be aware that your existing pre- or post-event scripts may not work for these resource groups. For more information on parallel processing of resource groups and using event scripts, see Chapter 7: Planning for Cluster Events.

    Parallel and serial processing of resource groups is reflected in the event summaries in the hacmp.out file. For more information, see the section Understanding the hacmp.out Log File in Chapter 2: Using Cluster Log Files in the Troubleshooting Guide.

    For more information about how to configure customized serial acquisition and release order of resource groups, see Configuring Processing Order for Resource Groups in Chapter 4: Configuring HACMP Resource Groups (Extended) in the Administration Guide.

    Dependent Resource Groups and Parallel or Serial Order

    Although by default HACMP processes resource groups in parallel, if you establish dependencies between some of the resource groups in the cluster, processing may take longer than it does for clusters without dependent resource groups as there may be more processing to do to handle one or more rg_move events

    Upon acquisition, first the parent or higher priority resource groups are acquired, then the child resource groups are acquired. Upon release, the order is reversed. The remaining resource groups in the cluster (those that do not have dependencies themselves) are processed in parallel.

    Also, if you specify serial order or processing and have dependent resource groups configured, make sure that the serial order does not contradict the dependency specified. The resource groups dependency overrides any serial order in the cluster.

    Planning Resource Groups in Clusters with Sites

    The combination of Inter-Site Management Policy and the node startup, fallover and fallback policies that you select determines the resource group startup, fallover, and fallback behavior.

    Site support in HACMP allows a variety of resource group configurations.

    Concurrent Resource Groups and Sites

    You can use the following policies for concurrent resource groups:

    Inter-Site Management Policy
    Online on Both Sites
    Online on Either Site
    Prefer Primary Site
    Ignore
    Startup Policy
    Online on All Available Nodes
    Fallover Policy
    Bring Offline (on Error node only)
    Fallback Policy
    Never FallBack

    Non-Concurrent Resource Groups and Sites

    For non-concurrent resource groups, you can use the following policies

    Inter-Site Management Policy
    Online on Either Site
    Prefer Primary Site
    Ignore
    Startup Policy
    Online on Home Node
    Online on First Available Node
    Online using Node Distribution Policy
    Fallover Policy
    Fallover to Next Priority Node (in the nodelist)
    Fallback Policy
    Fallback to higher priority node (in the nodelist)
    Never Fallback

    :

    General Resource Group Behavior in Clusters with Sites

    Non-concurrent resource groups defined with an Inter-Site Management Policy of Prefer Primary Site or Online on Either Site have two instances when the cluster is running:

  • A primary instance on a node at the primary site
  • A secondary instance on a node at the secondary site.
  • The clRGinfo command shows these instances as:

  • ONLINE
  • ONLINE SECONDARY.
  • Concurrent resource groups (Online on All Nodes) with an Inter-Site Management Policy of Online on Both Sites have multiple ONLINE instances and no ONLINE SECONDARY instances when the cluster is running at both locations.

    Concurrent resource groups with an Inter-Site Management Policy of Prefer Primary Site or Online on Either Site have primary instances on each node at the primary site and secondary instances on nodes at the secondary site.

    Resource groups with replicated resources are processed in parallel on node_up, according to their site management and node startup policies, taking any dependencies into account. When resource groups fall over between sites, the secondary instances are acquired and brought ONLINE SECONDARY on the highest priority node available at the new backup site. (More than one secondary instance can be on the same node.) Then the primary instance of the resource group is acquired and brought ONLINE on the highest priority node available to host this resource group on the new “active” site. This order of processing ensures that the backup site is ready to receive backup data as soon as the primary instance starts.

    If the secondary instance cannot be brought to ONLINE SECONDARY state for some reason, the primary instance will still be brought ONLINE, if possible.

    Special Considerations for Using Sites

    This section discusses considerations if your resource group has:

  • A Startup Policy of Online using Node Distribution Policy
  • Dependencies specified.
  • Dependent Resource Groups and Sites

    You can specify a dependency between two or more resource groups that reside on nodes on different sites. In this case, if either parent or child moves to the other site, the dependent group moves also. If for some reason the parent group cannot be activated on the fallover site, the child resource group will remain inactive also.

    Note that the dependency only applies to the state of the primary instance of the resource group. If the parent group’s primary instance is OFFLINE, and the secondary instance is ONLINE SECONDARY on a node, the child group’s primary instance will be OFFLINE.

    In cases of resource group recovery, resource groups can fall over to node(s) on either site. The sequence for acquiring dependent resource groups is the same as for clusters without sites, with the parent resource group acquired first and the child resource group acquired after it. The release logic is reversed: the child resource group is released before a parent resource group is released.

    Note that you cannot use a resource group node distribution policy for startup for resource groups with a same node dependency. You can use this policy for resource groups with a same site dependency.

    As in clusters without sites, if you have sites defined, configure application monitoring for applications included in resource groups with dependencies.

    Resource Group Behavior Examples in Clusters with Sites

    The Fallback Policy applies to the ONLINE and ONLINE SECONDARY instances of the resource group.

    The Inter-Site Management Policy for a resource group determines the fallback behavior of the ONLINE instances of the resource groups between sites, thus governing the location of the secondary instance.

    The ONLINE SECONDARY instance is located at the site that does not have the ONLINE instance. The following table shows the expected behavior of resource groups during site events according to the Startup and Inter-Site Management policies:

    NodeStartup Policy (applies within site)
    Inter-Site Management Policy
    Startup/Fallover/Fallback Behavior
    Online on Home Node Only
    Prefer Primary Site
    Cluster Startup
    Primary site: The home node acquires the resource group in the ONLINE state. Non-home nodes leave the resource group OFFLINE.
    Secondary site: The first node that joins on this site acquires the resource group in ONLINE SECONDARY state.
    Inter-site Fallover
    The ONLINE instance falls between sites when no nodes at the local site can acquire the resource group. The secondary instance moves to the other site and is brought ONLINE SECONDARY on the highest priority node available, if possible.
    Inter-site Fallback
    The ONLINE instance falls back to the primary site when a node from the primary site joins. The secondary instance moves to the other site and is brought ONLINE SECONDARY on the highest priority node available, if possible.
    Online on First Available Node
    or
    Online Using Node Distribution Policy
    Prefer Primary Site
    Cluster Startup
    Primary site: The node that joins first from the primary site (and meets the criteria) acquires the resource group in the ONLINE state. The resource group is OFFLINE on all other nodes at the primary site.
    Note that the node distribution policy applies only to the primary instance of the resource group.
    Secondary site: The first node to join the cluster in this site acquires all secondary instances of resource groups with this startup policy in ONLINE_SECONDARY state (no distribution).
    Inter-site Fallover
    The ONLINE instance falls between sites when no nodes at the local site can acquire the resource group. The secondary instance moves to the other site and is brought ONLINE SECONDARY on the highest priority node available, if possible.
    Inter-site Fallback
    The ONLINE instance falls back to primary site when a node on the primary site joins. The secondary instance moves to the other site and is brought ONLINE SECONDARY on the highest priority node available, if possible.
    Online on all Available Nodes
    Prefer Primary Site
    Cluster Startup
    Primary site: All nodes acquire the resource group in the ONLINE state.
    Secondary site: All nodes acquire the resource group in ONLINE_SECONDARY state.
    Inter-site Fallover
    The ONLINE instances fall between sites when all nodes at the local site go OFFLINE or fail to start the resource group. The secondary instances move to the other site and are brought ONLINE SECONDARY where possible.
    Inter-site Fallback
    ONLINE instances fall back to primary site when a node on the primary site rejoins. Nodes at the secondary site acquire the resource group in the ONLINE_SECONDARY state.
    Online on Home Node Only
    Online on Either Site
    Cluster Startup
    The home node that joins the cluster (from either site) acquires the resource group in the ONLINE state. Non-home nodes leave the resource group OFFLINE.
    Secondary site: The first node to join from the other site acquires the resource group in ONLINE_SECONDARY state.
    Inter-site Fallover
    The ONLINE instance falls between sites when no nodes at the local site can acquire the resource group. The secondary instance moves to the other site and is brought ONLINE SECONDARY on the highest priority node available, if possible.
    Inter-site Fallback
    The ONLINE instance does not fall back to the primary site when a node on the primary site rejoins. The highest priority rejoining node acquires the resource group in the ONLINE _SECONDARY state.
    Online on First Available Node
    or
    Online Using Node Distribution Policy
    Online on Either Site
    Cluster Startup
    The node that joins first from either site (that meets the distribution criteria) acquires the resource group in the ONLINE state.
    Secondary site: Once the resource group is ONLINE, the first joining node from the other site acquires the resource group in ONLINE_SECONDARY state.
    Inter-site Fallover
    The ONLINE instance falls between sites when no nodes at the local site can acquire the resource group.
    Inter-site Fallback
    The ONLINE instance does not fall back to the primary site when the primary site joins. Rejoining node acquires resource group in the ONLINE _SECONDARY state.
    Online on all Available Nodes
    Online on Either Site
    Cluster Startup
    The node that joins first from either site acquires the resource group in the ONLINE state. Once an instance of the group is active, the rest of the nodes in the same site also activate the group in the ONLINE state.
    Secondary site: All nodes acquire the resource group in ONLINE_SECONDARY state.
    Inter-site Fallover
    The ONLINE instance falls between sites when all nodes at the local site go OFFLINE or fail to start the resource group.
    Inter-site Fallback
    The ONLINE instance does not fall back to the primary site when the primary site joins. Rejoining nodes acquire the resource group in the ONLINE _SECONDARY state.
    Online on all Available Nodes
    Online at Both Sites
    Cluster Startup
    All nodes at both sites activate the resource group in the ONLINE state.
    Inter-site Fallover
    No fallover. Resource group is either OFFLINE or in ERROR state.
    Inter-site Fallback
    No fallback.

    See also Appendix B of the Administration Guide for several sample configurations and use cases with different configuration using resource group dependencies.

    Customizing Inter-Site Resource Group Recovery

    For a new installation of HACMP 5.4, inter-site resource group recovery is enabled by default.

    Fallover of resource groups between sites is disabled by default when you upgrade to HACMP 5.4 from a release prior to version 5.3. This is the pre-5.3 release behavior for non-Ignore site management policy. A particular instance of a resource group can fall over within one site, but cannot move between sites. If no nodes are available on the site where the affected instance resides, that instance goes into ERROR or ERROR_SECONDARY state. It does not stay on the node where it failed. This behavior applies to both primary and secondary instances.

    Note that in HACMP 5.3 and up, the Cluster Manager will move the resource group if a node_down or node_up event occurs, even if fallover between sites is disabled. You can also manually move a resource group between sites.

    Enabling or Disabling Fallover between Sites

    If you migrated from a previous release of HACMP, you can change the resource group recovery policy to allow the Cluster Manager to move a resource group to another site to avoid having the resource group go into ERROR state.

    Recovery of Primary Instances of Replicated Resource Groups across Sites

    When fallover across sites is enabled, HACMP tries to recover the primary instance of a resource group in situations where an interface connected to an inter-site network fails or becomes available.

    Recovery of Secondary Instances of Replicated Resource Groups across Sites

    When fallover across sites is enabled, HACMP tries to recover the secondary instance as well as the primary instance of a resource group in these situations:

  • If an acquisition failure occurs while the secondary instance of a resource group is being acquired, the Cluster Manager tries to recover the resource group's secondary instance, as it does for the primary instance. If no nodes are available for the acquisition, the resource group's secondary instance goes into global ERROR_SECONDARY state.
  • If quorum loss is triggered, and the resource group has its secondary instance online on the affected node, HACMP tries to recover the secondary instance on another available node.
  • If a local_network_down occurs on an XD_data network, HACMP moves resource groups that are ONLINE on the particular node that have GLVM or HAGEO resources to another available node on that site. This functionality of the primary instance is mirrored to the secondary instance so that secondary instances may be recovered via selective fallover.
  • Using SMIT to Enable or Disable Inter-Site Resource Group Recovery

    To enable or disable inter-site Resource Group Recovery, use the following path in HACMP SMIT: Extended Configuration > Extended Resource Configuration > HACMP Extended Resources Configuration > Customize Resource Group and Resource Recovery > Customize Inter-site Resource Group Recovery.

    Planning for Replicated Resources

    HACMP 5.4 offers much fuller support for replicated resources, including configuration and processing improvements. Many previous limitations have been eliminated for HACMP/XD replicated resources. In addition HACMP 5.4:

  • Enables you to dynamically reconfigure resource groups that contain replicated resources with HACMP/XD and HACMP site configurations.
  • Consolidates HACMP/XD verification into standard cluster verification by automatically detecting and calling the installed XD product’s verification utilities.
  • See more information in Planning Resource Groups in Clusters with Sites.

    Configuration of Replicated Resources

    If you have installed an HACMP/XD product, the following configurations are supported:

  • Resource groups with concurrent node policy can have non-concurrent site management policy.
  • Inter-site recovery of resource groups containing HACMP/XD replicated resources is allowed by default for new installations of HACMP 5.3 and 5.4. Configurations updated and migrated from previous releases maintain the pre-existing behavior. You can configure this behavior to be fallover or notify on cluster-initiated resource group movement. If you select the notify option, you need to configure a pre- or post-event script or a remote notification method. See the section on Customizing Resource Group Recovery in the Administration Guide.
  • Parent/child and location dependency configurations for replicated resource groups.
  • Node-based resource group distribution startup policy for resource groups with HACMP sites. (Network-based resource group distribution is no longer an available option.)
  • The following rules and restrictions apply to replicated resource groups:

  • You can have a service IP in a resource group with GeoMirror devices (GMDs) but the service IP cannot be placed on an XD_data network.
  • You cannot configure a resource group to use a non-concurrent node policy and a concurrent inter-site management policy.
  • Note: See the HACMP/XD documentation for complete information on configuring HAGEO, Metro Mirror, or GLVM resources and resource groups.

    Processing of Replicated Resources

    HACMP provides the following functionality for processing of replicated resources:

  • Whenever possible, HACMP processes events in parallel by default. Events are dynamically phased so that HACMP processes the primary and secondary instances of a resource group in proper order (release_primary, release_secondary, acquire_secondary, acquire_primary).
  • HACMP now recovers the secondary instances (as well as the primary instances) of the replicated resource groups during volume group losses, acquisition failures, and local_network_down events if another node or network is available.
  • HACMP has a better chance of acquiring the secondary instance of a resource group upon site fallover. Now HACMP can consider all nodes at the secondary site as targets, not just the node that previously hosted the primary instance as in the previous version of HACMP.
  • For complete information about HACMP/XD facilities and how they integrate with HACMP, see the HACMP/XD documentation at the following URL:

    http://www.ibm.com/servers/eserver/pseries/library/hacmp_docs.html

    Also see the section Planning Resource Groups in Clusters with Sites in this chapter.

    Moving Resource Groups with Replicated Resources

    You can move the primary instance of a resource group with replicated resources to another site. The Cluster Manager uses dynamic event phasing and first moves the secondary instance from that site to the other site, as long as a node is available there to host it. Every attempt is made to maintain the secondary instance in SECONDARY ONLINE state. Even if a node at a given site is configured so it cannot host more than one primary instance, it may host more than one secondary instance in order to keep them all SECONDARY ONLINE.

    Recovering Resource Groups on Node Startup

    Prior to HACMP 5.2, when a node joined the cluster, the node did not make an attempt to acquire any resource groups that had previously gone into an ERROR state on any other node. Such resource groups remained in the ERROR state and required use of the Resource Group Migration utility, clRGmove, to manually bring them back online.

    In HACMP 5.2 and up, resource group recovery is improved. An attempt is made to automatically bring online the resource groups that are currently in the ERROR state. This further increases the chances of bringing the applications back online. If a resource group is in the ERROR state on any node in the cluster, the node attempts to acquire it during a node_up event. The node must be included in the nodelist for the resource group.

    The resource group recovery on node startup is different for non-concurrent and concurrent resource groups:

  • If the starting node fails to activate a non-concurrent resource group that is in the ERROR state, the resource group continues to fall over to another node in the nodelist, if a node is available. The fallover action continues until all available nodes in the nodelist have been tried. If after all nodes have been tried the resource group was not acquired, it goes into an ERROR state.
  • If the starting node fails to activate a concurrent resource group that is in the ERROR state, the concurrent resource group is left in the ERROR state.
  • Planning for Workload Manager

    IBM offers AIX 5L Workload Manager (WLM) as a system administration resource included with AIX 4.3.3 and above. WLM allows users to set targets for and limits on CPU, physical memory usage, and disk I/O bandwidth for different processes and applications. This provides better control over the use of critical system resources at peak loads. HACMP allows you to configure WLM classes into HACMP resource groups so that the starting and stopping of WLM and the active WLM configuration can be under cluster control.

    HACMP does not verify every aspect of your WLM configuration; therefore, it remains your responsibility to ensure the integrity of the WLM configuration files. After you add the WLM classes to an HACMP resource group, the verification utility checks only whether the required WLM classes exist. Therefore, you must fully understand how WLM works, and configure it carefully. Incorrect but legal configuration parameters can impede the productivity and availability of your system.

    For complete information on how to set up and use Workload Manager, see the IBM AIX 5L Workload Manager (WLM) Redbook at the following URL:

    http://www.redbooks.ibm.com

    Workload Manager distributes system resources among processes that request them according to the class they are in. Processes are assigned to specific classes according to class assignment rules. Planning for WLM integration with HACMP includes two basic steps:

      1. Using AIX 5L SMIT panels to define the WLM classes and class assignment rules related to highly available applications.
      2. Using HACMP SMIT panels to establish the association between the WLM configuration and the HACMP resource groups. See Chapter 4: Configuring HACMP Resource Groups (Extended) in the Administration Guide for more information.

    About Workload Manager Classes

    Workload Manager distributes system resources among processes that request them according to the class to which the processes are assigned. The properties of a class include:

  • Name of the class. A unique alphanumeric string no more than 16 characters long.
  • Class tier. A number from 0 to 9. This number determines the relative importance of a class from most important (tier 0) to least important (tier 9).
  • Number of the CPU and physical memory “shares.” The actual amount of resources allotted to each class depends on the total number of shares in all classes (thus, if two classes are defined on a system, one with two shares of target CPU usage and the other with three shares, the first class will receive 2/5 and the second class will receive 3/5 of the CPU time).
  • Configuration limits. Minimum and maximum percentage limits of CPU time, physical memory, and disk I/O bandwidth accessible to the process.
  • You set up class assignment rules that tell WLM how to classify all new processes (as well as those already running at the time of WLM startup) according to their gid, uid and the full pathname.

    Workload Manager Reconfiguration, Startup, and Shutdown

    This section describes the way WLM is reconfigured or started or stopped once you have placed WLM under the control of HACMP.

    Workload Manager Reconfiguration

    After WLM classes are added to an HACMP resource group, then at the time of cluster synchronization on the node, HACMP reconfigures WLM to use the rules required by the classes associated with the node. In the event of dynamic resource reconfiguration on the node, WLM will be reconfigured in accordance with any changes made to WLM classes associated with a resource group.

    Workload Manager Startup

    WLM startup occurs either when the node joins the cluster or when a dynamic reconfiguration of the WLM configuration takes place.

    The configuration is node-specific and depends upon the resource groups in which the node participates. If the node cannot acquire any resource groups associated with WLM classes, WLM will be not be started.

    For all non-concurrent resource groups that do not have the Online Using Node Distribution Policy Startup, the startup script determines whether the resource group is running on a primary or on a secondary node and adds the corresponding WLM class assignment rules to the WLM configuration. For all other non-concurrent resource groups, and for concurrent access resource groups that the node can acquire, the primary WLM class associated with each resource group is placed in the WLM configuration; the corresponding rules are added to the rules table.

    Finally, if WLM is currently running and was not started by HACMP, the startup script will restart WLM from the user-specified configuration, saving the previous configuration. When HACMP is stopped, it returns WLM back to its previous configuration.

    Failure to start up WLM generates an error message logged in the hacmp.out log file, but node startup and the resource reconfiguration will proceed.

    Workload Manager Shutdown

    WLM shutdown occurs either when the node leaves the cluster or on dynamic cluster reconfiguration. If WLM is currently running, the shutdown script will check if the WLM was running before being started by the HACMP and what configuration it was using. It will then either do nothing (if WLM is not currently running), or will stop WLM (if it was not running before HACMP startup), or stop it and restart it in the previous configuration (if WLM was previously running).

    Limitations and Considerations

    Keep in mind the following when planning your Workload Manager configuration:

  • Some WLM configurations may impede HACMP performance. Be careful when designing your classes and rules, and be alert to how they may affect HACMP.
  • You can have no more than 27 non-default WLM classes across the cluster, since one configuration is shared across the cluster nodes.
  • An HACMP Workload Manager configuration does not support sub-classes, even though WLM allows them in AIX 5L v.5.2. If you configure sub-classes for any WLM classes that are placed in a resource group, a warning will be issued upon cluster verification, and the sub-classes will not be propagated to other nodes during synchronization.
  • On any given node, only the rules for classes associated with resource groups that can be acquired by a node are active on that node.
  • Assigning WLM Classes to HACMP Resource Groups

    As you plan how to assign the previously configured WLM classes to cluster resource groups, start by filling in the Primary Workload Manager Class and Secondary Workload Manager Class fields for each resource group in the Resource Groups Worksheets. For information about completing the Resource Groups Worksheet, see the section Completing the Resource Group Worksheet.

    The procedure for adding WLM classes as resources in resource groups is described in the section Configuring Workload Manager in Chapter 4: Configuring HACMP Resource Groups (Extended) in the Administration Guide.

    Completing the Resource Group Worksheet

    The Resource Groups Worksheet helps you plan the resource groups for the cluster. Complete one for each resource group.

    Note: For examples of location dependency and resource group behavior, see Appendix B: Resource Group Behavior During Cluster Events in the Administration Guide.

    Appendix A: Planning Worksheets has blank copies of the Resource Group Worksheet. Make copies of the worksheet to record configuration information.

    To complete a Resource Group Worksheet:

      1. Record the cluster name in the Cluster Name field. You first assigned this value in Chapter 2: Initial Cluster Planning.
      2. Assign a name to the resource group and record it in the Resource Group Name field. Use no more than 32 characters. You can use alphabetic or numeric characters and underscores, but do not use a leading numeric. Duplicate entries are not allowed.
      3. Record the names of the nodes you want to be members of the resource group nodelist for this resource group in the Participating Nodes/Default Node Priority field. List the node names in order from highest to lowest priority (this does not apply to concurrent resource groups).
      4. (Optional) Record the inter-site management policy. This choice is only available if you are using the Extended Configuration path. Cluster configurations are not typically set up with multiple sites unless you are installing HACMP/XD. Otherwise, unless appropriate methods or customization are provided to handle site operations, Ignore should be used for the Inter-Site Management Policy field.
    Ignore (default). Use this option if sites are not being used in the cluster. This option is also allowed for XD/Metro Mirror configurations.
    Prefer Primary Site. The primary instance of the resource group is brought online on a node at this site and will fallback to this site when it rejoins the cluster after a fallover. The secondary instance is maintained on the other site.
    Online on Either Site. The resource group node policy determines where the primary instance of the resource group will startup, fallover, and fallback. The secondary instance is maintained on the other site.
    Online on Both Sites. Select this option if you want replicated resources to be accessible from all sites. If the site relationship is Online on Both Sites, the node policy must be Available on All Nodes.
      5. Specify a Startup Policy, a Fallover Policy, and Fallback Policy.
    For information about these settings, see the section Resource Group Policies for Startup, Fallover and Fallback.
      6. (Optional) You can also specify a Delayed Fallback Timer and Settling Time.
    For information about these settings, see the section Resource Group Policies for Startup, Fallover and Fallback.
      7. (Optional) Record the resource group runtime policy. Runtime policies are only available on the Extended Configuration Path and include:
  • Dynamic node priority policy
  • Dependencies between resource groups
  • Workload Manager
  • Resource group processing order
    1. 8. (Optional) Record the Dynamic Node Priority Policy you plan to use for this resource group. (This field appears on Extended Configuration path Change/Show a Resource/Attribute field.)The default is blank. The ordered nodelist is the default policy. For concurrent resource groups, this is the only choice. To use a dynamic node priority policy, select one of the predefined dynamic node priority policies.
      9. (Optional) Record the dependency (parent/child or location) you plan to use for this resource group. See the appropriate section in HACMP offers a wide variety of configurations where you can specify the relationships between resource groups that you want to maintain at startup, fallover, and fallback. You can configure: to review the guidelines for these configurations.
      10. (Optional) Record the resource group processing order. In the Processing Order: Parallel, Customized or Serial field, identify whether you would like HACMP to acquire and release this resource group in parallel (default) or serially.
    For more information, see Planning Parallel or Serial Order for Processing Resource Groups.
      11. List the resources to be included in the resource group. You have identified the resources in previous chapters. In this section of the Resource Group Worksheet, you record the following resources/attributes:
  • Enter the Service IP Label in this field if your cluster uses IP Address Takeover.
  • This field relates only to non-concurrent resource groups. Leave the field Filesystems (default is All) blank, if you want all filesystems in the specified volume groups to be mounted by default when the resource group, containing this volume group, is brought online.
  • Note: If you leave the Filesystems (default is All) field blank and specify the shared volume groups in the Volume Groups field, then all filesystems will be mounted in the volume group. If you leave the Filesystems field blank and do not specify the volume groups in the Volume Groups field, no filesystems will be mounted.

    You may list the individual filesystems to include in this resource group in the Filesystems (default is All) field. In this case, only the specified filesystems will be mounted when the resource group is brought online.

  • In the Filesystems Consistency Check field, specify fsck or logredo.
  • In the Filesystems Recovery Method field, specify parallel or sequential.
  • In the Filesystems to Export field, enter the mount points of the filesystems, the directories that are exported to all nodes in the resource group nodelist when the resource is initially acquired, or both.
  • List the filesystems that should be NFS-mounted by the nodes in the resource group nodelist not currently holding the resource in the Filesystems to NFS Mount field. All nodes in the resource group nodelist that do not currently hold the resource will attempt to NFS-mount these filesystems.
  • In the Network for NFS Mount field, enter the preferred network to NFS-mount the filesystems specified.
  • List in the Volume Groups field the names of the volume groups containing raw logical volumes or raw volume groups that are varied on when the resource is initially acquired.
  • Specify the shared volume groups in the Volume Groups field if you want to leave the field Filesystems (default is All) blank, and want to mount all filesystems in the volume group. If you specify more than one volume group in this field, then you cannot choose to mount all filesystems in one volume group and not in another; all filesystems in all specified volume groups will be mounted.

    For example, in a resource group with two volume groups, vg1 and vg2, if the Filesystems (default is All) is left blank, then all the filesystems in vg1 and vg2 will be mounted when the resource group is brought up. However, if the Filesystems (default is All) has only filesystems that are part of the vg1 volume group, then none of the filesystems in vg2 will be mounted, because they were not entered in the Filesystems (default is All) field along with the filesystems from vg1.

    If you plan to use raw logical volumes, you only need to specify the volume group in which the raw logical volume resides in order to include the raw logical volumes in the resource group.

  • In the Concurrent Volume Groups field, enter the names of the concurrent volume groups that are to be varied on by the owner node.
  • If you plan on using an application that directly accesses raw disks, list the physical volume IDs of the raw disks in the Raw Disk PVIDs field.
  • If you are using Fast Connect Services, define the resources in this field.
  • If using Tape Resources, enter the name of the tape resource in this field.
  • List the names of the application servers to include in the resource group in the Application Servers field.
  • List the Communications Links for SNA-over-LAN, SNA-over-X.25, or X.25.
  • In the Primary Workload Manager Class and Secondary Workload Manager Class fields, fill in the name of a class associated with the HACMP WLM configuration that you want to add to this resource group.
  • For non-concurrent resource groups that do not have the Online Using Node Distribution Policy startup, if no secondary WLM class is specified, all nodes will use the primary WLM class. If a secondary class is also specified, only the primary node will use the primary WLM class. Secondary classes cannot be assigned to non-concurrent resource groups with the Online Using Node Distribution Policy startup and concurrent resource groups; for these resource group types, all nodes in the resource group use the primary WLM class.

    Note: Before adding WLM classes to a resource group, specify a WLM configuration in the Change/Show HACMP WLM runtime Parameters SMIT panel. The picklists for the Primary/Secondary WLM Classes are populated with the classes defined for the specified WLM configuration. For more information, see Chapter 4: Configuring HACMP Resource Groups (Extended) in the Administration Guide.
  • Miscellaneous Data is a string placed into the environment along with the resource group information and is accessible by scripts.
  • The Automatically Import Volume Groups field is set to false by default. Definitions of “available for import volume groups” are presented here from the file created the last time the information was collected from the Discover Current Volume Group menu. No updating of volume group information is done automatically.
  • If reset to true, causes the definition of any volume groups entered in the Volume Groups field or the Concurrent Volume Groups field to be imported to any resource group nodes that don't already have it.

    When Automatically Import Volume Groups is set to true, the final state of the volume group will depend on the initial state of the volume group (varied on or varied off) and the state of the resource group to which the volume group is to be added (online or offline).

  • Set the Disk Fencing Activated field to true to activate SSA Disk Fencing for the disks in this resource group. Set to false to disable. SSA Disk Fencing helps prevent partitioned clusters from forcing inappropriate takeover of resources.
  • Filesystems Mounted before IP Configured. HACMP handles node failure by taking over the failed node's IP address(es) and then taking over its filesystems. This results in “Missing File or Filesystem” errors for NFS clients since the clients can communicate with the backup server before the filesystems are available. Set to true to takeover Filesystems before taking over IP address(es) that will prevent an error. Set to false to keep the default order.
  • Where You Go from Here

    You have now planned the resource groups for the cluster. You next customize the cluster event processing for your cluster. Chapter 7: Planning for Cluster Events discusses this step.


    PreviousNextIndex