PreviousNextIndex

Chapter 4: Planning Shared Disk and Tape Devices


This chapter discusses information to consider before configuring shared external disks in an HACMP cluster and provides information about planning and configuring tape drives as cluster resources. This chapter contains the following sections:

  • Prerequisites
  • Overview
  • Choosing a Shared Disk Technology
  • Disk Power Supply Considerations
  • Planning for Non-Shared Disk Storage
  • Planning a Shared SCSI Disk Installation
  • Planning a Shared IBM SSA Disk Subsystem Installation
  • Completing the Disk Worksheets
  • Adding the Disk Configuration to the Cluster Diagram
  • Planning for Tape Drives as Cluster Resources
  • Where You Go from Here.
  • Prerequisites

    By now, you should have completed the planning steps in the previous chapters:

  • Chapter 2: Initial Cluster Planning
  • Chapter 3: Planning Cluster Network Connectivity.
  • Refer to AIX 5L documentation for the general hardware and software setup for your disk and tape devices.

    Overview

    In an HACMP cluster, shared disks are external disks connected to more than one cluster node. In a non-concurrent configuration, only one node at a time owns the disks. If the owner node fails, the cluster node with the next highest priority in the resource group nodelist acquires ownership of the shared disks and restarts applications to restore critical services to clients. This ensures that the data stored on the disks remains accessible to client applications.

    Typically, takeover occurs within 30 to 300 seconds. This range depends on the number and types of disks used, the number of volume groups, the filesystems (whether shared or NFS cross-mounted), and the number of critical applications in the cluster configuration.

    When planning the shared external disk for your cluster, the objective is to eliminate single points of failure in the disk storage subsystem. The following table lists the disk storage subsystem components, with recommended ways to eliminate them as single points of failure:

    Cluster Object
    Eliminated as Single Point of Failure by...
    Disk adapter
    Using redundant disk adapters
    Controller
    Using redundant disk controllers
    Disk
    Using redundant hardware and LVM disk mirroring or RAID mirroring

    In this chapter, you perform the following planning tasks:

  • Choosing a shared disk technology.
  • Planning the installation of the shared disk storage. This includes:
  • Determining the number of disks required to handle the projected storage capacity. You need multiple physical disks on which to put the mirrored logical volumes. Putting copies of a mirrored logical volume on the same physical device defeats the purpose of making copies. For more information about creating mirrored logical volumes, see Chapter 5: Planning Shared LVM Components.
  • Determining the number of disk adapters each node will contain to connect to the disks or disk subsystem.
  • Physical disks containing logical volume copies should be on separate adapters. If all logical volume copies are connected to a single adapter, the adapter is potentially a single point of failure. If the single adapter fails, HACMP moves the volume group to an alternate node. Separate adapters prevent the need for this move.

  • Understand the cabling requirements for each type of disk technology.
  • Completing planning worksheets for the disk storage.
  • Adding the selected disk configuration to the cluster diagram.
  • Planning for configuring a SCSI streaming tape drive or a direct Fibre Channel Tape unit attachment as a cluster resource.
  • Choosing a Shared Disk Technology

    For a complete list of supported hardware, including disks and disk adapters, as of the date of publication of this guide, see the following URL:

    http://www.ibm.com/common/ssi

    After selecting your country and language, select HW and SW Desc (Sales Manual, RPQ) for a Specific Information Search.

    The HACMP software supports the following disk technologies as shared external disks in a highly available cluster:

  • SCSI drives, including RAID subsystems
  • IBM SSA adapters and SSA disk subsystems
  • Fibre Channel adapters and disk subsystems
  • Data path devices (VPATH)—SDD 1.3.1.3 or greater.
  • You can combine these technologies within a cluster. Before choosing a disk technology, review the considerations for configuring each technology as described in this section.

    HACMP also supports dynamic tracking of Fibre Channel devices. For information about adapters and devices, see the Adapters, Devices, and Cable Information for Multiple Bus Systems guide at the following URL:

    http://publibfp.boulder.ibm.com/epubs/pdf/38051616.pdf

    For information about installing and configuring OEM disks, see the appendix on OEM Disk, Volume Group and Filesystems Accommodation in the Installation Guide.

    Obtaining HACMP APARS

    An authorized program analysis report (APAR) contains an account of a problem caused by a suspected defect in a current, unaltered release of a program. You can obtain a list of HACMP APARs and updates for hardware as follows:

      1. Go to IBM support website http://www.ibm.com/support/us/
      2. Search on “HACMP +APAR”.
      3. Sort the results by date, newest first.

    Disk Planning Considerations

    This chapter includes information to use the following SCSI disk devices and arrays as shared external disk storage in cluster configurations:

  • SCSI Disk Devices
  • IBM 2104 Expandable Storage Plus
  • IBM 2105 Enterprise Storage Server
  • IIBM pSeries
  • IBM TotalStorage DS6000 Storage Devices
  • IBM TotalStorage DS8000 Storage Devices
  • SCSI Disk Devices

    In an HACMP cluster, shared SCSI disks are connected to the same SCSI bus for the nodes that share the devices. They may be used in both concurrent and non-concurrent modes of access. In a non-concurrent access environment, the disks are owned by only one node at a time. If the owner node fails, the cluster node with the next highest priority in the resource group nodelist acquires ownership of the shared disks as part of fallover processing. This ensures that the data stored on the disks remains accessible to client applications.

    The following restrictions apply to using shared SCSI disks in a cluster configuration:

  • Different types of SCSI busses can be configured in an HACMP cluster. Specifically, SCSI devices can be configured in clusters of up to four nodes, where all nodes are connected to the same SCSI bus attaching the separate device types.
  • You can connect up to sixteen devices to a SCSI bus. Each SCSI adapter, and each disk, is considered a separate device with its own SCSI ID. The maximum bus length for most SCSI devices provides enough length for most cluster configurations to accommodate the full sixteen-device connections allowed by the SCSI standard.
  • Do not connect other SCSI devices, such as CD-ROMs, to a shared SCSI bus.
  • If you mirror your logical volumes across two or more physical disks, the disks should not be connected to the same power supply; otherwise, loss of a single power supply can prevent access to all copies. Plan on using multiple disk subsystem drawers or desk-side units to avoid dependence on a single power supply.
  • With quorum enabled, a two-disk volume group puts you at risk for losing quorum and data access. You can use a forced varyon to help ensure data availability.
  • IBM 2104 Expandable Storage Plus

    The IBM 2104 Expandable Storage Plus (EXP Plus) system provides flexible, scalable, and low-cost disk storage for RS/6000 and pSeries servers in a compact package. It is a good choice for small, two-node clusters.

    EXP Plus:

  • Scales from up to 2055 GB of capacity per drawer or tower to more than 28 TB per rack
  • Supports single or split-bus configuration flexibility to one or two servers
  • Incorporates high-performance Ultra3 SCSI disk storage with 160 MB/sec. throughput
  • Features up to fourteen 10,000 RPM disk drives, with capacities of 9.1 GB, 18.2GB, 36.4 GB and 73.4GB and 146.8GB
  • Requires a low voltage differential SE (LVD/SE) SCSI connection (for example a 6205 Dual Channel Ultra2 SCSI adapter)
  • Can be connected to only two nodes
  • Has two separate SCSI buses
  • Each node has its own SCSI connection to EXP Plus. This eliminates the need to change any adapter IDs.
  • Is internally terminated.
  • HACMP supports 2104 Expandable Storage Plus with the following adapters:

  • 2494
  • 6203
  • 6205
  • 6206
  • 6208.
  • HACMP has not been tested with 2104 Expandable Storage Plus and the 2498 adapter.

    For full information, see the IBM website at the following URL:

    http://www.ibm.com/servers/storage/support/disk/2104/

    IBM 2105 Enterprise Storage Server

    The IBM 2105 Enterprise Storage Server (ESS) provides multiple concurrent attachment and sharing of disk storage for a variety of open systems servers. IBM eServer pSeries processors can be attached, as well as other UNIX and non-UNIX platforms.

    These systems use IBM SSA disk technology internally. A Fibre Channel or a SCSI connection can be used to access the ESS, depending on the setup of the specific ESS system. Nodes using either access mechanism can share volumes, that is one node could use Fibre Channel and another node use a SCSI connection.

    On the ESS, all storage is protected with RAID technology. RAID-5 techniques can be used to distribute parity across all disks in the array. Failover Protection enables one partition, or storage cluster, of the ESS to takeover for the other so that data access can continue.

    HACMP and the IBM 2105 Enterprise Storage Server supports the following adapters for Fibre Channel attachment:

  • Gigabit Fibre Channel Adapter for PCI bus: adapter 6227 with firmware 3.22 A1
  • 2 Gigabit Fibre Channel Adapter for 64-bit PCI bus: adapter 6228 with firmware 3.82 A1.
  • The ESS includes web-based management interface, dynamic storage allocation, and remote services support. For more information on planning, general reference material, and attachment diagrams, see the following URL:

    http://www.storage.ibm.com/disk/ess/index.html

    IBM TotalStorage DS4000 Storage Server

    The DS4000 series (formerly named the FAStT series) has been enhanced to complement the entry and enterprise disk system offerings with the following:

  • DS4000 Storage Manager V9.10, enhanced remote mirror option
  • DS4100 Midrange Disk System (formerly named TotalStorage FAStT100 Storage Server, model 1724-100) for larger capacity configurations
  • EXP100 serial ATA expansion units attached to DS4400s.
  • DS4000 storage servers support 1-2 connections per node over an FC adapter. Use two adapters. DS4000 storage servers support the following adapters:

  • Gigabit Fibre Channel Adapter for PCI bus: adapter 6227 with firmware 3.22 A1
  • 2 Gigabit Fibre Channel Adapter for 64-bit PCI bus: adapter 6228 with firmware 3.82 A1.
  • The IBM DS4400 (formerly FAStT700) delivers superior performance with 2 Gbps Fibre Channel technology. The DS4400 offers protection with advanced functions and flexible facilities. It scales from 36GB to over 32TB to support growing storage requirements and offers advanced replication services.

    HACMP supports BladeCenter JS20 with IBM Total Storage DS4000.

    The IBM TotalStorage DS4300 (formerly FAStT600) is a mid-level disk system that can scale to over eight terabytes of fibre channel disk using 3 EXP700s, over sixteen terabytes of fibre channel disk with the Turbo feature using 7 EXP700s. It uses the latest in storage networking technology to provide an end-to-end 2 Gbps Fibre Channel solution.

    IBM DS4500 (formerly FAStT900) delivers offers up to 67.2TB of fibre channel disk storage capacity with 16 EXP700s or 16 EXP710s. DS4500 offers advanced replication services to support business continuance and disaster recovery.

    The IBM System Storage DS4800 is designed with 4 gigabit per second Fibre Channel interface technology that can support up to 224 disk drives in IBM System Storage EXP810, EXP710, EXP700, or EXP100 disk units. Additionally, the DS4800 supports high-performance Fibre Channel and high-capacity serial ATA (SATA) disk drives.

    For complete information about IBM Storage Solutions, see the following URL:

    http://www.storage.ibm.com

    IBM TotalStorage DS6000 Storage Devices

    HACMP supports IBM TotalStorage DS6000 Series Disk Storage Devices with the applicable APARs installed.

    HACMP/XD supports IBM TotalStorage DSCLI Metro Mirror using DS6000, along with a combination of DS8000 and DS6000 Series Disk Storage Devices.

    The DS6000 series is a Fibre Channel based storage system that supports a wide range of IBM and non-IBM server platforms and operating environments. This includes open systems, zSeries, and iSeries servers.

    IBM TotalStorage DS8000 Storage Devices

    HACMP supports IBM TotalStorage DS8000 Series Disk Storage Devices with the applicable APARs installed.

    HACMP/XD v5 has extended its Metro Mirror support to IBM TotalStorage DS8000 Series Disk Storage Devices.

    IBM TotalStorage DS8000 is a high-performance, high-capacity series of disk storage that is designed to support continuous operations. DS8000 series models consist of a storage unit and one or two management consoles, two being the recommended configuration. For high-availability, hardware components are redundant.

    IIBM pSeries

    The disk systems most often supported as external disk storage in cluster configurations are:

  • IBM eServer p5 models 510, 520, 550 and 570
  • IBM eServer p5 model 575
  • IBM eServer p5 models 590 and 595
  • IBM eServer i5 models 520, 550 and 570 iSeries and pSeries Convergence
  • RS/6000 SP System.
  • HACMP supports the following on the IBM pSeries:

  • micro-partioning under AIX 5.3 on POWER5 systems
  • IBM 2 Gigabit Fibre Channel PCI-X Storage Adapter.
  • IBM eServer p5 models 510, 520, 550 and 570

    HACMP supports the eServer p5 models 510, 520, 550, 570, and 575running AIX 5L v.5.2 and up with applicable APARs installed.

    The eServer p5 Express family uses the IBM POWER5™ microprocessor. The POWER5 processor can run both 32- and 64-bit applications simultaneously. Dynamic logical partitioning (LPAR), helps assign system resources (processors, memory and I/O) for faster, non-disruptive response to changing workload requirements. This allows automatic, non-disruptive balancing of processing power between partitions resulting in increased throughput and consistent response times.

    Be aware that there are some limitations to the supported configuration.

  • Virtual SCSI (VSCSI) requires the use of Enhance Concurrent mode, which involves some limitations, but may be an acceptable solution. For more information, see:
  • http://www.ibm.com/support/techdocs/atsmastr.nsf/Web/TechDocs
  • The p5 520, p5 550, p5 570, and p5 575 integrated serial ports are not enabled when the HMC ports are connected to a Hardware Management Console. Either the HMC ports or the integrated serial ports can be used, but not both. Moreover, the integrated serial ports are supported only for modem and async terminal connections. Any other applications using serial ports, including HACMP, require a separate serial port adapter to be installed in a PCI slot.
  • Since there are no integrated serial ports on the p5-575, HACMP can use alternative methods for non-IP heartbeating through asynchronous io adapters and disk heartbeating. HACMP is able to use the four integrated ethernet ports designated for application use.
  • For information on HACMP support of virtualization (VLAN, VSCSI), see:
  • http://www.ibm.com/support/techdocs/atsmastr.nsf/Web/TechDocs
  • HACMP support of DLPAR requires Apar IY69525
  • IBM eServer p5 model 575

    HACMP supports the IBM p5 575 (9118-575) high-bandwidth cluster node with applicable APARs installed.

    The p5 575 delivers an 8-way, 1.9 GHz POWER5 high-bandwidth cluster node, ideal for many high-performance computing applications. The p5 575 is packaged in a dense 2U form factor, with up to 12 nodes installed in a 42U-tall, 24-inch rack. Multiple racks of p5-575 nodes can be combined to provide a broad range of powerful cluster solutions. Up to 16 p5-575 nodes can be clustered together for a total of 128 processors.

    The following lists limitations to the supported eServer p5 model 575:

  • There are no integrated serial ports on the p5 575. HACMP can use alternative methods for non-IP heartbeating through asynchronous I/O adapters and disk heartbeating.
  • HACMP is able to use the four integrated ethernet ports designated for application use.
  • HACMP does not support DLPAR or Virtual LAN (VLAN) on p5 575 at this time.
  • HACMP support of Virtual SCSI (VSCSI) requires the use Enhanced Concurrent mode, which involves some limitations but may be an acceptable solution for some customers.
  • IBM eServer p5 models 590 and 595

    HACMP 5.2 and above support the IBM eServer p5-590 and IBM eServer p5 595. The p5 590 and p5-595 servers are powered by the IBM 64-bit Power Architecture™ microprocessor—the IBM POWER5™ microprocessor—with simultaneous multi-threading that makes each processor function as two to the operating system. The p5-595 features a choice of IBMs fastest POWER5 processors running at 1.90 GHz or 1.65 GHz, while the p5-590 offers 1.65 GHz processors.

    These servers come standard with mainframe-inspired reliability, availability, serviceability (RAS) capabilities and IBM Virtualization Engine™ systems technology with Micro-Partitioning™. Micro-Partitioning allows as many as ten logical partitions (LPARs) per processor to be defined. Both systems can be configured with up to 254 virtual servers with a choice of AIX 5L™, Linux, and i5/OS™ operating systems in a single server, opening the door to vast cost-saving consolidation opportunities.

    Beginning with HACMP 5.3, support of VSCSI requires the use Enhanced Concurrent mode, which involves some limitations but may be an acceptable solution for some customers. A whitepaper describing the capabilities and use of Enhanced Concurrent mode in a VSCSI environment is available from the IBM website.

    HACMP supports the following adapters for Fibre Channel attachment:

  • FC 6239 or FC 5716 2 Gigabit Fibre Channel - PCI-X
  • IBM eServer i5 models 520, 550 and 570 iSeries and pSeries Convergence

    HACMP 5.1 and up supports the IBM eServer i5, which is a hardware platform of iSeries and pSeries convergence, on which you can run native AIX 5L v.5.2 or v.5.3 with its own kernel (versus current PASE's SLIC kernel) in an LPAR partition. This gives an excellent alternative to consolidate AIX 5L applications and other UNIX-based applications, running in a separate pSeries box or other UNIX box, onto a single i5 platform.

    HACMP 5.1 up supports the new POWER5-based IBM i5 520, 550, and 570 servers running either AIX 5L v.5.2 or v.5.3 with applicable APARs installed.

    HACMP supports the following adapters for Fibre Channel attachment:

  • FC 6239 or FC 5716 2 Gigabit Fibre Channel - PCI-X
  • There are some limitations to the supported configurations of HACMP/AIX on iSeries i5 systems to consider:

  • On i5 hardware, HACMP will run only in LPARs that are running supported releases of AIX 5L. In addition, I/O (LAN and disk connections) must be directly attached to the LPAR(s) in which HACMP runs. I/O, which is intended for use with HACMP, is limited to that which is listed as supported in the HACMP Sale Manual 5765-F62.
  • HACMP also supports AIX 5L partitions on i5 servers provided the AIX 5L partitions running HACMP have dedicated I/O.
  • HACMP does not support micro-partitioning, Virtual SCSI (VSCSI) or Virtual LAN (VLAN) on i5 models this time.
  • The i5 520, 550, and 570 integrated serial ports are not enabled when the HMC ports are connected to Hardware Management Console. Either the HMC ports or the integrated serial ports can be used, but not both. Moreover, the integrated serial ports are supported only for modem and async terminal connections. Any other applications using serial ports, including HACMP, require a separate serial port adapter to be installed in a PCI slot.

    RS/6000 SP System

    The SP is a parallel processing machine that includes from two to 128 processors connected by a high-performance switch. The SP leverages the outstanding reliability provided by the RS/6000 series by including many standard RS/6000 hardware components in its design. The SP’s architecture then extends this reliability by enabling processing to continue following the failure of certain components. This architecture allows a failed node to be repaired while processing continues on the healthy nodes. You can even plan and make hardware and software changes to an individual node while other nodes continue processing.

    IBM Serial Storage Architecture Disk Subsystem

    Serial Storage Architecture (SSA) enables you to minimize single points of failure and achieve high availability in an HACMP environment.

    You can use IBM 7133 and 7131-405 SSA disk subsystems as shared external disk storage devices to provide concurrent access in an HACMP cluster configuration.

    The AIX V5.2 MPIO facility can be used to access disk subsystems through multiple paths. Multiple paths can provide both more throughput and higher availability than use of a single path. In particular, when multiple paths are used, failure of a single path due to an adapter, cable or switch failure will not cause applications to lose access to data. While HACMP will attempt to recover from complete loss of access to a volume group, that loss itself is going to be temporarily disruptive. The AIX V5.2 MPIO facility can prevent a single component failure from causing an application outage. When a shared volume group in an HACMP cluster is accessed through MPIO, it must be defined as an enhanced concurrent volume group.

    SSA is hot pluggable. Consequently, if you include SSA disks in a volume group using LVM mirroring, you can replace a failed disk drive without powering off the entire system.

    The following figure shows the basic SSA loop configuration:

    Basic SSA Loop Configuration 
    

    Disk Power Supply Considerations

    Reliable power sources are critical for a highly available cluster. Each mirrored disk chain in the cluster should have a separate power source. As you plan the cluster, make sure that the failure of any one power source (PDU, power supply, or building circuit) does not disable more than one node or mirrored chain.

    SCSI Device Power Considerations

    If the cluster site has a multiple phase power supply, ensure that the cluster nodes are attached to the same power phase. Otherwise, the ground will move between the systems across the SCSI bus and cause write errors.

    The bus and devices shared between nodes are subject to the same operational power surge restrictions as standard SCSI systems. Uninterruptible power supply (UPS) devices are necessary for preventing data loss. When power is first applied to a SCSI device, the attached bus, if actively passing data, may incur data corruption. You can avoid such errors by briefly halting data transfer operations on the bus while a device (disk or adapter) is turned on. For example, if cluster nodes are installed on two different power grids and one node has a power surge that causes it to reboot, the surviving node may lose data if a data transfer is active.

    The IBM DS4000 series (formerly named the FAStT series) series, the IBM 2104 Expandable Storage Plus, and the IBM 2105 Enterprise Storage Servers are less prone to power supply problems because they have redundant power supplies.

    IBM SSA Disk Subsystem Power Considerations

    Clusters with IBM SSA disk subsystems are less prone to power supply problems because they have redundant power supplies.

    Planning for Non-Shared Disk Storage

    Keep the following considerations in mind regarding non-shared disk storage:

  • Internal disks. The internal disks on each node in a cluster must provide sufficient space for:
  • AIX 5L software (approximately 500 MB)
  • HACMP software (approximately 50 MB for a server node)
  • Executable modules of highly available applications.
  • Root volume group. The root volume group for each node must not reside on the shared SCSI bus.
  • AIX 5L Error Notification Facility. Use the AIX 5L Error Notification Facility to monitor the rootvg on each node. Problems with the root volume group can be promoted to node failures.
  • For more information about using the Error Notification facility, see the chapter on Configuring AIX 5L for HACMP in the Installation Guide.
  • Disk adapter use. Because shared disks require their own adapters, you cannot use the same adapter for both a shared and a non-shared disk. The internal disks on each node require one SCSI adapter apart from any other adapters within the cluster.
  • Volume group use. Internal disks must be in a different volume group from the external shared disks.
  • The executable modules of the highly available applications should be on the internal disks and not on the shared external disks, for the following reasons:

  • Licensing
  • Application startup.
  • Licensing

    Vendors may require that you purchase a separate copy of each application for each processor or multi-processor that may run it, and protect the application by incorporating processor-specific information into the application when it is installed.

    Thus, if you are running your application executable from a shared disk, it is possible that after a fallover, HACMP will unable to restart the application on another node, because, for example, the processor ID on the new node does not matching the ID of the node on which the application was installed.

    The application may also require that you purchase what is called a node-bound license, that is, a license file on each node that contains information specific to the node.

    There may also be a restriction on the number of floating (available to any cluster node) licenses available within the cluster for that application. To avoid this problem, be sure that there are enough licenses for all processors in the cluster that may potentially run an application at the same time.

    Starting Applications

    Applications may contain configuration files that you can customize during installation and store with the application files. These configuration files usually store information, such as pathnames and log files, that are used when the application starts.

    You may need to customize your configuration files if your configuration requires both of the following:

  • You plan to store these configuration files on a shared filesystem
  • The application cannot use the same configuration on every fallover node.
  • This is typically the case in instances where the application typically runs on more than one node, with different configurations. For example, in a two-node mutual takeover configuration, both nodes may be running different instances of the same application, and standing by for one another. Each node must be aware of the location of configuration files for both instances of the application, and must be able to access them after a fallover. Otherwise, the fallover will fail, leaving critical applications unavailable to clients.

    To decrease how much you will need to customize your configuration files, place slightly different startup files for critical applications on local filesystems on either node. This allows the initial application parameters to remain static; the application will not need to recalculate the parameters each time it is called.

    Planning a Shared SCSI Disk Installation

    The following sections summarize the basic hardware components required to set up an HACMP cluster that includes the following shared storage systems:

  • HACMP and Virtual SCSI
  • Disk Adapters
  • Cables
  • Sample IBM 2104 Expandable Storage Plus Configuration
  • Sample IBM DS4000 Storage Server Configuration
  • Sample IBM 2105 Enterprise Storage Server Configuration
  • Your cluster requirements depend on the configuration you specify. To ensure that you account for all required components, complete a diagram for your system. In addition, consult the hardware manuals for detailed information about cabling and attachment for the particular devices you are configuring.

    HACMP and Virtual SCSI

    HACMP supports VIO (virtual I/O) SCSI with the applicable APARs installed. The following restrictions apply to using Virtual SCSI (VSCSI) in a cluster configuration:

  • The volume group must be defined as “Enhanced Concurrent Mode.” In general, Enhanced Concurrent Mode is the recommended mode for sharing volume groups in HACMP clusters because volumes are accessible by multiple HACMP nodes, resulting in faster failover in the event of a node failure.
  • If file systems are used on the standby nodes, they are not mounted until the point of fallover so accidental use of data on standby nodes is impossible.
  • If shared volumes are accessed directly (without file systems) in Enhanced Concurrent Mode, these volumes are accessible from multiple nodes so access must be controlled at a higher layer such as databases.
  • If any cluster node accesses shared volumes through VSCSI, all nodes must do so. This means that disks cannot be shared between an LPAR using VSCSI and a node directly accessing those disks.
  • From the point of view of the VIO server, physical disks (hdisks) are shared, not logical volumes or volume groups.
  • All volume group construction and maintenance on these shared disks is done from the HACMP nodes, not from the VIO server.
  • Disk Adapters

    Remove any SCSI terminators on the adapter card. Use external terminators in an HACMP cluster. If you terminate the shared SCSI bus on the adapter, you lose termination when the cluster node that contains the adapter fails.

    For a complete list of supported disk adapters, see the section see the following URL:

    http://www.ibm.com/common/ssi

    After selecting your country and language, select HW and SW Desc (Sales Manual, RPQ) for a Specific Information Search.

    Cables

    The cables required to connect nodes in your cluster depend on the type of SCSI bus you are configuring. Select cables that are compatible with your disk adapters and controllers. For information on the type and length SCSI cable required, see the hardware documentation that accompanies each device you want to include on the SCSI bus.

    Sample IBM 2104 Expandable Storage Plus Configuration

    The following figure shows a sample two-node configuration using the 2104 Expandable Storage Plus system.

    Note: Configuration for SCSI connections from other storage systems would resemble this one.

    Two-Node Expandable Storage Plus System 
    

    Sample IBM DS4000 Storage Server Configuration

    The following figure shows a recommended configuration for high availability when using a DS4000 Storage server (formerly FAStT) in an HACMP environment:

    DS4000 Storage Server Environment 
    

    Sample IBM 2105 Enterprise Storage Server Configuration

    Information on the IBM website includes several diagrams for the IBM 2105 Enterprise Storage Server. Search for the documentation for this model from the following URL:

    http://www.storage.ibm.com/solutions/index.html

    Using ESS Functions for High Availability

    When using the ESS in an HACMP environment, use the following:

  • Use the Sparing function to assign disks as spares and reduce the exposure to data loss. When the ESS detects that a disk is failing, it transfers the data from the failing disk to a spare device. You are required to specify at least one disk as a spare per drawer; however you can specify two spares to a drawer for increased availability.
  • Configure the two host interface cards in a bay to device interface cards in the same bay.
  • Configure the SCSI ports on the same interface card to the same partition of the ESS.
  • If you are using Switched Fabric:

  • Fibre Channel Switch must be configured with host World Wide Node Name (WWN)
  • Zoning (similar to routing) must be configured in the switch.
  • Planning a Shared IBM SSA Disk Subsystem Installation

    This section describes using SSA disks with HACMP. It supplements the IBM documentation that covers the specific SSA disk subsystem hardware you are using.

    AIX 5L and HACMP Levels

    On nodes running AIX 5L v.5.2, the C-SPOC utility does not allow new SSA concurrent mode volume groups to be created. (If you upgraded from previous versions of HACMP, you can use existing volume groups in SSA concurrent mode, but C-SPOC does not allow to create new groups of this type.) You can convert these volume groups to enhanced concurrent mode.

    On nodes running AIX 5L v.5.3, convert all volume groups to enhanced concurrent mode.

    Disk Adapters

    See the IBM manual for your hardware to see how to connect SSA disk subsystems to nodes.

    Note: The SSA 4-port adapter (feature code 6218, type 4-J) is not suitable for use with HACMP because it supports only one adapter per loop.

    Advanced SerialRAID Adapter (Feature Code 6225, Type 4-P)

    The 6225 SSA adapter (also called an eight-way adapter) can support SSA loops containing up to 8 eight-way adapters per loop. Most multi-node configurations set up with a minimal number of single points of failure require eight-way adapters. If any drives in the loop are configured for RAID 5, only two adapters can be used in the loop.

    These adapters must be at microcode level 1801 or later.

    SSA Multi-Initiator RAID/EL Adapters (Feature Codes 6215 Type 6-N)

    If the fast write cache or RAID functions of the adapters are used, no other adapter can be connected in an SSA loop with this adapter. If those functions are used, a second SSA Multi-Initiator RAID/EL adapter can be connected in the loop.

    Identifying Disk Adapters

    The two-way and eight-way disk adapters look the same, but their microcode is different. The easiest way to distinguish between these adapters is to install it in a machine and run either of the following commands:

    lsdev -Cc adapter

    or

    lscfg -vl ssaX

    where X is the adapter number.

    These commands provide identifying information about the microcode.

    Bypass Cards

    The 7133 Models T40 and D40 disk subsystems contain four bypass cards. Each bypass card has two external SSA connectors. Through these, you connect the bypass cards and, therefore, the disk drive module strings to each other or to a node.

    The bypass cards can operate in either bypass or forced inline mode.

    Bypass Mode

    When you set its jumpers so a bypass card operates in bypass mode, it monitors both of its external connections. If it detects that one of its connectors is connected to a powered-on SSA adapter or device, it switches to inline mode; that is, it connects the internal SSA links to the external connector. This effectively heals the break in the SSA loop.

    If the bypass card detects that neither of its connectors is connected to a powered-on SSA adapter or device, it switches to bypass state; that is, it connects the internal disk strings and disconnects them from the external connector.

    Forced Inline Mode

    When you set its jumpers so a bypass card operates in forced inline mode, it behaves permanently like a signal card of Models 010 and 500; that is, none of its electronic switching circuits are in use. Its internal SSA links connect to the external connector and can never make an internal bypass connection.

    Using SSA Facilities for High Availability

    This section describes how you can use SSA facilities to make your system highly available.

    SSA Loops

    Configure so that all SSA devices are in a loop, not just connected in a string. Although SSA devices function connected in a string, a loop provides two paths of communications to each device for redundancy. The adapter chooses the shortest path to a disk.

    SSA Fiber Optic Extenders

    The SSA Fiber Optic Extenders use cables up to 2.4 Km to replace a single SSA cable. The SSA Fiber Optic Extender (Feature code 5500) is supported on all Model 7133 disk subsystems.

    Using Fiber Optic extender, you can make the distance between disks greater than the LAN allows. If you do so, you cannot use routers and gateways. Consequently, under these circumstances, you cannot form an HACMP cluster between two LANs.

    Daisy-chaining the Adapters

    In each node, for each loop including that node, daisy-chain all its adapters. The SSAR router device uses another adapter when it detects that one adapter has failed. You need only one bypass switch for the whole daisy chain of adapters in the node rather than a bypass switch for each individual adapter.

    Bypass Cards in the 7133, Models D40 and T40 Disk Subsystems

    Bypass cards maintain high availability when a node fails, when a node is powered off, or when the adapter(s) of a node fail. Connect the pair of ports of one bypass card into the loop that goes to and from one node. That is, connect the bypass card to only one node. If you are using more than one adapter in a node, remember to daisy-chain the adapters.

    Avoid two possible conditions when a bypass card switches to bypass mode:

  • Do not connect two independent loops through a bypass card. When the bypass card switches to bypass mode, you want it to reconnect the loop inside the 7133 disk subsystem, rather than connecting two independent loops. So both ports of the bypass card must be in the same loop.
  • Dummy disks are connectors used to fill out the disk drive slots in a 7133 disk subsystem so the SSA loop can continue unbroken. Make sure that when a bypass card switches to bypass mode, it connects no more than three dummy disks consecutively in the same loop. Put the disks next to the bypass cards and dummy disks between real disks.
  • Configuring to Minimize Single Points of Failure

    To minimize single points of failure, consider the following points:

  • Use logical volume mirroring and place logical volume mirror copies on separate disks and in separate loops using separate adapters. In addition, it is a good idea to mirror between the front row and the back row of disks or between disk subsystems.
  • Avoid having the bypass card itself be a single point of failure by using one of the following mechanisms:
  • With one loop. Put two bypass cards into a loop connecting to each node.
  • With two loops. Set up logical volume mirroring to disks in a second loop. Set each loop to go through a separate bypass card to each node.
  • Set the bypass cards to forced inline mode for the following configurations:

  • When connecting multiple 7133 disk subsystems.
  • When the disk drives in one 7133 Model D40 or Model T40 are not all connected to the same SSA loop. In this type of configuration, forced inline mode removes the risk of a fault condition, namely, that a shift to bypass mode may cause the disk drives of different loops to be connected.
  • Configuring for Optimal Performance

    The following guidelines can help you configure your system for optimal performance:

  • Review multiple nodes and SSA domains:
  • A node and the disks it accesses make up an SSA domain. For configurations containing shared disk drives and multiple nodes, minimize the path length from each node to the disk drives it accesses. Measure the path length by the number of disk drives and adapters in the path. Each device has to receive and forward the packet of data.
  • With multiple adapters in a loop, put the disks near the closest adapter and make that the one that access the disks. In effect, try to keep I/O traffic within the SSA domain. Although any host can access any disk it is best to minimize I/O traffic crossing over to other domains.
  • When multiple hosts are in a loop, set up the volume groups so that a node uses the closest disks. This prevents one node’s I/O from interfering with another’s.
  • Distribute read and write operations evenly throughout the loops.
  • Distribute disks evenly among the loops.
  • Download microcode when you replace hardware.
  • To ensure that everything works correctly, install the latest filesets, fixes, and microcode for your disk subsystem.

    Testing Loops

    Test loops in the following way:

  • Test all loop scenarios thoroughly, especially in multiple-node loops. Test for loop breakage (failure of one or more adapters).
  • Test bypass cards for power loss in adapters and nodes to ensure that they follow configuration guidelines.
  • Planning for RAID and SSA Concurrent Volume Groups

    When concurrent volume groups are created on AIX 5L v.5.2 and up, they are created as enhanced concurrent mode volume groups by default. You should convert SSA concurrent volume groups to enhanced concurrent mode whenever possible to make use of its flexibility.

    RAID concurrent mode volume groups are now functionally obsolete, since enhanced concurrent mode provides extra capabilities, but RAID will continue to be supported for some time. HACMP supports both RAID and SSA concurrent mode volume groups with some important limitations:

  • A concurrent resource group that includes a node running a 64-bit kernel requires enhanced concurrent mode for any volume groups.
  • SSA concurrent mode is not supported on SSA disks with a 64-bit kernel.
  • SSA concurrent mode is supported on SSA disks with a 32-bit kernel.
  • The C-SPOC utility does not work with RAID concurrent volume groups. Convert these volume groups to enhanced concurrent mode. (Otherwise, AIX 5L sees them an non-concurrent).
  • If you specify the use of SSA disk fencing in your concurrent resource group, HACMP assigns the node numbers when you synchronize the resources.

    Note: The node number on a given node should match the HACMP node_id. The following command retrieves the HACMP node_id:
    odmget -q “name = node_name

    However, assign unique non-zero node numbers with ssar on each cluster node:

  • If you plan to create concurrent volume groups on SSA disk subsystem
  • If you do not specify the use of SSA disk fencing in your concurrent resource group.
  • For the steps to assign a node number, see Enabling SSA Disk Fencing.

    SSA Disk Fencing in Concurrent Access Clusters

    Preventing data integrity problems that can result from the loss of TCP/IP network communication is especially important in concurrent access configurations where multiple nodes have simultaneous access to a shared disk. Chapter 3: Planning Cluster Network Connectivity describes using HACMP-specific point-to-point networks to prevent partitioned clusters.

    Concurrent access configurations using SSA disk subsystems can also use disk fencing to prevent data integrity problems that can occur in partitioned clusters. Disk fencing can be used with enhanced concurrent mode.

    The SSA disk subsystem includes fence registers, one per disk, capable of permitting or disabling access by each of the 32 possible connections. Fencing provides a means of preventing uncoordinated disk access by one or more nodes.

    The SSA hardware has a fencing command for automatically updating the fence registers. This command provides a tie-breaking function within the controller for nodes independently attempting to update the same fence register. A compare-and-swap protocol of the fence command requires that each node provide both the current and desired contents of the fence register. If competing nodes attempt to update a register at about the same time, the first succeeds, but the second fails because it does not know the revised contents.

    Benefits of Disk Fencing

    Disk fencing provides the following benefits to concurrent access clusters:

  • It enhances data security by preventing nodes that are not active members of a cluster from modifying data on a shared disk. By managing the fence registers, the HACMP software can ensure that only the designated nodes within a cluster have access to shared SSA disks.
  • It enhances data reliability by assuring that competing nodes do not compromise the integrity of shared data. By managing the fence registers HACMP can prevent uncoordinated disk management by partitioned clusters. In a partitioned cluster, communication failures lead separate sets of cluster nodes to believe they are the only active nodes in the cluster. Each set of nodes attempts to take over the shared disk, leading to race conditions. The disk fencing tie-breaking mechanism arbitrates race conditions, ensuring that only one set of nodes gains access to the disk.
  • SSA Disk Fencing Implementation

    The HACMP software manages the content of the fence registers. At cluster configuration, the fence registers for each shared disk are set to allow access for the designated nodes. As cluster membership changes as nodes enter and leave the cluster, the event scripts call the cl_ssa_fence utility to update the contents of the fence register. If the fencing command succeeds, the script continues processing. If the operation fails, the script exits with failure, causing the cluster to go into reconfiguration.

    For information on how HACMP processes SSA disk fencing and how this is reflected in the hacmp.out file, see the section JOB_TYPE= SSA_FENCE in Chapter 2: Using Cluster Log Files in the Troubleshooting Guide.

    Disk Fencing with SSA Disks in Concurrent Mode

    You can only use SSA disk fencing under these conditions:

  • Only disks contained in concurrent mode volume groups will be fenced.
  • All nodes of the cluster must be configured to have access to these disks and to use disk fencing.
  • All resource groups with the disk fencing attribute enabled must be concurrent access resource groups.
  • Concurrent access resource groups must contain all nodes in the cluster. The verification utility issues an error if disk fencing is activated and the system finds nodes that are not included in the concurrent resource group.
  • The purpose of SSA disk fencing is to provide a safety lockout mechanism for protecting shared SSA disk resources in the event that one or more cluster nodes become isolated from the rest of the cluster.

    Concurrent mode disk fencing works as follows:

  • The first node up in the cluster fences out all other nodes of the cluster from access to the disks of the concurrent access volume group(s) for which fencing is enabled, by changing the fence registers of these disks.
  • When a node joins a cluster, the active nodes in the cluster allow the joining node access by changing the fence registers of all disks participating in fencing with the joining node.
  • When a node leaves the cluster, regardless of how it leaves, the remaining nodes that share access to a disk with the departed node should fence out the departed node as soon as possible.
  • If a node is the last to leave a cluster, whether the cluster services are stopped with resource groups brought offline or placed in an UNMANAGED state, it clears the fence registers to allow access by all nodes. Of course, if the last node stops unexpectedly (is powered off or crashes, for example), it does not clear the fence registers. In this case, manually clear the fence registers using the appropriate SMIT options. For more information, see Chapter 1: Troubleshooting HACMP Clusters in the Troubleshooting Guide.
  • Enabling SSA Disk Fencing

    The process of enabling SSA disk fencing for a concurrent resource group requires that all volume groups containing SSA disks on cluster nodes must be varied off and the cluster must be down when the cluster resources are synchronized. Note that this means all volume groups containing any of the SSA disks whether concurrent or non-concurrent, whether configured as part of the cluster or not, must be varied off for the disk fencing enabling process to succeed during the synchronization of cluster resources. If these conditions are not met, you have to reboot the nodes to enable fencing.

    Note: If disk fencing is enabled and not all nodes are included in the concurrent access resource group, you receive an error upon verification of cluster resources.

    The process of disk fencing enabling takes place on each cluster node as follows:

    Assign a node_number to the ssar that matches the node_id of the node in the HACMP configuration. Do the following to assign node numbers:

      1. Issue the command:
    chdev -l ssar -a node_number=x

    where x is the number to assign to that node.

    Any node_numbers, set before enabling disk fencing for purposes of replacing a drive or C-SPOC concurrent LVM functions, will be changed for disk fencing operations. The other operations will not be affected by this node_number change.
      2. First remove, then remake all hdisks, pdisks, ssa adapter, and tmssa devices of the SSA disk subsystem seen by the node, thus picking up the node_number for use in the fence register of each disk.
      3. Reboot the system.

    Disk Fencing and Dynamic Reconfiguration

    When a node is added to the cluster through dynamic reconfiguration while cluster nodes are up, the disk fencing enabling process is performed on the added node only, during the synchronizing of topology.

    Any node_numbers that were set before enabling disk fencing (for purposes of replacing a drive or C-SPOC concurrent LVM functions) will be changed for disk fencing operations. Therefore, when initially setting SSA disk fencing in a resource group, the resources must be synchronized while the cluster is down. The other operations will not be affected by this node_number change.

    Completing the Disk Worksheets

    After determining the disk storage technology you will include in your cluster, complete all of the appropriate worksheets as follows:

  • Completing the Shared SCSI Disk Worksheet
  • Completing the Shared SCSI Disk Array Worksheet
  • Completing the IBM SSA Disk Subsystems Worksheet.
  • Completing the Shared SCSI Disk Worksheet

    Complete a Shared SCSI Disk Worksheet for each shared SCSI disk array.

    To complete a Shared SCSI Disk Worksheet:

      1. Enter the Cluster name in the appropriate field. This information was determined in Chapter 2: Initial Cluster Planning.
      2. Check the appropriate field for the type of SCSI bus.
      3. Fill in the host and adapter information including the node name, the number of the slot in which the disk adapter is installed and the logical name of the adapter, such as scsi0. AIX 5L assigns the logical name when the adapter is configured.
      4. Determine the SCSI IDs for all the devices connected to the SCSI bus.
      5. Record information about the disk drives available over the bus, including the logical device name of the disk on every node. (This hdisk name is assigned by AIX 5L when the device is configured and may vary on each node.)

    Completing the Shared SCSI Disk Array Worksheet

    Complete a Shared SCSI Disk Array Worksheet for each shared SCSI disk array.

    To complete the Shared IBM SCSI Disk Arrays Worksheet:

      1. Enter the Cluster name in the appropriate field. This information was determined in Chapter 2: Initial Cluster Planning.
      2. Fill in the host and adapter information including the node name, the number of the slot in which the disk adapter is installed and the logical name of the adapter, such as scsi0. AIX 5L assigns the logical name when the adapter is configured.
      3. Assign SCSI IDs for all the devices connected to the SCSI bus. For disk arrays, the controller on the disk array are assigned the SCSI ID.
      4. Record information about the LUNs configured on the disk array.
      5. Record the logical device name AIX 5L assigned to the array controllers.

    Completing the IBM SSA Disk Subsystems Worksheet

    Complete an IBM SSA Disk Subsystems Worksheet for each shared SSA configuration.

    To complete the Shared IBM Serial Storage Architecture Disk Subsystems Worksheet:

      1. Enter the Cluster name in the appropriate field. This information was determined in Chapter 2: Initial Cluster Planning.
      2. Fill in the host and adapter information including the node name, the SSA adapter label, and the number of the slot in which the disk adapter is installed. Include dual-port number of the connection. This will be needed to make the loop connection clear.

    Adding the Disk Configuration to the Cluster Diagram

    Once you have chosen a disk technology, add your disk configuration to the cluster diagram you started in Chapter 2: Initial Cluster Planning.

    For the cluster diagram, draw a box representing each shared disk; then label each box with a shared disk name.

    Planning for Tape Drives as Cluster Resources

    You can configure a tape drive as a cluster resource, making it highly available to multiple nodes in a cluster. SCSI streaming tape drives and Direct Fibre Channel Tape unit attachments are supported. Management of shared tape drives is simplified by the following HACMP functionality:

  • Configuration of tape drives using SMIT
  • Verification of proper configuration of tape drives
  • Automatic management of tape drives during resource group start and stop operations
  • Reallocation of tape drives on node failure and node recovery
  • Controlled reallocation of tape drives on cluster shutdown
  • Controlled reallocation of tape drives during dynamic reconfiguration.
  • For information about completing the Shared Tape Drive Worksheet, see the section on Installing and Configuring Shared Tape Drives in the chapter on Configuring Installed Hardware in the Installation Guide.

    Limitations

    Note the following as you plan to include tape drives as cluster resources:

  • Support is limited to SCSI or Direct Fibre Channel tape drives that have hardware reserve and hardware reset/release functions.
  • A tape loader/stacker is treated like a simple tape drive by HACMP.
  • No more than two cluster nodes can share the tape resource.
  • Tape resources may not be part of concurrent resource groups.
  • The tape drive must have the same name (for example, /dev/rmt0) on both nodes sharing the tape device.
  • When a tape special file is closed, the default action is to release the tape drive. HACMP is not responsible for the state of the tape drive once an application has opened the tape.
  • No means of synchronizing tape operations and application servers is provided. If you decide that a tape reserve/release should be done asynchronously, provide a way to notify the application server to wait until the reserve/release is complete.
  • Tape drives with more than one SCSI interface are not supported. Therefore, only one connection exists between a node and a tape drive. The usual functionality of adapter fallover does not apply.
  • Reserving and Releasing Shared Tape Drives

    When a resource group with tape resources is activated, the tape drive is reserved to allow its exclusive use. This reservation is held until an application releases it, or the node is removed from the cluster:

  • When the special file for the tape is closed, the default action is to release the tape drive. An application can open a tape drive with a “do not release on close” flag. HACMP will not be responsible for maintaining the reservation after an application is started.
  • Upon stopping cluster services on a node and bringing resource groups offline, the tape drive is released, allowing access from other nodes.
  • Upon unexpected node failure, a forced release is done on the takeover node. The tape drive is then reserved as part of resource group activation.
  • Setting Tape Drives to Operate Synchronously or Asynchronously

    If a tape operation is in progress when a tape reserve or release is initiated, it may take many minutes before the reserve or release operation completes. HACMP allows synchronous or asynchronous reserve and release operations. Synchronous and asynchronous operation is specified separately for reserve and release.

    Synchronous Operation

    With synchronous operation, (the default value), HACMP waits for the reserve or release operation, including the execution of a user defined recovery procedure, to complete before continuing.

    Asynchronous Operation

    With asynchronous operation, HACMP creates a child process to perform the reserve or release operation, including the execution of a user defined recovery procedure, and immediately continues.

    Recovery Procedures

    Recovery procedures are highly dependent on the application accessing the tape drive. Rather than trying to predict likely scenarios and develop recovery procedures, HACMP provides for the execution of user defined recovery scripts for the following:

  • Tape start
  • Tape stop.
  • Tape Start Scripts and Stop Scripts

    Tape start and stop occurs during node start and stop, node fallover and reintegration, and dynamic reconfiguration. These scripts are called when a resource group is activated (tape start) or when a resource group is deactivated (tape stop). Sample start and stop scripts can be found in the /usr/es/sbin/cluster/samples/tape directory:

    tape_resource_start_example  
    tape_resource_stop_example  
    
  • During tape start, HACMP reserves the tape drive, forcing a release if necessary, and then calls the user-provided tape start script.
  • During tape stop, HACMP calls the user-provided tape stop script, and then releases the tape drive.
  • Note: You are responsible for correctly positioning the tape, terminating processes or applications writing to the tape drive, writing end of tape marks, etc., within these scripts.

    Other application-specific procedures should be included as part of the start server and stop server scripts.

    Adapter Fallover and Recovery

    Tape drives with more than one SCSI interface are not supported. Therefore, only one connection exists between a node and a tape drive. The usual notion of adapter fallover does not apply.

    Node Fallover and Recovery

    If a node that has tape resources that are part of an HACMP resource group fails, the takeover node will reserve the tape drive, forcing a release if necessary, and then calls the user-provided tape start script.

    On reintegration of a node, the takeover node runs the tape stop script and then releases the tape drive. The node being reintegrated reserves the tape drive and calls the user-provided tape start script.

    Network Fallover and Recovery

    HACMP does not provide tape fallover and recovery procedures for network failure.

    Where You Go from Here

    You have now planned your shared disk configuration. The next step is to plan the shared volume groups for your cluster. This step is described in Chapter 5: Planning Shared LVM Components.


    PreviousNextIndex