One blueprint11 for recovery planning describes a scheme consisting of six tiers of off-site recoverability (tiers 1--6), with a seventh tier (tier 0) that relies on local recovery only, with no off-site backup. The tiers cover a full range of recovery options, ranging from no data moved off-site to full off-site copies with no loss of data. The following figures and text describe them from a CICS® perspective.
Figure 40 summarizes the tier 0 solution.
Tier 0 is defined as having no requirements to save information off-site, establish a backup hardware platform, or develop a disaster recovery plan. Tier 0 is the no-cost disaster recovery solution.
Any disaster recovery capability would depend on recovering on-site local records. For most true disasters, such as fire or earthquake, you would not be able to recover your data or systems if you implemented a tier 0 solution.
Figure 41 summarizes the tier 1 solution.
Tier 1 is defined as having:
Your disaster recovery plan has to include information to guide the staff responsible for recovering your system, from hardware requirements to day-to-day operations.
The backups required for off-site storage must be created periodically. After a disaster, your data can only be as up-to-date as the last backup--daily, weekly, monthly, or whatever period you chose--because your recovery action is to restore the backups at the recovery site (when you have one).
This method may not meet your requirements if you need your online systems to be continuously available.
The major benefit of tier 1 is the low cost. The major costs are the storage site and transportation.
The drawbacks are:
Tier 1 provides a very basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount. However, tier 1 allows you to recover and provide some form of service at low cost. You must assess whether the loss of data and the time taken to restore a service will prevent your company from continuing in business.
Figure 42 summarizes the tier 2 solution.
Tier 2 is similar to tier 1. The difference in tier 2 is that a secondary site already has the necessary hardware installed, which can be made available to support the vital applications of the primary site. The same process is used to backup and store the vital data; therefore the same availability issues exist at the primary site as for tier 1.
The benefits of tier 2 are the elimination of the time it takes to obtain and setup the hardware at the secondary site, and the ability to test your disaster recovery plan.
The drawback is the expense of providing, or contracting for, a ‘hot’ standby site.
Tier 2, like tier 1, provides a very basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount. However, tier 2 allows you to recover and provide some form of service at low cost and more rapidly than tier 1. You must assess whether the loss of data and the time taken to restore a service will prevent your company from continuing in business.
Figure 43 summarizes the tier 3 solution.
Tier 3 is similar to tier 2. The difference is that data is electronically transmitted to the hot site. This eliminates physical transportation of data and the off-site storage warehouse. The same process is used to backup the data, so the same primary site availability issues exist in tier 3 as in tiers 1 and 2.
The benefits of tier 3 are:
The drawbacks are the cost of reserving the DASD at the hot standby site, and that you must have a link to the hot site, and the required software, to transfer the data.
Procedures and documentation still have to be available at the hot site, but this can be achieved electronically.
Tier 3, like tiers 1 and 2, provides a basic level of disaster recovery. You will lose data in the disaster, perhaps a considerable amount of data. The advantage of tier 3 is that you should be able to provide a service to your users quite rapidly. You must assess whether the loss of data will prevent your company from continuing in business.
Figure 44 summarizes the solutions for tiers 0 through 3, and shows the approximate time required for a recovery with each tier of solution.
Tiers 0 to 3 cover the disaster recovery plans of many CICS users. With the exception of tier 0, they employ the same basic design using a point-in-time copy of the necessary data. That data is then moved off-site to be used when required after a disaster.
The advantage of these methods is their low cost.
The disadvantages of these methods are:
Figure 45 summarizes the tier 4 solution.
Tier 4 closes the gap between the point-in-time backups and current online processing recovery. Under a tier 4 recovery plan, site one acts as a backup to site two, and site two acts as a backup to site one.
Tier 4 duplicates the vital data of each system at the other's site. You must transmit image copies of data to the alternate site on a regular basis. You must also transmit CICS system logs and forward recovery logs, after they have been archived. Similarly, you must transmit logs for IMS™ and DB2 subsystems. Your recovery action is to perform a forward recovery of the data at the alternate site. This allows recovery up to the point of the latest closed log for each subsystem.
You must also copy to the alternate site other vital data that is necessary to run your system. For example, you must copy your load libraries and JCL. You can do this on a regular basis, or when the libraries and JCL change.
The benefits of tier 4 are:
The drawbacks are:
Tier 4 provides a more advanced level of disaster recovery. You will lose data in the disaster, but only a few minutes- or hours-worth. You must assess whether the loss of data will prevent your company from continuing in business, and what the cost of lost data will be.
Figure 46 summarizes the tier 5 solution.
Tier 5, remote two-phase commit, is an application-based solution to provide high currency of data at a remote site. This requires partially or fully dedicated hardware at the remote site to keep the vital data in image format and to perform the two-phase commit. The vital data at the remote site and the primary site is updated or backed out as a single unit of work (UOW). This ensures that the only vital data lost would be from transactions that are in process when the disaster occurs.
Other data required to run your vital application has to be sent to the secondary site as well. For example, current load libraries and documentation has to be kept up-to-date on the secondary site.
The benefits of tier 5 are fast recovery using vital data that is current. The drawbacks are:
A Tier 5 solution is appropriate for a custom-designed recovery plan with special applications. Because these applications must be designed to use this solution, it cannot be implemented at most CICS sites.
Figure 47 summarizes the tier 6 solution.
Tier 6, minimal to zero data loss, is the ultimate level of disaster recovery.
There are two tier 6 solutions, one hardware-based and the other software-based. For details of the hardware and software available for these solutions, see Peer-to-peer remote copy (PPRC) and extended remote copy (XRC) (hardware) and Remote Recovery Data Facility (software).
The hardware solution involves the use of IBM 3990-6 DASD controllers with remote and local copies of vital data. There are two flavors of the hardware solution: (1) peer-to-peer remote copy (PPRC), and (2) extended remote copy (XRC).
The software solution involves the use of Remote Recovery Data Facility (RRDF). RRDF applies to data sets managed by CICS file control and to the DB2, IMS, IDMS, CPCS, ADABAS, and SuperMICR database management systems, collecting real-time log and journal data from them. RRDF is supplied by E-Net Corporation and is available from IBM as part of the IBM Cooperative Software Program.
The benefits of tier 6 are:
The drawbacks are the cost of running two sites and the communication overhead. If you are using the hardware solution based on 3990-6 controllers, you are limited in how far away your recovery site can be. If you use PPRC, updates are sent from the primary 3990-6 directly to the 3990-6 at your recovery site using enterprise systems connection (ESCON®) links between the two 3990-6 devices. The 3990-6 devices can be up to 43 km (26.7 miles) apart subject to quotation.
If you use XRC, the 3990-6 devices at the primary and recovery sites can be attached to the XRC DFSMS/MVS host at up to 43 km (26.7 miles) using ESCON links (subject to quotation). If you use three sites, one for the primary 3990, one to support the XRC DFSMS/MVS host, and one for the recovery 3990, this allows a total of 86 km (53.4 miles) between the 3990s. If you use channel extenders with XRC, there is no limit on the distance between your primary and remote site.
For RRDF there is no limit to the distance between the primary and secondary sites.
Tier 6 provides a very complete level of disaster recovery. You must assess whether the cost of achieving this level of disaster recovery is justified for your company.
Figure 48 summarizes the solutions for tiers 4 through 6, and shows the approximate time required for a recovery with each tier of solution.
This summary shows the three tiers and the various tools for each that can help you reach your required level of disaster recovery.
Tier 4 relies on automation to send backups to the remote site. NetView® provides the ability to schedule work in order to maintain recoverability at the remote site.
Tier 5 relies on the two-phase commit processing supported by various database products and your application program’s use of these features. Tier 5 requires additional backup processing to ensure that vital data, other than databases, is copied to the remote system.
Tier 6 is divided into two sections: software solutions for specific access methods and database management systems, and hardware solutions for any data.
RRDF can provide very high currency and recoverability for a wide range of data. However, it does not cover all the data in which you may be interested. For example, RRDF does not support load module libraries.
The 3990-6 hardware solution is independent of the data being stored on the DASD. PPRC and XRC can be used for databases, CICS files, logs, and any other data sets that you need to ensure complete recovery on the remote system.