Forward recovery of data sets accessed in RLS mode

A recoverable data set that is updated in RLS mode can have retained locks held for individual records.

In the event of a data set failure, it is important to ensure that you preserve any retained locks as part of the data set recovery process. This is to enable the locks associated with the original data set to be attached to the new data set. If the data set failure is caused by anything other than a volume failure, retained locks can be "unbound" using the SHCDS FRUNBIND subcommand. The data set can then be recovered and the locks rebound to the recovered data using the SHCDS FRBIND subcommand.

Note:
If you use CICSVR to recover a data set, the unbinding and subsequent binding of locks is handled by CICSVR. CICSVR also uses the SHCDS FRSETRR and FRRESETRR commands to prevent general access to the data set during the recovery process.

If a data set failure is caused by the loss of a volume, it is not possible to preserve retained locks using FRUNBIND and FRBIND because SMSVSAM no longer has access to the failed volume. When recovering from the loss of a volume, you can ensure data integrity only by deleting the entire IGWLOCK00 lock structure, which forces CICS® to perform lost locks recovery. CICS uses information from its system log to perform lost locks recovery. (For more information about lost locks processing, see Lost locks recovery.) Recovering data from the loss of a volume requires a different procedure from the simple loss of a data set.

The procedures to recover data sets that could have retained locks are described in the following topics:

Recovery of data set with volume still available

The procedure described here is necessary to preserve any retained locks that are held by SMSVSAM against the data in the old data set. Unless you follow all the steps of this procedure, the locks will not be valid for the new data set, with potential loss of data integrity.

The following steps outline the procedure to forward recover a data set accessed in RLS mode. Note that the procedure described here refers to two data sets--the failed data set, and the new one into which the backup is restored. When building your JCL to implement this process, be sure you reference the correct data set at each step.

1. Quiesce the data set
To prevent further accesses to the failed data set, quiesce it using the CEMT, or EXEC CICS, SET DSNAME QUIESCED command.
2. Create a new data set
Create a new data set into which the backup is to be restored. At this stage, it cannot have the same name as the failed production data set.
3. Issue FRSETRR
Use this access method services SHCDS subcommand to mark the failed data set as being subject to a forward recovery operation. This makes the data set unavailable to tasks other than those performing recovery functions, and also allows the following unbind operation to succeed.
4. Issue FRUNBIND
Use this access method services SHCDS subcommand to unbind any retained locks against the failed data set. This enables SMSVSAM to preserve the locks ready for re-binding later to the new data set used for the restore. This is necessary because there is information in the locks that relates to the old data set, and it must be updated to refer to the new data set. Unbinding and re-binding the locks takes care of this.
Note:
You can include the access method services SHCDS FRSETRR and FRUNBIND subcommands of steps 3 and 4 in the same IDCAMS execution, but they must be in the correct sequence. For example, the SYSIN input to IDCAMS would look like this:
//SYSIN   DD *
 SHCDS FRSETRR(old_dsname)
 SHCDS FRUNBIND(old_dsname)
/*
5. Restore the backup
After the unbind, restore a full backup of the data set to the new data set created in step 2. You can use the recovery function (HRECOVER) of DFSMShsm™ to do this.
6. Issue the FRSETRR subcommand
Use this access method services SHCDS subcommand to mark the new data set as being subject to a forward recovery operation. This is necessary to allow the later bind operation to succeed.
7. Run the forward recovery utility
Run your forward recovery utility to apply the forward recovery log to the restored data set, to redo all the completed updates.
8. Delete the old data set
Delete the old data set to enable you to rename the new data set to the name of the failed data set.
9. Alter the new data set name
Use access method services to rename the new data set to the name of the old data set.
ALTER CICS.DATASETB NEWNAME(CICS.DATASETA)

You must give the restored data set the name of the old data set to enable the following bind operation to succeed.

10. Issue the FRBIND subcommand
Use this access method services SHCDS subcommand to re-bind to the recovered data set all the retained locks that were unbound from the old data set.
11. Issue the FRRESETRR subcommand
Use this access method services SHCDS subcommand after the re-bind to re-enable access to the data set by applications other than the forward recovery utility.
Note:
You can include the SHCDS FRBIND and FRRESETRR subcommands of steps 10 and 11 in one IDCAMS execution, but they must be in the correct sequence. For example, the SYSIN input to IDCAMS would look like this:
//SYSIN   DD *
 SHCDS FRBIND(dataset_name)
 SHCDS FRRESETRR(dataset_name)
/*

These steps are summarized in the following commands, where the data set names are labeled with A and B suffixes:

 CEMT SET DSNAME(CICS.DATASETA) QUIESCED
 DEFINE CLUSTER(NAME(CICS.DATASETB) ...
 SHCDS FRSETRR(CICS.DATASETA)
 SHCDS FRUNBIND(CICS.DATASETA)
 HRECOVER (CICS.DATASETA.BACKUP) ... NEWNAME(CICS.DATASETB)
 SHCDS FRSETRR(CICS.DATASETB)
 EXEC PGM=fwdrecov_utility
 DELETE CICS.DATASETA
 ALTER CICS.DATASETB NEWNAME(CICS.DATASETA)
 SHCDS FRBIND(CICS.DATASETA)
 SHCDS FRRESETRR(CICS.DATASETA)

If you use CICSVR, the SHCDS functions are performed for you (see Forward recovery with CICSVR).

After successful forward recovery, CICS can carry out any pending backout processing against the restored data set. Backout processing is necessary because the forward recovery log contains after images of all changes to the data set, including those that are uncommitted, and were in the process of being backed out when the data set failed.

Recovery of data set with loss of volume

Moving a data set that has retained locks means the locks associated with the original data set have somehow to be attached to the new data set. In the event of a lost volume, a volume restore implicitly moves data sets. Even if you are using CICSVR, which normally takes care of re-attaching locks to a recovered data set, the movement of data sets caused by loss of a volume cannot be managed entirely automatically.

There are several methods you can use to recover data sets after the loss of a volume. Whichever method you use (whether a volume restore, a logical data set recovery, or a combination of both), you need to ensure SMSVSAM puts data sets into a lost locks state to protect data integrity. This means that, after you have carried out the initial step of recovering the volume, your data recovery process must include the following command sequence:

  1. ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER
  2. VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE
  3. ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE

The first command terminates all SMSVSAM servers in the sysplex and temporarily disables the SMSVSAM automatic restart facility. The second command (issued from any MVS™) deletes the lock structure. The third command restarts all SMSVSAM servers, as a result of which SMSVSAM records, in the sharing control data set, that data sets are in lost locks state. The automatic restart facility is also reenabled.

Each CICS region detects that its SMSVSAM server is down as a result of the TERMINATESERVER command, and waits for the server event indicating the server has restarted before it can resume RLS-mode processing. This occurs at step 3 in the above procedure.

It is important to realize the potential impact of these commands. Deleting the lock structure puts all RLS-mode data sets that have retained locks, or are open at the time the servers are terminated, into the lost locks condition. A data set which is in lost locks condition is not available for general access until all outstanding recovery on the data set is complete. This is because records are no longer protected by the lost locks, and new updates can only be permitted when all shunted UOWs with outstanding recovery work for the data set have completed.

When CICS detects that its server has restarted, it performs dynamic RLS restart, during which it is notified that it must perform lost locks recovery. During this recovery process, CICS does not allow new RLS-mode work to start for a given data set until all backouts for that data set are complete. Error responses are returned on open requests issued by any CICS region that was not sharing the data set at the time SMSVSAM servers were terminated, and on RLS access requests issued by any new UOWs in CICS regions that were sharing the data set. Also, in-doubt UOWs must be resolved before the data set can be taken out of lost locks state.

For RLS-mode data sets that are not on the lost volume, the CICS regions can begin lost locks recovery processing as soon as they receive notification from their SMSVSAM servers. For the data sets on these other volumes, recovery processing completes quickly and the data sets are removed from lost locks state.

For those data sets that are unavailable (for example, they are awaiting forward recovery because they are on the lost volume), CICS runs the backouts only when forward recovery is completed. In the case of CICSVR-managed forward recovery, completion is signalled automatically, and recovered data sets are removed from lost locks state when the associated backouts are run.

Volume recovery procedure using CFVOL QUIESCE

If a volume is lost, and you logically recover the data sets using CICSVR, you do not need to use the CFVOL QUIESCE command (step 1 in the procedure described below). This is because CICS cannot run the lost locks recovery process until the data sets are available, and the data sets are made available only after the CICSVR recovery jobs are finished.

If you physically restore the volume, however, the data sets that need to be forward recovered are immediately available for backout. In this case you need to use CFVOL QUIESCE before the volume restore to prevent access to the restored volume until that protection can be transferred to CICS (by using the CICS SET DSNAME(...) QUIESCED command). When all the data sets that need to be forward recovered have been successfully quiesced, you can enable the volume again (CFVOL ENABLE). The volume is then useable for other SMSVSAM data sets.

The command D SMS,CFVOL(volser) can be used to display the CFVOL state of the indicated volume.

CICS must not perform backouts until forward recovery is completed. The following outline procedure, which includes the three VARY SMS commands described above, prevents CICS opening for backout a data set on a restored volume until it is safe to do so. In this procedure volser is the volume serial of the lost volume:

  1. VARY SMS,CFVOL(volser),QUIESCE

    Perform this step before volume restore. Quiescing the volume ensures that the volume remains unavailable, even after the restore, so that attempts to open data sets on the volume in RLS mode will fail with RC=8, ACBERFLG=198(X'C6'). Quiescing the volume also ensures CICS can't perform backouts for data sets after the volume is restored until it is re-enabled.

  2. ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER
  3. VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE
  4. ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE

    Note at this point, as soon as they receive the "SMSVSAM available" event notification (ENF), CICS regions are able to run backouts for the data sets that are available. RLS-mode data sets on the lost volume, however, remain unavailable until a later ENABLE command.

  5. At this point the procedure assumes the volume has been restored. This step transfers the responsibility of inhibiting backouts for those data sets to be forward recovered from SMSVSAM to CICS. Quiescing the data sets that need to be forward recovered is a first step to allowing the restored volume to be used for recovery work for other data sets:
    1. SET DSNAME(...) QUIESCED

      Use this command for all of the data sets on the lost volume that are to be eventually forward recovered. Issue the command before performing any of the forward recoveries.

      Note:
      A later SET DSNAME(...) UNQUIESCED command is not needed if you are using CICSVR.
    2. VARY SMS,CFVOL(volser),ENABLE

      Issue this command when CICS regions have successfully completed the data set QUIESCE function. You can verify that data sets are successfully quiesced by inquiring on the quiesced state of each data set using the CEMT INQUIRE DSNAME(...) command. If a data set is still quiescing, CICS displays the words "BEING QUIESCED".

      This clears the SMSVSAM CFVOL-QUIESCED state and allows SMSVSAM RLS access to the volume. CICS ensures that access is not allowed to the data sets that will eventually be forward recovered, but the volume is available for other data sets.

  6. Run data set forward recovery jobs.

The following are two examples of forward recovery after the loss of a volume, based on the procedure outline above:

Example of recovery using data set backup

For this illustration, involving two data sets, we simulated the loss of a volume by varying the volume offline. The two data sets (RLSADSW.VF04D.DATAENDB and RLSADSW.VF04D.TELLCTRL) were being updated in RLS mode by many CICS AORs at the time the volume was taken offline. The CICS file names used for these data sets were F04DENDB and F04DCTRL.

The failed data sets were recovered onto another volume without first recovering the failed volume. For this purpose, you have to know what data sets are on the volume at the time of the failure. In Example of recovery using volume backup, we describe the recovery process by performing a volume restore before the forward recovery of data sets. Here are the steps followed in this example:

  1. We simulated the volume failure using the MVS command:
    ROUTE *ALL,VARY 4186,OFFLINE,FORCE

    The loss of the volume caused I/O errors and transaction abends, producing messages on the MVS system log such as these:

         DFHFC0157 ADSWA04B 030
         TT1P 3326 CICSUSER An I/O error has occurred on base data set
         RLSADSW.VF04D.TELLCTRL accessed via file F04DCTRL component code
         X'00'.
         DFHFC0158 ADSWA04B 031
         96329,13154096,0005EDC00000,D,9S4186,A04B    ,CICS
         ,4186,DA,F04DCTRL,86- OP,UNKNOWN COND.  ,000000A5000403,VSAM
     
         DFHFC0157 ADSWA03C 301
         DE1M 0584 CICSUSER An I/O error has occurred on base data set
         RLSADSW.VF04D.DATAENDB accessed via file F04DENDB component code
         X'00'.
         DFHFC0158 ADSWA03C 031
         ...

    As a result of the transaction abends, CICS attempted to back out in-flight UOWs. The backouts failed because CICS couldn't access the data sets on the lost volume. The associated backout failures were reported by CICS, as follows:

      +DFHFC4701 ADSWA03A 336
       11/24/96 13:15:48 ADSWA03A Backout failed for transaction DE1H, VSAM
       file F04DENDB, unit of work X'ADD18C07DCB70A05', task 46752, base
       RLSADSW.VF04D.DATAENDB, path RLSADSW.VF04D.DATAENDB, failure code
       X'24'.
     
      +DFHFC0152 ADSWA03A 339
       11/24/96 13:15:49 ADSWA03A ???? DE1H An attempt to retain locks for
       data set within unit of work X'ADD18C07DCB70A05' failed.  VSAM return
       code X'00000008' reason code X'000000A9'.
      +DFHME0116 ADSWA03A 340
       (Module: DFHMEME) CICS symptom string for message DFHFC0152 is
       PIDS/565501800 LVLS/510 MS/DFHFC0152 RIDS/DFHFCCA PTFS/UN92873
       REGS/GR15 VALU/00000008 PCSS/IDARETLK PRCS/000000A9
      +DFHFC0312 ADSWA03A Message DFHFC0152 data set RLSADSW.VF04D.DATAENDB

    We used the CEMT command INQUIRE UOWDSNFAIL IOERROR to display the UOWS that were shunted as a result of the I/O errors. For example, on the CICS region ADSWA01D the command showed the following shunted UOWs:

         INQUIRE UOWDSNFAIL IOERROR
         STATUS:  RESULTS
          Dsn(RLSADSW.VF04D.TELLCTRL                      ) Dat Ioe
             Uow(ADD18C2DA4D5FC03)                         Rls
          Dsn(RLSADSW.VF04D.DATAENDB                      ) Dat Ioe
             Uow(ADD18C2E693C7401)                         Rls
  2. The next step was to stop the I/O errors by closing the RLS-mode files that were open against failed data sets. In our example, file F04DENDB was open against data set RLSADSW.FV04D.DATAENDB, and file F04DCTRL was open against data set RLSADSW.FV04D.TELLCTRL.

    The normal way of closing RLS-mode files across a sysplex is to quiesce the data set using the CEMT command SET DSNAME QUIESCED in one CICS region. However, the quiesce operation requires access to the data set, and fails if the data set cannot be accessed. The alternative is to issue the SET FILE(F04DENDB) CLOSED and SET FILE(F04DCTRL) CLOSED commands, which we did using CICSPlex® SM to send the command to all the relevant regions. (Without CICSPlex SM, issue the CEMT SET FILE CLOSED command to each CICS region individually, either from the MVS console or from a CICS terminal).

  3. To enable CICSVR to recover the failed data sets, we first deleted the catalog entries for the two affected data sets using the IDCAMS DELETE command:
    DELETE RLSADSW.VF04D.TELLCTRL NOSCRATCH
    DELETE RLSADSW.VF04D.DATAENDB NOSCRATCH
  4. The impact of the recovery process is greater if there are inflight tasks updating RLS mode files. For this reason, it is recommended at this point that you quiesce the data sets that are being accessed in RLS mode on other volumes before terminating the SMSVSAM servers. To determine which data sets are being accessed in RLS-mode by a CICS region, use the SHCDS LISTSUBSYSDS subcommand. For example, the following command lists those data sets that are being accessed in RLS-mode by CICS region ADSWA01D.

    SHCDS LISTSUBSYSDS('ADSWA01D')

    For the purpose of this example, we did not quiesce data sets; hence there is no sample output to show.

    Note:
    You can issue SHCDS subcommands as a TSO command or from a batch job.
  5. We terminated the SMSVSAM servers using the MVS command:
    ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER

    We received message IGW572 on each MVS image confirming that the servers are terminating:

    IGW572I REQUEST TO TERMINATE SMSVSAM
            ADDRESS SPACE IS ACCEPTED:
            SMSVSAM SERVER TERMINATION SCHEDULED.

    In our example, terminating the servers caused abends of all in-flight tasks that were updating RLS-mode data sets. This, in turn, caused backout failures and shunted UOWs, which were reported by CICS messages. For example, the effect in CICS region ADSWA03C was shown by the following response to an INQUIRE UOWDSNFAIL command for data set RLSADSW.VF01D.BANKACCT:

    INQUIRE UOWDSNFAIL DSN(RLSADSW.VF01D.BANKACCT)
    STATUS:  RESULTS
     Dsn(RLSADSW.VF01D.BANKACCT                      ) Dat Ope
        Uow(ADD19B8166268E02)                         Rls
     Dsn(RLSADSW.VF01D.BANKACCT                      ) Rls Com
        Uow(ADD19B9D93DE1200)                         Rls

    After the SMSVSAM servers terminated, all RLS-mode files were automatically closed by CICS and further RLS access prevented.

  6. When we were sure that all servers were down, we deleted the IGWLOCK00 lock structure with the MVS command:
    VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE

    followed by the response "FORCEDELETELOCKSTRUCTURESMSVSAMYES" to allow the lock structure deletion to continue.

    Successful deletion of the lock structure was indicated by the following message:

    IGW527I SMSVSAM FORCE DELETE LOCK STRUCTURE PROCESSING IS NOW COMPLETE
  7. It was safe at this point to restart the SMSVSAM servers with the MVS command:
    ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE

    Initialization of the SMSVSAM servers resulted in the creation of a new lock structure, shown by the following message:

    IGW453I SMSVSAM ADDRESS SPACE HAS SUCCESSFULLY
            CONNECTED TO DFSMS LOCK STRUCTURE IGWLOCK00
            STRUCTURE VERSION: ADD1A77F0420E001 SIZE: 35072K bytes
            MAXIMUM USERS: 32 REQUESTED:32
            LOCK TABLE ENTRIES: 2097152 REQUESTED: 2097152
            RECORD TABLE ENTRIES: 129892 USED: 0

    The SMSVSAM server reported that there were no longer any retained locks but that instead there were data sets in the "lost locks" condition:

    IGW414I SMSVSAM SERVER ADDRESS SPACE IS NOW ACTIVE.
    IGW321I No retained locks
    IGW321I 45 spheres in Lost Locks

    CICS was informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery. In our example, CICS issued messages such as the following to tell us that lost locks recovery was needed on one or more data sets:

    DFHFC0555 ADSWA04A One or more data sets are in lost locks status.
              CICS will perform lost locks recovery.
  8. (If we had quiesced data sets before terminating the servers (see the comments between steps 3 and 4) this is the point at which we would unquiesce those data sets before continuing with the recovery.

    If there were many data sets in lost locks it would take some time for lost locks recovery to complete. Error responses are returned on open requests issued by any CICS region that was not sharing the data set at the time SMSVSAM servers were terminated, and on RLS access requests issued by any new UOWs in CICS regions that were sharing the data set. Also, it may be necessary to open explicitly files that suffer open failures during lost locks recovery.

    Each data set in a lost locks state is protected from new updates until all CICS regions have completed lost locks recovery for the data set. This means that all shunted UOWs must be resolved before the data set is available for new work. Assuming that all CICS regions are active, and there are no in-doubt UOWs, lost locks processing, for all data sets except the ones on the failed volume, should complete quickly.

  9. In this example, CEMT INQUIRE UOWDSNFAIL on CICS region ADSWA01D showed UOW failures only for the RLSADSW.VF04D.TELLCTRL and RLSADSW.VF04D.DATAENDB data sets:
    INQUIRE UOWDSNFAIL
    STATUS:  RESULTS
     Dsn(RLSADSW.VF04D.TELLCTRL                      ) Dat Ope
        Uow(ADD18C2DA4D5FC03)                         Rls
     Dsn(RLSADSW.VF04D.DATAENDB                      ) Dat Ope
        Uow(ADD18C2E693C7401)                         Rls

    The command INQUIRE DSN(RLSADSW.VF04D.DATAENDB) on the same region showed that the lost locks status for the data set was Recoverlocks. This meant that the data set had suffered lost locks and that CICS region ADSWA01D had recovery work to complete:

    INQUIRE DSN(RLSADSW.VF04D.DATAENDB)
    RESULT - OVERTYPE TO MODIFY
      Dsname(RLSADSW.VF04D.DATAENDB)
      Accessmethod(Vsam)
      Action(              )
      Filecount(0001)
      Validity(Valid)
      Object(Base)
      Recovstatus(Fwdrecovable)
      Backuptype()
      Frlog(00)
      Availability( Available )
      Lostlocks(Recoverlocks)
      Retlocks(Retained)
      Quiescestate()
      Uowaction(              )
      Basedsname(RLSADSW.VF04D.DATAENDB)
      Fwdrecovlsn(ADSW.CICSVR.F04DENDB)
  10. At this point, all data sets were available for new work except the two data sets on the failed volume. It was now possible to recover these using CICSVR. For information on using CICSVR see Forward recovery with CICSVR and the CICSVSAM Recovery User's Guide and Reference.
  11. All CICS regions are automatically notified when CICSVR processing for a data set is complete. CICSVR preserves the lost locks state for the recovered data set and CICS disallows all new update requests until all CICS regions have completed lost locks recovery. When all CICS regions have informed SMSVSAM that they have completed their lost locks recovery, the data set lost locks state changes to Nolostlocks.

  12. At this point recovery was complete and the recovered data sets were re-enabled for general access by issuing (through the CICSPlex SM "LOCFILES" view) the CEMT commands:
    SET FILE(F04DENDB) ENABLED
    SET FILE(F04DCTRL) ENABLED

    These commands are issued to each CICS AOR that requires access.

  13. All data sets were now available for general access. We confirmed this using the SHCDS subcommand LISTSUBSYS(ALL), which showed that no CICS region had lost locks recovery outstanding.

If you follow the above example, but find that a CICS region still has a data set in lost locks, you can investigate the UOW failures on that particular CICS region using the CEMT commands INQUIRE UOWDSNFAIL and INQUIRE UOW. For in-doubt UOWs that have updated a data set that is in a lost locks condition, CICS waits for in-doubt resolution before allowing general access to the data set. In such a situation you can still release the locks immediately, using the SET DSNAME command, although in most cases you will lose data integrity. See Lost locks recovery for more information about resolving in-doubt UOWs following lost locks processing.

Example of recovery using volume backup

In this example, we simulated the recovery from the loss of a volume by performing a volume restore before the forward recovery process. Backout-failed UOWs were the result of the I/O errors that occurred when the volume failed.

Note:
It is important to ensure that CICS cannot retry the shunted UOWs when the volume is restored, until after the forward recovery work is complete. This is done by quiescing the volume before it is restored, as described under Volume recovery procedure using CFVOL QUIESCE (step 1).

Many of the steps in this second example are the same as those described under the Example of recovery using data set backup, and are listed here in summary form only.

  1. We simulated the volume failure using the MVS command.
    ROUTE *ALL,VARY 4186,OFFLINE,FORCE
  2. We stopped the I/O errors by closing the files that were open against failed data sets. In our example, file F04DENDB was open against data set RLSADSW.FV04D.DATAENDB and file F04DCTRL was open against data set RLSADSW.FV04D.TELLCTRL.
  3. Because the failed data sets were restored from the same volume, there was no need to delete the catalog entries for these data sets.
  4. Before restoring the failed volume, we quiesced the volume to ensure that CICS could not access the restored data sets. by issuing the command:
    VARY SMS,CFVOL(9S4186),QUIESCE

    In this example, for volume serial 9S4186, the command produced the message:

    IGW462I DFSMS CF CACHE REQUEST TO QUIESCE VOLUME 9S4186 IS ACCEPTED

    We confirmed that the volume was quiesced by issuing the MVS command:

    DISPLAY SMS,CFVOL(9S4186)

    which confirmed that the volume was quiesced with the message:

    IGW531I DFSMS CF VOLUME STATUS
    VOLUME = 9S4186
    DFSMS VOLUME CF STATUS = CF_QUIESCED
    VOLUME 9S4186 IS NOT BOUND TO ANY DFSMS CF CACHE STRUCTURE
  5. We simulated the volume restore for this example by using the MVS VARY command to bring the volume back online:
    ROUTE *ALL,VARY 4186,ONLINE

    Because the volume was quiesced, attempts to open files on this volume failed, with messages such as the following:

    DFHFC0500 ADSWA02A RLS OPEN of file F04DENDB failed. VSAM has
              returned code X'0008' in R15 and reason X'00C6'.
  6. The impact of the recovery process is greater if there are inflight tasks updating RLS mode files. To minimize the impact, you are recommended at this point to quiesce all data sets that are being accessed in RLS mode.

  7. We terminated the SMSVSAM servers with the MVS command:
    ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER
  8. When all SMSVSAM servers were down, we deleted the IGWLOCK00 lock structure with the MVS command:
    VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE
  9. We restarted the SMSVSAM servers with the MVS command:
    ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE

    CICS was informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery. CICS issued messages such as the following to inform you that lost locks recovery was being performed on one or more data sets:

    +DFHFC0555 ADSWA04A One or more data sets are in lost locks status.
               CICS will perform lost locks recovery.
  10. If we had quiesced data sets prior to terminating the servers, this is the point at which we would unquiesce those data sets before proceeding.

    If there were many data sets in lost locks it would take some time for lost locks recovery to complete. It may be necessary to explicitly open files which suffer open failures during lost locks recovery.

  11. At this point it was possible that there were data sets on the restored volume which did not require forward recovery. In order to make these data sets available, we needed to re-allow access to the volume. Before doing this, however, we first had to quiesce the data sets that still required forward recovery, thus transferring the responsibility of preventing backouts from SMSVSAM to CICS. In our example, we quiesced our two data sets using the CEMT commands:
    SET DSN(RLSADSW.VF04D.DATAENDB) QUIESCED
    SET DSN(RLSADSW.VF04D.TELLCTRL) QUIESCED
  12. When we were sure that all data sets requiring forward recovery were quiesced, we used the following MVS command to allow access to the restored volume:
    VARY SMS,CFVOL(9S4186),ENABLE

    The above command produced the following message:

    IGW463I DFSMS CF CACHE REQUEST TO ENABLE
                  VOLUME 9S4186 IS COMPLETED.
                  DFSMS CF VOLUME STATUS = "CF_ENABLED"
  13. At this point, all data sets were available for new work except the two quiesced data sets on the restored volume. We recovered these using CICSVR.

    All CICS regions were automatically notified when CICSVR processing for each data set was complete, and each data set was automatically unquiesced by CICSVR to allow the backout shunted UOWs to be retried.

    After all backout shunted UOWs were successfully retried, the recovery was complete and we re-enabled the recovered data sets for general access on each CICS region using the CEMT commands:

    SET FILE(F04DENDB) ENABLED
    SET FILE(F04DCTRL) ENABLED
  14. Finally, we used the SHCDS command LISTSUBSYS(ALL) to confirm that no CICS region had lost locks recovery outstanding, indicating that recovery was complete.

Catalog recovery

If a user catalog is lost, follow the procedures documented in DFSMS/MVS Managing Catalogs. Before making the user catalog available, run the SHCDS CFREPAIR command to reconstruct critical RLS information in the catalog. Note that before running SHCDS CFREPAIR, the restored user catalog must be import connected to the master catalog on all systems (see the "Recovering Shared Catalogs" topic in DFSMS/MVS Managing Catalogs).

[[ Contents Previous Page | Next Page Index ]]