A recoverable data set that is updated in RLS mode can have retained locks held for individual records.
In the event of a data set failure, it is important to ensure that you preserve any retained locks as part of the data set recovery process. This is to enable the locks associated with the original data set to be attached to the new data set. If the data set failure is caused by anything other than a volume failure, retained locks can be "unbound" using the SHCDS FRUNBIND subcommand. The data set can then be recovered and the locks rebound to the recovered data using the SHCDS FRBIND subcommand.
If a data set failure is caused by the loss of a volume, it is not possible to preserve retained locks using FRUNBIND and FRBIND because SMSVSAM no longer has access to the failed volume. When recovering from the loss of a volume, you can ensure data integrity only by deleting the entire IGWLOCK00 lock structure, which forces CICS® to perform lost locks recovery. CICS uses information from its system log to perform lost locks recovery. (For more information about lost locks processing, see Lost locks recovery.) Recovering data from the loss of a volume requires a different procedure from the simple loss of a data set.
The procedures to recover data sets that could have retained locks are described in the following topics:
The procedure described here is necessary to preserve any retained locks that are held by SMSVSAM against the data in the old data set. Unless you follow all the steps of this procedure, the locks will not be valid for the new data set, with potential loss of data integrity.
The following steps outline the procedure to forward recover a data set accessed in RLS mode. Note that the procedure described here refers to two data sets--the failed data set, and the new one into which the backup is restored. When building your JCL to implement this process, be sure you reference the correct data set at each step.
//SYSIN DD *
SHCDS FRSETRR(old_dsname)
SHCDS FRUNBIND(old_dsname)
/*
ALTER CICS.DATASETB NEWNAME(CICS.DATASETA)
You must give the restored data set the name of the old data set to enable the following bind operation to succeed.
//SYSIN DD *
SHCDS FRBIND(dataset_name)
SHCDS FRRESETRR(dataset_name)
/*
These steps are summarized in the following commands, where the data set names are labeled with A and B suffixes:
CEMT SET DSNAME(CICS.DATASETA) QUIESCED
DEFINE CLUSTER(NAME(CICS.DATASETB) ...
SHCDS FRSETRR(CICS.DATASETA)
SHCDS FRUNBIND(CICS.DATASETA)
HRECOVER (CICS.DATASETA.BACKUP) ... NEWNAME(CICS.DATASETB)
SHCDS FRSETRR(CICS.DATASETB)
EXEC PGM=fwdrecov_utility
DELETE CICS.DATASETA
ALTER CICS.DATASETB NEWNAME(CICS.DATASETA)
SHCDS FRBIND(CICS.DATASETA)
SHCDS FRRESETRR(CICS.DATASETA)
If you use CICSVR, the SHCDS functions are performed for you (see Forward recovery with CICSVR).
After successful forward recovery, CICS can carry out any pending backout processing against the restored data set. Backout processing is necessary because the forward recovery log contains after images of all changes to the data set, including those that are uncommitted, and were in the process of being backed out when the data set failed.
Moving a data set that has retained locks means the locks associated with the original data set have somehow to be attached to the new data set. In the event of a lost volume, a volume restore implicitly moves data sets. Even if you are using CICSVR, which normally takes care of re-attaching locks to a recovered data set, the movement of data sets caused by loss of a volume cannot be managed entirely automatically.
There are several methods you can use to recover data sets after the loss of a volume. Whichever method you use (whether a volume restore, a logical data set recovery, or a combination of both), you need to ensure SMSVSAM puts data sets into a lost locks state to protect data integrity. This means that, after you have carried out the initial step of recovering the volume, your data recovery process must include the following command sequence:
The first command terminates all SMSVSAM servers in the sysplex and temporarily disables the SMSVSAM automatic restart facility. The second command (issued from any MVS™) deletes the lock structure. The third command restarts all SMSVSAM servers, as a result of which SMSVSAM records, in the sharing control data set, that data sets are in lost locks state. The automatic restart facility is also reenabled.
Each CICS region detects that its SMSVSAM server is down as a result of the TERMINATESERVER command, and waits for the server event indicating the server has restarted before it can resume RLS-mode processing. This occurs at step 3 in the above procedure.
It is important to realize the potential impact of these commands. Deleting the lock structure puts all RLS-mode data sets that have retained locks, or are open at the time the servers are terminated, into the lost locks condition. A data set which is in lost locks condition is not available for general access until all outstanding recovery on the data set is complete. This is because records are no longer protected by the lost locks, and new updates can only be permitted when all shunted UOWs with outstanding recovery work for the data set have completed.
When CICS detects that its server has restarted, it performs dynamic RLS restart, during which it is notified that it must perform lost locks recovery. During this recovery process, CICS does not allow new RLS-mode work to start for a given data set until all backouts for that data set are complete. Error responses are returned on open requests issued by any CICS region that was not sharing the data set at the time SMSVSAM servers were terminated, and on RLS access requests issued by any new UOWs in CICS regions that were sharing the data set. Also, in-doubt UOWs must be resolved before the data set can be taken out of lost locks state.
For RLS-mode data sets that are not on the lost volume, the CICS regions can begin lost locks recovery processing as soon as they receive notification from their SMSVSAM servers. For the data sets on these other volumes, recovery processing completes quickly and the data sets are removed from lost locks state.
For those data sets that are unavailable (for example, they are awaiting forward recovery because they are on the lost volume), CICS runs the backouts only when forward recovery is completed. In the case of CICSVR-managed forward recovery, completion is signalled automatically, and recovered data sets are removed from lost locks state when the associated backouts are run.
If a volume is lost, and you logically recover the data sets using CICSVR, you do not need to use the CFVOL QUIESCE command (step 1 in the procedure described below). This is because CICS cannot run the lost locks recovery process until the data sets are available, and the data sets are made available only after the CICSVR recovery jobs are finished.
If you physically restore the volume, however, the data sets that need to be forward recovered are immediately available for backout. In this case you need to use CFVOL QUIESCE before the volume restore to prevent access to the restored volume until that protection can be transferred to CICS (by using the CICS SET DSNAME(...) QUIESCED command). When all the data sets that need to be forward recovered have been successfully quiesced, you can enable the volume again (CFVOL ENABLE). The volume is then useable for other SMSVSAM data sets.
The command D SMS,CFVOL(volser) can be used to display the CFVOL state of the indicated volume.
CICS must not perform backouts until forward recovery is completed. The following outline procedure, which includes the three VARY SMS commands described above, prevents CICS opening for backout a data set on a restored volume until it is safe to do so. In this procedure volser is the volume serial of the lost volume:
Perform this step before volume restore. Quiescing the volume ensures that the volume remains unavailable, even after the restore, so that attempts to open data sets on the volume in RLS mode will fail with RC=8, ACBERFLG=198(X'C6'). Quiescing the volume also ensures CICS can't perform backouts for data sets after the volume is restored until it is re-enabled.
Note at this point, as soon as they receive the "SMSVSAM available" event notification (ENF), CICS regions are able to run backouts for the data sets that are available. RLS-mode data sets on the lost volume, however, remain unavailable until a later ENABLE command.
Use this command for all of the data sets on the lost volume that are to be eventually forward recovered. Issue the command before performing any of the forward recoveries.
Issue this command when CICS regions have successfully completed the data set QUIESCE function. You can verify that data sets are successfully quiesced by inquiring on the quiesced state of each data set using the CEMT INQUIRE DSNAME(...) command. If a data set is still quiescing, CICS displays the words "BEING QUIESCED".
This clears the SMSVSAM CFVOL-QUIESCED state and allows SMSVSAM RLS access to the volume. CICS ensures that access is not allowed to the data sets that will eventually be forward recovered, but the volume is available for other data sets.
The following are two examples of forward recovery after the loss of a volume, based on the procedure outline above:
For this illustration, involving two data sets, we simulated the loss of a volume by varying the volume offline. The two data sets (RLSADSW.VF04D.DATAENDB and RLSADSW.VF04D.TELLCTRL) were being updated in RLS mode by many CICS AORs at the time the volume was taken offline. The CICS file names used for these data sets were F04DENDB and F04DCTRL.
The failed data sets were recovered onto another volume without first recovering the failed volume. For this purpose, you have to know what data sets are on the volume at the time of the failure. In Example of recovery using volume backup, we describe the recovery process by performing a volume restore before the forward recovery of data sets. Here are the steps followed in this example:
ROUTE *ALL,VARY 4186,OFFLINE,FORCE
The loss of the volume caused I/O errors and transaction abends, producing messages on the MVS system log such as these:
DFHFC0157 ADSWA04B 030
TT1P 3326 CICSUSER An I/O error has occurred on base data set
RLSADSW.VF04D.TELLCTRL accessed via file F04DCTRL component code
X'00'.
DFHFC0158 ADSWA04B 031
96329,13154096,0005EDC00000,D,9S4186,A04B ,CICS
,4186,DA,F04DCTRL,86- OP,UNKNOWN COND. ,000000A5000403,VSAM
DFHFC0157 ADSWA03C 301
DE1M 0584 CICSUSER An I/O error has occurred on base data set
RLSADSW.VF04D.DATAENDB accessed via file F04DENDB component code
X'00'.
DFHFC0158 ADSWA03C 031
...
As a result of the transaction abends, CICS attempted to back out in-flight UOWs. The backouts failed because CICS couldn't access the data sets on the lost volume. The associated backout failures were reported by CICS, as follows:
+DFHFC4701 ADSWA03A 336
11/24/96 13:15:48 ADSWA03A Backout failed for transaction DE1H, VSAM
file F04DENDB, unit of work X'ADD18C07DCB70A05', task 46752, base
RLSADSW.VF04D.DATAENDB, path RLSADSW.VF04D.DATAENDB, failure code
X'24'.
+DFHFC0152 ADSWA03A 339
11/24/96 13:15:49 ADSWA03A ???? DE1H An attempt to retain locks for
data set within unit of work X'ADD18C07DCB70A05' failed. VSAM return
code X'00000008' reason code X'000000A9'.
+DFHME0116 ADSWA03A 340
(Module: DFHMEME) CICS symptom string for message DFHFC0152 is
PIDS/565501800 LVLS/510 MS/DFHFC0152 RIDS/DFHFCCA PTFS/UN92873
REGS/GR15 VALU/00000008 PCSS/IDARETLK PRCS/000000A9
+DFHFC0312 ADSWA03A Message DFHFC0152 data set RLSADSW.VF04D.DATAENDB
We used the CEMT command INQUIRE UOWDSNFAIL IOERROR to display the UOWS that were shunted as a result of the I/O errors. For example, on the CICS region ADSWA01D the command showed the following shunted UOWs:
INQUIRE UOWDSNFAIL IOERROR
STATUS: RESULTS
Dsn(RLSADSW.VF04D.TELLCTRL ) Dat Ioe
Uow(ADD18C2DA4D5FC03) Rls
Dsn(RLSADSW.VF04D.DATAENDB ) Dat Ioe
Uow(ADD18C2E693C7401) Rls
The normal way of closing RLS-mode files across a sysplex is to quiesce the data set using the CEMT command SET DSNAME QUIESCED in one CICS region. However, the quiesce operation requires access to the data set, and fails if the data set cannot be accessed. The alternative is to issue the SET FILE(F04DENDB) CLOSED and SET FILE(F04DCTRL) CLOSED commands, which we did using CICSPlex® SM to send the command to all the relevant regions. (Without CICSPlex SM, issue the CEMT SET FILE CLOSED command to each CICS region individually, either from the MVS console or from a CICS terminal).
DELETE RLSADSW.VF04D.TELLCTRL NOSCRATCH
DELETE RLSADSW.VF04D.DATAENDB NOSCRATCH
The impact of the recovery process is greater if there are inflight tasks updating RLS mode files. For this reason, it is recommended at this point that you quiesce the data sets that are being accessed in RLS mode on other volumes before terminating the SMSVSAM servers. To determine which data sets are being accessed in RLS-mode by a CICS region, use the SHCDS LISTSUBSYSDS subcommand. For example, the following command lists those data sets that are being accessed in RLS-mode by CICS region ADSWA01D.
SHCDS LISTSUBSYSDS('ADSWA01D')
For the purpose of this example, we did not quiesce data sets; hence there is no sample output to show.
ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER
We received message IGW572 on each MVS image confirming that the servers are terminating:
IGW572I REQUEST TO TERMINATE SMSVSAM
ADDRESS SPACE IS ACCEPTED:
SMSVSAM SERVER TERMINATION SCHEDULED.
In our example, terminating the servers caused abends of all in-flight tasks that were updating RLS-mode data sets. This, in turn, caused backout failures and shunted UOWs, which were reported by CICS messages. For example, the effect in CICS region ADSWA03C was shown by the following response to an INQUIRE UOWDSNFAIL command for data set RLSADSW.VF01D.BANKACCT:
INQUIRE UOWDSNFAIL DSN(RLSADSW.VF01D.BANKACCT)
STATUS: RESULTS
Dsn(RLSADSW.VF01D.BANKACCT ) Dat Ope
Uow(ADD19B8166268E02) Rls
Dsn(RLSADSW.VF01D.BANKACCT ) Rls Com
Uow(ADD19B9D93DE1200) Rls
After the SMSVSAM servers terminated, all RLS-mode files were automatically closed by CICS and further RLS access prevented.
VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE
followed by the response "FORCEDELETELOCKSTRUCTURESMSVSAMYES" to allow the lock structure deletion to continue.
Successful deletion of the lock structure was indicated by the following message:
IGW527I SMSVSAM FORCE DELETE LOCK STRUCTURE PROCESSING IS NOW COMPLETE
ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE
Initialization of the SMSVSAM servers resulted in the creation of a new lock structure, shown by the following message:
IGW453I SMSVSAM ADDRESS SPACE HAS SUCCESSFULLY
CONNECTED TO DFSMS LOCK STRUCTURE IGWLOCK00
STRUCTURE VERSION: ADD1A77F0420E001 SIZE: 35072K bytes
MAXIMUM USERS: 32 REQUESTED:32
LOCK TABLE ENTRIES: 2097152 REQUESTED: 2097152
RECORD TABLE ENTRIES: 129892 USED: 0
The SMSVSAM server reported that there were no longer any retained locks but that instead there were data sets in the "lost locks" condition:
IGW414I SMSVSAM SERVER ADDRESS SPACE IS NOW ACTIVE.
IGW321I No retained locks
IGW321I 45 spheres in Lost Locks
CICS was informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery. In our example, CICS issued messages such as the following to tell us that lost locks recovery was needed on one or more data sets:
DFHFC0555 ADSWA04A One or more data sets are in lost locks status.
CICS will perform lost locks recovery.
(If we had quiesced data sets before terminating the servers (see the comments between steps 3 and 4) this is the point at which we would unquiesce those data sets before continuing with the recovery.
If there were many data sets in lost locks it would take some time for lost locks recovery to complete. Error responses are returned on open requests issued by any CICS region that was not sharing the data set at the time SMSVSAM servers were terminated, and on RLS access requests issued by any new UOWs in CICS regions that were sharing the data set. Also, it may be necessary to open explicitly files that suffer open failures during lost locks recovery.
Each data set in a lost locks state is protected from new updates until all CICS regions have completed lost locks recovery for the data set. This means that all shunted UOWs must be resolved before the data set is available for new work. Assuming that all CICS regions are active, and there are no in-doubt UOWs, lost locks processing, for all data sets except the ones on the failed volume, should complete quickly.
INQUIRE UOWDSNFAIL
STATUS: RESULTS
Dsn(RLSADSW.VF04D.TELLCTRL ) Dat Ope
Uow(ADD18C2DA4D5FC03) Rls
Dsn(RLSADSW.VF04D.DATAENDB ) Dat Ope
Uow(ADD18C2E693C7401) Rls
The command INQUIRE DSN(RLSADSW.VF04D.DATAENDB) on the same region showed that the lost locks status for the data set was Recoverlocks. This meant that the data set had suffered lost locks and that CICS region ADSWA01D had recovery work to complete:
INQUIRE DSN(RLSADSW.VF04D.DATAENDB)
RESULT - OVERTYPE TO MODIFY
Dsname(RLSADSW.VF04D.DATAENDB)
Accessmethod(Vsam)
Action( )
Filecount(0001)
Validity(Valid)
Object(Base)
Recovstatus(Fwdrecovable)
Backuptype()
Frlog(00)
Availability( Available )
Lostlocks(Recoverlocks)
Retlocks(Retained)
Quiescestate()
Uowaction( )
Basedsname(RLSADSW.VF04D.DATAENDB)
Fwdrecovlsn(ADSW.CICSVR.F04DENDB)
All CICS regions are automatically notified when CICSVR processing for a data set is complete. CICSVR preserves the lost locks state for the recovered data set and CICS disallows all new update requests until all CICS regions have completed lost locks recovery. When all CICS regions have informed SMSVSAM that they have completed their lost locks recovery, the data set lost locks state changes to Nolostlocks.
SET FILE(F04DENDB) ENABLED
SET FILE(F04DCTRL) ENABLED
These commands are issued to each CICS AOR that requires access.
If you follow the above example, but find that a CICS region still has a data set in lost locks, you can investigate the UOW failures on that particular CICS region using the CEMT commands INQUIRE UOWDSNFAIL and INQUIRE UOW. For in-doubt UOWs that have updated a data set that is in a lost locks condition, CICS waits for in-doubt resolution before allowing general access to the data set. In such a situation you can still release the locks immediately, using the SET DSNAME command, although in most cases you will lose data integrity. See Lost locks recovery for more information about resolving in-doubt UOWs following lost locks processing.
In this example, we simulated the recovery from the loss of a volume by performing a volume restore before the forward recovery process. Backout-failed UOWs were the result of the I/O errors that occurred when the volume failed.
Many of the steps in this second example are the same as those described under the Example of recovery using data set backup, and are listed here in summary form only.
ROUTE *ALL,VARY 4186,OFFLINE,FORCE
VARY SMS,CFVOL(9S4186),QUIESCE
In this example, for volume serial 9S4186, the command produced the message:
IGW462I DFSMS CF CACHE REQUEST TO QUIESCE VOLUME 9S4186 IS ACCEPTED
We confirmed that the volume was quiesced by issuing the MVS command:
DISPLAY SMS,CFVOL(9S4186)
which confirmed that the volume was quiesced with the message:
IGW531I DFSMS CF VOLUME STATUS
VOLUME = 9S4186
DFSMS VOLUME CF STATUS = CF_QUIESCED
VOLUME 9S4186 IS NOT BOUND TO ANY DFSMS CF CACHE STRUCTURE
ROUTE *ALL,VARY 4186,ONLINE
Because the volume was quiesced, attempts to open files on this volume failed, with messages such as the following:
DFHFC0500 ADSWA02A RLS OPEN of file F04DENDB failed. VSAM has
returned code X'0008' in R15 and reason X'00C6'.
The impact of the recovery process is greater if there are inflight tasks updating RLS mode files. To minimize the impact, you are recommended at this point to quiesce all data sets that are being accessed in RLS mode.
ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER
VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE
ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE
CICS was informed during dynamic RLS restart about the data sets for which it must perform lost locks recovery. CICS issued messages such as the following to inform you that lost locks recovery was being performed on one or more data sets:
+DFHFC0555 ADSWA04A One or more data sets are in lost locks status.
CICS will perform lost locks recovery.
If we had quiesced data sets prior to terminating the servers, this is the point at which we would unquiesce those data sets before proceeding.
If there were many data sets in lost locks it would take some time for lost locks recovery to complete. It may be necessary to explicitly open files which suffer open failures during lost locks recovery.
SET DSN(RLSADSW.VF04D.DATAENDB) QUIESCED
SET DSN(RLSADSW.VF04D.TELLCTRL) QUIESCED
VARY SMS,CFVOL(9S4186),ENABLE
The above command produced the following message:
IGW463I DFSMS CF CACHE REQUEST TO ENABLE
VOLUME 9S4186 IS COMPLETED.
DFSMS CF VOLUME STATUS = "CF_ENABLED"
All CICS regions were automatically notified when CICSVR processing for each data set was complete, and each data set was automatically unquiesced by CICSVR to allow the backout shunted UOWs to be retried.
After all backout shunted UOWs were successfully retried, the recovery was complete and we re-enabled the recovered data sets for general access on each CICS region using the CEMT commands:
SET FILE(F04DENDB) ENABLED
SET FILE(F04DCTRL) ENABLED
If a user catalog is lost, follow the procedures documented in DFSMS/MVS Managing Catalogs. Before making the user catalog available, run the SHCDS CFREPAIR command to reconstruct critical RLS information in the catalog. Note that before running SHCDS CFREPAIR, the restored user catalog must be import connected to the master catalog on all systems (see the "Recovering Shared Catalogs" topic in DFSMS/MVS Managing Catalogs).
[[ Contents Previous Page | Next Page Index ]]