Recovering after a CICS failure

CICS® initialization for an emergency restart after a CICS failure is the same as initialization for a warm restart, plus some additional processing. (See CICS warm restart for the warm restart details.)

Overview

The additional processing performed for an emergency restart is mainly related to the recovery of in-flight transactions. There are two aspects to the recovery operation:

  1. Recovering information from the system log
  2. Driving backout processing for in-flight units of work

Recovering information from the system log

At some point during initialization (and before CICS performs program list table post-initialization (PLTPI) processing), the recovery manager scans the system log backwards. CICS uses the information retrieved to restore the region to its state at the time of the abnormal termination.

For non-RLS data sets and other recoverable resources, any locks (ENQUEUES) that were held before the CICS failure are re-acquired during this initial phase.

For data sets accessed in RLS mode, the locks that were held by SMSVSAM for in-flight tasks are converted into retained locks at the point of abnormal termination.

Driving backout processing for in-flight units of work

When initialization is almost complete, and after the completion of PLTPI processing, the recovery manager starts backout processing for any UOWs that were in-flight at the time of the failure of the previous run. Starting recovery processing at the end of initialization means that it occurs concurrently with new work.

Concurrent processing of new work and backout

The backout of UOWs that occurs after an emergency restart is the same process as dynamic backout of a failed transaction. Backing out in-flight transactions continues after "control is given to CICS", which means that the process takes place concurrently with new work arriving in the region.

Any non-RLS locks associated with in-flight (and other failed) transactions are acquired as active locks for the tasks attached to perform the backouts. This means that, if any new transaction attempts to access non-RLS data that is locked by a backout task, it waits normally rather than receiving the LOCKED condition.

Retained RLS locks are held by SMSVSAM, and these do not change while backout is being performed. Any new transactions that attempt to access RLS resources locked by a backout task receive a LOCKED condition.

For both RLS and non-RLS resources, the backout of in-flight transactions after an emergency restart is indistinguishable from dynamic transaction backout.

Effect of delayed recovery on PLTPI processing

Because recovery processing does not take place until PLTPI processing is complete, PLT programs may fail during an emergency restart if they attempt to access resources protected by retained locks. If PLT programs are not written to handle the LOCKED exception condition they abend with an AEX8 abend code.

If successful completion of PLTPI processing is essential before your CICS applications are allowed to start, consider alternative methods of completing necessary PLT processing. You may have to allow emergency restart recovery processing to finish, and then complete the failed PLTPI processing when the locks have been released.

Other backout processing

The recovery manager also drives:

The recovery manager drives these backout and commit processes because the condition that caused them to fail may be resolved by the time CICS restarts. If the condition that caused a failure has not been resolved, the UOW remains in backout- or commit-failed state. See Backout-failed recovery and Commit-failed recovery for more information.

[[ Contents Previous Page | Next Page Index ]]