The sample XRF overseer program

The CICS-supplied sample overseer is an assembler-language batch program that runs in its own address space. The source of the sample program is in four members of CICSTS31.CICS.SDFHSAMP:

The associated DSECTs are supplied in DFH$XRDS in the same library. An assembled version of the sample program is supplied in CICSTS31.CICS.SDFHAUTH.

The functions of the sample program

The program acts on five commands entered by the console operator. (Minimum abbreviations are shown like this: D.) The commands are as follows:

Display
to display the current status of all active-alternate pairs being monitored by the overseer program
Restart
to enable or disable the restart-in-place function of the overseer program
Snap
to take a snap dump of the sample program
End
to terminate the sample program
Open
to ask the overseer to try to open CICS® availability manager (CAVM) data sets that it has previously failed to open.

The full format of the operator command entered at the MVS™ console is:

MODIFY overseer-jobname,command-identifier

where "command-identifier" is Display, Restart, Snap, End, or Open, or an abbreviation of any of these. The Display and Restart commands control the two major functions of the sample overseer program, which are described below. The Open command is described under Opening CAVM data sets dynamically.

The display function

When the operator enters the Display command at the MVS console, the sample overseer program issues a multiline write-to-operator (MLWTO) command showing the last known state of each of the active-alternate pairs that it is monitoring. The overseer retrieves this information from the control and message data sets, in which the CICS availability manager (CAVM) has been recording state and surveillance information. The display includes a title line and one line of status information for each active-alternate pair. The title line is as follows:

GEN-APP  ACT-JOB  ACT-APP  ACPU A-ST  BKP-JOB  BKP-APP  BCPU B-ST

Each line of status information provides the following:

An example of the status display is shown, for guidance purposes, in the CICS Operations and Utilities Guide.

Note:
An ‘X’ following any of these status values indicates that the associated job is currently executing. However, because JES services are used to discover the execution state of a job, only those jobs that are running on the same JES as the overseer program (or on the same JES shared spool) show the correct execution state. Any job that is not on the same JES shared spool appears not to be executing.

There are two additional items that may appear on the status display. These are:

These are displayed instead of status data when no data was extracted from the CAVM data sets. This happens when newly-created data sets are used--CICS has not yet written any data to them--or when the overseer fails to open the data sets.

The restart-in-place function

The overseer program can restart failed CICS regions in place automatically, if they are in the same MVS image as the overseer. The alternatives to automatic restart are operator-initiated restart, automatic takeover to the alternate, and operator-initiated takeover.

Automatic restart in place of failed regions is most useful in the multi-MVS image MRO environment. Because related regions must operate in the same MVS image, a takeover of one region means that all related regions must also be taken over by their alternates. A region may not be important enough for you to want every failure to cause a takeover to the alternate MVS image. This could disrupt users who would not otherwise have been affected by the failure. Automatic restart in place of the failed region is therefore likely to be preferred to takeover in these circumstances.

If your system consists of one or more independent regions, with actives and alternates located in separate MVS images, you can:

If you are operating an MRO system in a single MVS image, the failure of an active region can be handled by a takeover by the alternate, without causing all the related regions to be taken over, because the new active region can continue communication with the other active regions. Takeover is therefore likely to be your preferred course of action.

Enabling and disabling restart in place

The restart-in-place function of the overseer program can be enabled and disabled using the Restart command. When you enter this command, restart processing is enabled or disabled for all generic applids that the overseer is monitoring. You can also specify that particular active-alternate pairs are not to be automatically restarted in place, regardless of whether restart processing is enabled or disabled. This is described in How to tell the overseer which actives and alternates to monitor.

The Restart command works like an ON/OFF switch. Restart in place is enabled when the sample program is initialized. When the Restart command is first entered, restart in place is disabled. If you issue the command again, restart is enabled again, and so on. If a region fails while restart in place is disabled, no attempt to restart it is made, even if restart in place is enabled again.

Rules that control restart in place

The sample overseer program concludes that a region has failed if both:

The overseer program can restart a failed active region in place, if all the following conditions are met:

When a failed active region is restarted in place, whether by the operator or by the overseer, the corresponding alternate region cannot continue to support the new active region, and must be restarted. The overseer program restarts the alternate region automatically in these circumstances, if restart processing is enabled for both the failing region and the overseer.

If you want to be able to restart regions in place in both MVS images in a two-image environment, an overseer program must execute in each image.

If the failed region was started originally as a started task, the overseer program restarts it as a started task, and if the failed region was started as a job, the overseer restarts it as a job. For more guidance information about how the sample overseer program restarts failed regions in place, refer to the CICS Operations and Utilities Guide.

Opening CAVM data sets dynamically

When the overseer program is initialized, it is possible that some CAVM data sets have not yet been formatted by a CICS system. The overseer program obtains an ‘open error’ return code on these data sets, and subsequent attempts to display details about the associated CICS systems receive the response ‘NO ACTIVE DATA AVAILABLE’.

This problem arises only if the overseer is initialized before all the CAVM data sets have been formatted. If it occurs, the operator can use the Open command (see Open ) to retry the opening of those CAVM data sets on which the Open previously failed. The overseer retries an Open only if the previous attempt failed with the return code X'C'. (See DFHWOSM FUNC=OPEN macro.)

The use of the Open command is indicated when:

How the sample overseer program interfaces with CICS

The overseer service is made up of a CICS overseer module (name DFHWOS), which you cannot customize, and a CICS-supplied sample overseer program (module name DFH$AXRO), which you can customize or replace with your own overseer program. DFHWOS loads the overseer program. DFHWOS and the compiled version of DFH$AXRO are supplied in CICSTS31.CICS.SDFHAUTH.

The CICS overseer module DFHWOS provides a stable interface to the CAVM data sets and to those MVS-authorized services that the overseer program requires. The overseer program invokes those services by means of a CICS-supplied group of macros called the DFHWOSM macros, which are described in The DFHWOSM macros.

DFHWOS therefore invokes the sample program, and is subsequently invoked by the sample program whenever the sample issues a DFHWOSM macro. The DFHWOSM macros do not interact directly with either the active or the alternate CICS address spaces.

How to tell the overseer which actives and alternates to monitor

The sample overseer program is written to handle active-alternate pairs and "related system names". A related system name identifies those regions or systems that cannot be considered in isolation by the overseer. The most common example of this is an MRO environment, where the overseer needs to be able to identify related regions when deciding whether to restart a failed region in place. Those regions or systems that are identified with a common related system name must be executed in the same MVS image.

The maximum number of active-alternate pairs that the overseer can monitor is 50.

The sample program discovers which active-alternate pairs it is monitoring from a VSAM key-sequenced data set called DFHOSD, which contains a single entry for each active-alternate pair. You create this data set and initialize it with information about active-alternate pairs before you use the overseer for the first time. You also have to redefine the DFHOSD data set whenever you want to change the information that it holds. CICS provides a sample job stream that you can use to:

The sample overseer program reads the DFHOSD records in key sequence and builds a table of entries. Each active-alternate pair is known by its generic applid on this data set. Every entry on the data set contains the following information:

The data structure of the DFHOSD data set entries is provided in member DFH$XRDS of CICSTS31.CICS.SDFHSAMP.

Related tasks
Customizing the sample XRF overseer program
Related reference
The DFHWOSM macros
[[ Contents Previous Page | Next Page Index ]]