Looking at the kernel domain storage areas

The type of information that you can gather from the kernel domain storage areas is as follows:

The first thing you need to do is to find out which tasks are associated with the error.

Finding which tasks are associated with the error

You can find out which tasks are associated with the error from the kernel task summary. This tells you which tasks were in the system when the dump was taken, whether or not they were running, and whether they were in error.

The task summary is in the form of a table, each line in the table representing a different task.

The left-hand column of the task summary shows the kernel task number, which is the number used by the kernel domain to identify the task. This is not the same as the normal CICS task number taken from field TCAKCTTA of the TCA.

Figure 1 shows an example of a kernel task summary with a task in error.

Figure 1. Kernel task summary showing a task in error
===KE: Kernel Domain KE_TASK Summary
KE_NUM KE_TASK  STATUS       TCA_ADDR TRAN_# TRANSID DS_TASK  KE_KTCB  ERROR
0001   06A14928 KTCB Step    00000000                00000000 06A37240
0002   06A145A8 KTCB QR      00000000                06B63000 06A3A090
0003   06A14228 KTCB RO      00000000                06B65000 06A39120
0004   06A23C80 KTCB FO      00000000                06B67000 06A381B0
0005   06A23900 Not Running  00000000                06B69080 06A39120
0006   06A23580 Not Running  06C76080 00022  CSNE    06B69180 06A3A090
0007   06A23200 Not Running  00000000                06B69280 06A3A090
0008   06A30C80 ***RUNNING***00051080 00005  DBRW    06B89880 06A3A090 *YES*
0009   06A30900 Not Running  00051680 00006  CSSY    06B89180 06A3A090
000A   06C3D400 Unused
000D   06C3D780 Unused
000E   06C3DB00 Unused
0011   06CDA080 Unused
0012   06CDA400 Unused
0013   06CDA780 Unused
0014   06CDAB00 Unused
0017   06B5E580 Unused
0018   06B5FC80 Not Running  06C75680 TCP    CSTP    06B89A80 06A3A090
001A   06CE7080 Unused

When you have located the task summary table in the formatted dump, look in the ERROR column. If you find a value of *YES* for a particular task, that task was in error at the time the dump was taken.

Note:
If the recovery routine that is invoked when the error occurs does not request a system dump, you will not see any tasks flagged in error. In such a case, the system dump is likely to have been requested by a program that is being executed lower down the linkage stack and that received an abnormal response following recovery. The program that received the error has gone from the stack, and so cannot be flagged. However, error data for the failing task was captured in the kernel domain error table (see Finding more information about the error). Error data is also captured in the error table even when no system dump is taken at all.

In Figure 1, you can see that kernel task number 0008 is shown to be in error.

Look next at the STATUS column. For each task you can see one of the following values:

You are almost certain to find that the task shown to be in error has a status of "***Running***", as in the example of Figure 1. Such a task would have been running at the time the error was detected.

Tasks shown to be "Not Running" are less likely to be associated with the error, but it is possible that one of these could have been flagged with an error. If you find this to be so, the most likely explanation is that the task in error was attempting recovery when, for some reason, it was suspended.

Two of the columns in the kernel task summary are particularly important in solving problems that require the use of traces. They are the TRAN_# and KE_NUM columns. The TRAN_# column for a task can contain:

When you are working with trace output, you can use the number from the TRAN_# column to identify entries associated with a user task up to the point at which that task passes control to CICS. To identify the CICS processing associated with the user task, you need to use the entry in the KE_NUM column of the kernel task summary. This matches the KE_NUM shown in the full trace entries for the task, and enables you to distinguish the CICS processing associated with the task you are interested in from other CICS processing.

Finding more information about the error

More information about the failure is given in the summary information for the task in error. This is given after the kernel task summary. It gives you a storage report for the task, including registers and PSWs, and any data addressed by the registers. The PSW is the program status word that is used by the machine hardware to record the address of the current instruction being executed, the addressing mode, and other control information. An example of such a storage report is shown in Figure 2, in this case for a program check.

Look first in the dump for this header, which introduces the error report for the task:

==KE: KE DOMAIN ERROR TABLE

Next, you will see the kernel error number for the task. Error numbers are assigned consecutively by the kernel, starting from 00000001. You might, for example, see this:

=KE: ERROR NUMBER:  00000001

The error number tells you the number of program checks and system abends that have occurred for this run of CICS. Not all of them have necessarily resulted in a system dump.

Some kernel error data follows. If you want to find the format of this data (and, in most cases, you will not need to), see the DFHKERRD section of the CICS Data Areas. The next thing of interest is the kernel’s interpretation of what went wrong. This includes the error code, the error type, the name of the program that was running, and the offset within the program.

The error code gives you the system and user completion codes issued when the abend occurred.

The error type tells you whether the error was associated with, for example, a program check, a system abend, or an internal request for system recovery.

Figure 2. Storage report for a task that has experienced a program check
==KE: KE DOMAIN ERROR TABLE
=KE: ERROR NUMBER:  00000001
 KERRD 0397B950 KERNEL ERROR DATA
    0000  F0C3F461 C1D2C5C1 018400C4 000022EE  C4C6C8E3 E2D74040 04D3FC70 054D0B78  *0C4/AKEA.D.D....DFHTSP  .L...(..*    0397B950
(Data for offset 0020 to 0100 follows)                                           ..........E..........DM.;*    0397B970
   ERROR CODE:  0C4/AKEA    ERROR TYPE:  PROGRAM_CHECK    TIMESTAMP:  A4D9433F7D330600
   DATE (GMT)    :  25/08/93        TIME (GMT)   :  09:04:49.484592
   DATE (LOCAL)  :  25/08/93        TIME (LOCAL) :  09:04:49.484592
   KE_NUM:  0007    KE_TASK:  03980AD0    TCA_ADDR:  0006A000    DS_TASK:  054D0B78
 ERROR HAPPENED IN PROGRAM DFHTSP   AT OFFSET 22EE
 ERROR HAPPENED UNDER THE CICS RB.
 CICS REGISTERS AND PSW FOLLOW.
   PSW:  078D1000 84D41F5E   INSTRUCTION LENGTH:  4  INTERRUPT CODE:  04  EXCEPTION ADDRESS:  00000000
   EXECUTION KEY AT PROGRAM CHECK/ABEND: 8
   SPACE AT PROGRAM CHECK/ABEND: BASESPACE
   REGISTERS 0-15
    0000  F2000518 03AE25CC 00011280 00000002  FFFFFFFF 04D44DB3 84D41DB6 04D42DB5  *2....................M(.DM...M..*    0397B9A0
    0020  04D43DB4 03B2F140 039936A0 00000001  0006A000 8003CBB0 84D43A54 04D43A3E  *.M....1 .R..............DM...M..*    0397B9C0
   DATA AT PSW: 84D41F5E    MODULE: DFHTSP     OFFSET: 000022EE
    0000  5CC4C6C8 E3E2D740 4084D3FC E4F0F3F3  F0C91707 1122C7C2 F6D44040 40401400  **DFHTSP  DL.U0330I....GB6M    ..*    04D3FC70
(Data for offset 0020 to 2300 follows)                                           .........................*    04D3FC90
   DATA AT REGISTERS
   REG 0   F2000518
 31-BIT DATA CANNOT BE ACCESSED **
 24-BIT DATA FOLLOWS:
   -0080  00000000 00000000 00000007 00000000  00000000 00000000 00000000 00FFCF20  *................................*    00000498
(Data for offset -0060 to 0100 follows)                                          .........................*    000004B8
(Similar data for CICS registers 1 to 15 follows)

Next, there is a report of where the system has recorded that the error occurred, and the circumstances of the failure. This is the general format of the information:

Error happened in program pppppppp at offset xxxxxxxx
Error happened ...

The program name (pppppppp) and offset (xxxxxxxx) are determined by searching through the CICS loader’s control blocks for a program that owned the abending instruction at the time of the abend. If this search does not find such a program, the following text appears in the report:

      PROGRAM QQQQQQQQ WAS IN CONTROL, BUT THE PSW WAS ELSEWHERE.

The program name (qqqqqqqq) reported, is the program that owns the current kernel stack entry for the abending task. If this text appears, it may be possible to locate the failing program using the method described in Using the linkage stack to identify the failing module. The failing program name and offset are also displayed in the section of the report immediately after the contents of the registers have been reported. The format of this information is:

        DATA AT PSW: AAAAAAAA   MODULE: PPPPPPPP   OFFSET: XXXXXXXX

If the failing program could not be located, the module name and offset are reported as unknown. The possible reasons for the program not being located are:

Note that the accuracy of the program name and offset reported in a formatted dump that was produced as the result of a program executing a wild branch cannot be guaranteed.

After the kernel’s interpretation of the error, you will see one of these diagnostic messages:

Error happened under the CICS RB

This means that the error was detected either when CICS code was executing, or when an access method called by CICS was running (for example, VSAM or QSAM). The CICS RB is the CICS request block, an MVS control block that records the state of the CICS program.

Error did not happen under the CICS RB

This message can be issued in any of these circumstances:

After either of these messages, you next get some data that is likely to be related to the problem. The data you get depends on whether or not the error happened under the CICS RB.

The error data for the failing task

If the error happened under the CICS RB, the error data you get in the task storage report is based on values in the PSW and the CICS registers at the time the error was detected. Figure 2 shows the storage report for a task that failed when a program check was detected. It illustrates the error data supplied when an error happens under the CICS RB.

If the error did not happen under the CICS RB, for example when CICS was calling an MVS service, you get data based on two sets of registers and PSWs. The registers and PSW of the CICS RB at the time of the error constitute one set. The registers and PSW of the RB in which the error occurred constitute the other set. This data will relate, very probably, to the execution of an SVC routine called by CICS. The error may have occurred, however, during an IRB interrupt or in an SRB. You can confirm whether this has happened by checking flags KERNEL_ERROR_IRB and KERNEL_ERROR_SRB_MODE.

The storage addressed by the registers and PSW

Any storage addressed by the registers and PSW is included in the error data for the failing task.

Note that only the values of the registers and PSW, not the storage they address, are guaranteed to be as they were at the time of the error. The storage that is shown is a snapshot taken at the time the internal system dump request was issued. Data might have changed because, for example, a program check has been caused by an incorrect address in a register, or short lifetime storage is addressed by a register.

Also, in general, where error data is given for a series of errors, the older the error, the less likely it is that the storage is as it was at the time of the failure. The most recent error has the highest error number; it might not be the first error shown in the output.

The registers might point to data in the CICS region. If the values they hold can represent 24-bit addresses, you see the data around those addresses. Similarly, if their values can represent 31-bit addresses, you get the data around those addresses.

It could be that the contents of a register might represent both a 24-bit address and a 31-bit address. In that case, you get both sets of addressed data. (Note that a register might contain a 24-bit address with a higher order bit set, making it appear like a 31-bit address; or it could contain a genuine 31-bit address.)

If, for any reason, the register does not address any data, you see either of these messages:

24-bit data cannot be accessed
31-bit data cannot be accessed

This means that the addresses cannot be found in the system dump of the CICS region. Note that MVS keeps a record of how CICS uses storage, and any areas not used by CICS are considered to lie outside the CICS address space. Such areas are not dumped in an MVS SDUMP of the region.

It is also possible that the addresses were within the CICS region, but they were not included in the SDUMP. This is because MVS enables you to take SDUMPs selectively, for example "without LPA". If this were to happen without your knowledge, you might think you had an addressing error when, in fact, the address was a valid one.

The format of the PSW is described in the IBM Enterprise Systems Architecture/370 Principles of Operation. The information in the PSW can help you to find the details needed by the IBM® Support Center. You can find the address of the failing instruction, and hence its offset within the module, and also the abend type. You find the identity of the failing module itself by examining the kernel linkage stack, as described in Using the linkage stack to identify the failing module.

Related concepts
The dump code options you can specify
Related tasks
Using the linkage stack to identify the failing module
Setting up the dumping environment
Related references
The system dump table
[[ Contents Previous Page | Next Page Index ]]