The system-level predefined situations are described below in alphabetical order. You can access the description of a specific situation by selecting its name in the Contents tab under Situations in the OMEGAMON XE on z/OS section of the help.
Note: In OMEGAMON XE for OS/390 V120, the predefined situations were renamed to make the intent of each situation clear. Additional predefined situations were added to the product. The list below reflects those names. A cross-reference listing of the situations showing their old and new names can be found in Using OMEGAMON XE on z/OS, Appendix B.
Crypto_CKDS_80PCT_Full monitors the Cryptographic Key Dataset (CKDS) and issues a Critical alert when it reaches 80% or more of its maximum capacity.
The CKDS is a VSAM linear dataset used to store keys encryption and authorization keys. If the dataset is at 80% or more of its maximum capacity, a new dataset should be created using a new master key and all keys contained in the dataset should be re-enciphered into the new dataset. The name of the current CKDS is shown in the CKDSname attribute. Refer to the ICSF Administration Guide for further details.
The formula is:
VALUE ICSF.Status EQ Active AND VALUE ICSF.CKDS_80Full EQ Yes
Crypto_CKDS_Access_Disabled monitors the Cyptographic Key Dataset (CKDS) and raises a Warning alert if access has been disabled.
The CKDS is a VSAM linear dataset used to store keys encryption and authorization keys. Access is normally disabled when a new master key or CKDS is being initialized. This interruption is temporary and access is enabled after key management operations are completed.
The formula is:
VALUE ICSF.CKDSAccess EQ Disabled
Crypto_Internal_Error monitors for internal errors and issues a Critical alert if one is detected.
Contact IBM Support with the event attributes to report the error and for assistance in correcting the problem. MonStatus = Overrun indicates that an internal queue overflow has been detected. SCEDisabled > 0 indicates one or more service call exits have ABENDed and is no longer collecting performance data.
The formula is:
VALUE ICSF.MonStatus NE Enabled
Crypto_Invalid_Master_Key monitors for the existence of a valid master key and raises a Critical alert if none is detected.
A valid master key must be loaded into at least one of the cryptographic coprocessors. Use the ICSF ISPF dialog, TKE, or the system element to load the master key into each cryptographic coprocessor. A different master key may be loaded into coprocessors shared by PRSM Logical Partitions. Each LPAR is associated with a separate Domain Index to isolate cryptographic keys. For PCIcoprocessors, the master key must be the same value as the symmetric-keys master key (SYM-MK).
The formula is:
VALUE ICSF.Status EQ Active AND VALUE ICSF.CCMKeyOK EQ No
Crypto_Invalid_PKA_Master_Keys
Crypto_Invalid_PKA_Master_Keys monitors for the existence of a valid Key Management Master Key (KMMK) and a valid Signature Master Key (SMK) and raises a Critical alert if either is invalid or missing.
If this situation is raised, ensure that the KMMK and SMK are loaded into each coprocessor. For PCI coprocessors, the SMK key must be the same value used for the asymmetric-keys master key (ASYM-MK). Use the KMMK and SM attributes to validate the values of the verification hash patterns for these keys.
The formula is:
VALUE ICSF.PKAMKeys EQ Invalid
Crypto_No_Coprocessors monitors for cryptographic coprocessors and raises a Critical alert if none is online.
At least one cryptographic coprocessor must be online for cryptographic services to become available. Verify that at least one coprocessor has been configured for the z/OS system. Use the System Element console to configure the coprocessors for use by systems.
The formula is:
VALUE ICSF.Status EQ Active AND VALUE ICSF.1_CC EQ No
Crypto_No_PCI_Coprocessors monitors for PCI cryptographic coprocessors and raises a Warning alert if none is detected.
Several Public Key Algorithm (PKA) service calls will not function without a PCI coprocessor available. Since PCI coprocessors are optimized for operations, PKA services will run slower on CMOS coprocessors.
The formula is:
VALUE ICSF.1_PCI EQ No
Crypto_PCI_Unavailable monitors for PCI coprocessors and raises a Critical alert if one is detected but is not online or active.
PCI coprocessors are optimized for Public Key Algorithm (PKA) operations and will run slower on CMOS coprocessors. Also, several PKA services will not run without a PCI coprocessor available.
The formula is:
VALUE ICSF.1_PCI EQ Yes AND VALUE ICSF.PCIStatus NE Active
Crypto_PKA_Services_Disabled monitors PKA services calls and raises a Warning alert if the service calls are disabled.
The services should only be disabled to update PKA Key Management Master Key (KMMK) or Signature Master Key (SMK), or to manage the Public Key Dataset (PKDS). PKA services calls should be enabled after PKA management operations are completed.
The formula is:
VALUE ICSF.Status EQ Active AND VALUE ICSF.PKACall EQ Disabled
Crypto_PKDS_Read_Disabled monitors the status of the Public Key Dataset (PKDS) and issues a Warning alert if read operations have been disabled.
The PKDS is a VSAM dataset used to store Public Key Algorithm (PKA) keys used for encryption and authentication. Read operations may be temporarily disabled for management operations on the PKDS. Read access to the PKDS is restored following completion of management operations. The PKDSname attribute displays the name of the current PKDS.
The formula is:
VALUE ICSF.PKDSRead EQ Disabled
Crypto_PKDS_Write_Disabled monitors the status of the Public Key Dataset (PKDS) and issues a Warning alert if write operations have been disabled.
The PKDS is a VSAM dataset used to store Public Key Algorithm keys used for encryption and authentication. Write access to the dataset may be temporarily disabled to allow key management operations to occur. Write access should be enabled following completion of PKDS key management operations. The PKDSname attribute displays the name of the current PKDS.
The formula is:
VALUE ICSF.PKDSWrite EQ Disabled
Crypto_Service_Unavailable monitors the status of cryptographic services and raises a Critical alert if they are unavailable.
If this situation is raised, verify that the ICSF subsystem is running on this system. If the ICSF subsystem is active, ensure that cryptographic coprocessors are online and available to this system. Also verify that a valid master key has been loaded in each coprocessor configured for this system.
The formula is:
VALUE ICSF.CryptoSvcs EQ Inactive
OS390_Allocated_CSA_Crit monitors to determine whether the percentage of the Common Storage Area allocated is equal to or greater than 95% and issues a Critical alert if the condition is true.
A system crash can occur due to exhausted CSA. Use this situation to identify the address spaces using high amounts of CSA and stop or cancel nonessential address spaces with high usage.
The formula is:
IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Allocation_Percent GE 95
OS390_ Allocated_CSA_Warn monitors to determine whether the percentage of the Common Storage Area allocated is between 90% and 94.9% inclusive and issues a Warning if the condition is true. A system crash can occur due to exhausted CSA. Identify the address spaces using high amounts of CSA and stop or cancel nonessential address spaces with high usage.
The formula is:
IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Allocation_Percent GE 90 AND
VALUE Common_Storage.Allocation_Percent LT 95
OS390_AvgCPU_Pct_Crit monitors to determine the average percentage of time that all processors available in this z/OS system were busy dispatching work and issues a Critical alert if the average percent value is equal to or greater than 100. This condition may or may not be a matter of immediate concern. If it arises suddenly on a uniprocessor, it may indicate that a unit of work is in a loop. If it is a chronic condition, it may be that the system is kept busy with low priority work. However, if service classes are missing their goals, a capacity increase may be needed.
The formula is:
IF VALUE System_CPU_Utilization.Average_CPU_Percent GE 100
OS390_AvgCPU_Pct_Warn monitors to determine the average percent of time that all processors available in this system were busy dispatching work, and issues a Warning if the average percent value is between 95 and 99% inclusive. This condition may or may not be a matter of immediate concern. If it arises suddenly on a uniprocessor, it may indicate that a unit of work is in a loop. If it is a chronic condition, it may be that the system is kept busy with low priority work. However, if service classes are missing their goals, a capacity increase may be needed.
The formula is:
IF VALUE System_CPU_Utilization.Average_CPU_Percent GE
95 AND
VALUE System_CPU_Utilization.Average_CPU_Percent LT 100
OS390_Cache_FastWrite_HitPt_Crit
OS390_Cache_FastWrite_HitPt_Crit monitors the percentage of successful I/O requests to write data to the cache and issues a Critical alert if the percentage is between 0 and 50%. If there is no service class that is missing its goal, this situation's thresholds may need to be adjusted. If service class periods are missing goals due to delay from the indicated device, datasets may need to be moved so that the Fast Write Cache capacity is better matched to the workload.
The formula is:
IF VALUE DASD_MVS_DEVICES.Fast_Write_Hit_Percent GT 0
AND
VALUE DASD_MVS_Devices.Fast_Write_Hit_Percent LE 50
OS390_Cache_FastWrite_HitPt_Warn
OS390_Cache_FastWrite_HitPt_Warn monitors the percentage of successful I/O requests to write data to the cache and issues a Warning alert if the percentage is between 50% and 70% inclusive. If there is no service class that is missing its goal, this situation's thresholds may need to be adjusted. If service class periods are missing goals due to delay from the indicated device, datasets may need to be moved so that the Fast Write Cache capacity is better matched to the workload.
The formula is:
IF VALUE DASD_MVS_DEVICES.Fast_Write_Hit_Percent LE 70
AND
VALUE DASD_MVS_DEVICES.Fast_Write_Hit_Percent GT 50
OS390_Cache_Read_HitPct_Crit monitors the percent of successful I/O requests to read data from the cache and issues a Critical alert if the percentage is greater than 0 and less than or equal to 50%. Research to determine whether data set placement should be adjusted (I/O tuning). If goals are being missed, tuning may be required. If no goals are being missed, the threshold may need to be adjusted.
The formula is:
IF VALUE DASD_MVS_DEVICES.Cache_Read_Hit_Percent GT 0
AND
VALUE DASD_MVS_DEVICES.Cache_Read_Hit_Percent LE 50
OS390_Cache_Read_HitPct_Warn monitors the percent of successful I/O requests to read data from the cache and issues a Warning if the percentage is between 51% and 70% inclusive. Research to determine whether data set placement should be adjusted (I/O tuning). If goals are being missed, tuning may be required.IF no goals are being missed, the threshold may need to be adjusted.
The formula is:
IF VALUE DASD_MVS_Devices.Cache_Read_Hit_Percent LE 70
AND
VALUE DASD_MVS_Devices.Cache_Read_Hit_Percent GT 50
OS390_Cache_Write_HitPct_Crit monitors the percent of successful I/O requests to write temporary data to the cache and issues a Critical alert if the percentage is greater than 0 and less than or equal to 50%. Determine whether any address spaces that have this device number allocated are in service classes that are missing their goals. If goals are being missed, data set placement (I/O tuning) may be required.
The formula is:
IF VALUE DASD_MVS_Devices.Cache_Write_Hit_Percent GT 0
AND
DASD_MVS_Devices.Cache_Write_Hit_Percent LE 50
OS390_Cache_Write_HitPct_Warn monitors the percent of successful I/O requests to write temporary data to the cache and issues a Warning alert if the percentage is between 51% and 70% inclusive. Determine whether any address spaces that have this device number allocated are in service classes that are missing their goals. If goals are being missed, data set placement (I/O tuning) may be required.
The formula is:
IF VALUE DASD_MVS_Devices.Cache_Write_Hit_Percent LE 70
AND
VALUE DASD_MVS_Devices.Cache_Write_Hit_Percent GT 50
OS390_CentralAvailFrames_Crit monitors to determine when available frames of real storage are less than the specified threshold and issues a Critical alert when the condition is true. This problem should correct itself in a short time by means of page stealing. However, if the problem occurs more often than once a day, there may be a performance problem in the paging subsystem or an address space is using an excessive number of pages.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Available_Frames LE 0
OS390_CentralAvailFrames_Warn monitors to determine when available frames of real storage are less than the specified threshold and issues a Warning alert when the condition is true. This problem should correct itself in a short time by means of page stealing. However, if the problem occurs more often than once a day, there may be a performance problem in the paging subsystem or an address space is using an excessive number of pages.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Available_Frames LE 0
OS390_CentralOnlineFrames_Crit
OS390_CentralOnlineFrames_Crit monitors the central storage online frame count and issues a Critical alert when the condition is true. This situation indicates that central (real) storage available to this system is less than the threshold. If this alert results from a deliberate reconfiguration action, you should reset this situation's threshold using the Situation editor. Otherwise, check for a possible hardware problem.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Online_Frames LT 0
OS390_CentralOnlineFrames_Warn
OS390_CentralOnlineFrames_Warn monitors the central storage online frame count and issues a Warning alert when the condition is true. This situation indicates that central (real) storage available to this system is less than the threshold. If this alert results from a deliberate reconfiguration action, you should reset this situation's threshold using the Situation editor. Otherwise, check for a possible hardware problem.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Online_Frames LT 0
OS390_CentraltoExpandedStor_Crit
OS390_CentraltoExpandedStor_Crit monitors the page movement rate from Central Storage to Expanded Storage and issues a Critical alert when the threshold value is exceeded. This situation is shipped disabled by default and should be activated only when attempting to solve a problem where excessive page movement is likely to be the cause.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Pages_Written_to_Expanded GE 1000000000
OS390_CentraltoExpandedStor_Warn
OS390_CentraltoExpandedStor_Warn monitors the page movement rate from Central Storage to Expanded Storage and issues a Warning when the threshold value is exceeded. This situation is shipped disabled by default and should be activated only when attempting to solve a problem where excessive page movement is likely to be the cause.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Pages_Written_to_Expanded GE 1000000000
OS390_ChannelComplexBusy_Crit monitors channel path activity and has determined that one or more channel paths is busier on all systems than the current Critical threshold. A Critical alert is issued. Check to determine the particular channels that are unusually active and if the threshold provided in this situation is low, adjust it using the Situation editor. Note that acceptable busy levels for tape channels, ESCON channels, and FICON channels are typically much higher than for parallel DASD channels.
The formula is:
IF VALUE Channel_Paths.Complex_Percent GE 100
OS390_ChannelComplexBusy_Warn monitors channel path activity and has determined that one or more channel paths is busier on all systems than the current Warning threshold. A Warning is issued. Check to determine the particular channels that are unusually active and if the threshold provided in this situation is low, adjust it using the Situation editor. Note that acceptable busy levels for tape channels, ESCON channels, and FICON channels are typically much higher than for parallel DASD channels.
The formula is:
IF VALUE Channel_Paths.Complex_Percent GE 100
OS390_Channel_LPAR_Busy_Pct_Crit
OS390_Channel_LPAR_Busy_Pct_Crit monitors the activity of the channel paths and issues a Critical alert if one or more channel paths is busier than the threshold. Identify the particular channel or channels that are unusually active. Note that the acceptable busy levels for tape channels, ESCON channels, and FICON channels are much higher than typical levels for parallel DASD channels. If the threshold for this situation is too low, adjust it using the Situation editor.
The formula is:
IF VALUE Channel.Paths.LPAR_Percent GE 100
OS390_Channel_LPAR_Busy_Pct_Warn
OS390_Channel_LPAR_Busy_Pct_Warn monitors the activity of the channel paths and issues a Warning alert if one or more channel paths is busier than the threshold. Identify the particular channel or channels that are unusually active. Note tht the acceptable busy levels for tape channels, ESCON channels, and FICON channels are much higher than typical levels for parallel DASD channels. If the threshold for this situation is too low adjust, it using the Situation editor.
The formula is:
IF VALUE Channel.Paths.LPAR_Percent GE 100
OS390_Channel_Path_Offline_Crit
OS390_Channel_Path_Offline_Crit monitors to determine whether a channel path is offline and issues a critical alert when this condition is true. This may be a normal condition if the indicated channel path is dynamically managed. Check the configuration matrix for the current image and determine whether this channel path should be online. If so, attempt to VARY it online.
The formula is:
IF VALUE Channel_Paths.Online EQ N
OS390_Channel_Path_Offline_Warn
OS390_Channel_Path_Offline_Warn monitors to determine whether a channel is offline and issues a warning when this condition is true. This may be a normal condition if the indicated channel path is dynamically managed. Check the configuration matrix for the current image and determine whether this channel path should be online. If so, attempt to VARY it online.
The formula is:
IF VALUE Channel_Paths.Online EQ N
OS390_Common_PageDS_PctFull_Crit
OS390_Common_PageDS_PctFull_Crit monitors to determine whether the percentage of slots in use on the common page dataset is greater than or equal to 80% and issues a Critical alert if the condition is true. If the common page data set becomes full, a system crash is imminent. Determine which address spaces are using the largest number of common slots and terminate those that can be shut down at this time. If this situation occurs more frequently than once a month, a larger common page data set should be created and activated at the next IPL.
The formula is:
IF VALUE Page_Dataset_Activity.Dataset_Type EQ Common
AND
VALUE Page_Dataset_Activity.Percent_Full GE 80
OS390_Common_PageDS_PctFull_Warn
OS390_Common_PageDS_PctFull_Warn monitors to determine whether the percentage of slots in use on the common page dataset is greater than or equal to 60% and less than 80% and issues a Warning if the condition is true. If the common page data set becomes full, a system crash is imminent. Determine which address spaces are using the largest number of common slots and terminate those that can be shut down at this time. If this situation occurs more frequently than once a month, a larger common page data set should be created and activated at the next IPL.
The formula is:
IF VALUE Page_Dataset_Activity.Dataset_Type EQ Common
AND
VALUE Page_Dataset_Activity.Percent Full GE 60 AND
VALUE Page_Dataset_Activity.Percent Full LT 80
OS390_CSA_Growth_Crit monitors to determine whether the growth in use of the Common Storage Area is greater than or equal to 50 and issues a Critical alert if the condition is true. Identify the address spaces using high amounts of CSA and showing rapid growth in its use. Stop or cancel nonessential address spaces to avert a crash.
The formula is:
IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Growth GE 50
OS390_CSA_Growth_Warn monitors to determine whether the growth in use of the Common Storage Area is between 35 and 49 inclusive and issues a Warning if the condition is true. Identify the address spaces using high amounts of CSA and showing rapid growth in its use. Stop or cancel nonessential address spaces to avert a crash.
The formula is:
IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Growth GE 35 AND
VALUE Common_Storage.Growth LT 50
OS390_DASD_Busy_Percent_Crit monitors DASD device utilization and issues a Critical alert when the percentage of time a device is busy is greater than or equal to 100. This condition may represent a current or pending performance problem if any service class period is missing its goal because of I/O delay for this device. This threshold is set to 100% by default and should be set to a lower value only when pursuing a chronic DASD performance problem.
The formula is:
IF VALUE DASD_MVS_Devices.Percent_Busy GE 100
OS390_DASD_Busy_Percent_Warn monitors DASD device utilization and issues a Warning alert when the percentage of time a device is busy is greater than or equal to 100. This condition may represent a current or pending performance problem if any service class period is missing its goal because of I/O delay for this device. This threshold is set to 100% by default and should be set to a lower value only when pursuing a chronic DASD performance problem.
The formula is:
IF VALUE DASD_MVS_Devices.Percent_Busy GE 100
OS390_DASD_Dropped_Ready_Crit monitors the count of devices in this condition and issues a Critical alert if the number is greater than or equal to 5. Should this rare situation occur, a hardware service person should be notified.
The formula is:
IF VALUE DASD_MVS.Dropped_Ready GE 5
OS390_DASD_Dropped_Ready_Warn monitors the count of devices in this condition and issues a Warning if the number is greater than 0 but less than 5. Should this rare situation occur, a hardware service person should be notified.
The formula is:
IF VALUE DASD_MVS.Dropped_Ready GT 0 AND
VALUE DASD_MVS.Dropped_Ready LT 5
OS390_DASD_NoDynamicReconn_Critical
OS390_DASD_NoDynamicReconn_Critical monitors the count of devices in this condition and issues a Critical alert if the number is greater than or equal to 5. This problem should be referred to appropriate personnel to determine whether the devices should be offloaded.
The formula is:
IF VALUE DASD_MVS.No_Dynamic Path_Reconnect GE 5
OS390_DASD_NoDynamicReconn_Warn
OS390_DASD_NoDynamicReconn_Warn monitors the count of devices in this condition and issues a Warning if the number is greater than 0 and less than 5. This problem should be referred to appropriate personnel to determine whether the devices should be offloaded.
The formula is:
IF VALUE DASD_MVS.No_Dynamic_Path_Reconnect GT 0 AND
VALUE DASD_MVS.No_Dynamic_Path_Reconnect LT 5
OS390_DASD_Not_Responding_Crit
OS390_DASD_Not_Responding_Crit monitors the count of devices in this condition and issues a Critical alert if the number is greater than or equal to 5. Should this rare situation occur, a hardware service person should be notified.
The formula is:
IF VALUE DASD_MVS.Not_Responding GE 5
OS390_DASD_Not_Responding_Warn
OS390_DASD_Not_Responding_Warn monitors the count of devices in this condition and issues a Warning if the number is greater than 0 but less than 5. Should this rare situation occur, a hardware service person should be notified.
The formula is:
IF VALUE DASD_MVS.Not_Responding GT 0 AND
VALUE DASD_MVS.Not_Responding LT 5
OS390_DASD_Response_Time_Crit monitors the response time for a DASD device and issues a Critical alert when the threshold value is exceeded. This situation is distributed as disabled by default and should be activated only when attempting to solve a problem where excessive DASD response time is likely to be the cause.
The formula is:
IF VALUE DASD_MVS_Devices.Response GE 1000000000
OS390_DASD_Response_Time_Warn monitors the response time for a DASD device and issues a Warning when the threshold value is exceeded. This situation is distributed as disabled by default and should be activated only when attempting to solve a problem where excessive DASD response time is likely to be the cause.
The formula is:
If VALUE DASD_MVS_Devices.Response GE 1000000000
OS390_ECSA_Allocation_Pct_Crit
OS390_ECSA_Allocation_Pct_Crit monitors to determine whether the percentage of the Extended Common Storage Area allocated is greater than or equal to 95% and issues a Critical alert if the condition is true. Check the current size of ECSA (the second CSA subparameter in IEASYSxx). The value may need to be adjusted before the next IPL. Attempt to determine who is using excessive ECSA or causing it to grow rapidly.
The formula is:
IF VALUE Common_Storage.Area EQ ECSA AND
VALUE Common_Storage.Allocation_Percent GE 95
OS390_ECSA_Allocation_Pct_Warn
OS390_ECSA_Allocation_Pct_Warn monitors to determine whether the percentage of the Extended Common Storage Area allocated is between 90% and 94.9% inclusive and issues a Warning if the condition is true. Check the current size of ECSA (the second CSA subparameter in IEASYSxx). The value may need to be adjusted before the next IPL. Attempt to determine who is using excessive ECSA or causing it to grow rapidly.
The formula is:
IF VALUE Common_Storage.Area EQ ECSA AND
VALUE Common_Storage.Allocation_Percent GE 90 AND
VALUE Common_Storage.Allocation_Percent LT 95
OS390_ExpandedOnlineFrames_Crit
OS390_ExpandedOnlineFrames_Crit monitors to determine whether the number of expanded storage frames online is less than the current Critical threshold and if so, issues a Critical alert.If there is no indication of a hardware problem, the threshold may need to be adjusted. Determine whether storage was reconfigured before the current IPL. In this case, too, the threshold may need to be adjusted.
IF VALUE Real_Storage.Storage_Type EQ ExpandedStorage
AND
VALUE Real_Storage.Online_Frames LT 0
OS390_ExpandedOnlineFrames_Warn
OS390_ExpandedOnlineFrames_Warn monitors to determine whether the number of expanded storage frames online is less than the current Warning threshold and if so, issues a Warning alert. If there is no indication of a hardware problem, the threshold may need to be adjusted. Determine whether storage was reconfigured before the current IPL. In this case, too, the threshold may need to be adjusted.
IF VALUE Real_Storage.Storage_Type EQ ExpandedStorage
AND
VALUE Real_Storage.Online_Frames LT 0
OS390_ExpandedToCentralStor_Crit
OS390_ExpandedToCentralStor_Crit monitors the page movement rate from expanded storage to central storage and issues a Critical alert when the threshold value is exceeded. This situation is disabled by default and should be activated only when attempting to solve a problem where excessive page movement is likely to be a cause.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Pages_Read_From_Expanded GE 1000000000
OS390_ExpandedToCentralStor_Warn
OS390_ExpandedToCentralStor_Warn monitors the page movement rate from expanded storage to central storage and issues a Warning alert when the threshold value is exceeded. This situation is disabled by default and should be activated only when attempting to solve a problem where excessive page movement is likely to be a cause.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Pages_Read_From_Expanded GE 1000000000
OS390_GlobalEnqueueReserve_Crit
OS390_GlobalEnqueueReserve_Crit monitors to determine whether the maximum wait time or the current wait time of any enqueue is greater than 60 seconds and issues a Critical alert if the condition is true. Check to determine who is holding the ENQ. If it is a batch job that can be cancelled and requeued, you can break the deadlock by so doing. If it is a started task or online user, report the problem to the appropriate personnel.
The formula is:
IF VALUE Enqueues.Maximum_Wait_Time GT 60 OR
VALUE Enqueues.Wait_Time GT 60
OS390_GlobalEnqueueReserve_Warn
OS390_GlobalEnqueueReserve_Warn monitors to determine whether the maximum wait time or the current wait time of any enqueue is between 31 and 60 seconds inclusive and issues a Warning if the condition is true. Check to determine who is holding the ENQ. If it is a batch job that can be cancelled and requeued, you can break the deadlock by so doing. If it is a started task or online user, report the problem to the appropriate personnel.
The formula is:
IF (VALUE Enqueues.Maximum_Wait_Time GT 30 AND
VALUE Enqueues.Maximum_Wait_Time LE 60) OR
(VALUE Enqueues.Wait_Time GT 30 and VALUE Enqueues.Wait_Time LE 60)
OS390_GRS_Broken_Crit monitors to determine whether the Global Resource Serialization (GRS) complex is broken and issues a Critical alert if it is. If the GRS complex is broken, it may be necessary to attempt to restart GRS from the console. You can display the status of the channel-to-channel adaptors on each system by entering the command D GRS.
The formula is:
IF VALUE Operator_Alerts.GRS_Status EQ Broken
OS390_GRS_Broken_Warn monitors to determine whether the Global Resource Serialization (GRS) complex is broken and issues a Warning if it is. If the GRS complex is broken, it may be necessary to attempt to restart GRS from the console. You can display the status of the channel-to-channel adaptors on each system by entering the command D GRS.
The formula is:
IF VALUE Operator_Alerts.GRS_Status EQ Broken
OS390_GTF_Active_Crit monitors to determine whether the Generalized Trace Facility is active and issues a Critical alert if the condition is true. While the Generalized Trace Facility is a useful diagnostic tool, it can cause performance degradation. Ensure that GTF is active for the minimum time required to obtain the needed data.
The formula is:
IF VALUE Operator_Alerts.GTF_Active EQ True
OS390_GTF_Active_Warn monitors to determine whether the Generalized Trace Facility is active and issues a Warning if the condition is true. While the Generalized Trace Facility is a useful diagnostic tool, it can cause performance degradation. Ensure that GTF is active for the minimum time required to obtain the needed data.
The formula is:
IF VALUE Operator_Alerts.GTF_Active EQ True
OS390_HSM_RecallWait_Crit monitors to determine whether the wait time in seconds of the longest single HSM recall that is waiting is greater than or equal to 1200 seconds and issues a Critical alert if the condition is true. Make sure that there is no outstanding tape mount for an HSM tape. In some cases, a wait can occur when a Migration Level 1 volume is tied up by a RESERVE or other conflicting activity such as a volume backup.
The formula is:
IF VALUE Operator_Alerts.HSM_Recall_Wait_Time GE 1200
OS390_HSM_RecallWait_Warn monitors to determine whether the wait time in seconds of the longest single HSM recall that is waiting is between 600 and 1199 seconds inclusive and issues a Warning if the condition is true. Make sure that there is no outstanding tape mount for an HSM tape. In some cases, a wait can occur when a Migration Level 1 volume is tied up by a RESERVE or other conflicting activity such as a volume backup.
The formula is:
IF VALUE Operator_Alerts.HSM_Recall_Wait_Time GE 600 and
VALUE Operator_Alerts.HSM_Recall_Wait_Time LT 1200
OS390_Indexed_VTOC_Lost_Crit monitors the count of devices in this condition and issues a Critical alert if the count is greater than or equal to 5. Refer this problem to an appropriate storage management specialist.
The formula is:
IF VALUE DASD_MVS.Indexed_VTOC_Lost GE 5
OS390_Indexed_VTOC_Lost_Warn monitors the count of devices in this condition and issues a Warning if the number is greater than 0 but less than 5. Refer this problem to an appropriate storage management specialist.
The formula is:
IF VALUE DASD_MVS.Indexed_VTOC_Lost GT 0 AND
VALUE DASD_MVS.Indexed_VTOC_Lost LT 5
OS390_Local_PageDS_Errors_Crit
OS390_Local_PageDS_Errors_Crit monitors to determine whether the number of errors in a local page dataset is greater than or equal to 5 and issues a Critical alert if the condition is true. Identify the failing dataset or datasets. Check for a spare page dataset slot, and if there is no spare, increase the PAGETOT parameter in IEASYSxx. There should be at least one spare slot per two page datasets. Remove the failing dataset from the PAGE parameter in IEASYSxx. If there is a spare slot, PAGEADD a dataset and use PAGEDEL REPLACE to move the pages to a good dataset.
The formula is:
IF VALUE Page_Dataset_Activity.Errors GE 5
OS390_Local_PageDS_Errors_Warn
OS390_Local_PageDS_Errors_Warn monitors to determine whether the number of errors in a local page dataset is greater than or equal to 1 and less than 5 and issues a Warning if the condition is true. Identify the failing dataset or datasets. Check for a spare page dataset slot, and if there is no spare, increase the PAGETOT parameter in IEASYSxx. There should be at least one spare slot per two page datasets. Remove the failing dataset from the PAGE parameter in IEASYSxx. If there is a spare slot, PAGEADD a dataset and use PAGEDEL REPLACE to move the pages to a good dataset.
The formula is:
IF VALUE Page_Dataset_Activity.Errors GE 1 and
VALUE Page_Dataset_Activity.Errors LT 5
OS390_Local_PageDS_PctFull_Crit
OS390_Local_PageDS_PctFull_Crit monitors to determine whether the percentage of slots in use on a local page dataset is greater than or equal to 35% and issues a Critical alert if the condition is true. When usage approaches 30%, paging efficiency begins to decline, and blocked paging disappears at about 35% occupancy. If this situation occurs, prepare to PAGEADD another dataset if the critical threshold is passed. If the current PAGTOTL setting in IEASYSxx does not allow another dataset to be added, it should be increased before the next IPL.
The formula is:
IF VALUE Page_Dataset_Activity.Dataset_Type EQ Local and
VALUE Page_Dataset_Activity.Percent_Full GE 35
OS390_Local_PageDS_PctFull_Warn
OS390_Local_PageDS_PctFull_Warn monitors to determine whether a local page dataset is greater than or equal to 25% full and less than 35% full and issues a Warning if the condition is true. When usage approaches 30%, paging efficiency begins to decline, and blocked paging disappears at about 35% occupancy. If this situation occurs, prepare to PAGEADD another dataset if the critical threshold is passed. If the current PAGTOTL setting in IEASYSxx does not allow another dataset to be added, it should be increased before the next IPL.
The formula is:
IF VALUE Page_Dataset_Activity.Dataset_Type EQ Local and
VALUE Page_Dataset_Activity.Percent Full GE 25 and
VALUE Page_Dataset_Activity.Percent Full LT 35
OS390_LPAR_OverheadPercent_Crit
OS390_LPAR_OverheadPercent_Crit monitors to determine whether the percentage of time the system spends managing a logical partition is greater than or equal to 20% and issues a Critical alert if the condition is true. Possible causes include saturation of the CPU capacity leading to excessive overhead switching CPUs between LPARs. This can be compounded when there are too many logical processors assigned to an LPAR.
The formula is:
IF VALUE System_CPU_Utilization.Partition_Overhead% GE 20
OS390_LPAR_OverheadPercent_Warn
OS390_LPAR_OverheadPercent_Warn monitors to determine whether the percentage of time the system spends managing a logical partition is greater than or equal to 10% and less than 20% and issues a Warning if the condition is true. Possible causes include saturation of the CPU capacity leading to excessive overhead switching CPUs between LPARs. This can be compounded when there are too many logical processors assigned to an LPAR.
The formula is:
IF VALUE System_CPU_Utilization.Partition_Overhead% GE
10 and
VALUE System_CPU_Utilization.Partition_Overhead% LT 20
OS390_LPAR_STATUS_Crit monitors to determine whether LPAR CPU Management Overhead or Velocity Index have exceeded thresholds and if so, issues a Critical alert. These conditions may or may not be of immediate concern. In the case of LPAR CPU Management Overhead, if the number of configured LPARs is substantial, it may trigger this situation. If the conditions persist, you may consider reducing the number of configured LPARs. In the case of the Velocity Index, you may want to adjust LPAR weights if the LPARs' workloads are not meeting expected service levels.
The formula is:
IF (VALUE LPAR_Clusters.LPAR_Name NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.LPAR_Effective_Weight_Index LT 0.9) OR
(VALUE LPAR_Clusters.LPAR_NAME NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.Host_LPAR_Flag EQ Y AND
VALUE LPAR_Clusters.CPC_CPU_Overhead GT 15.0)
OS390_LPAR_STATUS_Warn monitors to determine whether LPAR CPU Management Overhead or Velocity Index have exceeded thresholds and if so, issues a Warning. These conditions may or may not be of immediate concern. In the case of LPAR CPU Management Overhead, if the number of configured LPARs is substantial, it may trigger this situation. If the conditions persist, you may consider reducing the number of configured LPARs. In the case of the Velocity Index, you may want to adjust LPAR weights if the LPARs' workloads are not meeting expected service levels.
The formula is:
IF (VALUE LPAR_Clusters.LPAR_Name NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.LPAR_Effective_Weight_Index LT 1.0) OR
(VALUE LPAR_Clusters.LPAR_NAME NE _CLTotal AND
VALUE LPAR_Clusters.LPAR_Name NE _CPTotal AND
VALUE LPAR_Clusters.LPAR_Name NE PHYSICAL AND
VALUE LPAR_Clusters.Host_LPAR_Flag EQ Y AND
VALUE LPAR_Clusters.CPC_CPU_Overhead GT 10.0)
OS390_MAX_ASIDs_in_Use_Crit monitors to determine whether the percentage that represents the maximum number of address space vector table slots that are in use or unavailable is greater than or equal to 90% and issues a Critical alert if the condition is true. Check the values of the MAXUSER, RSVNONR, and RSVSTART parameters as well as for any problems that could lead to address space IDs becoming unusable.
The formula is:
IF VALUE Operator_Alerts.ASVT_Slot_Utilization GE 90
OS390_MAX_ASIDs_in_Use_Warn monitors to determine whether the percentage that represents the maximum number of address space vector table slots that are in use or unavailable is between 80 and 89% inclusive, and issues a Warning if the condition is true. Check the values of the MAXUSER, RSVNONR, and RSVSTART parameters as well as for any problems that could lead to address space IDs becoming unusable.
The formula is:
IF VALUE Operator_Alerts.ASVT_Slot_Utilization GE 80 and
VALUE Operator_Alerts.ASVT_Slot_Utilization LT 90
OS390_Migration_Rate_Crit monitors to determine whether the storage type is Expanded Storage and the number of pages per second that are being moved from expanded to auxiliary storage is greater than or equal to 100, and issues a Critical alert if the condition is true. This alert may signify a problem is one or more service class periods are missing their goals because of excessive page-in waits. If there is no service problem visible, this condition should be referred to capacity planning to assess the need for an upgrade or repartitioning of storage, or the threshold should be adjusted.
The formula is:
IF VALUE Real_Storage.Storage Type EQ ExpandedStorage
and
VALUE Real_Storage.Migration_Rate GE 100
OS390_Migration_Rate_Warn monitors to determine whether the storage type is Expanded Storage and the number of pages per second that are being moved from expanded to auxiliary storage is greater than or equal to 50 and less than 100, and issues a Warning if the condition is true. This alert may signify a problem is one or more service class periods are missing their goals because of excessive page-in waits. If there is no service problem visible, this condition should be referred to capacity planning to assess the need for an upgrade or repartitioning of storage, or the threshold should be adjusted.
The formula is:
IF VALUE Real_Storage.Storage Type EQ ExpandedStorage
and
VALUE Real_Storage.Migration_Rate GE 50 and
VALUE Real_Storage.Migration_Rate LT 100
OS390_Network_ResponseTime_Crit
OS390_Network_ResponseTime_Crit monitors the Network Response Time and when it equals or exceeds 10, issues a Critical alert. Appropriate personnel should be notified if the condition persists.
The formula is:
IF VALUE User_Response_Time.Network_Response GE 10
OS390_Network_ResponseTime_Warn
OS390_Network_ResponseTime_Warn monitors the Network Response Time and when it equals or exceeds 5 but is less than 10, issues a Warning. Appropriate personnel should be notified if the condition persists.
The formula is:
IF VALUE User_Response_Time.Network_Response GE 5 AND
VALUE User_Response_Time.Network_Response LT 10
OS390_OLTEP_Active_Crit monitors to determine whether OLTEP is active and issues a Critical alert if the situation is true. Determine who is using OLTEP and minimize the time of its use.
The formula is:
IF VALUE Operator_Alerts.OLTEP_Active EQ True
OS390_OLTEP_Active_Warn monitors to determine whether OLTEP is active and issues a Warning alert if it is. Determine who is using OLTEP and minimize the time of its use.
The formula is:
IF VALUE Operator_Alerts.OLTEP_Active EQ True
OS390_Outstanding_WTORs_Crit monitors to determine whether the number of outstanding Write to Operator with Reply requests is greater than or equal to 12 and issues a Critical alert if the condition is true. Check the operator console for outstanding replies and address these. If all of the outstanding replies are correct and routine, you may want to adjust this situation's threshold.
The formula is:
IF VALUE Operator_Alerts.Outstanding_Operator Replies GE 12
OS390_Outstanding_WTORs_Warn monitors to determine whether the number of outstanding Write to Operator with Reply requests is between 10 or 11 inclusive and issues a Warning if the condition is true. Check the operator console for outstanding replies and address these. If all of the outstanding replies are correct and routine, you may want to adjust this situation's threshold.
The formula is:
IF VALUE Operator_Alerts.Outstanding_Operator_Replies
GE 10 AND
VALUE Operator_Alerts.Outstanding_Operator_Replies LT 12
OS390_PageDSNotOperational_Crit
OS390_PageDSNotOperational_Crit monitors the number of page datasets in this condition and issues a Critical alert if the number is greater than or equal to 5. Verify that paging devices are operational. If a device is not operational, attempt to VARY it online. If a page data set was drained by a prior PAGEDEL DRAIN command, it may now be removed by a PAGEDEL DELETE command. If this alert occurs without warning, an IPL may be imminent. Prepare to shut down and request appropriate assistance.
The formula is:
IF VALUE System_Paging_Activity.Datasets_Not_Operational GE 5
OS390_PageDSNotOperational_Warn
OS390_PageDSNotOperational_Warn monitors the number of page datasets in this condition and issues a Warning if the number is from 1 to 4 inclusive. Verify that paging devices are operational. If a device is not operational, attempt to VARY it online. If a page data set was drained by a prior PAGEDEL DRAIN command, it may now be removed by a PAGEDEL DELETE command. If this alert occurs without warning, an IPL may be imminent. Prepare to shut down and request appropriate assistance.
The formula is:
IF VALUE System_Paging_Activity.Datasets_Not_Operational
GT 0 AND
VALUE System_Paging_Activity.Datasets_Not_Operational LT 5
OS390_Physical_CPUs_Online_Crit
OS390_Physical_CPUs_Online_Crit monitors the number of online CPUs and issues a Critical alert when the number is less than the current threshold. This situation is disabled (set to 0) by default and should be activated only to diagnose chronic configuration problems.
The formula is:
IF VALUE System_CPU_Utilization.Physical_CPU_Count LT 0
OS390_Physical_CPUs_Online_Warn
OS390_Physical_CPUs_Online_Warn monitors the number of online CPUs and issues a warning when the number is less than the current threshold. This situation is disabled (set to 0) by default and should be activated only to diagnose chronic configuration problems.
The formula is:
IF VALUE System_CPU_Utilization.Physical_CPU_Count LT 0
OS390_Real_Stor_Migrate_Age_Crit
OS390_Real_Stor_Migrate_Age_Crit monitors to determine whether the storage type is Expanded Storage and the time in seconds that has passed since the oldest frame of expanded storage was last referenced is less than or equal to 50 seconds, and issues a Critical alert if the condition is true. This alert may signify a problem if one or more service class periods are missing their goals because of excessive page-in waits. If there is no service problem visible, this condition should be referred to capacity planning to assess the need for an upgrade or repartitioning of storage, or the threshold should be adjusted.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ ExpandedStorage
AND
VALUE Real_Storage.Migration_Age LE 50
OS390_Real_Stor_Migrate_Age_Warn
OS390_Real_Stor_Migrate_Age_Warn monitors to determine whether the storage type is Expanded Storage and the time in seconds that has passed since the oldest frame of expanded storage was last referenced is greater than 50 seconds and less than or equal to 200 seconds, and issues a Warning if the condition is true. This alert may signify a problem if one or more service class periods are missing their goals because of excessive page-in waits. If there is no service problem visible, this condition should be referred to capacity planning to assess the need for an upgrade or repartitioning of storage, or the threshold should be adjusted.
The formula is:
IF VALUE Real_Storage.Storage Type EQ ExpandedStorage
and
VALUE Real_Storage.Migration_Age LE 200 and
VALUE Real_Storage.Migration_Age GT 50
OS390_RMF_Not_Active_Crit monitors to determine whether the RMF monitor is inactive and issues a Critical alert if the condition is true. RMF data is essential to performance management and problem analysis. If you cannot restart the RMF, notify appropriate personnel.
The formula is:
IF VALUE Operator_Alerts.RMF_Not_Active EQ True
OS390_RMF_Not_Active_Warn monitors to determine whether the RMF monitor is inactive and issues a Warning if the condition is true. RMF data is essential to performance management and problem analysis. If you cannot restart the RMF, notify appropriate personnel.
The formula is:
IF VALUE Operator_Alerts.RMF_Not_Active EQ True
OS390_SMF_Not_Recording_Crit monitors to determine whether the SMF is recording information and issues a Critical alert if the condition is true. SMF data has numerous uses including resource accounting and capacity management. Check the SMF datasets and restart the collection process as soon as possible. If you cannot restart the SMF datasets, notify appropriate personnel.
The formula is:
IF VALUE Operator_Alerts.SMF_Not_Recording EQ True
OS390_SMF_Not_Recording_Warn monitors to determine whether the SMF is recording information and issues a Warning if the condition is true. SMF data has numerous uses including resource accounting and capacity management. Check the SMF datasets and restart the collection process as soon as possible. If you cannot restart the SMF datasets, notify appropriate personnel.
The formula is:
IF VALUE Operator_Alerts.SMF_Not_Recording EQ True
OS390_SYSLOG_Not_Recording_Crit
OS390_SYSLOG_Not_Recording_Crit monitors to determine whether the System Log is recording information and issues a Critical alert if the condition is true. Determine why logging has stopped. A possibility is that JES spool space is exhausted.
The formula is:
IF VALUE Operator_Alerts.SYSLOG_Not_Recording EQ True
OS390_SYSLOG_Not_Recording_Warn
OS390_SYSLOG_Not_Recording_Warn monitors to determine whether the System Log is recording information and issues a Warning if the condition is true. Determine why logging has stopped. A possibility is that JES spool space is exhausted.
The formula is:
IF VALUE Operator_Alerts.SYSLOG_Not_Recording EQ True
OS390_System_PageFault_Rate_Crit
OS390_System_PageFault_Rate_Crit monitors the system page fault rate and issues a Critical alert when the threshold is exceeded. This situation is shipped disabled by default.
The formula is:
IF VALUE System_Paging_Activity.Page_Fault_Rage GE 1000000000
OS390_System_PageFault_Rate_Warn
OS390_System_PageFault_Rate_Warn monitors the system page fault rate and issues a Warning when the threshold is exceeded. This situation is shipped disabled by default.
The formula is:
IF VALUE System_Paging_Activity.Page_Fault_Rage GE 1000000000
OS390_Tape_Dropped_Ready_Crit monitors the number of tape drives in
this condition and issues a
Critical alert if the threshold is exceeded. Check the devices and attempt
to make them ready. If this is not possible, report the condition to the
appropriate personnel.
The formula is:
IF VALUE Tape_Drives.Dropped_Ready GE 5
OS390_Tape_Dropped_Ready_Warn monitors the number of tape drives in this condition and issues a Warning if the threshold is exceeded. Check the devices and attempt to make them ready. If this is not possible, report the condition to the appropriate personnel.
The formula is:
IF VALUE Tape_Drives.Dropped_Ready GT 0 AND
VALUE Tape_Drives.Dropped_Ready LT 5
OS390_Tape_Not_Responding_Crit
OS390_Tape_Not_Responding_Crit monitors the number of tape drives in this condition and issues a Critical alert if the threshold is exceeded. If the condition is persistent and the devices cannot be activated by VARYing them online, report the problem to the appropriate personnel.
The formula is:
IF VALUE Tape_Drives.Not_Responding GE 5
OS390_Tape_Not_Responding_Warn
OS390_Tape_Not_Responding_Warn monitors the number of tape drives in this condition and issues a Warning if the threshold is exceeded. If the condition is persistent and the devices cannot be activated by VARYing them online, report the problem to the appropriate personnel.
The formula is:
IF VALUE Tape_Drives.Not_Responding GT 0 AND
VALUE Tape_Drives.Not_Responding LT 5
OS390_Tape_Permanent_Errors_Crit
OS390_Tape_Permanent_Errors_Crit monitors the count of permanent errors on a tape drive and issues a Critical alert if the number is greater than or equal to 30.
The formula is:
IF VALUE Tape_Drives.Permanent_Errors GE 30
OS390_Tape_Permanent_Errors_Warn
OS390_Tape_Permanent_Errors_Warn monitors the count of permanent errors on a tape drive and issues a Warning if the number is between 5 and 29 inclusive.
The formula is:
IF VALUE Tape_Drives.Permanent_Errors GE 5 AND
VALUE Tape_Drives.Permanent_Errors LT 30
OS390_Tape_Temp_Errors_Crit monitors the count of temporary errors on a tape drive and issues a Critical alert if the number is greater than or equal to 30. The problem could be caused either by the media or by the device. Monitor to determine whether there is additional degradation and if so, report the problem to appropriate personnel.
The formula is:
IF VALUE Tape_Drives.Temporary_Errors GE 30
OS390_Tape_Temp_Errors_Warn monitors the count of temporary errors on
a tape drive and issues a
Warning if the number is between 5 and 29 inclusive. The problem could
be caused either by the media or by the device. Monitor to determine whether
there is additional degradation and if so, report the problem to appropriate
personnel.
The formula is:
IF VALUE Tape_Drives.Temporary_Errors GE 5 and
VALUE Tape_Drives.Temporary_Errors LT 30
OS390_Undispatched_Tasks_Crit monitors to determine whether the number of tasks or address spaces that have not been dispatched by the SRM due to constraints is greater than or equal to 20 and issues a Critical alert if the condition is true. If the condition persists for more than an hour, a capacity upgrade may be required. Determine whether any important service classes are missing their goals.
The formula is:
IF VALUE System_CPU_Utilization.Undispatched_Tasks GE 20
OS390_Undispatched_Tasks_Warn monitors to determine whether the number of tasks or address spaces that have not been dispatched by the SRM due to constraints is greater than or equal to 5 and less than 20 and issues a Warning if the condition is true. If the condition persists for more than an hour, a capacity upgrade may be required. Determine whether any important service classes are missing their goals.
The formula is:
IF VALUE System_CPU_Utilization.Undispatched_Tasks GE
05 AND
VALUE System_CPU_Utilization.Undispatched_Tasks LT 20
OS390_Unowned_Common_Stor_Crit
OS390_Unowned_Common_Stor_Crit monitors the amount of unowned storage in the Common Services Area and issues a Critical alert if the threshold is exceeded. Ensure that the CSA Analyzer collector is running.
The formula is:
IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Unowned GE 1000000000
OS390_Unowned_Common_Stor_Warn
OS390_Unowned_Common_Stor_Warn monitors the amount of unowned storage in the Common Services Area and issues a Warning alert if the threshold is exceeded. Ensure that the CSA Analyzer collector is running.
The formula is:
IF VALUE Common_Storage.Area EQ CSA AND
VALUE Common_Storage.Unowned GE 1000000000
OS390_Unref_Interval_Cnt_Crit monitors to determine whether the storage type is Central Storage and the amount of time, in seconds, that the oldest frame of pageable storage has gone without being referenced is less than or equal to 2 seconds, and issues a Critical alert if the condition is true. Determine whether any important service classes are failing to meet their goals and, if Private Page-in Wait is a significant reason. If so, Central Storage may be over-committed, possibly the result of a capacity problem.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Unreferenced_Interval_Count LE 2
OS390_Unref_Interval_Cnt_Warn monitors to determine whether the storage type is Central Storage and the amount of time, in seconds, that the oldest frame of pageable storage has gone without being referenced is greater than 2 and less than or equal to 4 seconds, and issues a Warning if the condition is true. Determine whether any important service classes are failing to meet their goals and, if Private Page-in Wait is a significant reason. If so, Central Storage may be over-committed, possibly the result of a capacity problem.
The formula is:
IF VALUE Real_Storage.Storage_Type EQ CentralStorage AND
VALUE Real_Storage.Unreferenced_Interval_Count LE 4 AND
VALUE Real_Storage.Unreferenced_Interval_Count GT 2
OS390_User_Host_Resp_Time_Crit
OS390_User_Host_Resp_Time_Crit monitors to determine whether the host (internal) response time for the indicated TSO user is exceeding the Critical threshold and if so, issues a Critical alert. If the user's service class is meeting its goal, there may be a specific problem in this user's address space. If the service class is missing its goal, the goal may be too demanding and may need to be adjusted.
The formula is:
IF VALUE User_Response_Time.Host_Response GE 100000000
OS390_User_Host_Resp_Time_Warn
OS390_User_Host_Resp_Time_Warn monitors to determine whether the host (internal) response time for the indicated TSO user is exceeding the Warning threshold and if so, issues a Warning alert. If the user's service class is meeting its goal, there may be a specific problem in this user's address space. If the service class is missing its goal, the goal may be too demanding and may need to be adjusted.
The formula is:
IF VALUE User_Response_Time.Host_Response GE 100000000
OS390_User_Total_Resp_Time_Crit
IF VALUE User_Response_Time.Total_Response GE 100000000.0
OS390_User_Total_Resp_Time_Crit monitors the total response time (host plus network) for a TSO user and issues a Critical alert if the threshold is exceeded. If the service class for this user is meeting its goal, the problem may be a network response time problem.
OS390_User_Total_Resp_Time_Warn
OS390_User_Total_Resp_Time_Warn monitors the total response time (host plus network) for a TSO user and issues a Warning alert if the threshold is exceeded. If the service class for this user is meeting its goal, the problem may be a network response time problem.
The formula is:
IF VALUE User_Response_Time.Total_Response GE 100000000.0
OS390_WTO_Buffers_Left_Crit monitors to determine whether the remaining WTO buffer pool is becoming dangerously small and issues a Critical alert if the condition is true. Determine whether a console device is down and if so, switch the message stream to another device.
The formula is:
IF VALUE Operator_Alerts.WTO_Buffers_Remaining LE 20
OS390_WTO_Buffers_Left_Warn monitors to determine whether the remaining WTO buffer pool is becoming short of resources and issues a Warning if the condition is true. Determine whether a console device is down and if so, switch the message stream to another device.
The formula is:
IF VALUE Operator_Alerts.WTO_Buffers_Remaining GT 20 AND
VALUE Operator_Alerts.WTO_Buffers_Remaining LE 100
Processes > 80% and < 90% of system limit monitors for a shortage of processes by checking the ratio of the current number of processes to the maximum number of processes that are allowed in the system and issues a Warning if the value is between 80% and 90% of the limit.
Processes > 90% of system limit monitors for a shortage of processes by checking the ratio of the current number of processes to the maximum number of processes that are allowed in the system and issues a Critical alert if the value is greater than 90% of the limit.
Kernel CPU > 50% monitors the percentage of CPU time used by the Kernel and issues a Warning if the value exceeds 50%.
Process UNIX run time > 50% monitors the percentage of UNIX run time used by a given process and issues a Warning if the value exceeds 50%.
A/S UNIX system time > 50% monitors address spaces and issues a Warning when the percent of system CPU time for UNIX work exceeds 50%.
A/S UNIX user time > 50% monitors address spaces and issues a Warning when the percent of user CPU time for UNIX work exceeds 50%.
HFS ENQ contention > 30 seconds monitors the number of seconds the issuer of the enqueue waits for or owns the resource, and issues a Critical indicator when the time exceeds 30 seconds.
HFS ENQ contention > 10 seconds monitors the number of seconds the issuer of the enqueue waits for or owns the resource, and issues a Warning when the time exceeds 10 seconds.
Mounted file system > 90% used monitors the percent of used space in mounted file systems and issues a Critical alert when the percentage for any file system exceeds 90%.
Mounted file system > 80% and < 90% used monitors the percent of used space in mounted file systems and issues a Warning when the percentage falls between 80% and 90% used.
Logged on user idle time > 8 hours issues a Warning when there has been no activity from a logged-on user after more than 8 hours has elapsed.
Quiesced file system is not monitored. It can be set to issue an alert when it determines that the status of a mounted file system is QUIESCED.
Missing inetd process or Unwanted inetd process running monitor inetd, a special system daemon that is used to start and stop other daemons. You can enable one of these situations to issue a Critical alert if inetd is not running when it should be, or is running when it should not be.
Two predefined situations shipped with OMEGAMON XE for OS/390 UNIX System Services are used to monitor your WebSphere File Systems..
Low Disk Space for WebSphere File Systems issues an alert when the percentage of space used by WebSphere file systems exceeds 50% of the total space available. To use this situation, modify it for your enterprise by replacing the dataset name OMVS.WAS350.SEJSHFS2 with the WebSphere File System Name specific to your installation..
Missing WebSphere Daemon Process issues an alert if the WebSphere Daemon process is not running.
Missing HTTP Server Daemon Process issues an alert if the HTTP Server Daemon process is not running. WebSphere Application Server V3.5 runs within the HTTP server. Thus, if the HTTP Server is not up, WebSphere Application Server will be unavailable.