Monitoring HOWTO


Using the Monitoring Application

This chapter describes planning for monitoring your system, tracking system events, and using and modifying the predefined scripts, expressions, commands, and responses packaged with this application. These predefined elements and how to use them are described in detail in Components Provided for Monitoring.


Planning What to Monitor in Your System

First, select conditions to monitor that would have a severe impact on your system. These conditions might include:

When you have determined the resource problems you want to monitor, review the predefined conditions and identify the conditions you want to use. Use the lscondition command to view all conditions. If a predefined condition deviates from your requirements in some way, you can edit it, use it as a template to create your own customized condition, or create your own condition using the mkcondition command. After you have selected conditions for monitoring, you need to plan one or more responses to be taken for the event and the optional rearm event.


Planning How to Respond to Detected Conditions

A set of predefined responses comes installed with your system (see Predefined Responses). Each response has one or more actions associated with it. Each action can be activated or deactivated to fit your particular work environment and schedule.

The predefined actions are:

You can also write your own commands that correct or mitigate conditions, and run them through the Run program option.

You might specify different actions based on when the monitored condition occurs. For example, you could have one set of actions to respond to a condition during working hours and another set to respond to a condition on nights and weekends. To be notified of events when you are away from your terminal, your actions must include email, broadcasting, or logging.


Getting Started with the Monitoring Application

This section describes how to start using the Monitoring application. You can use the command line to do the following:

How to Associate a Response with a Condition

To associate a response with a condition, use the following commands (see the command man pages or Cluster Systems Management for Linux Technical Reference for detailed usage information):

  1. Use the lsresponse command to list all responses.
  2. Use the mkcondresp command to associate a condition with a response without starting monitoring.
  3. Use the startcondresp command to associate a condition with a response and begin monitoring immediately.

How to View Events

To view events use the lsaudrec command to view the audit log. You can use the notifyevents predefined script to log events to a file.

How to Stop Monitoring

To stop monitoring use the stopcondresp command.


How to Monitor Your System Using the Command Line Interface

The following scenarios demonstrate most frequently performed monitoring tasks. See the Cluster Systems Management for Linux Technical Reference or the command man pages for detailed usage information.

  1. To list the conditions in your system, type: lscondition. Output is similar to:
    Name                           Monitoring Status
    "/tmp space used"              "Not monitored"
    "var space used"               "Monitored" 
    (more conditions listed...)
     
     
    
  2. To list the responses available in the system, type: lsresponse. Output is similar to:
    Name
    "Critical notification"
    "Warning notification"
    "Informational notification"
    "Remove unwanted files"
    (more responses listed...) 
     
     
    
  3. To list responses associated with a condition, use the lscondresp command. For example, to list the responses associated with the condition "/tmp space used", type: lscondresp "/tmp space used". Output is similar to:
    Condition            Response                     State
    "/tmp space used"    "Broadcast event on-shift"   Active
    "/tmp space used"    "E-mail root anytime"        Not Active
     
     
    
  4. To start monitoring a condition, one or more responses need to be specified for the condition. For example, to start monitoring the condition "/tmp space used" using the response "critical notification" and "remove unwanted files," type:
    startcondresp "/tmp space used" "critical notification" "remove unwanted files"
     
     
    
  5. You can either stop monitoring a condition completely, or stop monitoring a condition with specific responses. To stop monitoring the condition "/tmp space used" completely, type:
    stopcondresp "/tmp space used"
     
     
    

    To stop monitoring the condition "/tmp space used" with a specific response, "critical notification," type:

    stopcondresp "/tmp space used" "critical notification"
     
     
    
  6. You can copy a condition to use as a template for a new condition. For example, to create a new condition "my test condition" from an existing condition "/tmp space used," type:
    mkcondition -c "/tmp space used" "my test condition"
     
     
    
  7. To view events or the actions taken in response to the events, type:

    lsaudrec -l
    

For a complete list of predefined commands, scripts, and utilities, see Predefined Commands, Scripts, Utilities, and Files.


Tracking Monitoring Activity

For information about monitoring events, rearm events, actions, and errors that have occurred, view the audit log by using the lsaudrec command. See the lsaudrec man page and Using the Audit Log to Track Monitoring Activity for details.


Using the Audit Log to Track Monitoring Activity

Audit log records include the following:

The administrator can use the audit log to track activity that may not be visible otherwise because the activity is related to subsystems running in the background. To list audit log records, use the lsaudrec command. To remove audit log records, use the rmaudrec command. For details see the command man pages, or Cluster Systems Management for Linux Technical Reference.


Using Scripts

The Cluster Systems Management for Linux Technical Reference contains information about predefined scripts that are provided with the Event Response resource manager (ERRM). The following scripts are provided:

You can also use existing operating system commands and user-written scripts in the definition of an action.

Using Predefined Response Scripts

The displayevent, logevent, msgevent, notifyevent, and wallevent scripts are examples of the types of actions that system administrators can use to respond to events. The displayevent script displays an event or a rearm event to a specified X-window display. The logevent script appends a formatted string containing the specifics of an event to a user-specified text file. The msgevent script sends an event or a rearm event to a specified user's console. The notifyevent script captures the event information and sends the event information via UNIX mail to a specified userid. The wallevent script broadcasts a message to all users who are logged in. For a full description of these scripts, see Cluster Systems Management for Linux Technical Reference or the command man pages.

You can use these scripts as-is or treat them as templates by copying and modifying them to create new scripts that suit your needs. For example, to use the wallevent script as a template for a page event command, do the following:

  1. Copy the wallevent script at /usr/sbin/rsct/bin/wallevent to a new script file and rename it, for example, to pageevent.
  2. Replace the wall command with the program for your pager.

For a command to run in response to an event or a rearm event defined by a condition, the command must be included as an action in an Event Response resource. When an Event Response resource is defined, specify the entire path name for a script that is used within an action. Use the Event Response resource manager commands to set up responses.

Test any scripts or commands that you have created or modified before you use them as actions in a production environment.

Using Event Response Environment Variables

After ERRM has subscribed to RMC to monitor a condition and that condition occurs, the ERRM runs commands in the user's operating system environment. The Event Response resource contains a list of commands to be run. Before each command is run, the following environment variables are established for the command to use (see Event Response Resource Manager for a detailed description of the ERRM):

(See "Resource Handle" on page *** for a definition and an example of a resource handle.)


Using Expressions

The information in this section is for advanced users who want to:

Permissible data types and operators are described, and the order of precedence for the operators is included. RMC uses these functions to match a selection string against the persistent attributes of a resource and to implement the evaluation of an event expression or a rearm expression.

An expression is similar to a C language statement or the WHERE clause of an SQL query. It is composed of variables, operators, and constants. The C and SQL syntax styles may be intermixed within a single expression. The following table relates the RMC terminology to SQL terminology:

RMC SQL
attribute name column name
select string WHERE clause
operators predicates, logical connectives
resource class table

SQL Restrictions

SQL syntax is supported for selection strings, with the following restrictions:

Supported Base Data Types

The term variable is used in this context to mean the column name or attribute name in an expression. Variables and constants in an expression may be one of the following data types that are supported by the RMC subsystem:

Symbolic Name Description
CT_INT32 Signed 32-bit integer
CT_UINT32 Unsigned 32-bit integer
CT_INT64 Signed 64-bit integer
CT_UINT64 Unsigned 64-bit integer
CT_FLOAT32 32-bit floating point
CT_FLOAT64 64-bit floating point
CT_CHAR_PTR Null-terminated string
CT_BINARY_PTR Binary data - arbitrary-length block of data
CT_RSRC_HANDLE_PTR Resource handle - an identifier for a resource that is unique over space and time (20 bytes)

Structured Data Types

In addition to the base data types, aggregates of the base data types may be used as well. The first aggregate data type is similar to a structure in C in that it can contain multiple fields of different data types. This aggregate data type is referred to as structured data (SD). The individual fields in the structured data are referred to as structured data elements or simply elements. Each element of a structured data type may have a different data type which can be one of the base types in the preceding table or any of the array types discussed in the next section, except for the structured data array.

The second aggregate data type is an array. An array contains zero or more values of the same data type, such as an array of CT_INT32 values. Each of the array types has an associated enumeration value (CT_INT32_ARRAY, CT_UINT32_ARRAY). Structured data may also be defined as an array but is restricted to have the same elements in every entry of the array.

Data Types That Can Be Used for Literal Values

Literal values can be specified for each of the base data types as follows:

Array
An array or list of values may be specified by enclosing variables or literal values, or both, within braces {} or parentheses () and separating each element of the list with a comma. For example: { 1, 2, 3, 4, 5 } or ( "abc", "def", "ghi" )

Entries of an array can be accessed by specifying a subscript as in the C programming language. The index corresponding to the first element of the array is always zero; for example, List [2] references the third element of the array named List. Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. For example, if List is an integer array, then List[2]+4 produces the sum of 4 and the current value of the third entry of the array.

Binary Data
A binary constant is defined by a sequence of hexadecimal values, separated by white space. All hexadecimal values comprising the binary data constant are enclosed in double quotation marks. Each hexadecimal value includes an even number of hexadecimal digits, and each pair of hexadecimal digits represents a byte within the binary value. For example:
"0xabcd 0x01020304050607090a0b0c0d0e0f1011121314"
 

Character Strings
A string is specified by a sequence of characters surrounded by single or double quotation marks (you can have any number of characters, including none). Any character may be used within the string except the null '\0' character. Double quotation marks and backslashes may be included in strings by preceding them with the backslash character.

Floating Types
These types can be specified by the following syntax:

Integer Types
These types can be specified in decimal, octal, or hexadecimal format. Any value that begins with the digits 1-9 and is followed by zero or more decimal digits (0-9) is interpreted as a decimal value. A decimal value is negated by preceding it with the character '-'. Octal constants are specified by the digit 0 followed by 1 or more digits in the range 0-7. Hexadecimal constants are specified by a leading 0 followed by the letter x (uppercase or lowercase) and then followed by a sequence of one or more digits in the range 0-9 or characters in the range a-f (uppercase or lowercase).

Resource Handle
A fixed-size entity that consists of two 16-bit and four 32-bit words of data. A literal resource handle is specified by a group of six hexadecimal integers. The first two values represent 16-bit integers and the remaining four each represent a 32-bit word. Each of the six integers is separated by white space. The group is surrounded by double quotation marks. The following is an example of a resource handle:
"0x4018 0x0001 0x00000000 0x0069684c 0x00519686 0xaf7060fc"
 

Structured Data
Structured data values can be referenced only through variables. Nevertheless, the RMC command-line interface displays structured data (SD) values and accepts them as input when a resource is defined or changed. A literal SD is a sequence of literal values, as defined in Data Types That Can Be Used for Literal Values, that are separated by commas and enclosed in square brackets. For example, ['abc',1,{3,4,5}] specifies an SD that consists of three elements: (a) the string 'abc', (b) the integer value 1, and (c) the three-element array {3,4,5}.

Variable names refer to values that are not part of the expression but are accessed while running the expression. For example, when RMC processes an expression, the variable names are replaced by the corresponding persistent or dynamic attributes of each resource.

Entries of an array may be accessed by specifying a subscript as in 'C'. The index corresponding to the first element of the array is always 0 (for example, List[2] refers to the third element of the array named List). Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. A subscripted value may be used wherever the base data type of the array is used. For example, if List is an integer array, then "List[2]+4" produces the sum of 4 and the current value of the third entry of the array.

The elements of a structured data value can be accessed by using the following syntax:

<variable name>.<element name>

For example, a.b

The variable name is the name of the table column or resource attribute, and the element name is the name of the element within the structured data value. Either or both names may be followed by a subscript if the name is an array. For example, a[10].b refers to the element named b of the 11th entry of the structured data array called a. Similarly, a[10].b[3] refers to the fourth element of the array that is an element called b within the same structured data array entry a[10].

How Variable Names Are Handled

Variable names refer to values that are not part of an expression but are accessed while running the expression. When used to select a resource, the variable name is a persistent attribute. When used to generate an event, the variable name is a dynamic attribute. When used to select audit records, the variable name is the name of a field within the audit record.

A variable name is restricted to include only 7-bit ASCII characters that are alphanumeric (a-z, A-Z, 0-9) or the underscore character (_). The name must begin with an alphabetic character. When the expression is used by the RMC subsystem for an event or a rearm event, the name can have a suffix that is the '@' character followed by 'P', which refers to the previous observation.

Operators That Can Be Used in Expressions

Constants and variables may be combined by an operator to produce a result that in turn may be used with another operator. The resulting data type or the expression must be a scalar integer or floating-point value. If the result is zero, the expression is considered to be FALSE; otherwise, it is TRUE.

Note:
Blanks are optional around operators and operands unless their omission causes an ambiguity. An ambiguity typically occurs only with the word form of operator (that is, AND, OR, IN, LIKE, etc.). With these operators, a blank or separator, such as a parenthesis or bracket, is required to distinguish the word operator from an operand. For example, aANDb is ambiguous. It is unclear if this is intended to be the variable name aANDb or the variable names a, b combined with the operator AND. It is actually interpreted by the application as a single variable name aANDb. With non-word operators (for example, +, -, =, &&, etc.) this ambiguity does not exist, and therefore blanks are optional.

The set of operators that can be used in strings is summarized in the following table:

Operator Description Left Data Types Right Data Types Example Notes
+ Addition Integer,float Integer,float "1+2" results in 3 None
- Subtraction Integer,float Integer,float "1.0-2.0" results in -1.0 None
* Multiplication Integer,float Integer,float "2*3" results in 6 None
/ Division Integer,float Integer,float "2/3" results in 1 None
- Unary minus None Integer,float "-abc" None
+ Unary plus None Integer,float "+abc" None
.. Range Integers Integers "1..3" results in 1,2,3 Shorthand for all integers between and including the two values
% Modulo Integers Integers "10%2" results in 0 None
| Bitwise OR Integers Integers "2|4" results in 6 None
& Bitwise AND Integers Integers "3&2" results in 2 None
~ Bitwise complement None Integers ~0x0000ffff results in 0xffff0000 None
^ Exclusive OR Integers Integers 0x0000aaaa^0x0000ffff results in 0x00005555 None
>> Right shift Integers Integers 0x0fff>>4 results in 0x00ff None
<< Left shift Integers Integers "0x0ffff<<4" results in 0xffff0 None
==


=

Equality All but SDs All but SDs "2==2" results in 1


"2=2" results in 1

Result is true (1) or false (0)
!=


<>

Inequality All but SDs All but SDs "2!=2" results in 0


"2<>2" results in 0

Result is true (1) or false (0)
> Greater than Integer,float Integer,float "2>3" results in 0 Result is true (1) or false (0)
>= Greater than or equal Integer,float Integer,float "4>=3"=1 Result is true (1) or false (0)
< Less than Integer,float Integer,float "4<3" results in 0 Result is true (1) or false (0)
<= Less than or equal Integer,float Integer,float "2<=3" results in 1 Result is true (1) or false (0)
=~ Pattern match Strings Strings "abc"=~"a.*" results in 1 Right operand is interpreted as an extended regular expression
!~ Not pattern match Strings Strings "abc"!~"a.*" results in 0 Right operand is interpreted as an extended regular expression
=?


LIKE


like

SQL pattern match Strings Strings "abc"=? "a%" results in 1 Right operand is interpreted as a SQL pattern
!?


NOT LIKE


not like

Not SQL pattern match Strings Strings "abc"!? "a%" results in 0 Right operand is interpreted as a SQL pattern
|<


IN


in

Contains any All but SDs All but SDs "{1..5}|<{2,10}" results in 1 Result is true (1) if left operand contains any value from right operand
><


NOT IN


not in

Contains none All but SDs All but SDs "{1..5}><{2,10}" results in 1 Result is true (1) if left operand contains no value from right operand
&< Contains all All but SDs All but SDs "{1..5}&<{2,10}" results in 0 Result is true (1) if left operand contains all values from right operand
||


OR


or

Logical OR Integers Integers "(1<2)||(2>4)" results in 1 Result is true (1) or false (0)
&&


AND


and

Logical AND Integers Integers "(1<2)&&(2>4)" results in 0 Result is true (1) or false (0)
!


NOT


not

Logical NOT None Integers "!(2==4)" results in 1 Result is true (1) or false (0)

When integers of different signs or size are operands of an operator, standard C style casting is implicitly performed. When an expression with multiple operators is evaluated, the operations are performed in the order defined by the precedence of the operator. The default precedence can be overridden by enclosing the portion or portions of the expression to be evaluated first in parentheses (). For example, in the expression "1+2*3", multiplication is normally performed before addition to produce a result of 7. To evaluate the addition operator first, use parentheses as follows: "(1+2)*3". This produces a result of 9. The default precedence rules are shown in the following table. All operators in the same table cell have the same or equal precedence.

Operators Description
. Structured data element separator
~


!


NOT


not


-


+

Bitwise complement


Logical not






Unary minus


Unary plus

*


/


%

Multiplication


Division


Modulo

+


-

Addition


Subtraction

<<


>>

Left shift


Right shift

<


<=


>


>=

Less than


Less than or equal


Greater than


Greater than or equal

==


!=


=?


LIKE


like


!?


=~


!~


?=


|<


IN


in


><


NOT IN


not in


&<

Equality


Inequality


SQL match






SQL not match


Reg expr match


Reg expr not match


Reg expr match (compat)


Contains any






Contains none






Contains all

& Bitwise AND
^ Bitwise exclusive OR
| Bitwise inclusive OR
&& Logical AND
|| Logical OR
, List separator

Pattern Matching

Two types of pattern matching are supported; extended regular expressions and that which is compatible with the standard SQL LIKE predicate. This type of pattern may include the following special characters:

Examples of Expressions

Some examples of the types of expressions that can be constructed follow:

  1. The following expressions match all rows or resources that have a name which begins with 'tr' and ends with '0', where 'Name" indicates the column or attribute that is to be used in the evaluation:
    Name =~'tr.*0'
    Name LIKE 'tr%0'
     
     
    
  2. The following expressions evaluate to TRUE for all rows or resources that contain 1, 3, 5, 6, or 7 in the column or attribute that is called IntList, which is an array:
    IntList|<{1,3,5..7}
    IntList in (1,3,5..7)
     
     
    
  3. The following expression combines the previous two so that all rows and resources that have a name beginning with 'tr' and ending with '0' and have 1, 3, 5, 6, or 7 in the IntList column or attribute will match:
    (Name LIKE "tr%0")&&(IntList|<(1,3,5..7))
    (Name=~'tr.*0') AND (IntList IN {1,3,5..7})
    


[ Top of Page | Previous Page | Next Page | Table of Contents | Index ]