This chapter describes planning for monitoring your system, tracking system events, and using and modifying the predefined scripts, expressions, commands, and responses packaged with this application. These predefined elements and how to use them are described in detail in Components Provided for Monitoring.
First, select conditions to monitor that would have a severe impact on your system. These conditions might include:
When you have determined the resource problems you want to monitor, review the predefined conditions and identify the conditions you want to use. Use the lscondition command to view all conditions. If a predefined condition deviates from your requirements in some way, you can edit it, use it as a template to create your own customized condition, or create your own condition using the mkcondition command. After you have selected conditions for monitoring, you need to plan one or more responses to be taken for the event and the optional rearm event.
A set of predefined responses comes installed with your system (see Predefined Responses). Each response has one or more actions associated with it. Each action can be activated or deactivated to fit your particular work environment and schedule.
The predefined actions are:
You can also write your own commands that correct or mitigate conditions, and run them through the Run program option.
You might specify different actions based on when the monitored condition occurs. For example, you could have one set of actions to respond to a condition during working hours and another set to respond to a condition on nights and weekends. To be notified of events when you are away from your terminal, your actions must include email, broadcasting, or logging.
This section describes how to start using the Monitoring application. You can use the command line to do the following:
To associate a response with a condition, use the following commands (see the command man pages or Cluster Systems Management for Linux Technical Reference for detailed usage information):
To view events use the lsaudrec command to view the audit log. You can use the notifyevents predefined script to log events to a file.
To stop monitoring use the stopcondresp command.
The following scenarios demonstrate most frequently performed monitoring tasks. See the Cluster Systems Management for Linux Technical Reference or the command man pages for detailed usage information.
Name Monitoring Status "/tmp space used" "Not monitored" "var space used" "Monitored" (more conditions listed...)
Name "Critical notification" "Warning notification" "Informational notification" "Remove unwanted files" (more responses listed...)
Condition Response State "/tmp space used" "Broadcast event on-shift" Active "/tmp space used" "E-mail root anytime" Not Active
startcondresp "/tmp space used" "critical notification" "remove unwanted files"
stopcondresp "/tmp space used"
To stop monitoring the condition "/tmp space used" with a specific response, "critical notification," type:
stopcondresp "/tmp space used" "critical notification"
mkcondition -c "/tmp space used" "my test condition"
lsaudrec -l
For a complete list of predefined commands, scripts, and utilities, see Predefined Commands, Scripts, Utilities, and Files.
For information about monitoring events, rearm events, actions, and errors that have occurred, view the audit log by using the lsaudrec command. See the lsaudrec man page and Using the Audit Log to Track Monitoring Activity for details.
Audit log records include the following:
The administrator can use the audit log to track activity that may not be visible otherwise because the activity is related to subsystems running in the background. To list audit log records, use the lsaudrec command. To remove audit log records, use the rmaudrec command. For details see the command man pages, or Cluster Systems Management for Linux Technical Reference.
The Cluster Systems Management for Linux Technical Reference contains information about predefined scripts that are provided with the Event Response resource manager (ERRM). The following scripts are provided:
You can also use existing operating system commands and user-written scripts in the definition of an action.
The displayevent, logevent, msgevent, notifyevent, and wallevent scripts are examples of the types of actions that system administrators can use to respond to events. The displayevent script displays an event or a rearm event to a specified X-window display. The logevent script appends a formatted string containing the specifics of an event to a user-specified text file. The msgevent script sends an event or a rearm event to a specified user's console. The notifyevent script captures the event information and sends the event information via UNIX mail to a specified userid. The wallevent script broadcasts a message to all users who are logged in. For a full description of these scripts, see Cluster Systems Management for Linux Technical Reference or the command man pages.
You can use these scripts as-is or treat them as templates by copying and modifying them to create new scripts that suit your needs. For example, to use the wallevent script as a template for a page event command, do the following:
For a command to run in response to an event or a rearm event defined by a condition, the command must be included as an action in an Event Response resource. When an Event Response resource is defined, specify the entire path name for a script that is used within an action. Use the Event Response resource manager commands to set up responses.
Test any scripts or commands that you have created or modified before you use them as actions in a production environment.
After ERRM has subscribed to RMC to monitor a condition and that condition occurs, the ERRM runs commands in the user's operating system environment. The Event Response resource contains a list of commands to be run. Before each command is run, the following environment variables are established for the command to use (see Event Response Resource Manager for a detailed description of the ERRM):
The following data types are represented with this environment variable as a decimal string: CT_INT32, CT_UINT32, CT_INT64, CT_UINT64, CT_FLOAT32, and CT_FLOAT64.
CT_CHAR_PTR is represented as a string for this environment variable.
CT_BINARY_PTR is represented as a hexadecimal string separated by spaces.
CT_SD_PTR is enclosed in square brackets and has individual entries within the SD that are separated by commas. Arrays within an SD are enclosed within braces {}. For example, ["My Resource Name",{1,5,7},{0,9000,20000},{7000,11000,25000}] See the definition of ERRM_SD_DATA_TYPES for an explanation of the data types that these values represent.
(See "Resource Handle" on page *** for a definition and an example of a resource handle.)
The information in this section is for advanced users who want to:
Permissible data types and operators are described, and the order of precedence for the operators is included. RMC uses these functions to match a selection string against the persistent attributes of a resource and to implement the evaluation of an event expression or a rearm expression.
An expression is similar to a C language statement or the WHERE
clause of an SQL query. It is composed of variables, operators, and
constants. The C and SQL syntax styles may be intermixed within a
single expression. The following table relates the RMC terminology to
SQL terminology:
RMC | SQL |
---|---|
attribute name | column name |
select string | WHERE clause |
operators | predicates, logical connectives |
resource class | table |
SQL syntax is supported for selection strings, with the following restrictions:
The term variable is used in this context to mean the column name or
attribute name in an expression. Variables and constants in an
expression may be one of the following data types that are supported by the
RMC subsystem:
Symbolic Name | Description |
CT_INT32 | Signed 32-bit integer |
CT_UINT32 | Unsigned 32-bit integer |
CT_INT64 | Signed 64-bit integer |
CT_UINT64 | Unsigned 64-bit integer |
CT_FLOAT32 | 32-bit floating point |
CT_FLOAT64 | 64-bit floating point |
CT_CHAR_PTR | Null-terminated string |
CT_BINARY_PTR | Binary data - arbitrary-length block of data |
CT_RSRC_HANDLE_PTR | Resource handle - an identifier for a resource that is unique over space and time (20 bytes) |
In addition to the base data types, aggregates of the base data types may be used as well. The first aggregate data type is similar to a structure in C in that it can contain multiple fields of different data types. This aggregate data type is referred to as structured data (SD). The individual fields in the structured data are referred to as structured data elements or simply elements. Each element of a structured data type may have a different data type which can be one of the base types in the preceding table or any of the array types discussed in the next section, except for the structured data array.
The second aggregate data type is an array. An array contains zero or more values of the same data type, such as an array of CT_INT32 values. Each of the array types has an associated enumeration value (CT_INT32_ARRAY, CT_UINT32_ARRAY). Structured data may also be defined as an array but is restricted to have the same elements in every entry of the array.
Literal values can be specified for each of the base data types as follows:
Entries of an array can be accessed by specifying a subscript as in the C programming language. The index corresponding to the first element of the array is always zero; for example, List [2] references the third element of the array named List. Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. For example, if List is an integer array, then List[2]+4 produces the sum of 4 and the current value of the third entry of the array.
"0xabcd 0x01020304050607090a0b0c0d0e0f1011121314"
"0x4018 0x0001 0x00000000 0x0069684c 0x00519686 0xaf7060fc"
Variable names refer to values that are not part of the expression but are accessed while running the expression. For example, when RMC processes an expression, the variable names are replaced by the corresponding persistent or dynamic attributes of each resource.
Entries of an array may be accessed by specifying a subscript as in 'C'. The index corresponding to the first element of the array is always 0 (for example, List[2] refers to the third element of the array named List). Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. A subscripted value may be used wherever the base data type of the array is used. For example, if List is an integer array, then "List[2]+4" produces the sum of 4 and the current value of the third entry of the array.
The elements of a structured data value can be accessed by using the following syntax:
<variable name>.<element name>
For example, a.b
The variable name is the name of the table column or resource attribute, and the element name is the name of the element within the structured data value. Either or both names may be followed by a subscript if the name is an array. For example, a[10].b refers to the element named b of the 11th entry of the structured data array called a. Similarly, a[10].b[3] refers to the fourth element of the array that is an element called b within the same structured data array entry a[10].
Variable names refer to values that are not part of an expression but are accessed while running the expression. When used to select a resource, the variable name is a persistent attribute. When used to generate an event, the variable name is a dynamic attribute. When used to select audit records, the variable name is the name of a field within the audit record.
A variable name is restricted to include only 7-bit ASCII characters that are alphanumeric (a-z, A-Z, 0-9) or the underscore character (_). The name must begin with an alphabetic character. When the expression is used by the RMC subsystem for an event or a rearm event, the name can have a suffix that is the '@' character followed by 'P', which refers to the previous observation.
Constants and variables may be combined by an operator to produce a result that in turn may be used with another operator. The resulting data type or the expression must be a scalar integer or floating-point value. If the result is zero, the expression is considered to be FALSE; otherwise, it is TRUE.
The set of operators that can be used in strings is summarized in the
following table:
Operator | Description | Left Data Types | Right Data Types | Example | Notes |
---|---|---|---|---|---|
+ | Addition | Integer,float | Integer,float | "1+2" results in 3 | None |
- | Subtraction | Integer,float | Integer,float | "1.0-2.0" results in -1.0 | None |
* | Multiplication | Integer,float | Integer,float | "2*3" results in 6 | None |
/ | Division | Integer,float | Integer,float | "2/3" results in 1 | None |
- | Unary minus | None | Integer,float | "-abc" | None |
+ | Unary plus | None | Integer,float | "+abc" | None |
.. | Range | Integers | Integers | "1..3" results in 1,2,3 | Shorthand for all integers between and including the two values |
% | Modulo | Integers | Integers | "10%2" results in 0 | None |
| | Bitwise OR | Integers | Integers | "2|4" results in 6 | None |
& | Bitwise AND | Integers | Integers | "3&2" results in 2 | None |
~ | Bitwise complement | None | Integers | ~0x0000ffff results in 0xffff0000 | None |
^ | Exclusive OR | Integers | Integers | 0x0000aaaa^0x0000ffff results in 0x00005555 | None |
>> | Right shift | Integers | Integers | 0x0fff>>4 results in 0x00ff | None |
<< | Left shift | Integers | Integers | "0x0ffff<<4" results in 0xffff0 | None |
==
| Equality | All but SDs | All but SDs |
"2==2" results in 1
| Result is true (1) or false (0) |
!=
| Inequality | All but SDs | All but SDs |
"2!=2" results in 0
| Result is true (1) or false (0) |
> | Greater than | Integer,float | Integer,float | "2>3" results in 0 | Result is true (1) or false (0) |
>= | Greater than or equal | Integer,float | Integer,float | "4>=3"=1 | Result is true (1) or false (0) |
< | Less than | Integer,float | Integer,float | "4<3" results in 0 | Result is true (1) or false (0) |
<= | Less than or equal | Integer,float | Integer,float | "2<=3" results in 1 | Result is true (1) or false (0) |
=~ | Pattern match | Strings | Strings | "abc"=~"a.*" results in 1 | Right operand is interpreted as an extended regular expression |
!~ | Not pattern match | Strings | Strings | "abc"!~"a.*" results in 0 | Right operand is interpreted as an extended regular expression |
=?
| SQL pattern match | Strings | Strings | "abc"=? "a%" results in 1 | Right operand is interpreted as a SQL pattern |
!?
| Not SQL pattern match | Strings | Strings | "abc"!? "a%" results in 0 | Right operand is interpreted as a SQL pattern |
|<
| Contains any | All but SDs | All but SDs | "{1..5}|<{2,10}" results in 1 | Result is true (1) if left operand contains any value from right operand |
><
| Contains none | All but SDs | All but SDs | "{1..5}><{2,10}" results in 1 | Result is true (1) if left operand contains no value from right operand |
&< | Contains all | All but SDs | All but SDs | "{1..5}&<{2,10}" results in 0 | Result is true (1) if left operand contains all values from right operand |
||
| Logical OR | Integers | Integers | "(1<2)||(2>4)" results in 1 | Result is true (1) or false (0) |
&&
| Logical AND | Integers | Integers | "(1<2)&&(2>4)" results in 0 | Result is true (1) or false (0) |
!
| Logical NOT | None | Integers | "!(2==4)" results in 1 | Result is true (1) or false (0) |
When integers of different signs or size are operands of an operator, standard
C style casting is implicitly performed. When an expression with
multiple operators is evaluated, the operations are performed in the order
defined by the precedence of the operator. The default precedence can
be overridden by enclosing the portion or portions of the expression to be
evaluated first in parentheses (). For example, in the expression
"1+2*3", multiplication is normally performed before addition to produce a
result of 7. To evaluate the addition operator first, use parentheses
as follows: "(1+2)*3". This produces a result of 9. The
default precedence rules are shown in the following table. All
operators in the same table cell have the same or equal precedence.
Operators | Description |
. | Structured data element separator |
~
|
Bitwise complement
|
*
|
Multiplication
|
+
|
Addition
|
<<
|
Left shift
|
<
|
Less than
|
==
|
Equality
|
& | Bitwise AND |
^ | Bitwise exclusive OR |
| | Bitwise inclusive OR |
&& | Logical AND |
|| | Logical OR |
, | List separator |
Two types of pattern matching are supported; extended regular expressions and that which is compatible with the standard SQL LIKE predicate. This type of pattern may include the following special characters:
Some examples of the types of expressions that can be constructed follow:
Name =~'tr.*0' Name LIKE 'tr%0'
IntList|<{1,3,5..7} IntList in (1,3,5..7)
(Name LIKE "tr%0")&&(IntList|<(1,3,5..7)) (Name=~'tr.*0') AND (IntList IN {1,3,5..7})