Regular expression grammar

The Hyades Adapter Configuration Editor allows you to use regular expressions to describe how log files should be transformed into Common Base Event records. The following tables are a guideline to regular expression usage.

General rules

Regular expression matching

Expression Matches
{n,m} at least n but not more than m times
{n,} at least n times
{n} exactly n times
* 0 or more times
+ 1 or more times
? 0 or 1 times
. everything except \n in a regular expression within parentheses
^ a null token matching the beginning of a string or line (i.e., the position right after a newline or right before the beginning of a string) in a regular expression within parentheses
$ a null token matching the end of a string or line (that is, the position right before a newline or right after the end of a string) in a regular expression within parentheses
\b backspace inside a character class ([abcd])
\b null token matching a word boundary (\w on one side and \W on the other)
\B null token matching a boundary that isn't a word boundary
\A only at beginning of string
\Z only at end of string (or before newline at the end)
\ newline
\r carriage return
\t tab
\f formfeed
\d digit [0-9]
\D non-digit [^0-9]
\w word character [0-9a-z_A-Z]
\W non-word character [^0-9a-z_A-Z]
\s a whitespace character [ \t\n\r\f]
\S a non-whitespace character [^ \t\n\r\f]
\xnn the hexadecimal representation of character nn
\cD the corresponding control character
\nn or \nnn the octal representation of character nn unless a backreference.
\1, \2, \3 ... whatever the first, second, third, and so on, parenthesized group matched. This is called a backreference. If there is no corresponding group, the number is interpreted as an octal representation of a character.
\0 the null character. Any other backslashed character matches itself .
*? 0 or more times
+? 1 or more times
?? 0 or 1 times
{n}? exactly n times
{n,}? at least n times
{n,m}? at least n but not more than m times

Grouping and extracting matches

To group parts of an expression, use the metacharacters ( ). This allows the regular expression in the parantheses to be treated as a single unit. For example, the regular expression

severity:(1|2)
matches the pattern severity:1 or severity:2.

To extract parts of a string that have been matched using the grouping metacharacters, use the special variables $1, $2, etc.

# Extract the name and URL from $pattern = <a href="secure_logon.html">Logon form</a>
$pattern =~ <a href=\"(.*)\">(.*)</a> ; # match using grouping
$url = $1;                # $1 equals secure_logon.html
$pagename = $2;           # $2 equals Logon form

Perl 5 extended regular expressions

Expression Matches
(?#text) An embedded comment causing text to be ignored.
(?:regexp) Groups things like "()" but doesn't cause the group match to be saved.
(?=regexp) A zero-width positive lookahead assertion. For example, \w+(?=\s) matches a word followed by whitespace, without including whitespace in the MatchResult
(?!regexp) A zero-width negative lookahead assertion. For example foo(?!bar) matches any occurrence of foo that isn't followed by bar. This is a zero-width assertion, which means that a(?!b)d matches ad because a is followed by a character that is not b (the d) and d follows the zero-width assertion.
(?imsx) One or more embedded pattern-match modifiers:
i enables case insensitivity
m
enables multiline treatment of the input
s
enables single-line treatment of the input
x
enables extended whitespace comments

Related Concepts
Overview of the Hyades Generic Log Adapter
Common Base Event format specification

Related tasks
Creating a log parser
Creating a rules-based adapter
Creating a static adapter

Related references
Adapter Configuration File structure
Common Base Event format specification
Adapter Configuration Editor
Regular expression grammar