Note that the underscore (
There is one other metacharacter starting with a backslash, the octal
metacharacter. The octal metacharacter looks like this: "\nnn",
where "n" is a number from zero to seven. This is used for specifying control
characters that have no typed equivalent. For example,
There are three other metacharacters that may be of use. The first is the
braces metacharacter. This metacharacter follows a normal character
and contains two number separated by a
comma (,)
and surrounded by braces ({}). It is like the star
metacharacter, except the length of the string
it matches must be within the minimum and maximum length specified by the
two numbers in braces. Thus,
The alternative metacharacter is represented by a vertical bar
(|). It indicates an either/or behavior by separating two
or more possible choices. For example:
If you wish to include a dash within brackets as one of the characters to
match, instead of to denote a range, put the dash immediately before the
right bracket. Thus:
The bracket metacharacter can also be inverted by placing a caret
(^) immediately after the left bracket. Thus,
Note that within brackets, ordinary quoting rules do not apply and other
metacharacters are not available. The only characters that can be quoted
in brackets are "
Overview
A regular expression is a string of characters which tells the
searcher which string (or strings) you are looking for. The following
explains the format of regular expressions in detail. If you are
familiar with Perl, you already know the syntax. If you are familiar
with Unix, you should know that there are subtle differences between
Perl's regular expressions and Unix' regular expressions.
Simple Regular Expressions
In its simplest form, a regular expression is just a word or phrase to search
for. For example,
gauss
would match any subject with the string "gauss" in it, or which
mentioned the word "gauss" in the subject line. Thus, subjects with "gauss",
"gaussian" or "degauss" would all be matched, as would a subject containing the
phrases "de-gauss the monitor" or "gaussian elimination." Here are some more
examples:
carbon
hydro
oxy
top ten
The word boundary metacharacter matches the boundaries of words; that
is, it matches whitespace, punctuation and the very beginning and end of the
text. It looks like "\b". It's opposite searches for a character
that is not a word boundary. Thus:
\bcomput
will match "computer" or "computing", but not "supercomputer" since there is
no spaces or punctuation between "super" and "computer". Similarly,
\Bcomput
will not match "computer" or "computing", unless it is part of a
bigger word such as "supercomputer" or "recomputing"._
) is considered a "word" character.
Thus,
super\bcomputer
will not match "super_computer". \007
would find all subjects with an embedded ASCII "bell" character. (The bell is
specified by an ASCII value of 7.) You will
rarely need to use the octal metacharacter. ab{3,5}c
will match "abbbc", "abbbbc" or "abbbbbc". No other string is matched.
Likewise,
.{3,5}pentane
will match "cyclopentane", "isopentane" or "neopentane", but not "n-pentane",
since "n-" is only two characters long. isopentane|cyclopentane
will match any subject containing the strings "isopentane" or "cyclopentane" or
both. However, It will not match
"pentane" or "n-pentane" or "neopentane."
The last metacharacter is the brackets metacharacter. The bracket
metacharacter matches one occurence of any character inside the brackets
([]). For example,
\s[cmt]an\s
will match "can", "man" and "tan", but not "ban", "fan" or "pan". Similarly,
2,[23]-dimethylbutane
will match "2,2-dimethylbutane" or "2,3-dimethylbutane", but not
"2,4-dimethylbutane", "2,23-dimethylbutane" or "2,-dimethybutane".
Ranges of characters can be used by using the dash (-) within the
brackets. For example,
a[a-d]z
will match "aaz", "abz", "acz" or "adz", and nothing else. Likewise,
textfile0[3-5]
will match "textfile03", "textfile04", or "textfile05" and nothing else. a[1234-]z
and
a[1-4-]z
both do the same thing. They both match "a1z", "a2z", "a3z", "a4z" or "a-z",
and nothing else. textfile0[^02468]
matches any ten-character string starting with "textfile0" and ending with
anything except an even number. Inversion and ranges can be combined, so that
\W[^f-h]ood\W
matches any four letter wording ending in "ood" except for "food",
"good" or "hood". (Thus "mood" and "wood" would both be matched.)[
", "]
", and "\
".
Thus,
[\[\\\]]abc
matches any four letter string ending with "abc" and starting with
"[
", "]
", or "\
".
Forbidden Characters
Because of the way the searcher works, the following metacharacters should
not be used, even though they are valid Perl metacharacters. They
are:
Things To Remember
Here are some other things you should know about regular expressions.
mopac
and
Mopac
and
MOPAC
all search for the same set of strings. Each will match "mopac", "MOPAC",
"Mopac", "mopaC", "MoPaC", "mOpAc" and so forth. Thus you need not worry
about capitalization. (Note, however, that metacharacter must still have
the proper case. This is especially important for metacharacters whose
case determines whether their meaning is reversed or not.)
Design and production courtesy
GUI Online Productions