Regular expression syntax

A regular expression is a coded string. It defines a set of strings that match the expression. A regular expression can be made up of one or more branches (choices), each of which can be a string made up of characters, character classes, or parenthesized expressions with modifiers to specify repetition rules.

The regular expression syntax supported is a subset of XML Schema regular expressions. For the full syntax, see Appendix F in XML Schema Part 2: Datatypes that can be found on the World Wide Web Consortium (W3C) Web site.

The following table lists the supported regular expression syntax elements:

Metacharacter Meaning
\ escape
. any single character
* preceding character 0 or more times
+ preceding character 1 or more times
? preceding character 0 or 1 time
{...} occurrences of preceding 1
[...] match one of the class contained
[^...] match one of the class not contained 1
(...) group the expressions 1
| match either preceding or following
Escape sequence Meaning
\n new line
\r carriage return
\t tab
\e escape
Class code Meaning
\d digit [0-9]
\D non-digit [^0-9] 2
\s whitespace [ \t\n\r]
\S non-whitespace [^ \t\n\r] 2
\p{L} all letters 3
\p{N} all numbers, similar to \d 4
[\p{N}\p{L}] all numbers and all letters , similar to \w 4
\P{L} not letters, equivalent to [^\p{L}]
\P{N} not numbers, equivalent to [^\p{N}]
Range Meaning
{n} exactly n times
{n,} at least n times
{n,m} at least n but no more than m times
{0,m) zero to m times
Notes:
  1. The ellipsis (...) is used to indicate anything inside the { }, or [ ], or ( ) characters.
  2. The caret (^) means "not" when inside the [ ] characters.
  3. Consult Appendix F of the document XML Schema Part 2: Datatypes for other characters that can be used in place of L and N.
  4. Consult Appendix F of the document XML Schema Part 2: Datatypes for the precise differences.

The following table gives some examples of the syntax rules for regular expression syntax. See Using regular expressions to parse data elements for some examples of their use.

Regular expression data pattern Meaning
a Match character "a"
. Match any one character
a+ Match a string of one or more "a"
a* Match a string of zero or more "a"
a? Match zero or one "a"
a{3} Match a string of exactly three "a", that is "aaa"
a{3,} Match a string of three or more "a"
a{2,4} Match a string with a minimum of two and a maximum of four occurrences of "a"
[abc] Match any one of the characters "a", "b", or "c"
[a-zA-Z] Match any one character in the range "a" to "z", or in the range "A" to "Z". Note that the range of characters matched is based on the Unicodes of the characters specified.
[^abc] Match any character except one of "a", "b", or "c"
(ab)+ Match one or more repetitions of the string "ab"
(ab)|(cd) Match either of the strings "ab" or "cd"
Related concepts
Message modeling
The message model
TDS format: Relationship to the logical model
Related tasks
Developing message models
Working with a message definition file
Working with message model objects
Related reference
Message model reference information
Message model object properties
Additional MRM domain information
Additional TDS information