Regular Expression Parser
Overview
The Regular Expression Parser validates and
parses Connectors' input/output against some regular expression. It
uses the free Regular Expressions for Java library
"gnu.regxep" available at http://www.cacas.org/java/gnu/regexp/. Please, consult
"gnu.regexp" documentation for the regular expression notation
supported and for the library's
specification.
The Regular Expression Parser is designed as a useful
example that shows how to implement your own Parser in Java and
integrate it in the Metamerge
Integrator.
Functional
Specification
Configuration
The Parser provides the following parameters:
Parameter |
Description |
class |
com.architech.parser.rspRegExpParser |
regularExpression |
Specifies the
regular expression the Parser will use.
Subexpressions are enclosed in parentheses (for example:
"ab(c*)d(e*)f"). When the Parser is used in read mode,
those subexpressions correspond to the Entry's Attributes (in
the example above "c*" corresponds to the first Attribute and
"e*" corresponds to the second Entry's Attribute). |
attributeNames |
Specifies the names of the Attributes delimited with semi-colons
(for example: "Name;Value").
The interpretation of this parameter depends
on the Parser mode:
o read mode: The names are used for the
Attributes corresponding to the subexpressions of the regular
expression. Mapping is done in the order of appearance, i.e.
the first subexpression will correspond to an Attribute named with
the first name from the "attributeNames" parameter, etc.)
o write mode:
The names are used to define the output text. It is formed by
concatenating the values of the Attributes enumerated in the
"attributeNames" parameter.
|
Input
A single line from the input will correspond to a single Entry.
o
If the line doesn't match the regularExpression then an
Entry with no Attributes is returned.
o
If the line matches the regularExpression then an Entry
is populated with Attributes and returned. The number of Attributes
assigned is equal to the number of subexpressions in the
regularExpression and each Attribute's value is the substring of
the input line that matches the corresponding subexpression.
If the number of the names in the attributeNames parameter is
less than the number of the subexpressions in the
regularExpression parameter then Attribute names
are added - as many as needed to make those numbers equal. The
Attribute names added consist of the
prefix "ATTR_NAME_" and
the number of
the Attribute name added (starting from 0), e.i. ATTR_NAME_0, ATTR_NAME_1, ATTR_NAME_2,
etc.
Output
All Attributes enumerated in the
attributeNames parameter that exist in the Entry
are concatenated to form a single string (in the order they
appear in the attributeNames parameter).
o If this string matches the regularExpression, it
is printed on a single line in the output.
o If this string does not match the
regularExpression, nothing is printed in the output and the "no-match
event" is logged.
Source
Code
You can view the source code of the Regular
Expression Parser here
.
The Regular Expression Parser source file (with JavaDocs) is included here.
Installation
1. Create a new folder, named "RegExpParser", in the "jars" subfolder of the
MI root directory.
2. From the Regular Expressions for
Java website download the package gnu.regexp-1.1.3a.tar.gz.
(If this link has changed, please go to the library's page http://www.cacas.org/java/gnu/regexp/
and download library's latest version).
3. Extract the archive's contents keeping path information. Copy the file
"gnu-regexp-1.1.3.jar" (placed in the "lib" folder) to the newly created
"RegExpParser" folder.
4. Download the Regular Expression
Parser jar archive regExpParser.jar.
5. Copy the file "regExpParser.jar" to the
"RegExpParser" folder.
Downloads
An example configuration that demonstrates
the Regular Expression Parser is included here
.
|