Metamerge logo
Search

Advanced Search
*
*
*
* HOME DOCUMENTS & RESOURCES DOWNLOADS EARLY TECH ACCESS SUPPORT FAQ KNOWN ISSUES OLD VERSIONS
*

 

Regular Expression Parser

Overview

The Regular Expression Parser validates and parses Connectors' input/output against some regular expression. It uses the free Regular Expressions for Java library "gnu.regxep" available at http://www.cacas.org/java/gnu/regexp/. Please, consult "gnu.regexp" documentation for the regular expression notation supported and for the library's specification.

The Regular Expression Parser is designed as a useful example that shows how to implement your own Parser in Java and integrate it in the Metamerge Integrator.

Functional Specification

Configuration

The Parser provides the following parameters:

Parameter

Description

class com.architech.parser.rspRegExpParser
regularExpression Specifies the regular expression the Parser will use.
Subexpressions are enclosed in parentheses (for example: "ab(c*)d(e*)f"). When the Parser is used in read mode, those subexpressions correspond to the Entry's Attributes (in the example above "c*" corresponds to the first Attribute and "e*" corresponds to the second Entry's Attribute).
attributeNames

Specifies the names of the Attributes delimited with semi-colons (for example: "Name;Value").
The interpretation of this parameter depends on the Parser mode:
read mode: The names are used for the Attributes corresponding to the subexpressions of the regular expression. Mapping is done in the order of appearance, i.e. the first subexpression will correspond to an Attribute named with the first name from the "attributeNames" parameter, etc.)
write mode: The names are used to define the output text. It is formed by concatenating the values of the Attributes enumerated in the "attributeNames" parameter.

Input

A single line from the input will correspond to a single Entry.
o If the line doesn't match the regularExpression then an Entry with no Attributes is returned.
o If the line matches the regularExpression then an Entry is populated with Attributes and returned. The number of Attributes assigned is equal to the number of subexpressions in the regularExpression and each Attribute's value is the substring of the input line that matches the corresponding subexpression.
If the number of the names in the attributeNames parameter is less than the number of the subexpressions in the regularExpression parameter then Attribute names are added - as many as needed to make those numbers equal. The Attribute names added consist of the prefix "ATTR_NAME_" and the number of the Attribute name added (starting from 0), e.i. ATTR_NAME_0, ATTR_NAME_1, ATTR_NAME_2, etc.

Output

All Attributes enumerated in the attributeNames parameter that exist in the Entry are concatenated to form a single string (in the order they appear in the attributeNames parameter).
o If this string matches the regularExpression, it is printed on a single line in the output.
o If this string does not match the regularExpression, nothing is printed in the output and the "no-match event" is logged.

 

Source Code

You can view the source code of the Regular Expression Parser here .

The Regular Expression Parser source file (with JavaDocs) is included here.

 

Installation

1. Create a new folder, named "RegExpParser", in the "jars" subfolder of the MI root directory.
2. From the Regular Expressions for Java website download the package gnu.regexp-1.1.3a.tar.gz. (If this link has changed, please go to the library's page http://www.cacas.org/java/gnu/regexp/  and download library's latest version).
3. Extract the archive's contents keeping path information. Copy the file "gnu-regexp-1.1.3.jar" (placed in the "lib" folder) to the newly created "RegExpParser" folder.
4. Download the Regular Expression Parser jar archive regExpParser.jar.
5. Copy the file "regExpParser.jar" to the "RegExpParser" folder.

 

Downloads

An example configuration that demonstrates the Regular Expression Parser is included here .

 

*
  Metamerge Integrator version 4.6 ©Copyright Metamerge AS 2000-2002 Last edited 2002-06-10 contact us