Metamerge logo
Search

Advanced Search
*
*
*
* HOME DOCUMENTS & RESOURCES DOWNLOADS EARLY TECH ACCESS SUPPORT FAQ KNOWN ISSUES OLD VERSIONS
*

 

XML SAX Parser

Overview

The XML SAX parser reads XML document using a SAX2 parser.  The main advantage of SAX2, is that you can start iterate XML data without first having read in the whole file.  The main disadvantage is that the SAX parser only supports read: You cannot use it to write XML-files.

The Metamerge parser is based on the Apache Xerces library.

For every start-tag read from the input document the parser concatenates this tag name with the previous start-tag into a currentTag value. The currentTag value holds the path to the current position in the XML document. When character data is read an attribute is either created or appended to using the currentTag name and the character data as the value.

When an end-tag is encountered the parser will first check to see if the currentTag matches the GroupTag configured for the parser, and if it does, the current entry is added to a queue which is read by the readEntry() method. Since SAX2 is event driven, this class creates a thread that performs the XML parsing and notifies the parser class when an entry is ready for consumption.

The currentTag is composed of tag names with "." as the separator. The GroupTag specifies which tag-path marks the boundary for an entry. If specified as a string not starting with an asterix (*), the tag is checked for equality with the currentTag. If the GroupTag starts with "*" then the currentTag is checked for containment of the GroupTag (e.g. "Root.Entry.X" matches "*Root.Entry" but not "Root.Entry").

<Root>

<Entry>
   
<attribute>Big Data</attribute>
   
<attribute>Blue Data</attribute>
</Entry>

<Entry>
    <attribute name="big">
       
<size>12</size>
   
    <age>88</age>
   
</attribute>
   
<attribute name="blue">Blue Data</attribute>
</Entry>

</Root>

Using "Root.Entry" as the GroupTag, the above XML document would yield two entries with the following attributes:

ENTRY [
"Root.Entry.attribute": [ "Big Data", "Blue Data" ]
]

ENTRY [
"Root.Entry.attribute#name": [ "big" "blue" ]
"Root.Entry.attribute.size": [ "12" ]
"Root.Entry.attribute.age": [ "88" ]
"Root.Entry.attribute": [ "Blue Data" ]
]

The saxRemovePrefix parameter is a convenience parameter that cause the parser to remove a specific prefix. If the parser was configured with "Root.Entry." for the saxRemovePrefix the entries would be simpler like "attribute.size" rather than "Root.Entry.attribute.size".

Configuration

Parameter

Description

class com.architech.parser.rspXmlSax
saxGroupTag The TAG that delimits one entry from another. If blank every tag is returned as an entry.
saxRemovePrefix Specifies a prefix to be removed from returned tags (e.g. DocRoot.Entry.attr --> attr )
saxIgnoreAttribute Ignore attributes of tags. For example <person name="first"> would map to .name or .name#first dependent on whether saxIgnoreAttribute is set or not.
characterSet Optional character set conversion.

References

Apache Xerces library

See Also

SOAP Parser, DSML Parser, XML Parser

*
  Metamerge Integrator version 4.6 ©Copyright Metamerge AS 2000-2002 Last edited 2002-06-10 contact us