XML parsers and domains

The XML message domain includes all messages that conform to the W3C XML standard.

Messages in this domain are processed by one of the XML, XMLNS or XMLNSC parser. The XMLNS domain is an extension of the XML domain and contains messages that conform to the same standard and that exploit the namespaces feature of the XML specification. Messages in this domain are processed by the XML parser.

The XML parser is a program that interprets a bit stream or tree that represents a message that belongs to the XML domain and generates the corresponding tree from the bit stream on input, or bit stream from the tree on output. The bit stream is a representation of an XML file. (The XML parser also interprets a bit stream or tree that represents a message that belongs to the JMS domains; there is no JMS parser.)

Your applications can exchange XML messages (with or without namespace support) with the WebSphere Message Broker brokers in two ways:

  1. You can predefine (model) the message template to create a message dictionary. If you do so, your XML messages are parsed by the MRM parser and processed in the same way as all messages that you model.
  2. You can use self-defining messages that you do not specify in any way before sending.

    A self-defining message can be handled by every built-in node. The whole message can be stored in a database, and headers can be added to or removed from the message as it passes through the message flow.

A self-defining message is also known as a generic XML message. It does not have a recorded format, but carries the information about its content and structure within the message in the form of a document that adheres to the XML specification. Its definition is not held anywhere else. When an XML message is received by the broker, it is interpreted by the XML parser, and an internal message tree structure is created according to the XML definitions contained within that message.

Details of how the XML parser handles null elements and values is described in The XML parser and null values.

The information provided with WebSphere Message Broker does not provide a full definition or description of XML terminology, concepts, and message constructs: it is a summary that highlights aspects that are important when you use XML messages with brokers and message flows.

For further information about XML, see the developerWorks Web site.

Example XML message parsing

The name elements used in this description (for example, XmlDecl) are provided by WebSphere Message Broker and are referred to as correlation names. They are available for symbolic use within the ESQL that defines the processing of message content performed by the nodes, such as a Compute or Filter node, within a message flow. They are not part of the XML specification. Each XML parser defines its own set of correlation names because the handling of XML content varies.

The correlation names for XML name elements (for example, Element and XmlDecl) equate to a constant value of the form 0x01000000 etc. You can see these constants used in the output created by the Trace node when a message, or a portion of the message, is traced.

A simple XML message might take the form:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE Envelope
PUBLIC "http://www.ibm.com/dtds" "example.dtd"
[<!ENTITY Example_ID "ST_TimeoutNodes Timeout Request Input Test Message">]
>
<Envelope version="1.0">
	<Header>
		<Example>&Example_ID;</Example>
		<!-- This is a comment  -->
	</Header>
	<Body  version="1.0">
		<Element01>Value01</Element01>
		<Element02/>
		<Element03>
			<Repeated>ValueA</Repeated>
			<Repeated>ValueB</Repeated>
		</Element03>
		<Element04><P>This is <B>bold</B> text</P></Element04>
	</Body>
</Envelope>

The following sections show the output created by the Trace node when the above message has been parsed in the XML and XMLNSC parsers to demonstrate the differences in the internal structures used to represent the data as it is processed by the Broker.

Example XML Message parsed in the XML domain

Note in the following the WhiteSpace elements within the tree are there because of the space, tab and line breaks that format the original XML document, for presentation clarity the actual characters in the trace have been replaced with 'WhiteSpace' . WhiteSpace within an XML element does have business meaning and is represented using the Content syntax element. Note that the XmlDecl, DTD, and comments are represented in the XML domain using explicit correlation named syntax elements.

(0x01000010):XML        = (
    (0x05000018):XML      = (
      (0x06000011): = '1.0'
      (0x06000012): = 'UTF-8'
      (0x06000014): = 'no'
    )
    (0x06000002):         = 'WhiteSpace'
    (0x05000020):Envelope = (
      (0x06000004): = 'http://www.ibm.com/dtds'
      (0x06000008): = 'example.dtd'
      (0x05000021): = (
        (0x05000011):Example_ID = (
          (0x06000041): = 'ST_TimeoutNodes Timeout Request Input Test Message'
        )
      )
    )
    (0x06000002):         = 'WhiteSpace'
    (0x01000000):Envelope = (
      (0x03000000):version = '1.0'
      (0x02000000):        = 'WhiteSpace'
      (0x01000000):Header  = (
        (0x02000000):        = 'WhiteSpace'
        (0x01000000):Example = (
          (0x06000020): = 'Example_ID'
          (0x02000000): = 'ST_TimeoutNodes Timeout Request Input Test Message'
          (0x06000021): = 'Example_ID'
        )
        (0x02000000):        = 'WhiteSpace'
        (0x06000018):        = ' This is a comment  '
        (0x02000000):        = 'WhiteSpace'
      )
      (0x02000000):        = 'WhiteSpace'
      (0x01000000):Body    = (
        (0x03000000):version   = '1.0'
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element01 = (
          (0x02000000): = 'Value01'
        )
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element02 = 
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element03 = (
          (0x02000000):         = 'WhiteSpace'
          (0x01000000):Repeated = (
            (0x02000000): = 'ValueA'
          )
          (0x02000000):         = 'WhiteSpace'
          (0x01000000):Repeated = (
            (0x02000000): = 'ValueB'
          )
          (0x02000000):         = 'WhiteSpace'
        )
        (0x02000000):          = 'WhiteSpace'
        (0x01000000):Element04 = (
          (0x01000000):P = (
            (0x02000000):  = 'This is '
            (0x01000000):B = (
              (0x02000000): = 'bold'
            )
            (0x02000000):  = ' text'
          )
        )
        (0x02000000):          = 'WhiteSpace'
      )
      (0x02000000):        = 'WhiteSpace'
    )

Example XML Message parsed in the XMLNSC domain

The following trace shows the elements created to represents the same XML structure within the Compact XMLNSC parser in its default mode. In this mode the compact parser does not retain comments, processing instructions or mixed text.

It can be clearly seen by comparison that there is a large saving in the number of syntax elements used to represent the same business content of the Example XML message by using the compact parser.

Note that by not retaining mixed text all of the WhiteSpace elements that have no business data content are no longer taking any runtime foot print in the Broker message tree. However this also results in the mixed text in "Element04.P" being discarded and only the value of the child folder "Element04.P.B" is held in the tree, the text 'This is ' and ' text' in "P" is discarded . This type of XML structure is not normally associated with business data formats so use of the compact XMLNSC parser will generally be desirable. However should you need this type of processing you would either not use the XMLNSC parser or use it with the "retain mixed text" mode enabled.

The handling of the XML declaration is also different in the compact parser with the version, encoding and standalone attributes being held as children of the XmlDeclaration rather than special correlation named elements.

(0x01000000):XMLNSC     = (
    (0x01000400):XmlDeclaration = (
      (0x03000100):Version    = '1.0'
      (0x03000100):Encoding   = 'UTF-8'
      (0x03000100):StandAlone = 'no'
    )
    (0x01000000):Envelope       = (
      (0x03000100):version = '1.0'
      (0x01000000):Header  = (
        (0x03000000):Example = 'ST_TimeoutNodes Timeout Request Input Test Message'
      )
      (0x01000000):Body    = (
        (0x03000100):version   = '1.0'
        (0x03000000):Element01 = 'Value01'
        (0x01000000):Element02 = 
        (0x01000000):Element03 = (
          (0x03000000):Repeated = 'ValueA'
          (0x03000000):Repeated = 'ValueB'
        )
        (0x01000000):Element04 = (
          (0x01000000):P = (
            (0x03000000):B = 'bold'
          )
        )
   )

Most of the samples in the Samples Gallery use the XML parser to process messages. For example, have a look at the Coordinated Request Reply sample, Large Messaging sample, and Message Routing sample.

Related reference
Built-in nodes
XML constructs