You must define one document model for each document format that you intend to index. Here is a simple document model for plain-text structured documents. Note that GPP in the example stands for General Purpose Parser.
<?xml version="1.0"?> <GPPModel> - here begins the GPP document model <GPPFieldDefinition - here begins a field definition name="Head" - the name you assign to this field start="[head]" - the boundary string at the beginning of the field end="[/head]" - the boundary string at the end of the field exclude="YES" /> <GPPFieldDefinition - here begins the next field definition name="Abstract" start="[abstract]" end="[/abstract]" exclude="NO" /> : : </GPPModel>
Document models are specified in the XML language using tags as defined in Appendix G. Document model reference. A document model consists of text field definitions and attribute definitions. This example illustrates only text field definitions defined in GPPFieldDefinition elements. In a similar way, you can use GPPAttributeDefinition to define document attributes.
The first line <?xml version="1.0"?> in the example specifies that the document model is written using XML tags. Each of the text field definitions specifies boundary strings to identify the start and end of the field definition in the source document. So, whenever a document contains the sequence of characters [head] followed by some text and the sequence of characters [/head], the text between those boundary strings is taken to be the content of the text field that is assigned by the name head.
You assign a field name to each field definition. This field name is the means by which a query can restrict search to the content of a text field. The name can be either fixed or can be derived by a rule from the structural unit's content. Such a name could be, for example, the tag name for an XML entity, or the name of an XML attribute.