IBM OmniFind Analytics Edition Operation Guide
Edition Notice
First Edition (February 2007)

This edition applies to version 8, release 4 of IBM® OmniFind™ Analytics Edition and to all subsequent releases and modifications until otherwise indicated in new editions.

This document contains proprietary information of IBM. This proprietary information is provided in accordance with the license conditions and is protected by copyright. Information contained in this document provides no warranties whatsoever for any products. Also, no descriptions provided in this document should be interpreted as product warranties. Depending on the system environment, the yen symbol may be displayed as the backslash symbol, or the backslash symbol may be displayed as the yen symbol.

© Copyright International Business Machines Corporation 2007. All rights reserved.

US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

1 Introduction

This document describes the operational flow of the text mining system called IBM OmniFind Analytics Edition, and the tools to be used in its various operational phases. In particular, this document describes the series of procedures (preprocessing) from language processing with the data to be analyzed to the creation of the index structure for analysis.

Note that this document assumes that OmniFind Analytics Edition is already installed. See the Installation Guide for details on installation.

1.1 Target Audience

This document is written for system administrators and operational designers. They must fully understand the content of the introductory topics.

1.2 Terminology
The following terminology is used in this document.

Term Meaning
%TAKMI_HOME% This is an operating system environment variable for an installation directory. The directory name is decided at the time of installation.
* The % symbol at both ends means that this is an environment variable.

For example, when %TAKMI_HOME% is set to the value C:/Program Files/takmi,
%TAKMI_HOME%/conf/global_config.xml becomes

C:/Program Files/takmi/conf/global_config.xml.

Database This is a language processing result management unit for each analysis target data.
For example, when two types of data, "customer inquiry" and "internal document," are individually analyzed, two databases are created, one for each data type.
Database name
(DATABASE_NAME)
You may create a database with an arbitrary name consisting of single-byte numbers or single-byte alphabetical letters.
In this document, the database you created will be expressed as DATABASE_NAME. When you read this document, replace it with the name of the created database.
Database directory
(DATABASE_DIRECTORY)
This is a directory where databases are physically arranged. By default, assuming that the database name is DATABASE_NAME,

%TAKMI_HOME%/databases/DATABASE_NAME

becomes the database directory. This may be referred to as DATABASE_DIRECTORY. For example,

DATABASE_DIRECTORY/conf/database_config.xml

is a database_config.xml file in the conf directory, which is in the database directory.
Database settings
(database_config.xml)
This shows the settings created in the database configuration file:
DATABASE_DIRECTORY/conf/database_config.xml
The file name "database_config.xml" means the same thing.

There is one configuration file for each database.
Global settings
(global_config.xml)
This shows the settings created in the global configuration file:
%TAKMI_HOME%/conf/global_config.xml
The file name "global_config.xml" means the same thing.

There is only one global configuration file for each OmniFind Analytics Edition system.
If you wish to enable Web applications to use the newly created database, it is important to register that database in the global settings.
ATML
(ATML format)
This is an input format used in language processing.
The target data is always converted into this format before language processing.
MIML
(MIML format)
This is an output format used in language processing.
The result of language processing is returned in a file in this format, and then indexed.

1.3 Relevant Documents
Refer to the List of Instruction Manuals for other documentation for IBM OmniFind Analytics Edition.
1.4 Configuration File Editing Instructions

This topic provides notes on editing configuration files such as global_config.xml and database_config.xml by using the text editor.

These configuration files must be saved in UTF-8 format. When editing these files by using Microsoft® Windows® Notepad, be sure to select "UTF-8" as the encoding method.
More specifically, select "File" and then "Save As" from the Notepad menu, and save the files in UTF-8 format. Do not select "Unicode."

2 Operational Procedures and Design

This section describes the series of operations before analyzing target data with OmniFind Analytics Edition. Details on tools to be used are provided in section 3 and later sections. See these other sections if necessary.

First, an overview of the operation flow is provided. The figure below shows the flow of batch processing before the application is ready to analyze data. These processes as a whole are called preprocessing.

Preprocessing is roughly divided into two parts: the processing in which raw data to be analyzed is converted into ATML, which is the standard input data format for OmniFind Analytics Edition; and the processing in which indices are generated by performing the common processing to the ATML.
OmniFind Analytics Edition provides the following tools for performing each of the processing steps. When operating the system, establish a workflow for launching these tools in accordance with the requirements. Later sections describe the standard flow shown above.

Name Description
Dictionary Editor Supports category and dictionary edit operations.
Data Ingester Converts a target comma separated values (CSV) file into an ATML file.
NLP (language processing) Processes an ATML file and returns the result of language processing in an MIML file.
Indexer Generate indices for applications based on the result of language processing.

In actual system operations, there is another process in which processed data is deleted and replaced by new data. See 4.3 Deleting Data for details.

2.1 Naming and Creating Databases

Resources that are used in data processing are managed in units of databases. These resources are physically stored in a database directory.

In operational designing, first create a database directory. Follow the procedures below:

  • Copy the directory %TAKMI_HOME%/resource/database.template/ja entirely to %TAKMI_HOME%/databases/. Make sure that the copied directory is %TAKMI_HOME%/databases/ja. This is the new database directory.

    When creating a database for English data, copy the directory%TAKMI_HOME%/resource/database.template/en.
    * Do not delete the directory "database.template" and the following elements as they are templates.

  • Rename the directory %TAKMI_HOME%/databases/ja to an arbitrary name. Use single-byte alphabetical letters, numbers, and the underscore character. For example, if you name it as "SAMPLE_DB," then the database directory becomes %TAKMI_HOME%/databases/SAMPLE_DB.

  • Register the created database in the global settings. Open the global configuration file%TAKMI_HOME%/conf/global_config.xml with the text editor, and set the database as shown below.

    Always store the settings in UTF-8 format.

    A database is usually created in the %TAKMI_HOME%/databases directory, but it can be created in a different location in accordance with the disk capacity. In this case, specify "absolute" as the path-type attribute to create a full path for the database directory.

    <?xml version="1.0" encoding="UTF-8">
    <global_config>
    <params>
    <param name="language" value="ja"/>
    </params>
    <database_entries>
    <!--  Specify the relative path to the database directory as follows.   -->
    <database_entry name="SAMPLE_DB" path_type="relative" path="databases/SAMPLE_DB"/>

    <!--  Specify the absolute path to the database directory as follows.   -->
    <database_entry name="SAMPLE_DB2" path_type="absolute" path="C:/data/SAMPLE_DB2"/>
    </database_entries>
    </global_config>

2.2 Converting Target Data into a CSV File

The target data must be prepared in the following CSV (Comma Separated Values) format. If data is not in this format, follow the rules below and convert it into the CSV format beforehand.

The basic CSV format supports the format output by Microsoft Excel. Data saved as Microsoft Excel files can be easily converted into the CSV format by using Excel.

If the target data is in the CSV format from the beginning, check that it conforms to the following rules.

Required rules:

  • The separator for comma separated values is a single-byte comma (,).
  • All lines in the CSV file must have the same number of columns.
  • If line breaks or single-byte commas are used as part of data, these values must be enclosed in single-byte double quotation marks (").
    Special attention is required for non-text items (dates, for example) as these values are likely to be overlooked.
  • When the values enclosed in double quotation marks contain double quotation marks as part of data, place two sets of double quotation marks.
    Example: The value "For example, "A" is an upper-case character." as the column value in the CSV file must be stored as follows:
    "For example, ""A"" is an upper-case character."
  • The CSV file must be an appropriate text file.
    If binaries, which are not accepted by characters, are in the file, successful termination of the subsequent processing is not guaranteed. In particular, caution is necessary in handling binary data or character code conversion when converting data from other formats into CSV.

Strongly recommended rules:

  • The number of lines in a CSV file should be 50,000 or less.
    To handle data that contains more than 50,000 lines, divide the data into multiple CSV files. The maximum number of lines depends on the size of data per line or on hardware specifications, but it should be noted that the indexing function might not work properly if there are too many lines.
  • Among CSV columns, reserve one for date data to be used in time-series analysis.
    If none of the columns is specified, the date of data conversion into the ATML format will be automatically attached to each line. Because there is no correlation between the attached date and data content, the result of time series analysis will be meaningless.
  • Use the first line of the CSV file to list column names, and start describing the data on the second line.
    Although this is not required, it will make later descriptions of correspondence between CSV and ATML easier.
  • One CSV line must correspond to one document to be analyzed.
    Example: When a customer inquiry is recorded in multiple CSV lines, put these lines together into one line for analysis in a more natural context.

2.3 Designing a Category Tree

Edit a category tree to relate each column in a CSV file to a particular category. Also, edit database settings in the database_config.xml file in order to define the category edited here as a standard item.
Follow the procedures below. This must be done only once when designing a database unless the structure of the original CSV data changes.

Editing the category tree (category_tree.xml)

This section describes how to relate CSV columns to categories and how to edit the category tree by using an example. The established relations are used when the Data Ingester converts the CSV file into an ATML file.

Assume that a CSV file containing the following columns will be processed:

Column 1 Column 2 Column 3 Column 4
Date of inquiry Customer ID Name Inquiry

This CSV file has four columns. The fourth column contains text to be analyzed by language processing. The other three columns contain standard items that are attached to the target data from the beginning. Relate these three items to categories. A category consists of a category name and a category path. The following example shows how to create categories for some of the column names.

Column name Category path Category name
Customer ID .customer_id Customer ID
Name .fullname Name

Start a category path with the period character, and from the second character and on, use character strings consisting of single-byte alphanumeric characters with no period. A category path is case-sensitive.
Characters that can be used are 0 to 9, a to z, A to Z, and either the hyphen or the underscore character. It is useful to give a name with a meaning to a category path.
Column names can be used as category names as they are, but you can change them if necessary.

For date data ("date of inquiry" in this example) to be used in time series analysis, use the special category path ".date." This path does not have to be newly created as it is already provided in the category tree template. Rules for relating the date column to the category are described in the Data Ingester settings.

Next, register these categories in the category tree. The category tree is created in the file DATABASE_DIR/category/category_tree.xml.
Use the text editor to open the category_tree.xml file, and add the created categories as follows.

<?xml version="1.0" encoding="UTF-8"?>
<category_tree>
<node id="1" path="date" name="Standard date" features=""/>
<node id="2" path="date/dd" name="Date (day)" features="integer"/>
<node id="3" path="date/dow" name="Date (day of the week)" features="integer"/>
<node id="4" path="date/yyyy" name="Date (year)" features="integer"/>
...
<node id="200" path="customer_id" name="Customer ID" features=""/>
<node id="201" path="fullname" name="Name" features=""/>
</category_tree>

Add a node element to each category as a subelement directly below the category_tree element. Follow the instructions below when editing the category tree:

  • As the attribute ID, use a positive integer so that it will not overlap with other categories.
  • As the attribute path, use category path character strings with the period character at the beginning removed. Note that this value is case-sensitive.
  • Attribute features must always be described as features="".
  • After editing the category tree, save the changes in UTF-8 format.

Editing database settings (database_config.xml)

Edit the database settings after editing the category tree in order to prevent the newly registered categories in the category tree from being edited by the Dictionary Editor tool.
Use the text editor to open DATABASE_DIR/conf/database_config.xml, and add category_entry, which is a subelement directly below the category_entries element, as follows.

<category_entries>
<!-- Specifies subroot categories for system-reserved categories. -->
<category_entry name="reserved_by_system" value=".date"/>
<category_entry name="reserved_by_system" value=".date/dd"/>
<category_entry name="reserved_by_system" value=".date/dow"/>
<category_entry name="reserved_by_system" value=".date/yyyy"/>
<category_entry name="reserved_by_system" value=".date/yyyymm"/>
<category_entry name="reserved_by_system" value=".date/yyyyww"/>
<category_entry name="reserved_by_system" value=".date/yyyymmdd"/>
<category_entry name="reserved_by_system" value=".tkm_ja_base_word"/>
<category_entry name="reserved_by_system" value=".tkm_ja_base_phrase"/>
<category_entry name="reserved_by_system" value=".customer_id"/>
<category_entry name="reserved_by_system" value=".fullname"/>
...

Add a category_entry element for each category. Follow the instructions below when editing the category tree:

  • The attribute name value must always be "reserved_by_system."
  • For the attribute value, describe the category path that was added when the category tree was edited. Include the period character at the beginning. Note that this value is case-sensitive.
  • After editing the category tree, save the changes in UTF-8 format.

2.4 Creating Language Processing Resources

Create resources such as categories and patterns of dictionaries or additions if necessary. See the Dictionary Editor Guide for details on editing dictionaries and categories.

2.5 Converting Data into the ATML Format

To convert CSV data into ATML, which is the input data format for OmniFind Analytics Edition, use the Data Ingester. See Data Ingester for details.

2.6 Language Processing (NLP)

Language processing (NLP) refers to the processing of ATML files and the creation of MIML files that contain the results of language processing. See Section 4.1 for details.

2.7 Indexing

Indexing refers to the processing of MIML files to create indices for high-speed mining. See Section 4.2 for details.

2.8 Checking Operations with Text Miner

Once indexing is complete, applications that use indices such as Text Miner can be used. This section describes how to check operations with Text Miner.

  • Ensure that the indexing process has been completed. Check also that the target database is registered in the global settings.
  • Restart WebSphere Application Server. See Stopping and Launching the Server for instructions.
  • Launch the Web browser, type the following address, and access Text Miner.

    Protocol://host name:port number/TAKMI_MINER/

    Protocol, host name, and port number depend on the environment where OmniFind Analytics Edition is installed. The following is an example:

    http://localhost:7080/TAKMI_MINER/

  • Ensure that the screen as shown below is displayed. Click the database name registered in the global settings to make sure that Text Miner starts.

3 Data Conversion

This section describes how to convert CSV data into the ATML format by using the Data Ingester.

3.1 Using the Data Ingester Commands

The Data Ingester is a tool that converts a CSV file into an ATML file. This section describes the procedures for running takmi_data_ingester, which is a command to launch the tool, and how to edit the configuration file.

Running the commands

  • Ensure that the category tree design for the CSV file has been completed.
  • Edit the takmi_data_ingester configuration file. See below for information on how to edit the settings and for the list of areas to be edited.
    See Advanced Tool Settings for further detailed settings. The configuration file is DATABASE_DIRECTORY/conf/data_ingester_config_csv2atml.xml.
  • Open a command window (on Windows) or shell (on AIX®) to run the following commands:
    Windows:
    >  takmi_data_ingester.bat  CONFIG_FILE  CSV_FILE  ATML_FILE
    AIX:
    >  takmi_data_ingester.sh  CONFIG_FILE  CSV_FILE  ATML_FILE
    
    The meaning of the arguments is as follows:

  • CONFIG_FILE: Configuration file
  • CSV_FILE: CSV file to be converted
  • ATML_FILE: Path of the ATML file to be generated. The output destination is arbitrary; however, note that, the ATML file will be a language processing input file.
    Unless necessary, create the output file in the directory DATABASE_DIR/db/atml.

How to edit the settings

This section describes how to edit the settings of the data_ingester_config_csv2atml.xml file. See Advanced Tool Settings for further detailed settings. When editing and saving this file, be sure to set the character code to UTF-8 format.

The following is an example CSV file with four columns. The first line shows column names, and the actual data is in the second line and on. See Converting Target Data into a CSV File for details on the CSV file format.

Date of inquiry Customer ID Name Inquiry
2007/04/01 XX001122 Taro Sato The PC does not start.
2007/04/05 XX00334455 Ichiro Suzuki Prices of new products
:
:
:
:
:
:
:
:

Assume that the following settings are made in order to convert this CSV file into an ATML file.

  • Handle the values in the first column as the standard dates.
  • Relate the second column to the category with the category path .customer_id.
  • Relate the third column to the category with the category path .fullname.
  • Handle the fourth column as text to be analyzed by language processing.
  • Regard the second line in the CSV file as the first line.
How to make these settings is shown below. First, to define the values in the first column as dates, edit the configuration file as shown below. csv.date.format.list is the settings for understanding dates in the "2007/04/01" format.

<param name="csv.column.index.date" multivalued="no">
  <value>1</value>
</param>
    :
    :
<param name="csv.date.format.list" multivalued="true">
  <value>yyyy/MM/dd</value>
</param>

Next, relate each column in the CSV file to a category or text as shown below.

<param name="csv.column.text.indexes" multivalued="yes">
  <value>4</value>
</param>
    :
    :
<param name="csv.column.names" multivalued="yes">
  <value></value>
  <value>.customer_id</value>
  <value>.fullname</value>
  <value>Inquiry</value>
</param>

For csv.column.text.indexes, specify the numbers of the columns that contain the text you wish to analyze by language processing. You may specify multiple columns.

The number of values set for csv.column.names must always be the same as the number of columns in the CSV files. The number is four in this case.
For each value, select one of a category path, label for the text subject for language processing, and a blank character string. Specify a "blank character string" for the column that you wish to use for document ID, document title, or date.
In this example, the value 1 is set to csv.column.index.date in order to relate the first column to date. For this reason, a blank character string is set to the corresponding first value.
The value 4 is set to csv.column.text.indexes in order to use the fourth column as the text subject for language processing. For this reason, the text label "Inquiry" is set to the corresponding fourth value.
For the second and third columns, values to relate them to categories .customer_id and .fullname are set respectively.

See Advanced Tool Settings for details.

Finally, make the following settings to specify the first data line (the second line in this example), and the settings are complete.
Be sure to set the character code to UTF-8 and save the changes.

<param name="csv.row.firstindex" multivalued="no">
  <value>2</value>
</param>

4 Language Processing

This section describes language processing.

4.1 Using the Language Processing Commands

Language processing is done for each input data (ATML file). Before running language processing, it is necessary to allocate the resources to be used in language processing which are updated by the dictionary tools and DOCAT.

Allocating language processing resources

  • Open a command window (on Windows) or shell (on AIX) to run the following commands:
    Windows:
    >  takmi_nlp_resource_deploy.bat  DATABASE_DIRECTORY
    AIX:
    >  takmi_nlp_resource_deploy.sh  DATABASE_DIRECTORY
    
    The meaning of the arguments is as follows:

  • DATABASE_DIRECTORY: Path of the database directory.
Do the above processing only once before starting language processing. Once this processing is run, changes in the language resources in the dictionary tools or DOCAT will be applied to language processing.

Language processing
  • First make sure that ATML files for language processing have already been created, and that these files are located in the DATABASE_DIRECTORY/db/atml directory.
  • Open a command window (on Windows) or shell (on AIX) to run the following commands:
    Windows:
    >  takmi_nlp.bat  DATABASE_DIRECTORY  DATABASE_DIRECTORY/db/atml/input.atml
      DATABASE_DIRECTORY/db/miml/output.miml AIX: >  takmi_nlp.sh  DATABASE_DIRECTORY  DATABASE_DIRECTORY/db/atml/input.atml
      DATABASE_DIRECTORY/db/miml/output.miml
    The meaning of the arguments is as follows:

  • DATABASE_DIRECTORY: Path of the database directory.
  • DATABASE_DIRECTORY/db/atml/input.atml: Path of the input data file.
  • DATABASE_DIRECTORY/db/miml/output.miml: Path of the output data file.
Do not allocate language processing resources until all the input files are processed. If there are already output data files, they will be overwritten. If necessary, delete old data.

5 Indexing

This section describes indexing.

5.1 Using the Indexing Commands

There are two types of processing in indexing: creation of new indices and update of indices by file addition.

A new index is created for all data processed by language processing (MIML files). When adding files to update an index, new MIML files and existing MIML files are combined to create an index.
See Advanced Tool Settings for commands to customize the indexing function.

Creating a new index

  • First make sure that MIML files have been created through language processing, and that all the MIML files to be used in indexing are located in the DATABASE_DIRECTORY/db/miml directory.
  • If necessary, specify the indexing settings in the DATABASE_DIRECTORY/conf/database_config.xml file. In most situations, however, indexing works properly with the default settings.
  • Delete the old index.
  • Open a command window (on Windows) or shell (on AIX) to run the following commands:
    Windows:
    >  takmi_index.bat  DATABASE_DIRECTORY  [HEAP_SIZE_MB]
    AIX:
    >  takmi_index.sh  DATABASE_DIRECTORY  [HEAP_SIZE_MB]
    
    The meaning of the arguments is as follows:

  • DATABASE_DIRECTORY: Path of the database directory.
  • HEAP_SIZE_MB: (optional) Java™ heap size when running the commands. Specify in units of MBs. Type 1000 to omit.
Updating an index by adding files
  • First make sure that MIML files to be added have been created through language processing, and that these MIML files are located in the DATABASE_DIRECTORY/db/miml directory.
  • If necessary, specify the indexing settings in the DATABASE_DIRECTORY/conf/database_config.xml file. In most situations, however, indexing works properly with the default settings.
  • Open a command window (on Windows) or shell (on AIX) to run the following commands:
    Windows:
    >  takmi_index_diff.bat  DATABASE_DIRECTORY  [HEAP_SIZE_MB]
    AIX:
    >  takmi_index_diff.sh  DATABASE_DIRECTORY  [HEAP_SIZE_MB]
    
    The meaning of the arguments is as follows:

  • DATABASE_DIRECTORY: Path of the database directory.
  • HEAP_SIZE_MB: (optional) Java heap size when running commands. Specify in units of MBs. 1000 MB if omitted.
Differing from the creation of a new index, when takmi_index_diff.bat is run, it checks within DATABASE_DIRECTORY/db/miml for data that was newly processed by language processing, and creates an index for it. When there is no newly processed data, the existing index will be updated.

6 Deleting Data

This section describes the deletion of data.

6.1 Deleting Data

If necessary, delete previously created files when running language processing or indexing. To delete the files, be sure to first stop WebSphere Application Server. (The files might not be successfully deleted if the server is active.)
The behavior of any application is not guaranteed if the files have been deleted without stopping WebSphere Application Server.

When you want to add new MIML files that are already processed by language processing to the index, run the following commands in %TAKMI_HOME%/bin to delete the index because only the index must be recreated.
Windows:
> takmi_clear_index.bat  DATABASE_DIRECTORY
AIX:
> takmi_clear_index.sh  DATABASE_DIRECTORY
To reprocess all the data (to update the dictionary, for example), delete both MIML files and the index by running the following commands in %TAKMI_HOME/bin.
Windows:
> takmi_clear_nlp_index.bat  DATABASE_DIRECTORY
AIX:
> takmi_clear_nlp_index.sh  DATABASE_DIRECTORY
Before running this command, a message appears to confirm that you want to run the command. Type "y" to confirm.
Typical examples of data deletion are shown below.

Example of index deletion
Screenshot of the index deletion command execution screen

Example of deletion of the language processing result and index
Screenshot of the language processing result and index deletion command execution screen

As for pre-processed ATML files, establish operational rules so that they are deleted if necessary.

7 Stopping and Launching the Server

This section describes how to stop and launch the server. Web applications of OmniFind Analytics Edition operate on WebSphere Application Server, and all the applications stop or start operating only by stopping or activating WebSphere Application Server.

7.1 Stopping the Server

Stop the operation of WebSphere Application Server by a commonly used method. It can be stopped by using command lines, selecting an appropriate icon from the Windows menu, or by stopping the WebSphere Application Server service registered in Windows services. See the WebSphere Application Server documentation for details.

7.2 Launching the Server

Launch WebSphere Application Server by a commonly used method. It can be activated by using command lines, selecting an appropriate icon from the Windows menu, or by activating the WebSphere Application Server service registered in Windows services. See the WebSphere Application Server documentation for details.

After launching WebSphere Application Server, ensure that the OmniFind Analytics Edition applications are operating properly. Text Miner is properly operating if the database list is displayed when Text Miner is accessed from the Web browser. Note, however, that database names will not be displayed if they are not registered in the global settings.

8 Log Files

This section describes log files that are created by the system.

8.1 Output Destinations of Log Files

Logs are written when the preprocessing or server process is run or is in operation. They are written by default to the following locations:

Type Output destination
preprocessing %TAKMI_HOME%/logs
Server (including Web applications) Log directories of WebSphere Application Server

The log directory of WebSphere Application Server varies depending on the operating environment. See the WebSphere Application Server documentation for details.

The name of a log file varies depending on the application that creates the log. Its name is "application name.log" by default, and you may use it for problem determination. The WebSphere Application Server log files such as SystemOut.log and SystemErr.log are also often useful in problem determination.

8.2 Log File Settings

Settings of the log file are made in the configuration file called DATABASE_DIRECTORY/conf/application name_logging.properties. Specifications of the configuration file conform to the Logging API of Java. See the documentation on the Java Logging API for details.

9 Advanced Tool Settings

This section describes advanced settings of the tools used in the preprocessing phase.

9.1 Data Ingester

By editing the Data Ingester configuration file data_ingester_config_csv2atml.xml, it is possible to customize the method of conversion of CSV files into ATML files.

The format of the data_ingester_config_csv2atml.xml file is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ingester_config SYSTEM "data_ingester_config.dtd">
<ingester_config>
  <data_source impl="com.ibm.research.trl.milne.application.impl.common.prenlp.PreNLPDataSourceCSV">
  </data_source>
  <doc_converter_list>
    <doc_converter impl="com.ibm.research.trl.milne.application.impl.common.prenlp.PreNLPDocumentConverterATML">
    </doc_converter>
  </doc_converter_list>
  <doc_serializer impl="com.ibm.research.trl.milne.application.impl.common.prenlp.PreNLPDocumentSerializerATML">
  </doc_serializer>
</ingester_config>

To each data_source, doc_converter, and doc_serializer, specify a pair of values corresponding to a key as a parameter.
The parameter is provided in the following format, which is common to all three:



<param name="key of this parameter" multivalued="yes">
  <value>AAA</value>
  <value>BBB</value>
</param>

Be sure to specify either "yes" or "no" to the multivalued attribute.
When "no" is specified, only one value can be specified for that parameter.

List of data_source parameters
name Required multivalued Meaning and format of value Example of value Note
csv.character.encoding * no Character encoding of an input CSV file UTF-8, MS932, and so on Specify a character set that Java can interpret.
csv.row.firstindex * no Specify a line in the CSV file as a starting point for incorporating text. 1, 2, 3, ...
csv.column.index.id no Specify column numbers that correspond to document ID character strings, with the value 1 for the leftmost column. 1, 2, 3, ... If omitted, sequential numbers "1," "2," and so on will be automatically given.
csv.column.index.title no Specify column numbers that correspond to document titles, with the value 1 for the leftmost column. 1, 2, 3, ... If omitted, the title of each of the documents will be "".
csv.date.format.list yes This is a date format list for interpreting the data in column specified in csv.column.index.date as dates.
This must conform to the pattern supported by the Java class java.text.SimpleDateFormat.
yyyyMMdd, MM-dd-yyyy, yyyy/MM/dd, and so on If omitted, only
"yyyyMMdd"
is used.

Specify the value of csv.date.format.list in accordance with the character string format used for date data in CSV.
csv.column.index.date no Specify column numbers that correspond to document dates, with the value 1 for the leftmost column. 1, 2, 3, ... If omitted, the date of processing is incorporated in each document as the .date standard item. If character strings that are not appropriate for the format are included in the input data, the date of processing will be incorporated in that document (the same behavior as omission). Logs show data character strings that failed to be successfully processed.
csv.column.text.indexes yes Specify the column numbers containing text you wish to analyze by language processing, with the value 1 for the leftmost column. 1, 2, 3, ...
csv.column.names * yes Attach a label (character string) to each line in the input CSV file.
Be sure to always set the same number of values as the number of columns in the CSV file. While maintaining the sequence, values will correspond to the order of CSV columns.
The attached label functions as a category path designed for a standard item. However, date setting, text label, and space setting are the exceptions. The following rules apply to these settings.
  • Date setting: For columns specified by the parameter csv.column.index.date, the path set by this parameter will not be incorporated.
  • Text label: Name the columns specified by the parameter csv.column.text.indexes.
  • Space setting: When space characters are used to set value, data in the corresponding column will not be incorporated.
A specific example is shown below. Assume that there is an input CSV file with four columns and that csv.column.names is set as follows:

<param name="csv.column.names" multivalued="yes">
  <value>.aaa</value>
  <value></value>
  <value>.customer_id</value>
  <value>TEXT</value>
</param>

In this case, the first column corresponds to the category ".aaa," and the third column corresponds to the category ".customer_id." Since the second column is set with space characters, the data in the second column will not be incorporated in ATML. Note that this space setting cannot be omitted.
The set value "TEXT" in the fourth column is not a category path (since it does not start with the period character), an error occurs if the value 4 is not included in the csv.column.text.indexes setting. If this value is included, it will be incorporated in ATML as a text label.

List of doc_converter parameters
name Required multivalued Meaning and format of value Example of value Note
string.converter.locale no Determine if language-dependent conversion is required for the text subject to language processing. ja If omitted, language-dependent conversion is not run. If it is set to ja, all single-byte characters are converted to double-byte characters.

List of doc_serializer parameters
name Required multivalued Meaning and format of value Example of value Note
atml.indent.size no Specify the number of space characters to be used as tag indentation in the output ATML file.
Specify 0 for no indentation.
0, 1, 2, ... 0 when omitted.

9.2 Language Processing

List of arguments for the language resource deployment processing (takmi_nlp_resource_deploy.bat(.sh))
Parameter Description
DATABASE_DIRECTORY A user dictionary file (.adic.xml) in dic/category/ of the specified database directory and a resource file that reads and uses category/category_tree.xml in language processing will be updated. Files in DATABASE_DIRECTORY/category, DATABASE_DIRECTORY/dic, and DATABASE_DIRECTORY/ie will be updated.

List of arguments for language processing (takmi_nlp.bat(.sh))
Parameter Description
DATABASE_DIRECTORY Read and run language processing on the files in the category, dic, and ie directories in the specified database directory.
DATABASE_DIRECTORY/db/atml/input.atml This is an input file for language processing. It is used only in language processing. It is not viewed by applications such as Text Miner.
DATABASE_DIRECTORY/db/miml/output.atml This is an output file for language processing. If the same file already exists, it will be overwritten. This is referred to by the original document display function of Text Miner. Therefore, if the file is designed to be overwritten by language processing, it is necessary to stop Text Miner (and applications accessed from Text Miner).

Format of a user dictionary (.adci.xml) file used in language processing is as follows.

<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
  <entry id="0" lex="Personal computer" pos="noun" cat=".category1" />
  <entry id="1" lex="Software system" pos="noun" cat=".category2" />
  <entryC id="2" str="PC" pos="noun" eid="0" />
  <entryC id="3" str="Software" pos="noun" eid="1" />
<dictionary>

The word entry means a keyword, and entryC means a synonym. Synonym entryC must be defined after keyword entry. Relevant attributes are as follows.

Attribute Description
id A dictionary entry ID. Specify an integer larger than 0. This cannot overlap with others.
lex Keyword. Do not define keywords having the identical part-of-speech (pos) or category (cat) information.
pos Part-of-speech information. Only noun is supported.
cat Category information. Specify information defined in category_tree.xml. Proper operations are not guaranteed if undefined information is input.
str Synonym information. Do not define synonyms having the identical pos or entry ID (eid).
eid Specify an ID of the corresponding keyword. Proper operations are not guaranteed if an ID that does not exist is specified. Create separate entries to relate the ID to multiple keywords.
9.3 Indexing

takmi_index, which is run at the time of indexing, consists of a few different types of sub-processes. Description of each of them is provided below in the order of processing in takmi_index.

Processing (tool) Description
takmi_generate_config Updates the MIML file list in database_config.xml for the specified database.
takmi_index_singlebuild Processes the MIML files individually and creates an intermediate file index for each MIML file.
takmi_index_filemerge Merges individual file indexes created in takmi_index_singlebuild to create an intermediate file index that contains information for all the MIML files.
At this point, the index cannot be used yet. Processing by takmi_index_groupmerge is necessary.
takmi_index_groupmerge Processes the merged file index by takmi_index_filemerge to create a final index.

In regular indexing, the processing listed above is carried out by running takmi_index in sequence, and then an index is created.
If there are many MIML files, in consideration of the volume of memory use, you might choose to run "intermediate merging" to carry out the merge processing in several batches. The number of MIML files that requires the intermediate merge processing depends on the average document size or the volume of language processing resources, but it is about 200 files. Consider using the intermediate merge processing if indexing fails with a smaller number of files.

The tool necessary for doing intermediate merge processing is takmi_index_filemidmerge. This is run instead of takmi_index_filemerge in the takmi_index_filemerge phase.

takmi_index_filemidmerge has the following two functions:

  • It merges intermediate file indexes for the specified MIML files.
  • It further merges already merged intermediate indexes and creates a new intermediate index that contains information about all the MIML files.

How to create an index by using the intermediate merge processing is shown below.

First, ensure that the old index has been deleted, and run takmi_generate_config and takmi_index_singlebuild by following the procedures below.

>  takmi_generate_config  -dbdir DATABASE_DIRECTORY  -template DATABASE_CONFIG_FILE
>  takmi_index_singlebuild  DATABASE_DIRECTORY  [HEAP_SIZE_MB]

Then, select a MIML file group to be merged in the intermediate merge processing. Use a text editor to open DATABASE_DIRECTORY/conf/database_config.xml, search the data_entries elements, and find the list shown below.

<data_entries min_doc_id="0" max_doc_id="399999">
<data_entry path_type="relative" path="db\miml\sample.1.miml" type="miml" min_doc_id="0" max_doc_id="99999"/>
<data_entry path_type="relative" path="db\miml\sample.2.miml" type="miml" min_doc_id="100000" max_doc_id="199999"/>
<data_entry path_type="relative" path="db\miml\sample.3.miml" type="miml" min_doc_id="200000" max_doc_id="299999"/>
<data_entry path_type="relative" path="db\miml\sample.4.miml" type="miml" min_doc_id="300000" max_doc_id="399999"/>
<data_entry path_type="relative" path="db\miml\sample.5.miml" type="miml" min_doc_id="400000" max_doc_id="499999"/>
<data_entry path_type="relative" path="db\miml\sample.6.miml" type="miml" min_doc_id="500000" max_doc_id="599999"/>
</data_entries>

By running takmi_index_filemidmerge, the intermediate merge processing can be partially done for MIML files.
Run the following command to merge sample.1.miml and sample.2.miml into an intermediate index.

>  takmi_index_filemidmerge  DATABASE_DIRECTORY -from 0 -to 199999
In the same manner, run the following commands in sequence to first merge sample.3.miml and sample.4.miml, and then to merge sample.5.miml and sample.6.miml.
>  takmi_index_filemidmerge  DATABASE_DIRECTORY -from 200000 -to 399999
>  takmi_index_filemidmerge  DATABASE_DIRECTORY -from 400000 -to 599999

Specify the –from value and the –to value such that the sum of the values within the section defined by –from and –to matches with the section defined by the min_doc_id and max_doc_id of data_entries.

When the processing is completed for all the sections, run the following command to merge the indexes of individual sections.

>  takmi_index_filemidmerge  DATABASE_DIRECTORY 
	-intervals  0-199999  200000-399999  400000-599999

When using the –intervals option, all the sections set in the processing 1. can be specified in the "from-to" format.

When all the processing above is successfully completed, run takmi_group_index. This will complete indexing.

>  takmi_index_groupmerge  DATABASE_DIRECTORY  [HEAP_SIZE_MB]

9.4 DOCAT (in Japanese only)

Before deploying the language processing resources, it is necessary to set categories for categorizing documents in database_config.xml.
See the DOCAT Instruction Manual(in Japanese only) for the DOCAT settings.

10 List of Tools

The tools used in OmniFind Analytics Edition are located in the directory. The list of these tools is shown below. For Windows, the file names have the extension ".bat" and for AIX the file names have the extension ".sh."

Tool takmi_alert_correlation
Function Does the correlation detection batch processing in the Alerting System.
How to use takmi_alert_correlation DATABASE_NAME MAXIMUM_ANALYSIS_TIME_BY_MINUTE JAVA_HEAP_SIZE_BY_MEGA_BYTES
Arguments
  • DATABASE_NAME
    The database name defined in global_config/database_entries/database_entry@name of global_config.xml.

  • MAXIMUM_ANALYSIS_TIME_BY_MINUTE
    The maximum analysis time in units of minutes.

  • JAVA_HEAP_SIZE_BY_MEGA_BYTES
    The Java heap size for analysis in units of megabytes.
Tool takmi_alert_increase
Function Does the increase detection batch processing in the Alerting System.
How to use takmi_alert_increase DATABASE_NAME MAXIMUM_ANALYSIS_TIME_BY_MINUTE JAVA_HEAP_SIZE_BY_MEGA_BYTES
Arguments
  • DATABASE_NAME
    The database name defined in global_config/database_entries/database_entry@name of global_config.xml.

  • MAXIMUM_ANALYSIS_TIME_BY_MINUTE
    The maximum analysis time in units of minutes.

  • JAVA_HEAP_SIZE_BY_MEGA_BYTES
    The Java heap size for analysis in units of megabytes.
Tool takmi_clear_index
Function For the specified database, deletes the index generated by indexing.
How to use takmi_clear_index DATABASE_DIRECTORY
Arguments
  • DATABASE_DIRECTORY
    The full path of the database directory.
Tool takmi_clear_nlp_index
Function For the specified database, deletes the MIML files generated by language processing and the index generated by indexing.
How to use takmi_clear_nlp_index DATABASE_DIRECTORY
Arguments
  • DATABASE_DIRECTORY
    The full path of the database directory.
Tool takmi_data_ingester
Function Converts the specified CSV file into an ATML file.
How to use takmi_data_ingester CONFIG_FILE CSV_FILE ATML_FILE
Arguments
  • CONFIG_FILE
    The Configuration file. See section 9.1 for details.
  • CSV_FILE
    A CSV file to be converted.
  • ATML_FILE
    Path of the ATML file to be generated. The output destination is arbitrary; however, note that the ATML file will be a language processing input file.
    Unless necessary, create the file in the directory DATABASE_DIR/db/atml.
Tool takmi_generate_config
Function For the specified database, updates the MIML file list in database_config.xml.
How to use takmi_generate_config -dbdir DATABASE_DIRECTORY -template DATABASE_CONFIG_FILE [-diff]
Arguments
  • -dbdir DATABASE_DIRECTORY
    The path of the database directory.
  • -template DATABASE_CONFIG_FILE
    The database_config.xml file to be updated.
  • -diff
    (optional) Adds only the MIML files that are not yet registered in database_config.xml.
Tool takmi_index
Function For the specified database, does new indexing processing.
How to use takmi_index DATABASE_DIRECTORY [HEAP_SIZE_MB]
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
  • HEAP_SIZE_MB
    (optional) The Java heap size when running commands in units of megabytes. 1000 MB when omitted.
Tool takmi_index_diff
Function Does differential indexing for the specified database.
How to use takmi_index_diff DATABASE_DIRECTORY [HEAP_SIZE_MB]
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
  • HEAP_SIZE_MB
    (optional) The Java heap size when running commands in units of megabytes. 1000 MB when omitted.
Tool takmi_index_filemerge
Function In indexing, it does the file merge processing. It is called from takmi_index.
How to use takmi_index_filemerge DATABASE_DIRECTORY [HEAP_SIZE_MB]
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
  • HEAP_SIZE_MB
    (optional) The Java heap size when running commands in units of megabytes. 1000 MB when omitted.
Tool takmi_index_filemidmerge
Function In indexing, it does the file merge processing in batches.
How to use takmi_index_filemidmerge DATABASE_DIRECTORY -from xxx -to yyy

or

takmi_index_filemidmerge DATABASE_DIRECTORY -intervals from1-to1 from2-to2 ...
Arguments See 9.3 Indexing for details.
Note that the Java heap size for intermediate merging is set to 1,000 MB by default.
If you want to change it, externally set the value for the environment variable JAVA_HEAP_SIZE_BY_MEGA_BYTES_FILEMIDMERGE in units of MBs, and then run the tool.
For example, run the following command in Windows:

> set JAVA_HEAP_SIZE_BY_MEGA_BYTES_FILEMIDMERGE=1500
> takmi_index_filemidmerge ...
Tool takmi_index_groupmerge
Function In indexing, it does the group merge processing. It is called from takmi_index.
How to use takmi_index_groupmerge DATABASE_DIRECTORY [HEAP_SIZE_MB]
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
  • HEAP_SIZE_MB
    (optional) The Java heap size when running commands in units of megabytes. 1000 MB when omitted.
Tool takmi_index_singlebuild
Function In indexing, it processes individual MIML files separately. It is called from takmi_index.
How to use takmi_index_singlebuild DATABASE_DIRECTORY [HEAP_SIZE_MB]
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
  • HEAP_SIZE_MB
    (optional) The Java heap size when running commands in units of megabytes. 1000 MB when omitted.
Tool takmi_index_singlebuild_diff
Function In indexing, it processes individual MIML files separately. It is called from takmi_index_diff.
How to use takmi_index_singlebuild_diff DATABASE_DIRECTORY [HEAP_SIZE_MB]
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
  • HEAP_SIZE_MB
    (optional) The Java heap size when running commands in units of megabytes. 1000 MB when omitted.
Tool takmi_nlp
Function By using the language processing resources of the specified database, it does language processing on ATML files to create MIML files.
How to use takmi_nlp DATABASE_DIRECTORY DATABASE_DIRECTORY/db/atml/input.atml DATABASE_DIRECTORY/db/miml/output.miml
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
  • DATABASE_DIRECTORY/db/atml/input.atml
    The path of the input data file.
  • DATABASE_DIRECTORY/db/atml/input.atml
    The path of the output data file.
Tool takmi_nlp_resource_deploy
Function For the specified database, it deploys the language processing resources.
How to use takmi_nlp_resource_deploy DATABASE_DIRECTORY
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
Tool takmi_remove_inactive_index
Function For the specified database, it deletes intermediate indexes that are not in use.
How to use takmi_remove_inactive_index DATABASE_DIRECTORY
Arguments
  • DATABASE_DIRECTORY
    The path of the database directory.
Tool takmi_set_cp
Function It sets environment variables that are necessary for language processing.
How to use takmi_set_cp
Arguments None

11 List of Files

This chapter describes directories and files used by OmniFind Analytics Edition.

The following directories and files are created when databases are created.
Directory and file nameRequiredChanges during the operationDescription
category/category_tree.xmlYesYesIt stores category information. The information is changed or categories are added when the Dictionary Editor is used to save the category tree.
conf/database_config.xmlYesYesIt has information such as modules to be used or MIML files to be indexed. It is changed as data is added or changed.
conf/database_config_dictionary.xmlYesYesIt has information on already defined categories to be used by the Dictionary Editor. It is changed when standard item information is changed.
conf/database_config_miner.xmlYesNoIt has view information and displayed category information of Text Miner.
conf/database_config_alerting_system.xmlYesNoIt has Alerting System setting information.
conf/database_config_docat.xmlYesNoIt has DOCAT setting information.
conf/data_ingester_config_csv2atml.xmlYesNoIt has Data Ingester setting information.
conf/default,anonymousYesNoIt stores the Dictionary Editor configuration file (it is overwritten each time the tool is used).
db/atml/*.atmlYesYesThe input file for language processing.
db/miml/*.mimlYesYesThe output file for language processing.
db/indexYesYesIt stores the index file that Text Miner uses.
dic/candidateNoYesIt stores the dictionary candidate word list.
dic/category/*.adic.xmlNoYesIt stores a dictionary file group for attaching categories (no files by default). Files are created as Dictionary Editor is used.
dic/jsa/*.jmaNoYes(In Japanese only) It is a user dictionary file group (no files by default). Files are created as Dictionary Editor is used.
dic/jsa/*.ddfNoYes(In Japanese only) It has information on the user dictionary file group. It updates itself by using the *jma file when language processing is activated.
dic/jsa/takmi.dsoNoYes(In Japanese only) It has the user dictionary information to be used in language processing. It updates itself when language processing is activated.
pattern/dictionary.patNoYesIt is an information extraction pattern. It updates itself when the Dictionary Editor is used to update the category tree.
alertingYesYesThis is a directory for the Alerting System. Directories and files are created when Alerting System tools are used.
ieYesYes(In Japanese only) It is a directory for DOCAT. Directories and files are created when DOCAT tools are used.

The following directories and files are referred to and changed in the language resource deployment processing and in language processing. Do not deploy the language resources during language processing.
Directory and file name Read operation during the resource deployment processing Write operation during the resource deployment processing Read operation during language processing Write operation during language processing Description
category/category_tree.xml YesYes YesNo When a category is updated in Dictionary Editor, a dependency category is created in the added category. Therefore, stop operating Dictionary Editor during this process.
conf/database_config.xml YesNo YesNo It acquires language information.
conf/database_config_docat.xml YesNo NoNo It reads DOCAT parameter information (working_directory).
dic/category/*.adic.xml YesNo YesNo This file is created when Dictionary Editor is used. Although this file will not be changed, it is still necessary to stop operating Dictionary Editor during this process. This is read during the synonym processing or category attachment processing.
dic/jsa/*.jma NoYes YesNo (In Japanese only) This is created from the *.adic.xml file created by Dictionary Editor. This will not be created if the *.adic.xml file does not exist. This is read by the parser.
dic/jsa/*.ddf NoYes YesNo (In Japanese only) This is created from the *.adic.xml file created by Dictionary Editor. This will not be created if the *.adic.xml file does not exist. This is read by the parser.
dic/jsa/takmi.dso NoYes YesNo (In Japanese only) This is created from the *.adic.xml file created by Dictionary Editor. This will not be created if the *.adic.xml file does not exist. This is read by the parser.
dic/LangWare50/*.* NoYes YesNo (In English only) This is created from the *.adic.xml file created by Dictionary Editor. This will not be created if the *.adic.xml file does not exist. This is read by the parser.
pattern/auto_generated.pat NoYes YesNo This is created from the *.adic.xml file and category_tree.xml created by Dictionary Editor. Therefore, stop operating Dictionary Editor during this process. This is read during the expression extraction processing.
ie/categorization/*.feature.xml YesYes YesNo (In Japanese only) This is a categorization trigger information file created by DOCAT. A file is created for each category and is updated as DOCAT is used. Therefore, stop operating Dictionary Editor during this process. This is read during the categorization processing by DOCAT.
ie/categorization/*.model YesYes YesNo (In Japanese only) This is a categorization model file created by DOCAT. A file is created for each category and is updated as DOCAT is used. Therefore, stop operating Dictionary Editor during this process. This is read during the categorization processing by DOCAT.
ie/categorization/*.scfeature.xml YesNo(Yes) YesNo (In Japanese only) This is a categorization trigger search condition file created by DOCAT. A file is created for each category and is updated as DOCAT is used. Therefore, stop operating Dictionary Editor during this process. Files are overwritten if working_directory is set. This is read during the categorization processing by DOCAT.
ie/categorization/*.annotation.xml YesNo NoNo (In Japanese only) This is a document selection part record file created by DOCAT. A file is created for each category and is updated as DOCAT is used. Therefore, stop operating Dictionary Editor during this process. This is not referred to during language processing.

The following directories and files are referred to and changed when the Alerting System is used.
Directory and file nameRead operation during the system operationWrite operation during the system operationDescription
alerting/setting--This is created when Alerting System is launched. Files created in this directory are used only by Alerting System; therefore, it has no effect on tasks such as addition of data.
alerting/setting/increase_detection_settingYesNoThis is a parameter setting file for increase detection. It is updated as the parameter settings are changed.
alerting/setting/correlation_detection_settingYesNoThis is a parameter setting file for correlation detection. It is updated as the parameter settings are changed.
alerting/batch--This is created when Alerting System is launched. Files created in this directory are used only by Alerting System; therefore, it has no effect on tasks such as addition of data.
alerting/batch/increase_detection_report.xmlNoYesThis is the result of the regular batch processing of increase detection. It is updated as the regular batch processing is carried out.
alerting/batch/correlation_detection_report.xmlNoYesThis is the result of the regular batch processing of correlation detection. It is updated as the regular batch processing is carried out.

The following directories and files are referred to and changed when DOCAT is used.
Directory and file nameRead operation during the system operationWrite operation during the system operationDescription
ie/categorization/*.feature.xmlYesYes(In Japanese only) This is a categorization trigger information file created by DOCAT. A file is created for each category and is updated as DOCAT is used.
ie/categorization/*.modelYesYes(In Japanese only) This is a categorization model file created by DOCAT. A file is created for each category and is updated as DOCAT is used.
ie/categorization/*.scfeature.xmlYesNo(In Japanese only) This is a categorization trigger search condition file created by DOCAT. A file is created for each category and is updated as DOCAT is used.
ie/categorization/*.annotation.xmlYesNo(In Japanese only) This is a document selection part record file created by DOCAT. A file is created for each category and is updated as DOCAT is used.

Terms of Use

Notices
This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A. 
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to:

IBM World Trade Asia Corporation
Licensing
2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan 
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact:

IBM Corporation
Silicon Valley Lab
Building 090/H-410
555 Bailey Avenue
San Jose, CA 95141-1003
U.S.A.
Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee.

The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

Copyright License
This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

Trademarks
This topic lists IBM trademarks and certain non-IBM trademarks.

See http://www.ibm.com/legal/copytrade.shtml for information about IBM trademarks.

The following terms are trademarks or registered trademarks of other companies:

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product or service names might be trademarks or service marks of others.