Index

DKDatastoreTS

Purpose:

The DKDatastoreTS class is a specific version of dkDatastore to implement the Text Search (TS) datastore. Essentially, TS provides text indexing and search mechanisms. It does not really store documents or folders. TS indexes text parts of documents and process search requests using this index. The results of a text query submitted to TS are item IDs, which are keys to retrieve the actual documents from Content Manager datastore.

The execute() and evaluate() functions of DKDatastoreTS takes text query strings expressed in text query language type. The syntax of this query string is described below. The DKTextQuery object accepts queries in this syntax; in fact the DKTextQuery object delegates the low level query processing tasks to DKDatastoreTS.

Class summary:

class DKDatastoreTS: public dkDatastore     {     ...
    ...
   dkDatastoreTS () 
   void connect (const char* datastore_name,
                 const char* user_name = "",
                 const char* authentication = "",
                 const char* connect_string = "");
   void connect (const char* server_name,
                 const char* port, char communication_type);
 
   virtual void disconnect();
   virtual void getOption(long option, DKAny& value);
   virtual void setOption(long option, DKAny& value);
   virtual DKAny evaluate(const char* command,
                          const short commandkangType =
                             DK_CM_TEXT_QL_TYPE,
                          const DKNVPair* parms = 0);
   virtual DKAny evaluate( dkQuery* query);
   virtual DKAny evaluate(DKCQExpr* qe);
   virtual dkResultSetCursor* execute( const char* command,
                                       const short commandLangType =
                                         DK_CM_TEXT_QL_TYPE,
                                       const DKNVPair* parms = 0);
   virtual dkResultSetCursor* execute( dkQuery* query);
   virtual dkResultSetCursor* execute( DKCQExpr* qe);
   virtual void executeWithCallback( const char* command,
                                     const short commandLangType,
                                     const DKNVPair* parms,
                                     dkCallback* callbackObj);
   virtual void executeWithCallback( dkQuery* query,
                                     dkCallback* callbackObj);
   virtual void executeWithCallback( DKCQExpr* qe,
                                     dkCallback* callbackObj);
   virtual dkQuery*  createQuery(const char* command,
                                 const short commandLangType =
                                    DK_CM_TEXT_QL_TYPE,
                                 const DKNVPair* parms=0);
   virtual dkQuery*  createQuery(DKCQExpr* qe);
   virtual DKBoolean isConnected();
   virtual DKString datastoreName() const;
   virtual DKString datastoreType() const;
   virtual DKHandle* connection();
   virtual DKHandle* handle(const char* type);
   virtual DKString userName() const;
   virtual dkCollection* listDataSources();
   virtual DKString* listDataSourceNames(long& arrarySize);
   virtual DKAny listServers();
   virtual DKAny listSchema();
   virtual dkCollection* listEntities();
   virtual DKString* listEntityNames(long& arraySize);
   virtual dkDatastoreDef* datastoreDef();    
   virtual void startUpdateIndex(const char* indexName);
   virtual void clearIndex(const char* indexName);
   virtual void createIndex(DKIndexInfoTS* newIndex);
   virtual void deleteIndex(const char* indexName);
   virtual DKIndexInfoTS* getIndexInformation(const char* indexName);
   DKIndexFuncStatusTS* getIndexFunctionStatus(const char* indexName);
   void setIndexFunctionStatus(const char* indexName, long actionId);
   virtual DKMatchesInfoTS* getMatches(dkResultSetCursor* cursor,   
                                       const char* documentId,
                                       const char* textIndexName,
                                       dkBoolean useDictionary);
};

Members:

Member functions

connect
Connects to the datastore: the userName and password are for the Content Manager server and the datastore_name is the name of the search service.

The connect_string is optional; it is used to provide the communication type and port number, as well as a listing of library server, user ID and authentication groupings.

Below is a sample of a connect string an end user may supply:

    [COMMTYPE={T | P}; PORT=portnumber;
    LIBACCESS=(libraryserver, userid, password;...)]

Additional connect string parameters:

COMMTYPE:
communication type. This can be set to T (for TCPIP) or P (for PIPES).

PORT:
port number. This parameter must be included if the COMMTYPE is specified.

LIBACCESS:
library access group. If this parameter is passed, you should not specify the userName and password parameters in the connect function. Each library access group is related to a Content Manager server. If one library access group is specified, the parentheses are not needed; you can specify one or more library access groups. Each library access group consists of the library server name (for example, LIBSRVR2), user ID and password of a Content Manager server (which is where the text parts are stored).

There are different ways to engage the connect function. Below is a listing of the different ways to connect with Text Search:

  • connect with datastore_name (search service)
  • connect with datastore_name (text search server) and specify connect_string with COMMTYPE and PORT
  • connect with datastore_name (search service), and specify the connect_string with LIBACCESS
  • connect with datastore_name (text search service), and specify the connect_string with COMMTYPE, PORT and LIBACCESS

Exceptions

  • DKDatastoreAccessError

void connect ( const char* datastore_name,
const char* userName = "",
const char* password = "",
const char* connect_string = ""); 

connect (with 3 parameters)
Connects to the datastore. The server_name is the host name of the machine where the text search server is located. You need to specify the communication type (DK_CTYP_TCPIP (for TCPIP) or DK_CTYP_PIPES (for PIPES)), and port number.

You can also connect to the datastore if you supply the search service name for server_name, an empty string for port, and a blank character (' ') for communication_type.

Exceptions

  • DKDatastoreAccessError

void connect (const char* server_name,
const char*  port, char communication_type); 

disconnect
Disconnects from a datastore.
virtual void disconnect();
 

getOption
Gets a datastore option.
virtual void getOption(long option, DKAny& value);

setOption
Sets a datastore option.
virtual void setOption(long option, DKAny& value);

evaluate
Evaluates the query.

Parameters

command
A query string.

commandLangType
A query type.

parms
An additional query option in a name/value pair.

query
A query object.

qe
A common query expression object.
virtual DKAny evaluate(const char* command,
const short commandLangType =
DK_CM_TEXT_QL_TYPE,
const DKNVPair* parms = 0);
   virtual DKAny evaluate( dkQuery* query);
   virtual DKAny evaluate(DKCQExpr* qe); 

execute
Executes the query.

Parameters

command
A query string.

commandLangType
A query type.

parms
An additional query option in a name/value pair.

query
A query object.

qe
A common query expression object.
   virtual dkResultSetCursor* execute( const char* command,
const short commandLangType =
DK_CM_TEXT_QL_TYPE,
const DKNVPair* parms = 0);
   virtual dkResultSetCursor* execute( dkQuery* query);
   virtual dkResultSetCursor* execute( DKCQExpr* qe);

executeWithCallback
Executes the query with callback function.

Parameters

command
A query string.

commandLangType
A query type.

parms
An additional query option in a name/value pair.

callbackObj
A dkCallback object.

query
A query object.

qe
A common query expression object.
virtual void executeWithCallback( const char* command,
const short commandLangType,
const DKNVPair* parms,
dkCallback* callbackObj);
   virtual void executeWithCallback( dkQuery* query,
dkCallback* callbackObj);
   virtual void executeWithCallback( DKCQExpr* qe,
dkCallback* callbackObj);

createQuery
Creates a query object.

Parameters

command
A query string.

commandLangType
A query type.

parms
An additional query option in a name/value pair.

qe
A common query expression object.
virtual dkQuery*  createQuery(const char* command,
const short commandLangType =
DK_CM_TEXT_QL_TYPE,
const DKNVPair* parms=0);
   virtual dkQuery*  createQuery(DKCQExpr* qe);

isConnected
Checks to see if the datastore is connected.
virtual DKBoolean isConnected();
 

datastoreName
Gets the name of this datastore object. Usually it represents a datastore source's server name.
virtual DKString datastoreName() const;
 

datastoreType
Gets the datastore type for this datastore object.
virtual DKString datastoreType() const;
 

connection
Gets the connection handle for a datastore.
virtual DKHandle* connection();
 

handle
Gets a datastore handle.

Parameters
type -- The type of datastore handle wanted.

virtual DKHandle* handle(const char* type);

userName
Gets the user name for this datastore object.
virtual DKString userName() const;
 

listDataSources
Lists the available datastore sources that can be used to connect with.
virtual dkCollection* listDataSources();
 

listDataSourceNames
List the available datastore source names that can be used to connect with.
virtual DKString* listDataSourceNames(long& arrarySize);
 

listServers
listServers has been deprecated and replaced by listDataSources.
virtual DKAny listServers();
 

listSchema
listSchema has been deprecated and replaced by listEntities.
virtual DKAny listSchema();
 

listEntities
Gets a list of entities from the persistent datastore.
virtual dkCollection* listEntities();
 

listEntityNames
Gets a list of entity names from the persistent datastore.
virtual DKString* listEntityNames(long& arraySize);

datastoreDef
Gets the datastore definition.
virtual dkDatastoreDef* datastoreDef();    
 

startUpdateIndex
startUpdateIndex has been deprecated and replaced by startUpdateIndex in the DKDatastoreAdminTS class.
virtual void startUpdateIndex(const char* indexName);

clearIndex
clearIndex has been deprecated and replaced by clearIndex in the DKDatastoreAdminTS class.
virtual void clearIndex(const char* indexName);

createIndex
createIndex has been deprecated and replaced by createIndex in the DKDatastoreDefTS class.
virtual void createIndex(DKIndexInfoTS* newIndex);

deleteIndex
deleteIndex has been deprecated and replaced by deleteIndex in the DKDatastoreDefTS class.
virtual void deleteIndex(const char* indexName);

getIndexInformation
getIndexInformation has been deprecated and replaced by getIndexInformation in the DKDatastoreAdminTS class.
virtual DKIndexInfoTS* getIndexInformation(const char* indexName);
 

getIndexFunctionStatus
getIndexFunctionStatus has been deprecated and replaced with getIndexFunctionStatus in the DKDatastoreAdminTS class.
 DKIndexFuncStatusTS* getIndexFunctionStatus(const char* indexName); 

setIndexFunctionStatus
setIndexFunctionStatus has been deprecated and replaced with setIndexFunctionStatus in the DKDatastoreAdminTS class.

Sets indexing function status for a text search index. You need to establish a connection to the server before calling this function. The indexName is the text search index used to set indexing status information.

The actionId is DK_TSINDEXACTID_ENABLE, DK_TSINDEXACTID_DISABLE or DK_TSINDEXACTID_RESET. Use the DK_TSINDEXACTID_RESET if the reason code was set and the function was stopped, allowing another startUpdateIndex to be done after the Text Search Engine error has been corrected. All constants are defined in the DKConstant2.h file.

Exceptions

  • DKDatastoreAccessError
  • DKDatastoreError

void setIndexFunctionStatus(const char* indexName, long actionId); 

getMatches
virtual DKMatchesInfoTS* getMatches(dkResultSetCursor* cursor,
const char* documentId,
const char* textIndexName,
dkBoolean useDictionary);
 

Text Search text query string

The syntax of text query string is as follows:

 
      SEARCH=(COND=(text_search_expression)
             );
     [OPTION=([SEARCH_INDEX={search_index_name | (index_list) };]
               [MAX_RESULTS=maximum_results;]
               [THES_NAME=thesaurus_index_name;]
               [THES_DEPTH=depth_for_query_expansion;]
               [TIME_LIMIT=time_limit]
               [MATCH_INFO=yes_no;]
                        [RANKING=yes_no;]
                        [SORT=yes_no;]
                [MATCH_DICT=yes_no]
              )]
 

Words in uppercase are keywords. Lowercase words are parameters supplied by users; they are described below. Note that DBCS (double-byte character set) characters must be enclosed in SBCS single quotes, like a phrase. For more information about options, refer to the EhwSearch chapter of the Text Search Engine Application Programming Reference.

text_search_expression

This is an expression composed of a free_text_expression or a boolean_query, followed by an optional free_text_expression. A boolean_query followed by a free_text_expression is known as a hybrid query.

{boolean_query  [free_text_expression] | free_text_expression}

Notice that only one boolean query and/or one optional free_text_expression is allowed. If a boolean query is requested, this should be specified first. For more information about options, refer to the EhwSearch chapter of the Text Search Engine Application Programming Reference.

boolean_query:

[unary_operator] text_search_criteria
[[binary_operator [unary_operator] text_search_criteria] ... ] 

Binary operators are AND or &, OR or |. NOT is the only unary operator. Parentheses are treated as a subquery. A subquery changes the default order of processing for the binary operators. For example, a query that includes parentheses would have the following syntax: UNIX AND (ibm OR system). The information located inside the parentheses, "(ibm OR system)," is a subquery contained inside of a query.

Search argument:text_search_criteria is one of the following keyword/options, where the dollar sign delimits the keyword/option:

         { search_argument                 |
        $DOC$  '{' proximity_search_argument '}'  |
        $PARA$ '{' proximity_search_argument '}'  |
        $SENT$ '{' proximity_search_argument '}'
      }

The following options specify proximity search conditions, which require search arguments. These consist of at least a pair of words or phrases:

$DOC$
reserved word indicating that the search proximity expression in search argument has a scope of the whole document

$PARA$
indicating that the search proximity expression in search argument has a scope of a paragraph

$SENT$
indicating that the search proximity expression in search argument has a scope of a sentence

search_argument can be more than one word or phrase:

      [$search_option$] {word | phrase} [$search_option$] {word | phrase}...]

proximity_search_argument:

      [$search_option$] {word | phrase} [$search_option$] {word | phrase}
      [$search_option$] [{word | phrase}...]

Each word or phrase can be preceded by the "-$search_options$-" tag.

The dollar sign delimits search_option. Options inside a pair of dollar sign are separated by comma, and can have the following values.

The NOT operator is not allowed with the keywords $DOC$, $PARA$, or $SENT$.

The valid codes and ids can be found in the DKConstant2.h file, in the users' include directory. They need to be converted from number values to string values for ccode and langid.

DOCMOD
one or more Document model elements separated by semi colons.

DOCMODNAME
the name of the document model that a section list is defined for.

SECLIST
the list of sections that are defined to the model defintions file. For GTR type indexes only one entry is allowed in this list. If there is more than one item in the list they are separated by commas.

CCSID=ccode:
an option that specifies the character code for a country.

LANG=langid:
an option that specifies the language ID for a country.

SC=symbol:
symbol to indicate a single required character, usually a question mark (?). This must come before the MC=symbol if both SC and MC are specified.

MC=symbol:
symbol to indicate a sequence of optional characters or for a single optional word; that is, wild card character, usually an asterisk (*).

SYN
The text search includes synonyms of the current search term.

THES

THES or THES=relation_name

The text search includes a request to also search for thesaurus expansions of the current search term. Text Search looks for thesaurus terms either in the file defined by the THES_NAME option or the default file. The default file is "imlthes" for Linguistic and Precise searches; the default file is "imlnthes" for GTR searches. If relation_name is specified, query expansion by thesaurus is done along branches of the named relation. If no value is specified, all branches are taken into account for query expansion.

If you have multiple terms in your search (words separated with spaces), you can use either single or double quotes to enclose the string. For example, if you want to search for the words "digital" and "database" using a single query, your query would look like this: 'digital database'. Spaces between words are only recognized when contained within single or double quotes.

NOSEQ
The words in the current search term are requested to be in any sequence; if not specified, the words must occur in exactly the same sequence within a single sentence.

SOUND
The words in the current search term "sound like" words targeted in the search.

MATCH=n
An option that specifies the degree of similarity (GTR). "n" is a number between one and five, inclusive.

BOUND
An option that requests the search to respect word phrase boundaries (GTR).

CSENS
The search is case-sensitve. This is only valid for GTR-type index with case enabled.

ESTEM
An option that requests tokens with a stem that matches the search term (GTR). With this option, Text Search Engine will also search on "computer" and "computing" from the search term "compute".

word is a word in the specified search language, phrase is single or double quoted words (which can be DBCS, double byte character set characters), and free_text is words inside a pair of braces{}.

free_text_expression:free_text_expression is composed of the following string free_text_search_criteria, where free_text_search_criteria is:

    [$free_text_search_option$] '{' free_text '}'
 

The dollar sign delimits free_text_search_option. Options inside a pair of dollar signs are separated by a comma, and can currently have the following value:

DOCMOD
one or more Document model elements separated by semi colons.

DOCMODNAME
the name of the document model that a section list is defined for.

SECLIST
the list of sections that are defined to the model defintions file. For GTR type indexes only one entry is allowed in this list. If there is more than one item in the list they are separated by commas.

SYN
the text search includes synonyms of the current search term.

THES

THES or THES=relation_name

The text search includes a request to also search for thesaurus expansions of the current search term. Text Search looks for thesaurus terms either in the file defined by the THES_NAME option or the default file. The default file is "imlthes" for Linguistic and Precise searches; the default file is "imlnthes" for GTR searches. If relation_name is specified, query expansion by thesaurus is done along branches of the named relation. If no value is specified, all branches are taken into account for query expansion.

If you have multiple terms in your search (words separated with spaces), you can use either single or double quotes to enclose the string. For example, if you want to search for the words "digital" and "database" using a single query, your query would look like this: 'digital database'. Spaces between words are only recognized when contained within single or double quotes.

search_index_name
the name of one search index to be searched.

index_list
the list of search index names to be searched, separated by commas.

maximum_results
the desired maximum number of results to be returned.

time_limit
specifies the maximum processing time of the text search server for a Boolean query or the Boolean part of a hybrid query.

thesaurus_index_name
specifies the name of a thesaurus index to be used to expand query terms. The default name is imlthes for Linguistic and Precise searches; the default name is imlnthes for GTR searches.

depth_for_query_expansion
specifies the depth to be used in query expansion by looking for matches in the thesaurus. Actual expansion of the query is requested by using the THES search_option. The default depth setting is 1.

An example of a boolean search expression to search for documents contains the phrase UNIX Operating and a word member in the same paragraph, is as follows:

            'UNIX Operating'   AND  member                               

An example of a boolean and free-text search expression to search for documents containing the words WWW, internet, and a free text web site is as follows:

            WWW AND internet  {web site} 
           

Another example of an expression to search for documents containing the words internet and DB2 in the same paragraph, a word that starts with Net, and the free_text internet commerce is booming is as follows:

    $PARA$ {internet DB2} AND $MC=*$ Net* 
    {internet commerce is booming}

yes_no for MATCH_INFO
The MATCH_INFO indicator. The valid values are:

yes_no for MATCH_DICT
The MATCH_DICT indicator. The valid values are:

(c) Copyright International Business Machines Corporation 1996, 2003. IBM Corp. All rights reserved.