Purpose:
The DKDatastoreTS class is a specific version of dkDatastore to implement the Text Search (TS) datastore. Essentially, TS provides text indexing and search mechanisms. It does not really store documents or folders. TS indexes text parts of documents and process search requests using this index. The results of a text query submitted to TS are item IDs, which are keys to retrieve the actual documents from Content Manager datastore.
The execute() and evaluate() functions of DKDatastoreTS takes text query strings expressed in text query language type. The syntax of this query string is described below. The DKTextQuery object accepts queries in this syntax; in fact the DKTextQuery object delegates the low level query processing tasks to DKDatastoreTS.
Class summary:
class DKDatastoreTS: public dkDatastore { ... ... dkDatastoreTS () void connect (const char* datastore_name, const char* user_name = "", const char* authentication = "", const char* connect_string = ""); void connect (const char* server_name, const char* port, char communication_type); virtual void disconnect(); virtual void getOption(long option, DKAny& value); virtual void setOption(long option, DKAny& value); virtual DKAny evaluate(const char* command, const short commandkangType = DK_CM_TEXT_QL_TYPE, const DKNVPair* parms = 0); virtual DKAny evaluate( dkQuery* query); virtual DKAny evaluate(DKCQExpr* qe); virtual dkResultSetCursor* execute( const char* command, const short commandLangType = DK_CM_TEXT_QL_TYPE, const DKNVPair* parms = 0); virtual dkResultSetCursor* execute( dkQuery* query); virtual dkResultSetCursor* execute( DKCQExpr* qe); virtual void executeWithCallback( const char* command, const short commandLangType, const DKNVPair* parms, dkCallback* callbackObj); virtual void executeWithCallback( dkQuery* query, dkCallback* callbackObj); virtual void executeWithCallback( DKCQExpr* qe, dkCallback* callbackObj); virtual dkQuery* createQuery(const char* command, const short commandLangType = DK_CM_TEXT_QL_TYPE, const DKNVPair* parms=0); virtual dkQuery* createQuery(DKCQExpr* qe); virtual DKBoolean isConnected(); virtual DKString datastoreName() const; virtual DKString datastoreType() const; virtual DKHandle* connection(); virtual DKHandle* handle(const char* type); virtual DKString userName() const; virtual dkCollection* listDataSources(); virtual DKString* listDataSourceNames(long& arrarySize); virtual DKAny listServers(); virtual DKAny listSchema(); virtual dkCollection* listEntities(); virtual DKString* listEntityNames(long& arraySize); virtual dkDatastoreDef* datastoreDef(); virtual void startUpdateIndex(const char* indexName); virtual void clearIndex(const char* indexName); virtual void createIndex(DKIndexInfoTS* newIndex); virtual void deleteIndex(const char* indexName); virtual DKIndexInfoTS* getIndexInformation(const char* indexName); DKIndexFuncStatusTS* getIndexFunctionStatus(const char* indexName); void setIndexFunctionStatus(const char* indexName, long actionId); virtual DKMatchesInfoTS* getMatches(dkResultSetCursor* cursor, const char* documentId, const char* textIndexName, dkBoolean useDictionary); };
Members:
The connect_string is optional; it is used to provide the communication type and port number, as well as a listing of library server, user ID and authentication groupings.
Below is a sample of a connect string an end user may supply:
[COMMTYPE={T | P}; PORT=portnumber; LIBACCESS=(libraryserver, userid, password;...)]
Additional connect string parameters:
There are different ways to engage the connect function. Below is a listing of the different ways to connect with Text Search:
Exceptions
void connect ( const char* datastore_name, const char* userName = "", const char* password = "", const char* connect_string = "");
You can also connect to the datastore if you supply the search service name for server_name, an empty string for port, and a blank character (' ') for communication_type.
Exceptions
void connect (const char* server_name, const char* port, char communication_type);
virtual void disconnect();
virtual void getOption(long option, DKAny& value);
virtual void setOption(long option, DKAny& value);
Parameters
virtual DKAny evaluate(const char* command, const short commandLangType = DK_CM_TEXT_QL_TYPE, const DKNVPair* parms = 0); virtual DKAny evaluate( dkQuery* query); virtual DKAny evaluate(DKCQExpr* qe);
Parameters
virtual dkResultSetCursor* execute( const char* command, const short commandLangType = DK_CM_TEXT_QL_TYPE, const DKNVPair* parms = 0); virtual dkResultSetCursor* execute( dkQuery* query); virtual dkResultSetCursor* execute( DKCQExpr* qe);
Parameters
virtual void executeWithCallback( const char* command, const short commandLangType, const DKNVPair* parms, dkCallback* callbackObj); virtual void executeWithCallback( dkQuery* query, dkCallback* callbackObj); virtual void executeWithCallback( DKCQExpr* qe, dkCallback* callbackObj);
Parameters
virtual dkQuery* createQuery(const char* command, const short commandLangType = DK_CM_TEXT_QL_TYPE, const DKNVPair* parms=0); virtual dkQuery* createQuery(DKCQExpr* qe);
virtual DKBoolean isConnected();
virtual DKString datastoreName() const;
virtual DKString datastoreType() const;
virtual DKHandle* connection();
Parameters
type -- The type of datastore handle wanted.
virtual DKHandle* handle(const char* type);
virtual DKString userName() const;
virtual dkCollection* listDataSources();
virtual DKString* listDataSourceNames(long& arrarySize);
virtual DKAny listServers();
virtual DKAny listSchema();
virtual dkCollection* listEntities();
virtual DKString* listEntityNames(long& arraySize);
virtual dkDatastoreDef* datastoreDef();
virtual void startUpdateIndex(const char* indexName);
virtual void clearIndex(const char* indexName);
virtual void createIndex(DKIndexInfoTS* newIndex);
virtual void deleteIndex(const char* indexName);
virtual DKIndexInfoTS* getIndexInformation(const char* indexName);
DKIndexFuncStatusTS* getIndexFunctionStatus(const char* indexName);
Sets indexing function status for a text search index. You need to establish a connection to the server before calling this function. The indexName is the text search index used to set indexing status information.
The actionId is DK_TSINDEXACTID_ENABLE, DK_TSINDEXACTID_DISABLE or DK_TSINDEXACTID_RESET. Use the DK_TSINDEXACTID_RESET if the reason code was set and the function was stopped, allowing another startUpdateIndex to be done after the Text Search Engine error has been corrected. All constants are defined in the DKConstant2.h file.
Exceptions
void setIndexFunctionStatus(const char* indexName, long actionId);
virtual DKMatchesInfoTS* getMatches(dkResultSetCursor* cursor, const char* documentId, const char* textIndexName, dkBoolean useDictionary);
Text Search text query string
The syntax of text query string is as follows:
SEARCH=(COND=(text_search_expression) ); [OPTION=([SEARCH_INDEX={search_index_name | (index_list) };] [MAX_RESULTS=maximum_results;] [THES_NAME=thesaurus_index_name;] [THES_DEPTH=depth_for_query_expansion;] [TIME_LIMIT=time_limit] [MATCH_INFO=yes_no;] [RANKING=yes_no;] [SORT=yes_no;] [MATCH_DICT=yes_no] )]
Words in uppercase are keywords. Lowercase words are parameters supplied by users; they are described below. Note that DBCS (double-byte character set) characters must be enclosed in SBCS single quotes, like a phrase. For more information about options, refer to the EhwSearch chapter of the Text Search Engine Application Programming Reference.
This is an expression composed of a free_text_expression or a boolean_query, followed by an optional free_text_expression. A boolean_query followed by a free_text_expression is known as a hybrid query.
{boolean_query [free_text_expression] | free_text_expression}
Notice that only one boolean query and/or one optional free_text_expression is allowed. If a boolean query is requested, this should be specified first. For more information about options, refer to the EhwSearch chapter of the Text Search Engine Application Programming Reference.
boolean_query:
[unary_operator] text_search_criteria [[binary_operator [unary_operator] text_search_criteria] ... ]
Binary operators are AND or &, OR or |. NOT is the only unary operator. Parentheses are treated as a subquery. A subquery changes the default order of processing for the binary operators. For example, a query that includes parentheses would have the following syntax: UNIX AND (ibm OR system). The information located inside the parentheses, "(ibm OR system)," is a subquery contained inside of a query.
Search argument:text_search_criteria is one of the following keyword/options, where the dollar sign delimits the keyword/option:
{ search_argument | $DOC$ '{' proximity_search_argument '}' | $PARA$ '{' proximity_search_argument '}' | $SENT$ '{' proximity_search_argument '}' }
The following options specify proximity search conditions, which require search arguments. These consist of at least a pair of words or phrases:
search_argument can be more than one word or phrase:
[$search_option$] {word | phrase} [$search_option$] {word | phrase}...]
proximity_search_argument:
[$search_option$] {word | phrase} [$search_option$] {word | phrase} [$search_option$] [{word | phrase}...]
Each word or phrase can be preceded by the "-$search_options$-" tag.
The dollar sign delimits search_option. Options inside a pair of dollar sign are separated by comma, and can have the following values.
The NOT operator is not allowed with the keywords $DOC$, $PARA$, or $SENT$.
The valid codes and ids can be found in the DKConstant2.h file, in the users' include directory. They need to be converted from number values to string values for ccode and langid.
THES or THES=relation_name
The text search includes a request to also search for thesaurus expansions of the current search term. Text Search looks for thesaurus terms either in the file defined by the THES_NAME option or the default file. The default file is "imlthes" for Linguistic and Precise searches; the default file is "imlnthes" for GTR searches. If relation_name is specified, query expansion by thesaurus is done along branches of the named relation. If no value is specified, all branches are taken into account for query expansion.
If you have multiple terms in your search (words separated with spaces), you can use either single or double quotes to enclose the string. For example, if you want to search for the words "digital" and "database" using a single query, your query would look like this: 'digital database'. Spaces between words are only recognized when contained within single or double quotes.
word is a word in the specified search language, phrase is single or double quoted words (which can be DBCS, double byte character set characters), and free_text is words inside a pair of braces{}.
free_text_expression:free_text_expression is composed of the following string free_text_search_criteria, where free_text_search_criteria is:
[$free_text_search_option$] '{' free_text '}'
The dollar sign delimits free_text_search_option. Options inside a pair of dollar signs are separated by a comma, and can currently have the following value:
THES or THES=relation_name
The text search includes a request to also search for thesaurus expansions of the current search term. Text Search looks for thesaurus terms either in the file defined by the THES_NAME option or the default file. The default file is "imlthes" for Linguistic and Precise searches; the default file is "imlnthes" for GTR searches. If relation_name is specified, query expansion by thesaurus is done along branches of the named relation. If no value is specified, all branches are taken into account for query expansion.
If you have multiple terms in your search (words separated with spaces), you can use either single or double quotes to enclose the string. For example, if you want to search for the words "digital" and "database" using a single query, your query would look like this: 'digital database'. Spaces between words are only recognized when contained within single or double quotes.
An example of a boolean search expression to search for documents contains the phrase UNIX Operating and a word member in the same paragraph, is as follows:
'UNIX Operating' AND member
An example of a boolean and free-text search expression to search for documents containing the words WWW, internet, and a free text web site is as follows:
WWW AND internet {web site}
Another example of an expression to search for documents containing the words internet and DB2 in the same paragraph, a word that starts with Net, and the free_text internet commerce is booming is as follows:
$PARA$ {internet DB2} AND $MC=*$ Net* {internet commerce is booming}
Returns match information for each item returned from the text query. The match information contains the text of the document and the highlighting information for all matches of the corresponding query.
Important: This process is time consuming because the document is retrieved from Content Manager datastore and analyzed linguistically, and potential matches are determined. These processes will have an impact on the performance of the text query.
Do not return match information for each item returned from the text query. The match information is returned in a new attribute, DKMATCHESINFO, in the dkDDO returned from a text query. The value of the attribute DKMATCHESINFO will be a DKMatchesInfoTS object.
Highlighting information will be obtained using a dictionary.
Highlighting information will not be obtained using a dictionary.
(c) Copyright International Business Machines Corporation 1996, 2003. IBM Corp. All rights reserved.