Package translate :: Package search :: Package indexing :: Module CommonIndexer :: Class CommonDatabase
[hide private]
[frames] | no frames]

Class CommonDatabase

source code


base class for indexing support

any real implementation must override most methods of this class

Instance Methods [hide private]
 
__init__(self, basedir, analyzer=None, create_allowed=True)
initialize or open an indexing database
source code
 
flush(self, optimize=False)
flush the content of the database - to force changes to be written to disk
source code
query type of the specific implemention
make_query(self, args, require_all=True, analyzer=None)
create simple queries (strings or field searches) or combine multiple queries (AND/OR)
source code
xapian.Query | PyLucene.Query
_create_query_for_query(self, query)
generate a query based on an existing query object
source code
xapian.Query | PyLucene.Query
_create_query_for_string(self, text, require_all=True, analyzer=None)
generate a query for a plain term of a string query
source code
xapian.Query | PyLucene.Query
_create_query_for_field(self, field, value, analyzer=None)
generate a field query
source code
xapian.Query | PyLucene.Query
_create_query_combined(self, queries, require_all=True)
generate a combined query
source code
 
index_document(self, data)
add the given data to the database
source code
xapian.Document | PyLucene.Document
_create_empty_document(self)
create an empty document to be filled and added to the index later
source code
 
_add_plain_term(self, document, term, tokenize=True)
add a term to a document
source code
 
_add_field_term(self, document, field, term, tokenize=True)
add a field term to a document
source code
 
_add_document_to_index(self, document)
add a prepared document to the index database
source code
 
begin_transaction(self)
begin a transaction
source code
 
cancel_transaction(self)
cancel an ongoing transaction
source code
 
commit_transaction(self)
submit the currently ongoing transaction and write changes to disk
source code
subclass of CommonEnquire
get_query_result(self, query)
return an object containing the results of a query
source code
 
delete_document_by_id(self, docid)
delete a specified document
source code
list of dicts
search(self, query, fieldnames)
return a list of the contents of specified fields for all matches of a query
source code
 
delete_doc(self, ident)
delete the documents returned by a query
source code
 
_walk_matches(self, query, function, arg_for_function=None)
use this function if you want to do something with every single match of a query
source code
 
set_field_analyzers(self, field_analyzers)
set the analyzers for different fields of the database documents
source code
int | dict
get_field_analyzers(self, fieldnames=None)
return the analyzer that was mapped to a specific field
source code
 
_decode(self, text)
decode the string from utf-8 or charmap perform unicde normalization
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Variables [hide private]
  field_analyzers = {}
mapping of field names and analyzers - see 'set_field_analyzers'
  ANALYZER_EXACT = 0
exact matching: the query string must equal the whole term string
  ANALYZER_PARTIAL = 2
partial matching: a document matches, even if the query string only matches the beginning of the term value.
  ANALYZER_TOKENIZE = 4
tokenize terms and queries automatically
  ANALYZER_DEFAULT = 6
the default analyzer to be used if nothing is configured
  QUERY_TYPE = None
override this with the query class of the implementation
  INDEX_DIRECTORY_NAME = None
override this with a string to be used as the name of the indexing directory/file in the filesystem
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, basedir, analyzer=None, create_allowed=True)
(Constructor)

source code 

initialize or open an indexing database

Any derived class must override __init__.

Any implementation can rely on the "self.location" attribute to be set by the __init__ function of the super class.

Parameters:
  • basedir (str) - the parent directory of the database
  • analyzer (int) - bitwise combination of possible analyzer flags to be used as the default analyzer for this database. Leave it empty to use the system default analyzer (self.ANALYZER_DEFAULT). see self.ANALYZER_TOKENIZE, self.ANALYZER_PARTIAL, ...
  • create_allowed (bool) - create the database, if necessary; default: True
Raises:
  • ValueError - the given location exists, but the database type is incompatible (e.g. created by a different indexing engine)
  • OSError - the database failed to initialize
Overrides: object.__init__

flush(self, optimize=False)

source code 

flush the content of the database - to force changes to be written to disk

some databases also support index optimization

Parameters:
  • optimize (bool) - should the index be optimized if possible?

make_query(self, args, require_all=True, analyzer=None)

source code 

create simple queries (strings or field searches) or combine multiple queries (AND/OR)

To specifiy rules for field searches, you may want to take a look at 'set_field_analyzers'. The parameter 'match_text_partial' can override the previously defined default setting.

Parameters:
  • args (list of queries | single query | str | dict) - queries or search string or description of field query examples:
       [xapian.Query("foo"), xapian.Query("bar")]
       xapian.Query("foo")
       "bar"
       {"foo": "bar", "foobar": "foo"}
    
  • require_all (boolean) - boolean operator (True -> AND (default) / False -> OR)
  • analyzer (int) - (only applicable for 'dict' or 'str') Define query options (partial matching, exact matching, tokenizing, ...) as bitwise combinations of CommonIndexer.ANALYZER_???. This can override previously defined field analyzer settings. If analyzer is None (default), then the configured analyzer for the field is used.
Returns: query type of the specific implemention
the combined query

_create_query_for_query(self, query)

source code 

generate a query based on an existing query object

basically this function should just create a copy of the original

Parameters:
  • query (xapian.Query) - the original query object
Returns: xapian.Query | PyLucene.Query
the resulting query object

_create_query_for_string(self, text, require_all=True, analyzer=None)

source code 

generate a query for a plain term of a string query

basically this function parses the string and returns the resulting query

Parameters:
  • text (str) - the query string
  • require_all (bool) - boolean operator (True -> AND (default) / False -> OR)
  • analyzer (int) - Define query options (partial matching, exact matching, tokenizing, ...) as bitwise combinations of CommonIndexer.ANALYZER_???. This can override previously defined field analyzer settings. If analyzer is None (default), then the configured analyzer for the field is used.
Returns: xapian.Query | PyLucene.Query
resulting query object

_create_query_for_field(self, field, value, analyzer=None)

source code 

generate a field query

this functions creates a field->value query

Parameters:
  • field (str) - the fieldname to be used
  • value (str) - the wanted value of the field
  • analyzer (int) - Define query options (partial matching, exact matching, tokenizing, ...) as bitwise combinations of CommonIndexer.ANALYZER_???. This can override previously defined field analyzer settings. If analyzer is None (default), then the configured analyzer for the field is used.
Returns: xapian.Query | PyLucene.Query
resulting query object

_create_query_combined(self, queries, require_all=True)

source code 

generate a combined query

Parameters:
  • queries (list of xapian.Query) - list of the original queries
  • require_all (bool) - boolean operator (True -> AND (default) / False -> OR)
Returns: xapian.Query | PyLucene.Query
the resulting combined query object

index_document(self, data)

source code 

add the given data to the database

Parameters:
  • data (dict | list of str) - the data to be indexed. A dictionary will be treated as fieldname:value combinations. If the fieldname is None then the value will be interpreted as a plain term or as a list of plain terms. Lists of terms are indexed separately. Lists of strings are treated as plain terms.

_create_empty_document(self)

source code 

create an empty document to be filled and added to the index later

Returns: xapian.Document | PyLucene.Document
the new document object

_add_plain_term(self, document, term, tokenize=True)

source code 

add a term to a document

Parameters:
  • document (xapian.Document | PyLucene.Document) - the document to be changed
  • term (str) - a single term to be added
  • tokenize (bool) - should the term be tokenized automatically

_add_field_term(self, document, field, term, tokenize=True)

source code 

add a field term to a document

Parameters:
  • document (xapian.Document | PyLucene.Document) - the document to be changed
  • field (str) - name of the field
  • term (str) - term to be associated to the field
  • tokenize (bool) - should the term be tokenized automatically

_add_document_to_index(self, document)

source code 

add a prepared document to the index database

Parameters:
  • document (xapian.Document | PyLucene.Document) - the document to be added

begin_transaction(self)

source code 

begin a transaction

You can group multiple modifications of a database as a transaction. This prevents time-consuming database flushing and helps, if you want that a changeset is committed either completely or not at all. No changes will be written to disk until 'commit_transaction'. 'cancel_transaction' can be used to revert an ongoing transaction.

Database types that do not support transactions may silently ignore it.

cancel_transaction(self)

source code 

cancel an ongoing transaction

See 'start_transaction' for details.

commit_transaction(self)

source code 

submit the currently ongoing transaction and write changes to disk

See 'start_transaction' for details.

get_query_result(self, query)

source code 

return an object containing the results of a query

Parameters:
  • query (a query object of the real implementation) - a pre-compiled query
Returns: subclass of CommonEnquire
an object that allows access to the results

delete_document_by_id(self, docid)

source code 

delete a specified document

Parameters:
  • docid (int) - the document ID to be deleted

search(self, query, fieldnames)

source code 

return a list of the contents of specified fields for all matches of a query

Parameters:
  • query (a query object of the real implementation) - the query to be issued
  • fieldnames (string | list of strings) - the name(s) of a field of the document content
Returns: list of dicts
a list of dicts containing the specified field(s)

delete_doc(self, ident)

source code 

delete the documents returned by a query

Parameters:
  • ident (int | list of tuples | dict | list of dicts | query (e.g. xapian.Query) | list of queries) - [list of] document IDs | dict describing a query | query

_walk_matches(self, query, function, arg_for_function=None)

source code 

use this function if you want to do something with every single match of a query

example:

    self._walk_matches(query, function_for_match, arg_for_func)
   'function_for_match' expects only one argument: the matched object
Parameters:
  • query (xapian.Query | PyLucene.Query) - a query object of the real implementation
  • function (function) - the function to execute with every match
  • arg_for_function (anything) - an optional argument for the function

set_field_analyzers(self, field_analyzers)

source code 

set the analyzers for different fields of the database documents

All bitwise combinations of CommonIndexer.ANALYZER_??? are possible.

Parameters:
  • field_analyzers (dict containing field names and analyzers) - mapping of field names and analyzers
Raises:
  • TypeError - invalid values in 'field_analyzers'

get_field_analyzers(self, fieldnames=None)

source code 

return the analyzer that was mapped to a specific field

see 'set_field_analyzers' for details

Parameters:
  • fieldnames (str | list of str | None) - the analyzer of this field (or all/multiple fields) is requested; leave empty (or "None") to request all fields
Returns: int | dict
the analyzer setting of the field - see CommonDatabase.ANALYZER_??? or a dict of field names and analyzers