#include <tblcoll.h>
Class diagram for RuleBasedCollator:
Public Members | |||
![]() | ![]() | RuleBasedCollator (const UnicodeString& rules, UErrorCode& status) | |
![]() | ![]() | RuleBasedCollator constructor. More... | |
![]() | ![]() | RuleBasedCollator ( const UnicodeString& rules, ECollationStrength collationStrength, UErrorCode& status) | |
![]() | ![]() | RuleBasedCollator ( const UnicodeString& rules, Normalizer::EMode decompositionMode, UErrorCode& status) | |
![]() | ![]() | RuleBasedCollator ( const UnicodeString& rules, ECollationStrength collationStrength, Normalizer::EMode decompositionMode, UErrorCode& status) | |
![]() | ![]() | virtual | ~RuleBasedCollator () |
![]() | ![]() | Destructor. More... | |
![]() | ![]() | RuleBasedCollator (const RuleBasedCollator& other) | |
![]() | ![]() | Copy constructor. More... | |
![]() | ![]() | RuleBasedCollator& | operator= (const RuleBasedCollator& other) |
![]() | ![]() | Assignment operator. More... | |
![]() | ![]() | virtual UBool | operator== (const Collator& other) const |
![]() | ![]() | Returns true if "other" is the same as "this". More... | |
![]() | ![]() | virtual UBool | operator!= (const Collator& other) const |
![]() | ![]() | Returns true if "other" is not the same as "this". More... | |
![]() | ![]() | virtual Collator* | clone (void) const |
![]() | ![]() | Makes a deep copy of the object. More... | |
![]() | ![]() | virtual CollationElementIterator* | createCollationElementIterator (const UnicodeString& source) const |
![]() | ![]() | Creates a collation element iterator for the source string. More... | |
![]() | ![]() | virtual CollationElementIterator* | createCollationElementIterator (const CharacterIterator& source) const |
![]() | ![]() | Creates a collation element iterator for the source. More... | |
![]() | ![]() | virtual EComparisonResult | compare ( const UnicodeString& source, const UnicodeString& target) const |
![]() | ![]() | Compares a range of character data stored in two different strings based on the collation rules. More... | |
![]() | ![]() | virtual EComparisonResult | compare ( const UnicodeString& source, const UnicodeString& target, int32_t length) const |
![]() | ![]() | Compares a range of character data stored in two different strings based on the collation rules up to the specified length. More... | |
![]() | ![]() | virtual EComparisonResult | compare ( const UChar* source, int32_t sourceLength, const UChar* target, int32_t targetLength) const |
![]() | ![]() | The comparison function compares the character data stored in two different string arrays. More... | |
![]() | ![]() | virtual CollationKey& | getCollationKey ( const UnicodeString& source, CollationKey& key, UErrorCode& status) const |
![]() | ![]() | Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare. More... | |
![]() | ![]() | virtual CollationKey& | getCollationKey (const UChar *source, int32_t sourceLength, CollationKey& key, UErrorCode& status) const |
![]() | ![]() | Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare. More... | |
![]() | ![]() | virtual int32_t | hashCode (void) const |
![]() | ![]() | Generates the hash code for the rule-based collation object. More... | |
![]() | ![]() | const UnicodeString& | getRules (void) const |
![]() | ![]() | Gets the table-based rules for the collation object. More... | |
![]() | ![]() | int32_t | getMaxExpansion (int32_t order) const |
![]() | ![]() | Return the maximum length of any expansion sequences that end with the specified comparison order. More... | |
![]() | ![]() | virtual UClassID | getDynamicClassID (void) const |
![]() | ![]() | Returns a unique class ID POLYMORPHICALLY. More... | |
![]() | ![]() | uint8_t* | cloneRuleData (int32_t &length, UErrorCode &status) |
![]() | ![]() | Returns the binary format of the class's rules. More... | |
Static Public Members | |||
![]() | ![]() | UClassID | getStaticClassID (void) |
![]() | ![]() | Returns the class ID for this class. More... | |
Friends | |||
![]() | ![]() | class | RuleBasedCollatorStreamer |
![]() | ![]() | class | CollationElementIterator |
![]() | ![]() | class | Collator |
![]() | ![]() | class | TableCollationData |
The user can create a customized table-based collation.
RuleBasedCollator maps characters to collation keys.
Table Collation has the following restrictions for efficiency (other subclasses may be used for more complex languages) :
1. If the French secondary ordering is specified in a collation object, it is applied to the whole object.
2. All non-mentioned Unicode characters are at the end of the collation order.
3. Private use characters are treated as identical. The private use area in Unicode is 0xE800-0xF8FF.
The collation table is composed of a list of collation rules, where each rule is of three forms:
. < modifier > . < relation > < text-argument > . < reset > < text-argument >
'@' : Indicates that secondary differences, such as accents, are sorted backwards, as in French.
'&' : Indicates that the next rule follows the position to where the reset text-argument would be sorted.
This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
. a < b < c . a < b & b < c . a < c & a < b
. a < b & a < c . a < c & a < b
Ignorable Characters
For ignorable characters, the first rule must start with a relation (the examples we have used above are really fragments; "a < b" really should be "< a < b"). If, however, the first relation is not "<", then all the text-arguments up to the first "<" are ignorable. For example, ", - < a < b" makes "-" an ignorable character, as we saw earlier in the word "black-birds". In the samples for different languages, you see that most accents are ignorable.
Normalization and Accents
The Collator object automatically normalizes text internally to separate accents from base characters where possible. This is done both when processing the rules, and when comparing two strings. Collator also uses the Unicode canonical mapping to ensure that combining sequences are sorted properly (for more information, see The Unicode Standard, Version 2.0.)
Errors
The following are errors:
. Examples: . Simple: "< a < b < c < d" . Norwegian: "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I< j,J . < k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R< s,S< t,T . < u,U< v,V< w,W< x,X< y,Y< z,Z . < å=a°,Å=A° . ;aa,AA< æ,Æ< ø,Ø"
To create a table-based collation object, simply supply the collation rules to the RuleBasedCollator contructor. For example:
. UErrorCode status = U_ZERO_ERROR; . RuleBasedCollator *mySimple = new RuleBasedCollator(Simple, status);
Another example:
. UErrorCode status = U_ZERO_ERROR; . RuleBasedCollator *myNorwegian = new RuleBasedCollator(Norwegian, status);
. Traditional Spanish (fragment): ... & C < ch , cH , Ch , CH ... . German (fragment) : ...< y , Y < z , Z . & AE, Ä & AE, ä . & OE , Ö & OE, ö . & UE , Ü & UE, ü . Symbols (fragment): ...< y, Y < z , Z . & Question-mark ; '?' . & Ampersand ; '&' . & Dollar-sign ; '$' <p>To create a collation object for traditional Spanish, the user can take the English collation rules and add the additional rules to the table. For example: <pre> . UErrorCode status = U_ZERO_ERROR; . UnicodeString rules(DEFAULTRULES); . rules += "& C < ch, cH, Ch, CH"; . RuleBasedCollator *mySpanish = new RuleBasedCollator(rules, status);
In order to sort symbols in the similiar order of sorting their alphabetic equivalents, you can do the following,
. UErrorCode status = U_ZERO_ERROR; . UnicodeString rules(DEFAULTRULES); . rules += "& Question-mark ; '?' & Ampersand ; '&' & Dollar-sign ; '$' "; . RuleBasedCollator *myTable = new RuleBasedCollator(rules, status);
Another way of creating the table-based collation object, mySimple, is:
. UErrorCode status = U_ZERO_ERROR; . RuleBasedCollator *mySimple = new . RuleBasedCollator(" < a < b & b < c & c < d", status);
. UErrorCode status = U_ZERO_ERROR; . RuleBasedCollator *mySimple = new . RuleBasedCollator(" < a < b < d & b < c", status);
To combine collations from two locales, (without error handling for clarity)
. // Create an en_US Collator object . Locale locale_en_US("en", "US", ""); . RuleBasedCollator* en_USCollator = (RuleBasedCollator*) . Collator::createInstance( locale_en_US, success ); . . // Create a da_DK Collator object . Locale locale_da_DK("da", "DK", ""); . RuleBasedCollator* da_DKCollator = (RuleBasedCollator*) . Collator::createInstance( locale_da_DK, success ); . . // Combine the two . // First, get the collation rules from en_USCollator . UnicodeString rules = en_USCollator->getRules(); . // Second, get the collation rules from da_DKCollator . rules += da_DKCollator->getRules(); . RuleBasedCollator* newCollator = new RuleBasedCollator( rules, success ); . // newCollator has the combined rules
Another more interesting example would be to make changes on an existing table to create a new collation object. For example, add "& C < ch, cH, Ch, CH" to the en_USCollation object to create your own English collation object,
. // Create a new Collator object with additional rules . rules = en_USCollator->getRules(); . rules += "& C < ch, cH, Ch, CH"; . RuleBasedCollator* myCollator = new RuleBasedCollator( rules, success ); . // myCollator contains the new rules
The following example demonstrates how to change the order of non-spacing accents,
. UChar contents[] = { . '=', 0x0301, ';', 0x0300, ';', 0x0302, . ';', 0x0308, ';', 0x0327, ',', 0x0303, // main accents . ';', 0x0304, ';', 0x0305, ';', 0x0306, // main accents . ';', 0x0307, ';', 0x0309, ';', 0x030A, // main accents . ';', 0x030B, ';', 0x030C, ';', 0x030D, // main accents . ';', 0x030E, ';', 0x030F, ';', 0x0310, // main accents . ';', 0x0311, ';', 0x0312, // main accents . '<', 'a', ',', 'A', ';', 'a', 'e', ',', 'A', 'E', . ';', 0x00e6, ',', 0x00c6, '<', 'b', ',', 'B', . '<', 'c', ',', 'C', '<', 'e', ',', 'E', '&', . 'C', '<', 'd', ',', 'D', 0 }; . UnicodeString oldRules(contents); . UErrorCode status = U_ZERO_ERROR; . // change the order of accent characters . UChar addOn[] = { '&', ',', 0x0300, ';', 0x0308, ';', 0x0302, 0 }; . oldRules += addOn; . RuleBasedCollator *myCollation = new RuleBasedCollator(oldRules, status);
The last example shows how to put new primary ordering in before the default setting. For example, in Japanese collation, you can either sort English characters before or after Japanese characters,
. UErrorCode status = U_ZERO_ERROR; . // get en_US collation rules . RuleBasedCollator* en_USCollation = . (RuleBasedCollator*) Collator::createInstance(Locale::US, status); . // Always check the error code after each call. . if (U_FAILURE(status)) return; . // add a few Japanese character to sort before English characters . // suppose the last character before the first base letter 'a' in . // the English collation rule is 0x2212 . UChar jaString[] = { '&', 0x2212, '<', 0x3041, ',', 0x3042, '<', 0x3043, ',', 0x3044, 0 }; . UnicodeString rules( en_USCollation->getRules() ); . rules += jaString; . RuleBasedCollator *myJapaneseCollation = new RuleBasedCollator(rules, status);
NOTE: Typically, a collation object is created with Collator::createInstance().
Note: RuleBasedCollator
s with different Locale, CollationStrength and Decomposition mode settings will return different sort orders for the same set of strings. Locales have specific collation rules, and the way in which secondary and tertiary differences are taken into account, for example, will result in a different sorting order for same strings.
Definition at line 319 of file tblcoll.h.
RuleBasedCollator::RuleBasedCollator (const UnicodeString & rules, UErrorCode & status) |
RuleBasedCollator constructor.
This takes the table rules and builds a collation table out of them. Please see RuleBasedCollator class description for more details on the collation rule syntax.
rules | the collation rules to build the collation table from. |
RuleBasedCollator::RuleBasedCollator (const UnicodeString & rules, ECollationStrength collationStrength, UErrorCode & status) |
RuleBasedCollator::RuleBasedCollator (const UnicodeString & rules, Normalizer::EMode decompositionMode, UErrorCode & status) |
RuleBasedCollator::RuleBasedCollator (const UnicodeString & rules, ECollationStrength collationStrength, Normalizer::EMode decompositionMode, UErrorCode & status) |
virtual RuleBasedCollator::~RuleBasedCollator () [virtual]
|
Destructor.
RuleBasedCollator::RuleBasedCollator (const RuleBasedCollator & other) |
Copy constructor.
RuleBasedCollator & RuleBasedCollator::operator= (const RuleBasedCollator & other) |
Assignment operator.
virtual UBool RuleBasedCollator::operator== (const Collator & other) const [virtual]
|
UBool RuleBasedCollator::operator!= (const Collator & other) const [inline, virtual]
|
virtual Collator * RuleBasedCollator::clone (void) const [virtual]
|
Makes a deep copy of the object.
The caller owns the returned object.
Reimplemented from Collator.
virtual CollationElementIterator * RuleBasedCollator::createCollationElementIterator (const UnicodeString & source) const [virtual]
|
Creates a collation element iterator for the source string.
The caller of this method is responsible for the memory management of the return pointer.
source | the string over which the CollationElementIterator will iterate. |
virtual CollationElementIterator * RuleBasedCollator::createCollationElementIterator (const CharacterIterator & source) const [virtual]
|
Creates a collation element iterator for the source.
The caller of this method is responsible for the memory management of the returned pointer.
source | the CharacterIterator which produces the characters over which the CollationElementItgerator will iterate. |
virtual EComparisonResult RuleBasedCollator::compare (const UnicodeString & source, const UnicodeString & target) const [virtual]
|
Compares a range of character data stored in two different strings based on the collation rules.
Returns information about whether a string is less than, greater than or equal to another string in a language. This can be overriden in a subclass.
source | the source string. |
target | the target string to be compared with the source stirng. |
Reimplemented from Collator.
virtual EComparisonResult RuleBasedCollator::compare (const UnicodeString & source, const UnicodeString & target, int32_t length) const [virtual]
|
Compares a range of character data stored in two different strings based on the collation rules up to the specified length.
Returns information about whether a string is less than, greater than or equal to another string in a language. This can be overriden in a subclass.
source | the source string. |
target | the target string to be compared with the source string. |
length | compares up to the specified length |
Reimplemented from Collator.
virtual EComparisonResult RuleBasedCollator::compare (const UChar * source, int32_t sourceLength, const UChar * target, int32_t targetLength) const [virtual]
|
The comparison function compares the character data stored in two different string arrays.
Returns information about whether a string array is less than, greater than or equal to another string array.
Example of use:
. UErrorCode status = U_ZERO_ERROR; . Collator *myCollation = Collator::createInstance(Locale::US, status); . if (U_FAILURE(status)) return; . myCollation->setStrength(Collator::PRIMARY); . // result would be Collator::EQUAL ("abc" == "ABC") . // (no primary difference between "abc" and "ABC") . Collator::EComparisonResult result = myCollation->compare(L"abc", 3, L"ABC", 3); . myCollation->setStrength(Collator::TERTIARY); . // result would be Collator::LESS (abc" <<< "ABC") . // (with tertiary difference between "abc" and "ABC") . Collator::EComparisonResult result = myCollation->compare(L"abc", 3, L"ABC", 3);
source | the source string array to be compared with. |
sourceLength | the length of the source string array. If this value is equal to -1, the string array is null-terminated. |
target | the string that is to be compared with the source string. |
targetLength | the length of the target string array. If this value is equal to -1, the string array is null-terminated. |
Reimplemented from Collator.
virtual CollationKey & RuleBasedCollator::getCollationKey (const UnicodeString & source, CollationKey & key, UErrorCode & status) const [virtual]
|
Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare.
Use a CollationKey when you need to do repeated comparisions on the same string. For a single comparison the compare method will be faster.
source | the source string. |
key | the transformed key of the source string. |
status | the error code status. |
Reimplemented from Collator.
virtual CollationKey & RuleBasedCollator::getCollationKey (const UChar * source, int32_t sourceLength, CollationKey & key, UErrorCode & status) const [virtual]
|
Transforms a specified region of the string into a series of characters that can be compared with CollationKey.compare.
Use a CollationKey when you need to do repeated comparisions on the same string. For a single comparison the compare method will be faster.
source | the source string. |
key | the transformed key of the source string. |
status | the error code status. |
Reimplemented from Collator.
virtual int32_t RuleBasedCollator::hashCode (void) const [virtual]
|
Generates the hash code for the rule-based collation object.
Reimplemented from Collator.
const UnicodeString & RuleBasedCollator::getRules (void) const |
Gets the table-based rules for the collation object.
int32_t RuleBasedCollator::getMaxExpansion (int32_t order) const |
Return the maximum length of any expansion sequences that end with the specified comparison order.
order | a collation order returned by previous or next. |
virtual UClassID RuleBasedCollator::getDynamicClassID (void) const [inline, virtual]
|
Returns a unique class ID POLYMORPHICALLY.
Pure virtual override. This method is to implement a simple version of RTTI, since not all C++ compilers support genuine RTTI. Polymorphic operator==() and clone() methods call this method.
Reimplemented from Collator.
uint8_t * RuleBasedCollator::cloneRuleData (int32_t & length, UErrorCode & status) |
Returns the binary format of the class's rules.
The format is that of .col files.
length | Returns the length of the data, in bytes |
status | the error code status. |
UClassID RuleBasedCollator::getStaticClassID (void) [inline, static]
|
Returns the class ID for this class.
This is useful only for comparing to a return value from getDynamicClassID(). For example:
Base* polymorphic_pointer = createPolymorphicObject(); if (polymorphic_pointer->getDynamicClassID() == Derived::getStaticClassID()) ...
friend class RuleBasedCollatorStreamer [friend]
|
friend class CollationElementIterator [friend]
|
friend class Collator [friend]
|
friend class TableCollationData [friend]
|