#include <convert.h>
Public Members | |||
![]() | ![]() | UnicodeConverterCPP () | |
![]() | ![]() | Creates Unicode Conversion Object will default to LATIN1 <-> encoding. More... | |
![]() | ![]() | UnicodeConverterCPP (const char* name, UErrorCode& err) | |
![]() | ![]() | Creates Unicode Conversion Object by specifying the codepage name. More... | |
![]() | ![]() | UnicodeConverterCPP (const UnicodeString& name, UErrorCode& err) | |
![]() | ![]() | Creates a UnicodeConverter object with the names specified as unicode strings. More... | |
![]() | ![]() | UnicodeConverterCPP (int32_t codepageNumber, UConverterPlatform platform, UErrorCode& err) | |
![]() | ![]() | Creates Unicode Conversion Object using the codepage ID number. More... | |
![]() | ![]() | ~UnicodeConverterCPP () | |
![]() | ![]() | void | fromUnicodeString (char* target, int32_t& targetSize, const UnicodeString& source, UErrorCode& err) const |
![]() | ![]() | Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter. More... | |
![]() | ![]() | void | toUnicodeString (UnicodeString& target, const char* source, int32_t sourceSize, UErrorCode& err) const |
![]() | ![]() | Transcode the source string in codepage encoding to the target string in Unicode encoding. More... | |
![]() | ![]() | void | fromUnicode (char*& target, const char* targetLimit, const UChar*& source, const UChar* sourceLimit, int32_t * offsets, bool_t flush, UErrorCode& err) |
![]() | ![]() | Transcodes an array of unicode characters to an array of codepage characters. More... | |
![]() | ![]() | void | toUnicode (UChar*& target, const UChar* targetLimit, const char*& source, const char* sourceLimit, int32_t * offsets, bool_t flush, UErrorCode& err) |
![]() | ![]() | Converts an array of codepage characters into an array of unicode characters. More... | |
![]() | ![]() | int8_t | getMaxBytesPerChar (void) const |
![]() | ![]() | Returns the maximum length of bytes used by a character. More... | |
![]() | ![]() | int8_t | getMinBytesPerChar (void) const |
![]() | ![]() | Returns the minimum byte length for characters in this codepage. More... | |
![]() | ![]() | UConverterType | getType (void) const |
![]() | ![]() | Gets the type of conversion associated with the converter e.g. More... | |
![]() | ![]() | void | getStarters (bool_t starters[256], UErrorCode& err) const |
![]() | ![]() | Gets the "starter" bytes for the converters of type MBCS will fill in an U_ILLEGAL_ARGUMENT_ERROR if converter passed in is not MBCS. More... | |
![]() | ![]() | void | getSubstitutionChars (char* subChars, int8_t& len, UErrorCode& err) const |
![]() | ![]() | Fills in the output parameter, subChars, with the substitution characters as multiple bytes. More... | |
![]() | ![]() | void | setSubstitutionChars (const char* subChars, int8_t len, UErrorCode& err) |
![]() | ![]() | Sets the substitution chars when converting from unicode to a codepage. More... | |
![]() | ![]() | void | resetState (void) |
![]() | ![]() | Resets the state of stateful conversion to the default state. More... | |
![]() | ![]() | const char* | getName ( UErrorCode& err) const |
![]() | ![]() | Gets the name of the converter (zero-terminated). More... | |
![]() | ![]() | int32_t | getCodepage (UErrorCode& err) const |
![]() | ![]() | Gets a codepage number associated with the converter. More... | |
![]() | ![]() | UConverterToUCallback | getMissingCharAction (void) const |
![]() | ![]() | Returns the current setting action taken when a character from a codepage is missing. More... | |
![]() | ![]() | UConverterFromUCallback | getMissingUnicodeAction (void) const |
![]() | ![]() | Return the current setting action taken when a unicode character is missing. More... | |
![]() | ![]() | void | setMissingCharAction (UConverterToUCallback action, UErrorCode& err) |
![]() | ![]() | Sets the current setting action taken when a character from a codepage is missing. More... | |
![]() | ![]() | void | setMissingUnicodeAction (UConverterFromUCallback action, UErrorCode& err) |
![]() | ![]() | Sets the current setting action taken when a unicode character is missing. More... | |
![]() | ![]() | void | getDisplayName (const Locale& displayLocale, UnicodeString& displayName) const |
![]() | ![]() | Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead. More... | |
![]() | ![]() | UConverterPlatform | getCodepagePlatform (UErrorCode& err) const |
![]() | ![]() | Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead. More... | |
![]() | ![]() | UnicodeConverterCPP& | operator= (const UnicodeConverterCPP& that) |
![]() | ![]() | bool_t | operator== (const UnicodeConverterCPP& that) const |
![]() | ![]() | bool_t | operator!= (const UnicodeConverterCPP& that) const |
![]() | ![]() | UnicodeConverterCPP (const UnicodeConverterCPP& that) | |
![]() | ![]() | void | fixFileSeparator (UnicodeString& source) const |
![]() | ![]() | Fixes the backslash character mismapping. More... | |
![]() | ![]() | bool_t | isAmbiguous (void) const |
![]() | ![]() | Determines if the converter contains ambiguous mappings of the same character or not. More... | |
Static Public Members | |||
![]() | ![]() | const char* const* | getAvailableNames (int32_t& num, UErrorCode& err) |
![]() | ![]() | Returns the available names. More... | |
![]() | ![]() | int32_t | flushCache (void) |
![]() | ![]() | Iterates through every cached converter and frees all the unused ones. More... |
UnicodeConverterCPP::UnicodeConverterCPP () |
Creates Unicode Conversion Object will default to LATIN1 <-> encoding.
UnicodeConverterCPP::UnicodeConverterCPP (const char * name, UErrorCode & err) |
Creates Unicode Conversion Object by specifying the codepage name.
The name string is in ASCII format.
code_set | the pointer to a char[] object containing a codepage name. (I) |
UErrorCode | Error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned. |
UnicodeConverterCPP::UnicodeConverterCPP (const UnicodeString & name, UErrorCode & err) |
Creates a UnicodeConverter object with the names specified as unicode strings.
The name should be limited to the ASCII-7 alphanumerics. Dash and underscore characters are allowed for readability, but are ignored in the search.
code_set | name of the uconv table in Unicode string (I) |
err | error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned. |
UnicodeConverterCPP::UnicodeConverterCPP (int32_t codepageNumber, UConverterPlatform platform, UErrorCode & err) |
Creates Unicode Conversion Object using the codepage ID number.
code_set | a codepage # (I) @UErrorCode Error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned. |
UnicodeConverterCPP::~UnicodeConverterCPP () |
void UnicodeConverterCPP::fromUnicodeString (char * target, int32_t & targetSize, const UnicodeString & source, UErrorCode & err) const |
Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter.
For example, if a Unicode to/from JIS converter is specified, the source string in Unicode will be transcoded to JIS encoding. The result will be stored in JIS encoding.
source | the source Unicode string |
target | the target string in codepage encoding |
targetSize | Input the number of bytes available in the "target" buffer, Output the number of bytes copied to it |
err | the error status code. U_MEMORY_ALLOCATION_ERROR will be returned if the the internal process buffer cannot be allocated for transcoding. U_ILLEGAL_ARGUMENT_ERROR is returned if the converter is null or the source or target string is empty. |
void UnicodeConverterCPP::toUnicodeString (UnicodeString & target, const char * source, int32_t sourceSize, UErrorCode & err) const |
Transcode the source string in codepage encoding to the target string in Unicode encoding.
For example, if a Unicode to/from JIS converter is specified, the source string in JIS encoding will be transcoded to Unicode encoding. The result will be stored in Unicode encoding.
source | the source string in codepage encoding |
target | the target string in Unicode encoding |
targetSize | : I/O parameter, Input size buffer, Output # of bytes copied to it |
err | the error status code U_MEMORY_ALLOCATION_ERROR will be returned if the the internal process buffer cannot be allocated for transcoding. U_ILLEGAL_ARGUMENT_ERROR is returned if the converter is null or the source or target string is empty. |
void UnicodeConverterCPP::fromUnicode (char *& target, const char * targetLimit, const UChar *& source, const UChar * sourceLimit, int32_t * offsets, bool_t flush, UErrorCode & err) |
Transcodes an array of unicode characters to an array of codepage characters.
The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingCharAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).
target | : I/O parameter. Input : Points to the beginning of the buffer to copy codepage characters to. Output : points to after the last codepage character copied to target. |
targetLimit | the pointer to the end of the target array |
source | the source Unicode character array |
sourceLimit | the pointer to the end of the source array |
flush | TRUE if the buffer is the last buffer and the conversion will finish in this call, FALSE otherwise. (future feature pending) |
UErrorCode | the error status. U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null. |
void UnicodeConverterCPP::toUnicode (UChar *& target, const UChar * targetLimit, const char *& source, const char * sourceLimit, int32_t * offsets, bool_t flush, UErrorCode & err) |
Converts an array of codepage characters into an array of unicode characters.
The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingUnicodeAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).
target | : I/O parameter. Input : Points to the beginning of the buffer to copy Unicode characters to. Output : points to after the last UChar copied to target. |
targetLimit | the pointer to the end of the target array |
source | the source codepage character array |
sourceLimit | the pointer to the end of the source array |
flush | TRUE if the buffer is the last buffer and the conversion will finish in this call, FALSE otherwise. (future feature pending) |
err | the error code status U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null, targetLimit < target, sourceLimit < source |
int8_t UnicodeConverterCPP::getMaxBytesPerChar (void) const |
Returns the maximum length of bytes used by a character.
This varies between 1 and 4
int8_t UnicodeConverterCPP::getMinBytesPerChar (void) const |
Returns the minimum byte length for characters in this codepage.
This is either 1 or 2 for all supported codepages.
UConverterType UnicodeConverterCPP::getType (void) const |
Gets the type of conversion associated with the converter e.g.
SBCS, MBCS, DBCS, UTF8, UTF16_BE, UTF16_LE, ISO_2022, EBCDIC_STATEFUL, LATIN_1
void UnicodeConverterCPP::getStarters (bool_t starters[256], UErrorCode & err) const |
Gets the "starter" bytes for the converters of type MBCS will fill in an U_ILLEGAL_ARGUMENT_ERROR
if converter passed in is not MBCS.
fills in an array of boolean, with the value of the byte as offset to the array. At return, if TRUE is found in at offset 0x20, it means that the byte 0x20 is a starter byte in this converter.
starters: | an array of size 256 to be filled in |
err: | an array of size 256 to be filled in |
void UnicodeConverterCPP::getSubstitutionChars (char * subChars, int8_t & len, UErrorCode & err) const |
Fills in the output parameter, subChars, with the substitution characters as multiple bytes.
subChars | the subsitution characters |
len | the number of bytes of the substitution character array |
err | the error status code. U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null. If the substitution character array is too small, an U_INDEX_OUTOFBOUNDS_ERROR will be returned. |
void UnicodeConverterCPP::setSubstitutionChars (const char * subChars, int8_t len, UErrorCode & err) |
Sets the substitution chars when converting from unicode to a codepage.
The substitution is specified as a string of 1-4 bytes, and may contain null byte. The fill-in parameter err will get the error status on return.
cstr | the substitution character array to be set with |
len | the number of bytes of the substitution character array and upon return will contain the number of bytes copied to that buffer |
err | the error status code. U_ILLEGAL_ARGUMENT_ERROR if the converter is null. or if the number of bytes provided are not in the codepage's range (e.g length 1 for ucs-2) |
void UnicodeConverterCPP::resetState (void) |
Resets the state of stateful conversion to the default state.
This is used in the case of error to restart a conversion from a known default state.
const char * UnicodeConverterCPP::getName (UErrorCode & err) const |
Gets the name of the converter (zero-terminated).
the name will be the internal name of the converter
converter | the Unicode converter |
err | the error status code. U_INDEX_OUTOFBOUNDS_ERROR in the converterNameLen is too small to contain the name. |
int32_t UnicodeConverterCPP::getCodepage (UErrorCode & err) const |
Gets a codepage number associated with the converter.
This is not guaranteed to be the one used to create the converter. Some converters do not represent IBM registered codepages and return zero for the codepage number. The error code fill-in parameter indicates if the codepage number is available.
err | the error status code. U_ILLEGAL_ARGUMENT_ERROR will returned if the converter is null or if converter's data table is null. |
UConverterToUCallback UnicodeConverterCPP::getMissingCharAction (void) const |
Returns the current setting action taken when a character from a codepage is missing.
(Currently STOP or SUBSTITUTE).
UConverterFromUCallback UnicodeConverterCPP::getMissingUnicodeAction (void) const |
Return the current setting action taken when a unicode character is missing.
(Currently STOP or SUBSTITUTE).
void UnicodeConverterCPP::setMissingCharAction (UConverterToUCallback action, UErrorCode & err) |
Sets the current setting action taken when a character from a codepage is missing.
(Currently STOP or SUBSTITUTE).
action | the action constant if an equivalent codepage character is missing |
void UnicodeConverterCPP::setMissingUnicodeAction (UConverterFromUCallback action, UErrorCode & err) |
Sets the current setting action taken when a unicode character is missing.
(currently T_UnicodeConverter_MissingUnicodeAction is either STOP or SUBSTITUTE, SKIP, CLOSEST_MATCH, ESCAPE_SEQ may be added in the future).
action | the action constant if an equivalent Unicode character is missing |
err | the error status code |
void UnicodeConverterCPP::getDisplayName (const Locale & displayLocale, UnicodeString & displayName) const |
Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead.
displayLocale | the valid Locale, from which we want to localize |
displayString | a UnicodeString that is going to be filled in. |
UConverterPlatform UnicodeConverterCPP::getCodepagePlatform (UErrorCode & err) const |
Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead.
err | the error code status |
UnicodeConverterCPP& UnicodeConverterCPP::operator= (const UnicodeConverterCPP & that) |
bool_t UnicodeConverterCPP::operator== (const UnicodeConverterCPP & that) const |
bool_t UnicodeConverterCPP::operator!= (const UnicodeConverterCPP & that) const |
UnicodeConverterCPP::UnicodeConverterCPP (const UnicodeConverterCPP & that) |
void UnicodeConverterCPP::fixFileSeparator (UnicodeString & source) const |
Fixes the backslash character mismapping.
For example, in SJIS, the backslash character in the ASCII portion is also used to represent the yen currency sign. When mapping from Unicode character 0x005C, it's unclear whether to map the character back to yen or backslash in SJIS. This function will take the input buffer and replace all the yen sign characters with backslash. This is necessary when the user tries to open a file with the input buffer on Windows.
source | the input buffer to be fixed |
bool_t UnicodeConverterCPP::isAmbiguous (void) const |
Determines if the converter contains ambiguous mappings of the same character or not.
const char *const * UnicodeConverterCPP::getAvailableNames (int32_t & num, UErrorCode & err) [static]
|
Returns the available names.
Lazy evaluated, Library owns the storage
num | the number of available converters |
err | the error code status |
int32_t UnicodeConverterCPP::flushCache (void) [static]
|
Iterates through every cached converter and frees all the unused ones.