Main Page   Class Hierarchy   Alphabetical List   Data Structures   File List   Data Fields   Globals  

UnicodeConverter Class Reference

This class is deprecated and will be removed. More...

#include <convert.h>


Public Methods

 UnicodeConverter ()
 Creates Unicode Conversion Object will default to LATIN1 <-> encoding. More...

 UnicodeConverter (const char *name, UErrorCode &err)
 Creates Unicode Conversion Object by specifying the codepage name. More...

 UnicodeConverter (const UnicodeString &name, UErrorCode &err)
 Creates a UnicodeConverter object with the names specified as unicode strings. More...

 UnicodeConverter (int32_t codepageNumber, UConverterPlatform platform, UErrorCode &err)
 Creates Unicode Conversion Object using the codepage ID number. More...

 ~UnicodeConverter ()
void fromUnicodeString (char *target, int32_t &targetSize, const UnicodeString &source, UErrorCode &err) const
 Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter. More...

void toUnicodeString (UnicodeString &target, const char *source, int32_t sourceSize, UErrorCode &err) const
 Transcode the source string in codepage encoding to the target string in Unicode encoding. More...

void fromUnicode (char *&target, const char *targetLimit, const UChar *&source, const UChar *sourceLimit, int32_t *offsets, UBool flush, UErrorCode &err)
 Transcodes an array of unicode characters to an array of codepage characters. More...

void toUnicode (UChar *&target, const UChar *targetLimit, const char *&source, const char *sourceLimit, int32_t *offsets, UBool flush, UErrorCode &err)
 Converts an array of codepage characters into an array of unicode characters. More...

int8_t getMaxBytesPerChar (void) const
 Returns the maximum length of bytes used by a character. More...

int8_t getMinBytesPerChar (void) const
 Returns the minimum byte length for characters in this codepage. More...

UConverterType getType (void) const
 Gets the type of conversion associated with the converter e.g. More...

void getStarters (UBool starters[256], UErrorCode &err) const
 Gets the "starter" bytes for the converters of type MBCS will fill in an U_ILLEGAL_ARGUMENT_ERROR if converter passed in is not MBCS. More...

void getSubstitutionChars (char *subChars, int8_t &len, UErrorCode &err) const
 Fills in the output parameter, subChars, with the substitution characters as multiple bytes. More...

void setSubstitutionChars (const char *subChars, int8_t len, UErrorCode &err)
 Sets the substitution chars when converting from unicode to a codepage. More...

void resetState (void)
 Resets the state of stateful conversion to the default state. More...

const char * getName (UErrorCode &err) const
 Gets the name of the converter (zero-terminated). More...

int32_t getCodepage (UErrorCode &err) const
 Gets a codepage number associated with the converter. More...

void getMissingCharAction (UConverterToUCallback *action, const void **context) const
 Returns the current setting action taken when a character from a codepage is missing or a byte sequence is illegal etc. More...

void getMissingUnicodeAction (UConverterFromUCallback *action, const void **context) const
 Return the current setting action taken when a unicode character is missing or there is an unpaired surrogate etc. More...

void setMissingCharAction (UConverterToUCallback newAction, const void *newContext, UConverterToUCallback *oldAction, const void **oldContext, UErrorCode &err)
 Sets the current setting action taken when a character from a codepage is missing. More...

void setMissingUnicodeAction (UConverterFromUCallback newAction, const void *newContext, UConverterFromUCallback *oldAction, const void **oldContext, UErrorCode &err)
 Sets the current setting action taken when a unicode character is missing. More...

void getDisplayName (const Locale &displayLocale, UnicodeString &displayName) const
 Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead. More...

UConverterPlatform getCodepagePlatform (UErrorCode &err) const
 Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead. More...

UnicodeConverter & operator= (const UnicodeConverter &that)
UBool operator== (const UnicodeConverter &that) const
UBool operator!= (const UnicodeConverter &that) const
 UnicodeConverter (const UnicodeConverter &that)
void fixFileSeparator (UnicodeString &source) const
 Fixes the backslash character mismapping. More...

UBool isAmbiguous (void) const
 Determines if the converter contains ambiguous mappings of the same character or not. More...


Static Public Methods

const char *const * getAvailableNames (int32_t &num, UErrorCode &err)
 Returns the available names. More...

int32_t flushCache (void)
 Iterates through every cached converter and frees all the unused ones. More...


Detailed Description

This class is deprecated and will be removed.

Use the more powerful C conversion API with the UConverter type and ucnv_... functions.

There are also two new functions in ICU 2.0 that convert a UnicodeString and extract a UnicodeString using a UConverter (search unistr.h for UConverter). They replace the fromUnicodeString() and toUnicodeString() functions here. All other UnicodeConverter functions are basically aliases of C API functions.

Old documentation:

UnicodeConverter is a C++ wrapper class for UConverter. You need one UnicodeConverter object in place of one UConverter object. For details on the API and implementation of the codepage converter interface see ucnv.h.

See also:
UConverter
Deprecated:
To be removed after 2002-sep-30; use the C API with UConverter and ucnv_... functions.


Constructor & Destructor Documentation

UnicodeConverter::UnicodeConverter  
 

Creates Unicode Conversion Object will default to LATIN1 <-> encoding.

Returns:
the created Unicode converter object
Deprecated:

UnicodeConverter::UnicodeConverter const char *    name,
UErrorCode   err
 

Creates Unicode Conversion Object by specifying the codepage name.

The name string is in ASCII format.

Parameters:
code_set  the pointer to a char[] object containing a codepage name. (I)
UErrorCode  Error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned.
Returns:
the created Unicode converter object
Deprecated:

UnicodeConverter::UnicodeConverter const UnicodeString   name,
UErrorCode   err
 

Creates a UnicodeConverter object with the names specified as unicode strings.

The name should be limited to the ASCII-7 alphanumerics. Dash and underscore characters are allowed for readability, but are ignored in the search.

Parameters:
code_set  name of the uconv table in Unicode string (I)
err  error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned.
Returns:
the created Unicode converter object
Deprecated:

UnicodeConverter::UnicodeConverter int32_t    codepageNumber,
UConverterPlatform    platform,
UErrorCode   err
 

Creates Unicode Conversion Object using the codepage ID number.

Parameters:
code_set  a codepage # (I) @UErrorCode Error status (I/O) IILLEGAL_ARGUMENT_ERROR will be returned if the string is empty. If the internal program does not work correctly, for example, if there's no such codepage, U_INTERNAL_PROGRAM_ERROR will be returned.
Returns:
the Unicode converter object
Deprecated:


Member Function Documentation

void UnicodeConverter::fixFileSeparator UnicodeString   source const
 

Fixes the backslash character mismapping.

For example, in SJIS, the backslash character in the ASCII portion is also used to represent the yen currency sign. When mapping from Unicode character 0x005C, it's unclear whether to map the character back to yen or backslash in SJIS. This function will take the input buffer and replace all the yen sign characters with backslash. This is necessary when the user tries to open a file with the input buffer on Windows.

Parameters:
source  the input buffer to be fixed
Deprecated:

int32_t UnicodeConverter::flushCache void    [static]
 

Iterates through every cached converter and frees all the unused ones.

Returns:
the number of cached converters successfully deleted
Deprecated:

void UnicodeConverter::fromUnicode char *&    target,
const char *    targetLimit,
const UChar *&    source,
const UChar   sourceLimit,
int32_t *    offsets,
UBool    flush,
UErrorCode   err
 

Transcodes an array of unicode characters to an array of codepage characters.

The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingCharAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).

Parameters:
target  : I/O parameter. Input : Points to the beginning of the buffer to copy codepage characters to. Output : points to after the last codepage character copied to target.
targetLimit  the pointer to the end of the target array
source  the source Unicode character array
sourceLimit  the pointer to the end of the source array
offsets  if NULL is passed, nothing will happen to it, otherwise it needs to have the same number of allocated cells as target. Will fill in offsets from target to source pointer e.g: offsets[3] is equal to 6, it means that the target[3] was a result of transcoding source[6] For output data carried across calls, and other data without a specific source character (such as from escape sequences or callbacks) -1 will be placed for offsets.
flush  set to TRUE if the current source buffer is the last available chunk of the source, FALSE otherwise. Note that if a failing status is returned, this function may have to be called multiple times wiht flush set to TRUE until the source buffer is consumed.
flush  TRUE if the buffer is the last buffer and the conversion will finish in this call, FALSE otherwise. (future feature pending)
UErrorCode  the error status. U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null.
Deprecated:

void UnicodeConverter::fromUnicodeString char *    target,
int32_t &    targetSize,
const UnicodeString   source,
UErrorCode   err
const
 

Transcodes the source UnicodeString to the target string in a codepage encoding with the specified Unicode converter.

For example, if a Unicode to/from JIS converter is specified, the source string in Unicode will be transcoded to JIS encoding. The result will be stored in JIS encoding.

Parameters:
source  the source Unicode string
target  the target string in codepage encoding
targetSize  Input the number of bytes available in the "target" buffer, Output the number of bytes copied to it
err  the error status code. U_MEMORY_ALLOCATION_ERROR will be returned if the the internal process buffer cannot be allocated for transcoding. U_ILLEGAL_ARGUMENT_ERROR is returned if the converter is null or the source or target string is empty.
Deprecated:

const char* const* UnicodeConverter::getAvailableNames int32_t &    num,
UErrorCode   err
[static]
 

Returns the available names.

Lazy evaluated, Library owns the storage

Parameters:
num  the number of available converters
err  the error code status
Returns:
the name array
Deprecated:

int32_t UnicodeConverter::getCodepage UErrorCode   err const
 

Gets a codepage number associated with the converter.

This is not guaranteed to be the one used to create the converter. Some converters do not represent IBM registered codepages and return zero for the codepage number. The error code fill-in parameter indicates if the codepage number is available.

Parameters:
err  the error status code. U_ILLEGAL_ARGUMENT_ERROR will returned if the converter is null or if converter's data table is null.
Returns:
If any error occurrs, null will be returned.
Deprecated:

UConverterPlatform UnicodeConverter::getCodepagePlatform UErrorCode   err const
 

Returns the T_UnicodeConverter_platform (ICU defined enum) of a UnicodeConverter available, the internal name will be returned instead.

Parameters:
err  the error code status
Returns:
the codepages platform
Deprecated:

void UnicodeConverter::getDisplayName const Locale &    displayLocale,
UnicodeString   displayName
const
 

Returns the localized name of the UnicodeConverter, if for any reason it is available, the internal name will be returned instead.

Parameters:
displayLocale  the valid Locale, from which we want to localize
displayString  a UnicodeString that is going to be filled in.
Deprecated:

int8_t UnicodeConverter::getMaxBytesPerChar void    const
 

Returns the maximum length of bytes used by a character.

This varies between 1 and 4

Returns:
the max number of bytes per codepage character * converter is null, targetLimit < target, sourceLimit < source
Deprecated:

int8_t UnicodeConverter::getMinBytesPerChar void    const
 

Returns the minimum byte length for characters in this codepage.

This is either 1 or 2 for all supported codepages.

Returns:
the minimum number of byte per codepage character
Deprecated:

void UnicodeConverter::getMissingCharAction UConverterToUCallback *    action,
const void **    context
const
 

Returns the current setting action taken when a character from a codepage is missing or a byte sequence is illegal etc.

Parameters:
action  the callback function pointer
context  the callback function state
Deprecated:

void UnicodeConverter::getMissingUnicodeAction UConverterFromUCallback *    action,
const void **    context
const
 

Return the current setting action taken when a unicode character is missing or there is an unpaired surrogate etc.

Parameters:
action  the callback function pointer
context  the callback function state
Deprecated:

const char* UnicodeConverter::getName UErrorCode   err const
 

Gets the name of the converter (zero-terminated).

the name will be the internal name of the converter

Parameters:
converter  the Unicode converter
err  the error status code. U_INDEX_OUTOFBOUNDS_ERROR in the converterNameLen is too small to contain the name.
Deprecated:

void UnicodeConverter::getStarters UBool    starters[256],
UErrorCode   err
const
 

Gets the "starter" bytes for the converters of type MBCS will fill in an U_ILLEGAL_ARGUMENT_ERROR if converter passed in is not MBCS.

fills in an array of boolean, with the value of the byte as offset to the array. At return, if TRUE is found in at offset 0x20, it means that the byte 0x20 is a starter byte in this converter.

Parameters:
starters:  an array of size 256 to be filled in
err:  an array of size 256 to be filled in
See also:
ucnv_getType
Deprecated:

void UnicodeConverter::getSubstitutionChars char *    subChars,
int8_t &    len,
UErrorCode   err
const
 

Fills in the output parameter, subChars, with the substitution characters as multiple bytes.

Parameters:
subChars  the subsitution characters
len  the number of bytes of the substitution character array
err  the error status code. U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null. If the substitution character array is too small, an U_INDEX_OUTOFBOUNDS_ERROR will be returned.
Deprecated:

UConverterType UnicodeConverter::getType void    const
 

Gets the type of conversion associated with the converter e.g.

SBCS, MBCS, DBCS, UTF8, UTF16_BE, UTF16_LE, ISO_2022, EBCDIC_STATEFUL, LATIN_1

Returns:
the type of the converter
Deprecated:

UBool UnicodeConverter::isAmbiguous void    const
 

Determines if the converter contains ambiguous mappings of the same character or not.

Returns:
TRUE if the converter contains ambiguous mapping of the same character, FALSE otherwise.
Deprecated:

void UnicodeConverter::resetState void   
 

Resets the state of stateful conversion to the default state.

This is used in the case of error to restart a conversion from a known default state.

Deprecated:

void UnicodeConverter::setMissingCharAction UConverterToUCallback    newAction,
const void *    newContext,
UConverterToUCallback *    oldAction,
const void **    oldContext,
UErrorCode   err
 

Sets the current setting action taken when a character from a codepage is missing.

(Currently STOP or SUBSTITUTE).

Parameters:
newAction  the action constant if an equivalent codepage character is missing
newContext  the new toUnicode callback function state
oldAction  the original action constant, saved for later restoration.
oldContext  the old toUnicode callback function state
err  the error status code
Deprecated:

void UnicodeConverter::setMissingUnicodeAction UConverterFromUCallback    newAction,
const void *    newContext,
UConverterFromUCallback *    oldAction,
const void **    oldContext,
UErrorCode   err
 

Sets the current setting action taken when a unicode character is missing.

(currently T_UnicodeConverter_MissingUnicodeAction is either STOP or SUBSTITUTE, SKIP, CLOSEST_MATCH, ESCAPE_SEQ may be added in the future).

Parameters:
newAction  the action constant if an equivalent Unicode character is missing
newContext  the new fromUnicode callback function state
oldAction  the original action constant, saved for later restoration.
oldContext  the old fromUnicode callback function state
err  the error status code
Deprecated:

void UnicodeConverter::setSubstitutionChars const char *    subChars,
int8_t    len,
UErrorCode   err
 

Sets the substitution chars when converting from unicode to a codepage.

The substitution is specified as a string of 1-4 bytes, and may contain null byte. The fill-in parameter err will get the error status on return.

Parameters:
cstr  the substitution character array to be set with
len  the number of bytes of the substitution character array and upon return will contain the number of bytes copied to that buffer
err  the error status code. U_ILLEGAL_ARGUMENT_ERROR if the converter is null. or if the number of bytes provided are not in the codepage's range (e.g length 1 for ucs-2)
Deprecated:

void UnicodeConverter::toUnicode UChar *&    target,
const UChar   targetLimit,
const char *&    source,
const char *    sourceLimit,
int32_t *    offsets,
UBool    flush,
UErrorCode   err
 

Converts an array of codepage characters into an array of unicode characters.

The source pointer is an I/O parameter, it starts out pointing at the place to begin translating, and ends up pointing after the first sequence of the bytes that it encounters that are semantically invalid. if T_UnicodeConverter_setMissingUnicodeAction is called with an action other than STOP before a call is made to this API, consumed and source should point to the same place (unless target ends with an imcomplete sequence of bytes and flush is FALSE).

Parameters:
target  : I/O parameter. Input : Points to the beginning of the buffer to copy Unicode characters to. Output : points to after the last UChar copied to target.
targetLimit  the pointer to the end of the target array
source  the source codepage character array
sourceLimit  the pointer to the end of the source array
offsets  if NULL is passed, nothing will happen to it, otherwise it needs to have the same number of allocated cells as target. Will fill in offsets from target to source pointer e.g: offsets[3] is equal to 6, it means that the target[3] was a result of transcoding source[6] For output data carried across calls, and other data without a specific source character (such as from escape sequences or callbacks) -1 will be placed for offsets.
flush  set to TRUE if the current source buffer is the last available chunk of the source, FALSE otherwise. Note that if a failing status is returned, this function may have to be called multiple times wiht flush set to TRUE until the source buffer is consumed.
flush  TRUE if the buffer is the last buffer and the conversion will finish in this call, FALSE otherwise. (future feature pending)
err  the error code status U_ILLEGAL_ARGUMENT_ERROR will be returned if the converter is null, targetLimit < target, sourceLimit < source
Deprecated:

void UnicodeConverter::toUnicodeString UnicodeString   target,
const char *    source,
int32_t    sourceSize,
UErrorCode   err
const
 

Transcode the source string in codepage encoding to the target string in Unicode encoding.

For example, if a Unicode to/from JIS converter is specified, the source string in JIS encoding will be transcoded to Unicode encoding. The result will be stored in Unicode encoding.

Parameters:
source  the source string in codepage encoding
target  the target string in Unicode encoding
targetSize  : I/O parameter, Input size buffer, Output # of bytes copied to it
err  the error status code U_MEMORY_ALLOCATION_ERROR will be returned if the the internal process buffer cannot be allocated for transcoding. U_ILLEGAL_ARGUMENT_ERROR is returned if the converter is null or the source or target string is empty.
Deprecated:


The documentation for this class was generated from the following file:
Generated on Mon Mar 4 20:10:53 2002 for ICU 2.0 by doxygen1.2.14 written by Dimitri van Heesch, © 1997-2002