You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "David E. Cleary" <da...@progress.com> on 2000/07/25 20:15:57 UTC

Bug in ICUTransService.cpp

When using the ICU Transcoder and creating a DOMString like this:

	DOMString str("A String");

you'll land up getting a DOMString missing the last character. This is due
to a bug in ICUTransService.cpp in the transcode method with this signature:

bool ICULCPTranscoder::transcode(const  char* const     toTranscode
                                ,       XMLCh* const    toFill
                                , const unsigned int    maxChars)

When a call is made to ucnv_toUChars, the value maxChars is used for the
targetCapacity parameter. maxChars is actually the string length and not the
target capacity which is maxChars + 1 to account for the null terminator.
Below is the fix.

    //
    //  Use a faux block to enforce a lock on the converter, which will
    //  unlock immediately after its completed.
    //
    UErrorCode err = U_ZERO_ERROR;
    {
        XMLMutexLock lockConverter(&fMutex);
		// (PSC) Bug fix. maxChars is target capacity, not number of characters,
so add 1
        ucnv_toUChars
        (
            fConverter
            , targetBuf
            , maxChars + 1
            , toTranscode
            , srcLen
            , &err
        );
    }

Also, another problem is that DOMString currently doesn't inform you when
this fails. In this case, ucnv_toUChars returns with a
U_BUFFER_OVERFLOW_ERROR so transcode returns false, but DOMString doesn't
throw an exception and has a TBD comment saying something should be done.

David Cleary
Progress Software Corp.


RE: Bug in ICUTransService.cpp

Posted by "David E. Cleary" <da...@progress.com>.
> -----Original Message-----
> From: Dean Roddey [mailto:droddey@charmedquark.com]
> Sent: Tuesday, July 25, 2000 11:17 PM
> To: xerces-c-dev@xml.apache.org
> Subject: Re: Bug in ICUTransService.cpp
>
>
> Yeh, there has historically been problems in that area. Somewhere
> along the
> line, ICU changed the semantics of that field, and I was always confused
> about it to begin with, so I'm not suprised to find this kind of
> bug. If you
> confirm that ICU really, really does intend to get a max that includes the
> space for the null, and that its not a one-off problem from the
> caller, then
> it should go in definitely.

Here is the documentation of that function. It clearly states that the
parameter is target capacity, which includes space for the null terminator.

======================================

U_CAPI int32_t U_EXPORT2 ucnv_toUChars (const UConverter * converter, UChar
* target, int32_t targetCapacity, const char * source, int32_t sourceSize,
UErrorCode * err)


Transcode the source string in codepage encoding to the target string in
Unicode encoding.

For example, if a Unicode to/from JIS converter is specified, the source
string in JIS encoding will be transcoded to Unicode and placed into a
provided target buffer. if any problems during conversion are encountered it
will SUBSTITUTE with the Unicode REPLACEMENT char We recomment, the size of
the target buffer needs to be at least as long as the maximum # of bytes per
char in this character set. A zero-terminator will be placed at the end of
the target buffer This function is a more convenient but less efficient
version of \Ref{ucnv_toUnicode}.

Parameters:
converter   the Unicode converter
source   the source string in codepage encoding
target   the target string in Unicode encoding
targetCapacity   capacity of the target buffer
sourceSize   : Number of bytes in source to be transcoded
err   the error status code U_MEMORY_ALLOCATION_ERROR will be returned if
the the internal process buffer cannot be allocated for transcoding.
U_ILLEGAL_ARGUMENT_ERROR is returned if the converter is NULL or if the
source or target string is empty. U_BUFFER_OVERFLOW_ERROR when the input
buffer is prematurely exhausted and targetSize non-NULL.

Returns:
the number of UChar needed in target (including the zero terminator)
See also:
ucnv_getNextUChar() , ucnv_toUnicode() , ucnv_convert()
Stable:


Re: Bug in ICUTransService.cpp

Posted by Dean Roddey <dr...@charmedquark.com>.
Yeh, there has historically been problems in that area. Somewhere along the
line, ICU changed the semantics of that field, and I was always confused
about it to begin with, so I'm not suprised to find this kind of bug. If you
confirm that ICU really, really does intend to get a max that includes the
space for the null, and that its not a one-off problem from the caller, then
it should go in definitely.

--------------------------
Dean Roddey
The CIDLib C++ Frameworks
Charmed Quark Software
droddey@charmedquark.com
http://www.charmedquark.com

"You young, and you gotcha health. Whatchoo wanna job fer?"


----- Original Message -----
From: "David E. Cleary" <da...@progress.com>
To: <xe...@xml.apache.org>
Sent: Tuesday, July 25, 2000 11:15 AM
Subject: Bug in ICUTransService.cpp


> When using the ICU Transcoder and creating a DOMString like this:
>
> DOMString str("A String");
>
> you'll land up getting a DOMString missing the last character. This is due
> to a bug in ICUTransService.cpp in the transcode method with this
signature:
>
> bool ICULCPTranscoder::transcode(const  char* const     toTranscode
>                                 ,       XMLCh* const    toFill
>                                 , const unsigned int    maxChars)
>
> When a call is made to ucnv_toUChars, the value maxChars is used for the
> targetCapacity parameter. maxChars is actually the string length and not
the
> target capacity which is maxChars + 1 to account for the null terminator.
> Below is the fix.
>
>     //
>     //  Use a faux block to enforce a lock on the converter, which will
>     //  unlock immediately after its completed.
>     //
>     UErrorCode err = U_ZERO_ERROR;
>     {
>         XMLMutexLock lockConverter(&fMutex);
> // (PSC) Bug fix. maxChars is target capacity, not number of characters,
> so add 1
>         ucnv_toUChars
>         (
>             fConverter
>             , targetBuf
>             , maxChars + 1
>             , toTranscode
>             , srcLen
>             , &err
>         );
>     }
>
> Also, another problem is that DOMString currently doesn't inform you when
> this fails. In this case, ucnv_toUChars returns with a
> U_BUFFER_OVERFLOW_ERROR so transcode returns false, but DOMString doesn't
> throw an exception and has a TBD comment saying something should be done.
>
> David Cleary
> Progress Software Corp.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>