You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Simon Elbaz <el...@gmail.com> on 2012/09/28 00:18:10 UTC

xerces trunk on openbsd 5.1

Hi,

I wanted to try using xerces on openbsd 5.1.

After compilation, DOMCount was always returning:
unknow reason.

After reading the code, it turns out that the end of conversion by
wcsrtombs and mbsrtowcs is based on a test on source pointer (source
pointer should point on null character).

The problem is that this behaviour is not implemented. Source pointer
points on the character following the last converted character leading
xerces binary to a risky memory access.

Below, there is a patch based on values returned by the functions (-1 in
case of error, >= 0 in case of complete/incomplete conversion) that fixes
the problem.

Regards,
Simon Elbaz

$ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
===================================================================
--- xercesc/util/Transcoders/Iconv/IconvTransService.cpp        (revision
1387785)
+++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp        (working
copy)
@@ -429,7 +429,7 @@
     srcBuffer[gTempBuffArraySize - 1] = 0;
     const wchar_t *src = 0;

-    while (toTranscode[srcCursor] || src)
+    while (toTranscode[srcCursor])
     {
         if (src == 0) // copy a piece of the source string into a local
                       // buffer, converted to wchar_t and NULL-terminated.
@@ -454,7 +454,7 @@
             break;
         }
         dstCursor += len;
-        if (src != 0) // conversion not finished. This *always* means there
+        if (len == (resultSize - dstCursor)) // conversion not finished.
This *always* means there
                       // was not enough room in the destination buffer.
         {
             reallocString<char>(resultString, resultSize, manager,
resultString != localBuffer);
@@ -512,9 +512,9 @@
             break;
         }
         dstCursor += len;
-        if (src == 0) // conversion finished
+        if ((len >= 0) && (len < (resultSize - dstCursor))) // conversion
finished
             break;
-        if (dstCursor >= resultSize - 1)
+        if (len == (resultSize - dstCursor))
             reallocString<wchar_t>(tmpString, resultSize, manager,
tmpString != localBuffer);
     }
     // make a final copy, converting from wchar_t to XMLCh:

Re: xerces trunk on openbsd 5.1

Posted by sh...@e-z.net.
I am still recently new to the current Xerces.  I use it through the Xalan
project.

If it is UCS-2, then that explains the apparent ambiguity when 2 XMLCh are
required to render some large Unicode codepoints.

- Steve

> I thought the internal format was UCS-2; is it actually UTF-16 ?
>
>  -b.
>
>> The type XMLCh is a 16-bit type.  The internal data storage
>> is UTF-16.
>>
>> Sincerely,
>> Steven J. Hathaway
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: c-dev-help@xerces.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: xerces trunk on openbsd 5.1

Posted by Ben/RS <be...@redsnapper.net>.
I thought the internal format was UCS-2; is it actually UTF-16 ?

 -b.

> The type XMLCh is a 16-bit type.  The internal data storage
> is UTF-16.
> 
> Sincerely,
> Steven J. Hathaway


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: xerces trunk on openbsd 5.1

Posted by sh...@e-z.net.
FYI

Be careful with type wchar_t for code validation.

GNU implements wchar_t as 32-bit.
Windows implements wchar_t as 16-bit.
Other platforms may also have 16/32, mixed, or undefined.

The type XMLCh is a 16-bit type.  The internal data storage
is UTF-16.

Sincerely,
Steven J. Hathaway

> Hi,
>
> I wanted to try using xerces on openbsd 5.1.
>
> After compilation, DOMCount was always returning:
> unknow reason.
>
> After reading the code, it turns out that the end of conversion by
> wcsrtombs and mbsrtowcs is based on a test on source pointer (source
> pointer should point on null character).
>
> The problem is that this behaviour is not implemented. Source pointer
> points on the character following the last converted character leading
> xerces binary to a risky memory access.
>
> Below, there is a patch based on values returned by the functions (-1 in
> case of error, >= 0 in case of complete/incomplete conversion) that fixes
> the problem.
>
> Regards,
> Simon Elbaz
>
> $ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> ===================================================================
> --- xercesc/util/Transcoders/Iconv/IconvTransService.cpp        (revision
> 1387785)
> +++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp        (working
> copy)
> @@ -429,7 +429,7 @@
>      srcBuffer[gTempBuffArraySize - 1] = 0;
>      const wchar_t *src = 0;
>
> -    while (toTranscode[srcCursor] || src)
> +    while (toTranscode[srcCursor])
>      {
>          if (src == 0) // copy a piece of the source string into a local
>                        // buffer, converted to wchar_t and
> NULL-terminated.
> @@ -454,7 +454,7 @@
>              break;
>          }
>          dstCursor += len;
> -        if (src != 0) // conversion not finished. This *always* means
> there
> +        if (len == (resultSize - dstCursor)) // conversion not finished.
> This *always* means there
>                        // was not enough room in the destination buffer.
>          {
>              reallocString<char>(resultString, resultSize, manager,
> resultString != localBuffer);
> @@ -512,9 +512,9 @@
>              break;
>          }
>          dstCursor += len;
> -        if (src == 0) // conversion finished
> +        if ((len >= 0) && (len < (resultSize - dstCursor))) // conversion
> finished
>              break;
> -        if (dstCursor >= resultSize - 1)
> +        if (len == (resultSize - dstCursor))
>              reallocString<wchar_t>(tmpString, resultSize, manager,
> tmpString != localBuffer);
>      }
>      // make a final copy, converting from wchar_t to XMLCh:
>



---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


Re: xerces trunk on openbsd 5.1

Posted by Simon Elbaz <el...@gmail.com>.
Hi Alberto,

your modification of configure script solves my problem.
Thanks.

On Fri, Sep 28, 2012 at 4:27 PM, Alberto Massari <
Alberto.Massari@progress.com> wrote:

>  Hi Simon,
> it looks that libc in OpenBSD 5.1 is not obeying to the documentation for
> wcsrtombs/mbsrtowcs.
>
> If *d**s**t* is not a null pointer, the pointer object pointed  to
> by  *s**r**c*  is  assigned  either  a null pointer (if conversion
> stopped due to reaching a terminating  null  wide-character)
> or  the address just  past the last wide-character converted
> (if any).
>
> Instead of hacking the code to try to detect whether the conversion
> actually wrote a NULL character in the converted string, I chose to modify
> the 'configure' script to detect this behaviour and disable the usage of
> the re-entrant functions if it doesn't match how the Xerces code uses them.
>
> Thank you for reporting this issue,
> Alberto
>
> Il 28/09/2012 00:18, Simon Elbaz ha scritto:
>
> Hi,
>
> I wanted to try using xerces on openbsd 5.1.
>
> After compilation, DOMCount was always returning:
> unknow reason.
>
> After reading the code, it turns out that the end of conversion by
> wcsrtombs and mbsrtowcs is based on a test on source pointer (source
> pointer should point on null character).
>
> The problem is that this behaviour is not implemented. Source pointer
> points on the character following the last converted character leading
> xerces binary to a risky memory access.
>
> Below, there is a patch based on values returned by the functions (-1 in
> case of error, >= 0 in case of complete/incomplete conversion) that fixes
> the problem.
>
> Regards,
> Simon Elbaz
>
> $ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> ===================================================================
> --- xercesc/util/Transcoders/Iconv/IconvTransService.cpp        (revision
> 1387785)
> +++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp        (working
> copy)
> @@ -429,7 +429,7 @@
>      srcBuffer[gTempBuffArraySize - 1] = 0;
>      const wchar_t *src = 0;
>
> -    while (toTranscode[srcCursor] || src)
> +    while (toTranscode[srcCursor])
>      {
>          if (src == 0) // copy a piece of the source string into a local
>                        // buffer, converted to wchar_t and NULL-terminated.
> @@ -454,7 +454,7 @@
>              break;
>          }
>          dstCursor += len;
> -        if (src != 0) // conversion not finished. This *always* means
> there
> +        if (len == (resultSize - dstCursor)) // conversion not finished.
> This *always* means there
>                        // was not enough room in the destination buffer.
>          {
>              reallocString<char>(resultString, resultSize, manager,
> resultString != localBuffer);
> @@ -512,9 +512,9 @@
>              break;
>          }
>          dstCursor += len;
> -        if (src == 0) // conversion finished
> +        if ((len >= 0) && (len < (resultSize - dstCursor))) // conversion
> finished
>              break;
> -        if (dstCursor >= resultSize - 1)
> +        if (len == (resultSize - dstCursor))
>              reallocString<wchar_t>(tmpString, resultSize, manager,
> tmpString != localBuffer);
>      }
>      // make a final copy, converting from wchar_t to XMLCh:
>
>
>

Re: xerces trunk on openbsd 5.1

Posted by Alberto Massari <Al...@progress.com>.
Hi Simon,
it looks that libc in OpenBSD 5.1 is not obeying to the documentation 
for wcsrtombs/mbsrtowcs.

If_d__s__t_  is not a null pointer, the pointer object pointed  to
by_s__r__c_   is  assigned  either  a null pointer (if conversion
stopped due to reaching a terminating  null  wide-character)
or  the address just  past the last wide-character converted
(if any).

Instead of hacking the code to try to detect whether the conversion 
actually wrote a NULL character in the converted string, I chose to 
modify the 'configure' script to detect this behaviour and disable the 
usage of the re-entrant functions if it doesn't match how the Xerces 
code uses them.

Thank you for reporting this issue,
Alberto

Il 28/09/2012 00:18, Simon Elbaz ha scritto:
> Hi,
>
> I wanted to try using xerces on openbsd 5.1.
>
> After compilation, DOMCount was always returning:
> unknow reason.
>
> After reading the code, it turns out that the end of conversion by 
> wcsrtombs and mbsrtowcs is based on a test on source pointer (source 
> pointer should point on null character).
>
> The problem is that this behaviour is not implemented. Source pointer 
> points on the character following the last converted character leading 
> xerces binary to a risky memory access.
>
> Below, there is a patch based on values returned by the functions (-1 
> in case of error, >= 0 in case of complete/incomplete conversion) that 
> fixes the problem.
>
> Regards,
> Simon Elbaz
>
> $ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
> ===================================================================
> --- xercesc/util/Transcoders/Iconv/IconvTransService.cpp (revision 
> 1387785)
> +++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp (working copy)
> @@ -429,7 +429,7 @@
>      srcBuffer[gTempBuffArraySize - 1] = 0;
>      const wchar_t *src = 0;
>
> -    while (toTranscode[srcCursor] || src)
> +    while (toTranscode[srcCursor])
>      {
>          if (src == 0) // copy a piece of the source string into a local
>                        // buffer, converted to wchar_t and 
> NULL-terminated.
> @@ -454,7 +454,7 @@
>              break;
>          }
>          dstCursor += len;
> -        if (src != 0) // conversion not finished. This *always* means 
> there
> +        if (len == (resultSize - dstCursor)) // conversion not 
> finished. This *always* means there
>                        // was not enough room in the destination buffer.
>          {
>              reallocString<char>(resultString, resultSize, manager, 
> resultString != localBuffer);
> @@ -512,9 +512,9 @@
>              break;
>          }
>          dstCursor += len;
> -        if (src == 0) // conversion finished
> +        if ((len >= 0) && (len < (resultSize - dstCursor))) // 
> conversion finished
>              break;
> -        if (dstCursor >= resultSize - 1)
> +        if (len == (resultSize - dstCursor))
>              reallocString<wchar_t>(tmpString, resultSize, manager, 
> tmpString != localBuffer);
>      }
>      // make a final copy, converting from wchar_t to XMLCh:
>