You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by bu...@apache.org on 2003/12/13 13:13:46 UTC

DO NOT REPLY [Bug 25498] New: - Win32Transcoder does not properly transcode ISO-8859-2 and other encodings

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25498>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25498

Win32Transcoder does not properly transcode ISO-8859-2 and other encodings

           Summary: Win32Transcoder does not properly transcode ISO-8859-2
                    and other encodings
           Product: Xerces-C++
           Version: 2.4.0
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Utilities
        AssignedTo: xerces-c-dev@xml.apache.org
        ReportedBy: jdrozd@software602.cz
                CC: jdrozd@software602.cz


Win32TransService scans the Windows registry for supported charsets and reads 
the "Codepage" and "InternetEncoding". For many charsets these value are equal, 
but not for all.

When a Win32Transcoder object is created for a given charset, the "Codepage" 
value is stored in the fWinCP member and the "InternetEncoding" value in the 
fIECP member. Win32Transcoder methods use the fWinCP value and pass it to the 
Windows API functions like ::MultiByteToWideChar. This is wrong. The fIECP 
value should be used instead.

For example when transcoding from the ISO-8859-2 encoding then fWinCP is 1250 
and fIECP is 28592. Win32Transcoder::transcodeFrom(...) 
calls ::MultiByteToWideChar(1250, ...). This transcodes from the Windows-1250 
code page, not from ISO-8859-2, and the result is wrong.

The proposed patch:
Replace fWinCP with fIECP in all calls of Windows API functions in all 
Win32Transcoder methods.

In Win32Transcoder::transcodeFrom:
...............
  const unsigned int toEat = ::IsDBCSLeadByteEx(fIECP, *inPtr) ? 2 : 1;
  // Make sure a whol char is in the source
  if (inPtr + toEat > inEnd)
      break;
  // Try to translate this next char and check for an error
  const unsigned int converted = ::MultiByteToWideChar
  ( fIECP, MB_PRECOMPOSED | MB_ERR_INVALID_CHARS, (const char*)inPtr, toEat, 
outPtr, 1);
...............

In Win32Transcoder::transcodeTo:
...............
  const unsigned int bytesStored = ::WideCharToMultiByte
  (fIECP, WC_COMPOSITECHECK | WC_SEPCHARS, srcPtr, 1, (char*)outPtr, outEnd - 
outPtr, 0, &usedDef);
...............

In Win32Transcoder::canTranscodeTo:
...............
  const unsigned int bytesStored = ::WideCharToMultiByte
  (fIECP, WC_COMPOSITECHECK | WC_SEPCHARS, srcBuf, srcCount, tmpBuf, 64, 0, 
&usedDef);
...............

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org