You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by Igor Tandetnik <it...@syncro-tech.com> on 2000/03/15 18:47:34 UTC

Problem in XMLUTF8Transcoder

Hello.

I have Xerces-C version 1.1.0

There is the table 
static const XMLUInt32 gUTFOffsets[6] =
{
    0, 0x3080, 0xE2080, 0x3C82080, 0xFA082080, 0x82022080
};
in util/XMLUTF8Transcoder.cpp. The numbers in this table should have been equal to the following:

0
(0xC0 << 6) + 0x80
(((0xE0 << 6) + 0x80) << 6) + 0x80
(((((0xF0 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((0xF8 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((((0xFC << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80

to correctly account for UTF-8 byte masks.
All the numbers comply except the last - it must be 0x82082080. I guess it is just a typo. It does not influence the processing anyway because the large UCS-4 codes which will require 6-byte sequences will cause the error in the conversion to the high and low surrogate (UTF-16). I'm just being pedantic here.

Igor Tandetnik