You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by Igor Tandetnik <it...@syncro-tech.com> on 2000/03/15 18:47:34 UTC
Problem in XMLUTF8Transcoder
Hello.
I have Xerces-C version 1.1.0
There is the table
static const XMLUInt32 gUTFOffsets[6] =
{
0, 0x3080, 0xE2080, 0x3C82080, 0xFA082080, 0x82022080
};
in util/XMLUTF8Transcoder.cpp. The numbers in this table should have been equal to the following:
0
(0xC0 << 6) + 0x80
(((0xE0 << 6) + 0x80) << 6) + 0x80
(((((0xF0 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((0xF8 << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
(((((((((0xFC << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80) << 6) + 0x80
to correctly account for UTF-8 byte masks.
All the numbers comply except the last - it must be 0x82082080. I guess it is just a typo. It does not influence the processing anyway because the large UCS-4 codes which will require 6-byte sequences will cause the error in the conversion to the high and low surrogate (UTF-16). I'm just being pedantic here.
Igor Tandetnik