You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by ro...@us.ibm.com on 2000/02/18 21:01:32 UTC

Xerces-C: New intrinsic encoding



Since there is quite a bit of generated HTML (which hopefully is well
formed and therefore can be parsed via our parser) and eventually XML from
Windows based tools, and they tend to generate output in the 'Windows-1252'
encoding, I've added this as another intrinsic encoding in the Xerces-C
parser. The non-Windows builds will be broken for a few minutes until the
build files on the other platforms are updated.

BTW, be sure to differentiate between ISO-8859-1 (Latin1) and 1252, since
they are not the same but were traditionally treated as so by many Windows
tools. In many cases it wouldn't make a difference, but some of the code
points in the upper half do have different semantics and you might not
round trip correctly if you use those code points but say the encoding is
Latin1.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
roddey@us.ibm.com