You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Joseph Kesselman <ke...@us.ibm.com> on 2002/12/03 14:59:33 UTC
Re: Encoding issues
UTF-8 can represent any Unicode character... but it does so by turning
some of them into multiple-byte sequences, and in order to do so it has to
reserve the bytes above 0x7F for that purpose. If you try to use those
bytes as characters themselves, UTF-8 conversion will fail. See the RFC
for more detail; it's not hard to find with a websearch.
There is probably an encoding that would work for your files -- but you'll
have to determine what it is and explicitly specify it.
______________________________________
Joe Kesselman / IBM Research
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org