You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Joseph Kesselman <ke...@us.ibm.com> on 2002/12/03 14:59:33 UTC

Re: Encoding issues

UTF-8 can represent any Unicode character... but it does so by turning 
some of them into multiple-byte sequences, and in order to do so it has to 
reserve the bytes above 0x7F for that purpose. If you try to use those 
bytes as characters themselves, UTF-8 conversion will fail. See the RFC 
for more detail; it's not hard to find with a websearch.

There is probably an encoding that would work for your files -- but you'll 
have to determine what it is and explicitly specify it.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org