You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Frank Zhou <fc...@yahoo.com> on 2004/04/02 02:21:58 UTC

exception: invalid byte 1 of 1-byte UTF-8 sequence (0xb3)

Hi All,

I was trying to parse an XML file using xerces-J
latest version and got the invalidByte exception,
the problem has to do with a character "superscript 3"
with unicode 0xb3. I thought this is a legal XML
character according to the XML specification, and I
can load the xml document fine with Windows Explorer.
Any clue why I got this exception? 

Thanks much in advance.

Frank
===========================
Here is the exception trace.


java.io.UTFDataFormatException: invalid byte 1 of
1-byte UTF-8 sequence (0xb3) 

at 
org.apache.xerces.impl.io.UTF8Reader.invalidByte(Unknown
Source)
at 
org.apache.xerces.impl.io.UTF8Reader.read(Unknown
 Source)
at
org.apache.xerces.impl.XMLEntityManager$EntityScanner.load(Unknown
Source)
at
org.apache.xerces.impl.XMLEntityManager$EntityScanner.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at
org.apache.xerces.parsers.DTDConfiguration.parse
(Unknown Source)
at
org.apache.xerces.parsers.DTDConfiguration.parse(Unknown
Source)
at
org.apache.xerces.parsers.XMLParser.parse(Unknown
Source)
at
org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
Source)



__________________________________
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway 
http://promotions.yahoo.com/design_giveaway/

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: exception: invalid byte 1 of 1-byte UTF-8 sequence (0xb3)

Posted by Andy Clark <an...@apache.org>.
Frank Zhou wrote:
> I was trying to parse an XML file using xerces-J
> latest version and got the invalidByte exception,
> the problem has to do with a character "superscript 3"
> with unicode 0xb3. I thought this is a legal XML
> character according to the XML specification, and I
> can load the xml document fine with Windows Explorer.
> Any clue why I got this exception? 

You are probably seeing this exception because your
document does not specify the true encoding of the file.
Since Xerces defaults to UTF-8, it produces an error when
it sees a character from the true encoding and tries to
handle it as UTF-8.

Add an XML declaration at the top of the file that
specifies the real encoding. For example:

   <?xml version='1.0' encoding='ISO-8859-1'?>

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org