You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Dmitry Mordovin <dm...@dwide.com> on 2015/03/27 07:52:11 UTC
UTF8Reader: Invalid byte sequence
Hi!
Try to parse html string with english, russian and vietnamese characters.
Sample:
Document doc = builder.parse(new
StringBufferInputStream("<html><body>Eng Рус Việt Nam</body></html>"));
Java file stored as UTF-8
I even check string "Eng Рус Việt Nam" with online convert service -
result: input string encoding same as output - utf8
Java Appliction Exception at parse proc:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException:
Invalid byte 2 of 2-byte UTF-8 sequence.
at
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:691)
at
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:372)
at
com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1743)
at
com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1413)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2823)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
at
javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at cc.jmitty.PowerWorker.doProceedPDFRequest(PowerWorker.java:268)
at cc.jmitty.PowerWorker.doSendPDF(PowerWorker.java:187)
at cc.jmitty.PowerWorker.run(PowerWorker.java:93)
Have you any idea how to check my string or another solution?
Dmitry