You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Suresh Babu Koya <sk...@quark.co.in> on 2002/08/14 15:28:37 UTC
Inconsistent behavior with Xerces
>>-----Original Message-----
>>From: Suresh Babu Koya [mailto:skoya@quark.co.in]
>>Sent: Wednesday, August 14, 2002 1:33 PM
>>To: xerces-j-user@xml.apache.org
>>Subject: Inconsistent behavior of XML
>>
>>
>>I came across a XML file that uses UTF-8 encoding and uses a special
>>character. The file is well-formed according to IE and XMLSpy.
>>But when I try to serialize it Xerces with the following
>>program I get some
>>output.
>>
>>I am attaching the files with this mail.
>>
>>
>>On reopening the serialized file with Xerces and again trying
>>to serialize
>>with my program
>>the file Xerces reports errors.
>>
>>Exception List:
>>
>>java.io.UTFDataFormatException: invalid byte 2 of 2-byte
>>UTF-8 sequence
>>(0x3c)
>> at
>>org.apache.xerces.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:678)
>> at
>>org.apache.xerces.impl.io.UTF8Reader.read(UTF8Reader.java:355)
>> at
>>org.apache.xerces.impl.XMLEntityManager$EntityScanner.load(XML
>>EntityManager.
>>java:3257)
>> at
>>org.apache.xerces.impl.XMLEntityManager$EntityScanner.scanCont
>>ent(XMLEntityM
>>anager.java:2371)
>> at
>>org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCont
>>ent(XMLDocumen
>>tFragmentScannerImpl.java:829)
>> at
>>org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$Fragment
>>ContentDispatc
>>her.dispatch(XMLDocumentFragmentScannerImpl.java:1387)
>> at
>>org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocu
>>ment(XMLDocume
>>ntFragmentScannerImpl.java:333)
>> at
>>org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfigurat
>>ion.java:524)
>> at
>>org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfigurat
>>ion.java:580)
>> at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:152)
>> at
>>org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXP
>>arser.java:110
>>8)
>> at sax2saxtest.SAXWriter.main(SAXWriter.java:34)
>>Exception in thread "main"
>>
>>
>>Here is the the code that I have written to parse and
>>serialize the file
>>using XMLSerializer class in Xerces.
>>
>>import org.apache.xml.serialize.XMLSerializer;
>>import org.apache.xml.serialize.OutputFormat;
>>import java.io.*;
>>import org.xml.sax.*;
>>import javax.xml.parsers.*;
>>public class SAXWriter {
>>
>> public SAXWriter() {
>> }
>> public static void main(String[] args) throws Exception {
>> OutputFormat of = new OutputFormat("XML","UTF-8",true);
>> of.setIndent(2);
>> FileWriter fout = new FileWriter("d:/Out1.xml");
>> XMLSerializer s= new XMLSerializer(new PrintWriter(fout), of);
>> SAXParserFactory spf = SAXParserFactory.newInstance();
>> SAXParser sp = spf.newSAXParser();
>> XMLReader rdr = sp.getXMLReader();
>> rdr.setContentHandler(s.asContentHandler());
>> String uri = "file:///d:/Temp/myfile.xml";
>> rdr.parse(uri);
>> }
>>}
>>
>>
>>
>>
>>
>>
>>
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org