You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Suresh Babu Koya <sk...@quark.co.in> on 2002/08/14 15:28:37 UTC

Inconsistent behavior with Xerces

>>-----Original Message-----
>>From: Suresh Babu Koya [mailto:skoya@quark.co.in]
>>Sent: Wednesday, August 14, 2002 1:33 PM
>>To: xerces-j-user@xml.apache.org
>>Subject: Inconsistent behavior of XML
>>
>>
>>I came across a XML file that uses UTF-8 encoding and uses a special
>>character. The file is well-formed according to IE and XMLSpy.
>>But when I try to serialize it Xerces with the following 
>>program I get some
>>output. 
>>
>>I am attaching the files with this mail.
>>
>>
>>On reopening the serialized file with Xerces and again trying 
>>to serialize
>>with my program
>>the file Xerces reports errors. 
>>
>>Exception List:
>>
>>java.io.UTFDataFormatException: invalid byte 2 of 2-byte 
>>UTF-8 sequence
>>(0x3c)
>>	at
>>org.apache.xerces.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:678)
>>	at 
>>org.apache.xerces.impl.io.UTF8Reader.read(UTF8Reader.java:355)
>>	at
>>org.apache.xerces.impl.XMLEntityManager$EntityScanner.load(XML
>>EntityManager.
>>java:3257)
>>	at
>>org.apache.xerces.impl.XMLEntityManager$EntityScanner.scanCont
>>ent(XMLEntityM
>>anager.java:2371)
>>	at
>>org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanCont
>>ent(XMLDocumen
>>tFragmentScannerImpl.java:829)
>>	at
>>org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$Fragment
>>ContentDispatc
>>her.dispatch(XMLDocumentFragmentScannerImpl.java:1387)
>>	at
>>org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocu
>>ment(XMLDocume
>>ntFragmentScannerImpl.java:333)
>>	at
>>org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfigurat
>>ion.java:524)
>>	at
>>org.apache.xerces.parsers.DTDConfiguration.parse(DTDConfigurat
>>ion.java:580)
>>	at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:152)
>>	at
>>org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXP
>>arser.java:110
>>8)
>>	at sax2saxtest.SAXWriter.main(SAXWriter.java:34)
>>Exception in thread "main" 
>>  
>>
>>Here is the the code that I have written to parse and 
>>serialize the file
>>using XMLSerializer class in Xerces. 
>>
>>import org.apache.xml.serialize.XMLSerializer;
>>import org.apache.xml.serialize.OutputFormat;
>>import java.io.*;
>>import org.xml.sax.*;
>>import javax.xml.parsers.*;
>>public class SAXWriter {
>>
>>  public SAXWriter() {
>>  }
>>  public static void main(String[] args) throws Exception {
>>    OutputFormat of = new OutputFormat("XML","UTF-8",true);
>>    of.setIndent(2);
>>    FileWriter fout = new FileWriter("d:/Out1.xml");
>>    XMLSerializer s= new XMLSerializer(new PrintWriter(fout), of);
>>    SAXParserFactory spf = SAXParserFactory.newInstance();
>>    SAXParser sp = spf.newSAXParser();
>>    XMLReader rdr =  sp.getXMLReader();
>>    rdr.setContentHandler(s.asContentHandler());
>>    String uri = "file:///d:/Temp/myfile.xml";
>>    rdr.parse(uri);
>>  }
>>}
>> 
>>
>>
>>
>>
>>
>>     
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org