You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Jones, David [deljones]" <D....@liverpool.ac.uk> on 2006/10/30 18:50:39 UTC
Reading UTF-16: Content is not allowed in prolog
Hi
I've been trying to write and read back a UTF-16 encoded XML document,
without success. I can seemingly write the document ok (see
WriteUTF16.java ) and Firefox can open it without complaining (except
the fact it doesn't have any style information). When I try and read
back the document (using ReadUTF16.java) I get the following exception;
-
[Fatal Error] output.xml:1:40: Content is not allowed in prolog.
Exception in thread "main" org.xml.sax.SAXParseException: Content is not
allowed
in prolog.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at ReadUTF16.main(ReadUTF16.java:47)
I'm not sure whether I'm missing something simple, doing something
really stupid or even if it is a platform dependent thing (I'm using
Windows XP and Windows 2000).
many thanks
david
------- ReadUTF16.java -------
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
import java.util.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
public class ReadUTF16
{
public static void main ( String rags[] ) throws Exception
{
System.out.println("read: main()");
File f = new File ( "output.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse( f.toURI().toString() );
// serialise to console using jaxp
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ;
DOMSource ds = new DOMSource ( doc );
StreamResult res = new StreamResult ( System.out ) ;
t.transform ( ds , res ) ;
}// end method
}// end class
------- WriteUTF16.java -------
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
import java.util.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
public class WriteUTF16
{
public static void main ( String rags[] ) throws Exception
{
System.out.println("main()");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.newDocument();
Element element = doc.createElement( "tagname") ;
element.setAttribute("test","test-value");
doc.appendChild ( element ) ;
// serialise using jaxp
File f = new File ( "output.xml");
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
Properties p = t.getOutputProperties();
// print properties to console
p.list ( System.out ) ;
// set up the transformer
// there are properties to set and error listeners which can be set
// does something when output to the console anyway...
t.setOutputProperty ( "encoding" , "UTF-16" ) ;
// set indentation
t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ;
// new DOMSource instance
DOMSource ds = new DOMSource ( doc );
// Print Writer
PrintWriter writer = new PrintWriter( new BufferedWriter ( new
FileWriter ( f) ) );
StreamResult res = new StreamResult ( writer ) ;
// transform
t.transform ( ds , res ) ;
}// end method
}// end class
Re: Reading UTF-16: Content is not allowed in prolog
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi David,
The encoding declared in your document (in the XML declaration) and the
actual encoding of your document probably don't match after you've
serialized it to a file. You should be passing a FileOutputStream to the
transformer and let it handle the character encoding instead of using a
FileWriter which assumes the platform default encoding (which could be
anything) is acceptable. Something similar was discussed on the j-dev list
many months ago. The thread starts here [1] in the archives if you're
interested.
Thanks.
[1]
http://mail-archives.apache.org/mod_mbox/xerces-j-dev/200504.mbox/%3c426D751E.9090808@sharp.fm%3e
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
"Jones, David [deljones]" <D....@liverpool.ac.uk> wrote on 10/30/2006
12:50:39 PM:
> Hi
> I've been trying to write and read back a UTF-16 encoded XML
> document, without success. I can seemingly write the document ok
> (see WriteUTF16.java ) and Firefox can open it without complaining
> (except the fact it doesn't have any style information). When I try
> and read back the document (using ReadUTF16.java) I get the
> following exception; -
>
> [Fatal Error] output.xml:1:40: Content is not allowed in prolog.
> Exception in thread "main" org.xml.sax.SAXParseException: Content is
> not allowed
> in prolog.
> at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
> at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
> at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
> at ReadUTF16.main(ReadUTF16.java:47)
> I'm not sure whether I'm missing something simple, doing something
> really stupid or even if it is a platform dependent thing (I'm using
> Windows XP and Windows 2000).
>
> many thanks
>
> david
>
>
>
> ------- ReadUTF16.java -------
> import org.w3c.dom.*;
> import javax.xml.parsers.*;
> import java.io.*;
> import java.util.*;
> import javax.xml.transform.dom.*;
> import javax.xml.transform.*;
> import javax.xml.transform.stream.*;
>
>
> public class ReadUTF16
> {
> public static void main ( String rags[] ) throws Exception
> {
> System.out.println("read: main()");
> File f = new File ( "output.xml");
>
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document doc = db.parse( f.toURI().toString() );
>
> // serialise to console using jaxp
> TransformerFactory tf = TransformerFactory.newInstance();
> Transformer t = tf.newTransformer();
> t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ;
> DOMSource ds = new DOMSource ( doc );
> StreamResult res = new StreamResult ( System.out ) ;
> t.transform ( ds , res ) ;
>
> }// end method
> }// end class
>
>
> ------- WriteUTF16.java -------
>
> import org.w3c.dom.*;
> import javax.xml.parsers.*;
> import java.io.*;
> import java.util.*;
> import javax.xml.transform.dom.*;
> import javax.xml.transform.*;
> import javax.xml.transform.stream.*;
>
>
> public class WriteUTF16
> {
>
> public static void main ( String rags[] ) throws Exception
> {
> System.out.println("main()");
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder db = dbf.newDocumentBuilder();
> Document doc = db.newDocument();
> Element element = doc.createElement( "tagname") ;
> element.setAttribute("test","test-value");
> doc.appendChild ( element ) ;
> // serialise using jaxp
> File f = new File ( "output.xml");
> TransformerFactory tf = TransformerFactory.newInstance();
>
> Transformer t = tf.newTransformer();
>
> Properties p = t.getOutputProperties();
> // print properties to console
> p.list ( System.out ) ;
> // set up the transformer
> // there are properties to set and error listeners which can be set
>
> // does something when output to the console anyway...
> t.setOutputProperty ( "encoding" , "UTF-16" ) ;
> // set indentation
> t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ;
> // new DOMSource instance
> DOMSource ds = new DOMSource ( doc );
> // Print Writer
> PrintWriter writer = new PrintWriter( new BufferedWriter ( new
> FileWriter ( f) ) );
> StreamResult res = new StreamResult ( writer ) ;
> // transform
> t.transform ( ds , res ) ;
> }// end method
> }// end class
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org
Re: Reading UTF-16: Content is not allowed in prolog
Posted by ke...@us.ibm.com.
[Fatal Error] output.xml:1:40: Content is not allowed in prolog.
You have something other than the Byte Order Mark, the XML Declaration,
Processing Instructions, or whitespace before the document's root element.
Fix the file so it's well-formed XML.
______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev-11.html)