You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "Jones, David [deljones]" <D....@liverpool.ac.uk> on 2006/10/30 18:50:39 UTC

Reading UTF-16: Content is not allowed in prolog

Hi
I've been trying to write and read back a UTF-16 encoded XML document,
without success. I can seemingly write the document ok (see
WriteUTF16.java ) and Firefox can open it without complaining (except
the fact it doesn't have any style information). When I try and read
back the document (using ReadUTF16.java) I get the following exception;
- 
 
 [Fatal Error] output.xml:1:40: Content is not allowed in prolog.
Exception in thread "main" org.xml.sax.SAXParseException: Content is not
allowed
 in prolog.
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
        at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
        at ReadUTF16.main(ReadUTF16.java:47)

I'm not sure whether I'm missing something simple, doing something
really stupid or even if it is a platform dependent thing (I'm using
Windows XP and Windows 2000).
 
many thanks
 
david 
 
 
 
------- ReadUTF16.java -------
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
import java.util.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
 

public class ReadUTF16
{
 public static void main ( String rags[] ) throws Exception 
 {
  System.out.println("read: main()");
  File f = new File ( "output.xml");
  
  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
  DocumentBuilder db = dbf.newDocumentBuilder();
  Document doc = db.parse( f.toURI().toString() );
 
  // serialise to console using jaxp 
    TransformerFactory tf = TransformerFactory.newInstance();  
  Transformer t = tf.newTransformer();
  t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ;  
  DOMSource ds = new DOMSource ( doc  );  
  StreamResult res = new StreamResult  ( System.out ) ;
  t.transform ( ds , res ) ;
    
 }// end method
}// end class
 
 
------- WriteUTF16.java ------- 

import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
import java.util.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
 

public class WriteUTF16
{

 public static void main ( String rags[] ) throws Exception 
 {
  System.out.println("main()");
  DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
  DocumentBuilder db = dbf.newDocumentBuilder();
  Document doc = db.newDocument();
  Element element = doc.createElement( "tagname") ;
  element.setAttribute("test","test-value");
  doc.appendChild ( element ) ; 
  // serialise using jaxp 
   File f = new File ( "output.xml");
  TransformerFactory tf = TransformerFactory.newInstance();
  
  Transformer t = tf.newTransformer();
  
  Properties p = t.getOutputProperties();
  // print properties to console 
  p.list ( System.out ) ;
  // set up the transformer 
  // there are properties to set and error listeners which can be set 
  
  // does something when output to the console anyway... 
  t.setOutputProperty ( "encoding" , "UTF-16" ) ;
  // set indentation 
  t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ;
  // new DOMSource instance 
  DOMSource ds = new DOMSource ( doc  );
  // Print Writer 
  PrintWriter writer = new PrintWriter( new BufferedWriter ( new
FileWriter ( f) ) );  
  StreamResult res = new StreamResult  ( writer ) ;
  // transform 
  t.transform ( ds , res ) ;    
 }// end method
}// end class




Re: Reading UTF-16: Content is not allowed in prolog

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi David,

The encoding declared in your document (in the XML declaration) and the 
actual encoding of your document probably don't match after you've 
serialized it to a file. You should be passing a FileOutputStream to the 
transformer and let it handle the character encoding instead of using a 
FileWriter which assumes the platform default encoding (which could be 
anything) is acceptable. Something similar was discussed on the j-dev list 
many months ago. The thread starts here [1] in the archives if you're 
interested.

Thanks.

[1] 
http://mail-archives.apache.org/mod_mbox/xerces-j-dev/200504.mbox/%3c426D751E.9090808@sharp.fm%3e

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Jones, David [deljones]" <D....@liverpool.ac.uk> wrote on 10/30/2006 
12:50:39 PM:

> Hi
> I've been trying to write and read back a UTF-16 encoded XML 
> document, without success. I can seemingly write the document ok 
> (see  WriteUTF16.java ) and Firefox can open it without complaining 
> (except the fact it doesn't have any style information). When I try 
> and read back the document (using ReadUTF16.java) I get the 
> following exception; - 
> 
>  [Fatal Error] output.xml:1:40: Content is not allowed in prolog.
> Exception in thread "main" org.xml.sax.SAXParseException: Content is
> not allowed
>  in prolog.
>         at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
>         at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown 
Source)
>         at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
>         at ReadUTF16.main(ReadUTF16.java:47)
> I'm not sure whether I'm missing something simple, doing something 
> really stupid or even if it is a platform dependent thing (I'm using
> Windows XP and Windows 2000).
> 
> many thanks
> 
> david 
> 
> 
> 
> ------- ReadUTF16.java -------
> import org.w3c.dom.*;
> import javax.xml.parsers.*;
> import java.io.*;
> import java.util.*;
> import javax.xml.transform.dom.*;
> import javax.xml.transform.*;
> import javax.xml.transform.stream.*;
> 
> 
> public class ReadUTF16
> {
>  public static void main ( String rags[] ) throws Exception 
>  {
>   System.out.println("read: main()");
>   File f = new File ( "output.xml");
> 
>   DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
>   DocumentBuilder db = dbf.newDocumentBuilder();
>   Document doc = db.parse( f.toURI().toString() );
> 
>   // serialise to console using jaxp 
>     TransformerFactory tf = TransformerFactory.newInstance(); 
>   Transformer t = tf.newTransformer();
>   t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ; 
>   DOMSource ds = new DOMSource ( doc  ); 
>   StreamResult res = new StreamResult  ( System.out ) ;
>   t.transform ( ds , res ) ;
> 
>  }// end method
> }// end class
> 
> 
> ------- WriteUTF16.java ------- 
> 
> import org.w3c.dom.*;
> import javax.xml.parsers.*;
> import java.io.*;
> import java.util.*;
> import javax.xml.transform.dom.*;
> import javax.xml.transform.*;
> import javax.xml.transform.stream.*;
> 
> 
> public class WriteUTF16
> {
> 
>  public static void main ( String rags[] ) throws Exception 
>  {
>   System.out.println("main()");
>   DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
>   DocumentBuilder db = dbf.newDocumentBuilder();
>   Document doc = db.newDocument();
>   Element element = doc.createElement( "tagname") ;
>   element.setAttribute("test","test-value");
>   doc.appendChild ( element ) ; 
>   // serialise using jaxp 
>    File f = new File ( "output.xml");
>   TransformerFactory tf = TransformerFactory.newInstance();
> 
>   Transformer t = tf.newTransformer();
> 
>   Properties p = t.getOutputProperties();
>   // print properties to console 
>   p.list ( System.out ) ;
>   // set up the transformer 
>   // there are properties to set and error listeners which can be set 
> 
>   // does something when output to the console anyway... 
>   t.setOutputProperty ( "encoding" , "UTF-16" ) ;
>   // set indentation 
>   t.setOutputProperty ( OutputKeys.INDENT , "yes" ) ;
>   // new DOMSource instance 
>   DOMSource ds = new DOMSource ( doc  );
>   // Print Writer 
>   PrintWriter writer = new PrintWriter( new BufferedWriter ( new 
> FileWriter ( f) ) ); 
>   StreamResult res = new StreamResult  ( writer ) ;
>   // transform 
>   t.transform ( ds , res ) ; 
>  }// end method
> }// end class

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Reading UTF-16: Content is not allowed in prolog

Posted by ke...@us.ibm.com.
 [Fatal Error] output.xml:1:40: Content is not allowed in prolog.

You have something other than the Byte Order Mark, the XML Declaration,
Processing Instructions, or whitespace before the document's root element.
Fix the file so it's well-formed XML.


______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev-11.html)