You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "McEvoy, Peter" <pe...@iona.com> on 2002/11/06 18:55:40 UTC

Stoopid question about XMLSerializer and OutputFormat

Folks,
	I've noticed something when using XMLSerialzer to write out DOMs.  Something seems to be wrong with the way entities are being written out.  To demonstrate, the following is a slight modification of the DOMGenerate example that is in the samples dir of the Xerces 2.2 distribution.

THe expected output would be:

STRXML = <?xml version="1.0" encoding="UTF-8"?>
<root>
    <characters>&apos; &quot; &lt; &gt; &amp;</characters>
    <entities>&amp;apos; &amp;quot; &amp;lt; &amp;gt; &amp;amp;</entities>
</root>

But instead the output is

STRXML = <?xml version="1.0" encoding="UTF-8"?>
<root>
    <characters>' &quot; &lt; > &amp;</characters>
    <entities>&amp;apos; &amp;quot; &amp;lt; &amp;gt; &amp;amp;</entities>
</root>

Why are the ' and > characters not getting converted to entities?  How can the serialized XML be valid XML if they are not output as entities?  Perhaps I have done something wrong, but this is on ly a slight modification of the DOMGenerate example (and I really want to move to convenience methods for outputting XML instead of needing to write the likes of the write method in the DOMWriter example...

Can someone help me out....?

Here is the code for the above

/**
 * Simple Sample that:
 * - Generate a DOM from Scratch.
 * - Output DOM to a String using Serializer
 * @author Jeffrey Rodriguez
 * @version $Id: DOMGenerate.java,v 1.5 2002/01/29 01:15:05 lehors Exp $
 */
public class DOMGenerate {
    public static void main( String[] argv ) {
        try {
            Document doc= new DocumentImpl();
            Element root = doc.createElement("root");     // Create Root Element

            Element characters = doc.createElement("characters");     // Create Root Element
            characters.appendChild( doc.createTextNode("' \" < > &") );
            root.appendChild( characters );                        // Add Root to Document

            Element entities = doc.createElement("entities");     // Create Root Element
            entities.appendChild( doc.createTextNode("&apos; &quot; &lt; &gt; &amp;") );
            root.appendChild( entities );                        // Add Root to Document
            doc.appendChild( root );                        // Add Root to Document


            OutputFormat    format  = new OutputFormat( doc );   //Serialize DOM
            format.setIndenting(true);
            StringWriter  stringOut = new StringWriter();        //Writer will be a String
            XMLSerializer    serial = new XMLSerializer( stringOut, format );
            serial.asDOMSerializer();                            // As a DOM Serializer

            serial.serialize( doc.getDocumentElement() );

            System.out.println( "STRXML = " + stringOut.toString() ); //Spit out DOM as a String
        } catch ( Exception ex ) {
            ex.printStackTrace();
        }
    }
}

Re: Stoopid question about XMLSerializer and OutputFormat

Posted by Joseph Kesselman <ke...@us.ibm.com>.
See the XML spec. The > character does not have to be escaped in most 
circumstances, though it can be and often is for readability. Similarly, ' 
only has to be escaped in situations where it would cause parsing 
problems, specifically when it's inside an attribute which itself was 
quoted with ' characters.

______________________________________
Joe Kesselman  / IBM Research

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org