You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by "McEvoy, Peter" <pe...@iona.com> on 2002/11/06 18:55:40 UTC
Stoopid question about XMLSerializer and OutputFormat
Folks,
I've noticed something when using XMLSerialzer to write out DOMs. Something seems to be wrong with the way entities are being written out. To demonstrate, the following is a slight modification of the DOMGenerate example that is in the samples dir of the Xerces 2.2 distribution.
THe expected output would be:
STRXML = <?xml version="1.0" encoding="UTF-8"?>
<root>
<characters>' " < > &</characters>
<entities>&apos; &quot; &lt; &gt; &amp;</entities>
</root>
But instead the output is
STRXML = <?xml version="1.0" encoding="UTF-8"?>
<root>
<characters>' " < > &</characters>
<entities>&apos; &quot; &lt; &gt; &amp;</entities>
</root>
Why are the ' and > characters not getting converted to entities? How can the serialized XML be valid XML if they are not output as entities? Perhaps I have done something wrong, but this is on ly a slight modification of the DOMGenerate example (and I really want to move to convenience methods for outputting XML instead of needing to write the likes of the write method in the DOMWriter example...
Can someone help me out....?
Here is the code for the above
/**
* Simple Sample that:
* - Generate a DOM from Scratch.
* - Output DOM to a String using Serializer
* @author Jeffrey Rodriguez
* @version $Id: DOMGenerate.java,v 1.5 2002/01/29 01:15:05 lehors Exp $
*/
public class DOMGenerate {
public static void main( String[] argv ) {
try {
Document doc= new DocumentImpl();
Element root = doc.createElement("root"); // Create Root Element
Element characters = doc.createElement("characters"); // Create Root Element
characters.appendChild( doc.createTextNode("' \" < > &") );
root.appendChild( characters ); // Add Root to Document
Element entities = doc.createElement("entities"); // Create Root Element
entities.appendChild( doc.createTextNode("' " < > &") );
root.appendChild( entities ); // Add Root to Document
doc.appendChild( root ); // Add Root to Document
OutputFormat format = new OutputFormat( doc ); //Serialize DOM
format.setIndenting(true);
StringWriter stringOut = new StringWriter(); //Writer will be a String
XMLSerializer serial = new XMLSerializer( stringOut, format );
serial.asDOMSerializer(); // As a DOM Serializer
serial.serialize( doc.getDocumentElement() );
System.out.println( "STRXML = " + stringOut.toString() ); //Spit out DOM as a String
} catch ( Exception ex ) {
ex.printStackTrace();
}
}
}
Re: Stoopid question about XMLSerializer and OutputFormat
Posted by Joseph Kesselman <ke...@us.ibm.com>.
See the XML spec. The > character does not have to be escaped in most
circumstances, though it can be and often is for readability. Similarly, '
only has to be escaped in situations where it would cause parsing
problems, specifically when it's inside an attribute which itself was
quoted with ' characters.
______________________________________
Joe Kesselman / IBM Research
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org