You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2002/05/08 01:10:48 UTC

DO NOT REPLY [Bug 8893] New: - Creation of DOM containing invalid xml characters.

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=8893>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=8893

Creation of DOM containing invalid xml characters.

           Summary: Creation of DOM containing invalid xml characters.
           Product: Xerces-J
           Version: 1.4.3
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Critical
          Priority: Other
         Component: Core
        AssignedTo: xerces-j-dev@xml.apache.org
        ReportedBy: abhilash.koneri@bestbuy.com


I am using xerces for building a dom document from character data. The 
character data contains some characters which are not legal xml characters. 
However, this does not cause any exception during the creation of the the DOM. 
However, when I serialize the dom (to a string) and the re-parse it to
obtain the dom, I get an sax exception reporting the invalid xml character.

The code used is attached below.
-----

import java.io.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.w3c.dom.*;

import org.apache.xml.serialize.XMLSerializer;
import org.apache.xml.serialize.OutputFormat;


public class ItemTest 
{
    public static void main(String[] args) throws Throwable
    {
         String illegalUnicodeString = "BLACK  ";

         char[] chars = illegalUnicodeString.toCharArray();
         for(int i=0;i<chars.length; i++)
         {
             System.out.println("character "+chars[i]+":"+Character.isLetter
(chars[i]));

         }
         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

         factory.setValidating(false);
         DocumentBuilder builder = factory.newDocumentBuilder();

         Document dom = builder.newDocument();
         Element rootElement= dom.createElement("ITEMRECORD");
         rootElement.appendChild(dom.createTextNode(illegalUnicodeString));
         dom.appendChild(rootElement);

         String domString = getXmlAsString(dom, false);

         System.out.println("The serialized dom string is \n"+domString);
         StringReader reader = new StringReader(domString);
         InputSource is = new InputSource(reader);
         dom = builder.parse(is);
    }

     public static String getXmlAsString(Document dom, boolean supressHeader) 
     {
         String xmlString = null;

         try 
         { 
             OutputFormat format = new OutputFormat(dom);
             format.setPreserveSpace(true);
             format.setOmitXMLDeclaration(supressHeader);  // skip boilerplate 
at top of XML document
             StringWriter sOut = new StringWriter();
             XMLSerializer serializer = new XMLSerializer(sOut,format);
             serializer.asDOMSerializer();
             serializer.serialize(dom.getDocumentElement());
             xmlString = sOut.getBuffer().toString();
         }
         catch(DOMException domE) 
         {
             domE.printStackTrace();
         }
         catch(IOException e) 
         {
             e.printStackTrace();
         }
         return xmlString;

     }
}

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org