You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2002/05/18 00:58:53 UTC

DO NOT REPLY [Bug 9215] New: - XML that contains a large amount of CDATA Sections in parsed incorrectly

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9215>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9215

XML that contains a large amount of CDATA Sections in parsed incorrectly

           Summary: XML that contains a large amount of CDATA Sections in
                    parsed incorrectly
           Product: XalanJ2
           Version: 2.3
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Xalan
        AssignedTo: xalan-dev@xml.apache.org
        ReportedBy: matt73@aracnet.com
                CC: havlovick.matthew@cfwy.com


For my work, I retreive a large amount of data as an XML String and I use the 
DocumentBuilder to parse a ByteArrayInputStream containing this XML. The XML 
contains many CDATA sections and occasionally, depending upon the data, the 
document tree will have nodes that contain incorrect data. 

I have found that if I put a crimson.jar in front of the xercesImpl.jar in the 
classpath, then the document tree comes out OK, but not if xercesImpl.jar is in 
front of the crimson.jar.

Since we use such a large string of XML data, trying to have you reproduce it 
may be somewhat difficult. I was able to make a small program that does produce 
these incorrect results.

import org.w3c.dom.*;
import javax.xml.parsers.*; 
import java.io.*;

class xmltest{
    public static void main(String args[]){
 
        StringBuffer xml = new StringBuffer();
        xml.append("<LETTERS>");
        for (int y=0;y<=100;y++){
            xml.append("<LETTER><![CDATA[");
            for (int z=0;z<=y;z++) xml.append((char)((y%26)+97));
            xml.append("]]></LETTER>");
        }
        xml.append("</LETTERS>");
        
        byte[] b = xml.toString().getBytes();
        InputStream is = new ByteArrayInputStream(b);
        Document doc = null;
        try {
            if (is!=null){
                DocumentBuilderFactory docBuilderFactory = 
DocumentBuilderFactory.newInstance();
                DocumentBuilder docBuilder = 
docBuilderFactory.newDocumentBuilder();
                doc = docBuilder.parse(is);
            }
        } catch (Exception e){}   
          
            
        NodeList nodelist =  doc.getDocumentElement().getChildNodes();
        for (int idx=0; idx<nodelist.getLength();idx++){
            Node node = nodelist.item(idx);
            System.out.println(node.getFirstChild().getNodeValue());
        }
    }
}

At least in my testing, when the nodelist gets to the 65th item, the result for 
the node value is incorrect. Instead of the node containing the same letter, it 
is like a concatination of many of the other nodes.

Thanks,

Matt Havlovick
Consolidated Freightways