You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2001/12/12 17:46:51 UTC

DO NOT REPLY [Bug 5382] New: - Limitation of the number of namespace declarations

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5382>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5382

Limitation of the number of namespace declarations

           Summary: Limitation of the number of namespace declarations
           Product: Xerces-J
           Version: 1.4.4
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: DOM
        AssignedTo: xerces-j-dev@xml.apache.org
        ReportedBy: bbeauvoir@yahoo.com


Xerces-J 1.4.4 cannot parse XML documents with a very large number of namespace 
declarations. (e.g. attribute such as: xmlns:prefix="uri"). This bug has been 
encounter in a production system that uses XSLT (Xalan) to process very large 
XML documents.
I propose a simple fix for this problem (see below).


HOW TO REPRODUCE THIS BUG:
~~~~~~~~~~~~~~~~~~~~~~~~~~

To reproduce this bug try to parse an XML document structured as follows:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <a:para xmlns:a="urn:a"/>          
    <a:para xmlns:a="urn:a"/>
    <a:para xmlns:a="urn:a"/>
    ... 
    <a:para xmlns:a="urn:a"/>
    <a:para xmlns:a="urn:a"/>
    <a:para xmlns:a="urn:a"/>

    <b:test xmlns:b="urn:b"/>
</root>

There should be 16360 <a:para xmlns:a="urn:a"/> child elements of the <root> 
element to reproduce the bug. The text nodes containing only spaces used for 
the indentation are important.
If you try to parse this kind of XML document with Xerces-J 1.4.4 the following 
NullPointerException is thrown:

java.lang.NullPointerException
	at org.apache.xerces.dom.DeferredElementNSImpl.synchronizeData
(DeferredElementNSImpl.java:154)
	at org.apache.xerces.dom.ElementImpl.getNodeName(ElementImpl.java:144)
	at NSLimitationBug.main(NSLimitationBug.java:26)

Here is the source code of my NSLimitationBug class that produces the 
NullPointerException. You should change the path of the XML file to load.

import java.io.*;
import org.apache.xerces.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;

public class NSLimitationBug
{
    public static void main(String[] args)
    {
        try
        {
            File file = new File("E:\\XercesBug\\large-in.xml");
            Reader reader = new BufferedReader(new FileReader(file));

            DOMParser parser = new DOMParser();
            parser.setFeature("http://xml.org/sax/features/validation", false);
            parser.setFeature("http://apache.org/xml/features/dom/defer-node-
expansion", true);
            InputSource source = new InputSource(reader);
            parser.parse(source);
            Document doc = parser.getDocument();

            NodeList children = doc.getDocumentElement().getChildNodes();
            int count = children.getLength();
            Element lastElem = (Element) children.item(count - 2); // The last 
child is a text node

            System.out.println("Name: '" + lastElem.getNodeName() + "'");
            System.out.println("Namespace URI: '" + lastElem.getNamespaceURI() 
+ "'");
        }
        catch (Throwable t)
        {
            t.printStackTrace();
        }
    }
}

PROPOSED FIX:
~~~~~~~~~~~~
Apparently this bug is due to a coding error in the 
org.apache.xerces.deom.DefferedDocumentImpl class.
In fact, in the method org.apache.xerces.dom.DefferedDocumentImpl#getNodeURI
(int nodeIndex, boolean free) an integer is down casted into a short for no 
reason. For the last element child of the <root> element the integer to cast is 
32768. As the maximum short number is 32767, the integer 32768 is casted into 
the short -32768.  In fact: (short)32768 == -32768 
This later results into the NullPointerException.
To fix this bug, simply remove the down casting into a short and change the 
return type of the two 'getNodeURI' to integer. 
Here is the code of these two methods after applying this fix:


    /** Returns the URI of the given node. */
    public int getNodeURI(int nodeIndex) {
        return getNodeURI(nodeIndex, true);
    }

    /**
     * Returns the URI of the given node.
     * @param True to free URI index.
     */
    public int getNodeURI(int nodeIndex, boolean free) {

        if (nodeIndex == -1) {
            return -1;
        }

        int chunk = nodeIndex >> CHUNK_SHIFT;
        int index = nodeIndex & CHUNK_MASK;
        if (free) {
            return clearChunkIndex(fNodeURI, chunk, index);
        }
        return getChunkIndex(fNodeURI, chunk, index);

    } // getNodeURI(int):int


NOTE:
~~~~~
I have first noted this bug in Xerces-J 1.2.0. With this version there is no 
NullPointerException. However the namespace URI of the last element is 'null' 
instead of being 'urn:b'.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org