You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by ma...@lutris.com on 2000/08/26 00:57:44 UTC

Problem with Xerces DOM

The way org.apache.xerces.dom.NodeImpl implements getChildNodes() is very
problematic and breaks with the HTML DOM.  It uses a `hidden' implement of the
NodeList interface to do this.  Hidden since NodeImpl implements NodeList,
however NodeList is not part of the org.w3c.dom.Node API use to access
NodeImpl.

The problem with this is that an class sub-classing a NodeImpl my unknownly
override the NodeList method.

This occurs in practice with two of the HTML DOM Elements that have
getLength() methods: HTMLFormElement and HTMLSelectElement.  Calling
getChildNodes().getLength() on either of these elements returns incorrect
values, as its calling the derived getElement() is used, not the one intended
to return the number of children.

I locally modified Xerces 1.0.2 to fix this with the enclosed patch.  It will
create an instance of an inner class implementing NodeList on the first call
to getChildNodes().  This object is saved and used on all subsequent calls to
getChildNodes().  This patch does not work with the current code, as the
structure of NodeImpl, etc, has changed.  I would be happy to port this to the
newest Xerces if there is consensus on this solution.

Mark


*** NodeContainer.java	2000/02/20 03:14:08	1.1.1.1
--- NodeContainer.java	2000/08/23 23:57:41	1.1.1.1.2.4
***************
*** 77,86 ****
   * IMPLEMENTATION DETAIL; applications should _never_ assume that
   * this identity exists.
   * <P>
   */
  public abstract class NodeContainer
!     extends NodeImpl
!     implements NodeList {
  
      /** Serialization version. */
      static final long serialVersionUID = 2815829867152120872L;
--- 77,88 ----
   * IMPLEMENTATION DETAIL; applications should _never_ assume that
   * this identity exists.
   * <P>
+  * FIXME: Implementing NodeList breaks the HTML DOM, this implementaion
+  * was modified in enhydra.
+  * <P>
   */
  public abstract class NodeContainer
!     extends NodeImpl {
  
      /** Serialization version. */
      static final long serialVersionUID = 2815829867152120872L;
***************
*** 97,103 ****
      protected transient int nodeListChanges = -1;
  
      /** Cached node list length. */
!     protected transient int nodeListLength;
  
      /** Last requested node. */
      protected transient NodeImpl nodeListNode;
--- 99,105 ----
      protected transient int nodeListChanges = -1;
  
      /** Cached node list length. */
!     protected transient int nodeListLength = -1;
  
      /** Last requested node. */
      protected transient NodeImpl nodeListNode;
***************
*** 105,110 ****
--- 107,115 ----
      /** Last requested node index. */
      protected transient int nodeListIndex;
  
+     /** NodeList object, allocated on first use */
+     private transient NodeList nodeList;
+ 
      // lazy-evaluation info
      /** Synchronization of child nodes needed. */
      protected transient boolean syncChildren;
***************
*** 112,117 ****
--- 117,184 ----
      /** Table for quick check of child insertion. */
      protected static int[] kidOK;
  
+     /**
+      * Implementation of node list.
+      */
+     private final class NodeListImpl implements NodeList {
+         /**
+          * NodeList method: Count the immediate children of this node
+          * @return int
+          */
+         public int getLength() {
+ 
+             if ((nodeListChanges != changes) || (nodeListLength == -1)) {
+                 nodeListChanges = changes;
+                 nodeListLength = 0;
+                 nodeListIndex = 0;
+                 nodeListNode = firstChild;
+                 for (NodeImpl node = firstChild; node != null; node = node.nextSibling) {
+                     nodeListLength++;
+                 }
+             }
+ 
+             return nodeListLength;
+ 
+         } // getLength():int
+ 
+         /**
+          * NodeList method: Return the Nth immediate child of this node, or
+          * null if the index is out of bounds.
+          * @return org.w3c.dom.Node
+          * @param Index int
+          */
+         public Node item(int index) {
+             // short way
+             if (nodeListChanges == changes) {
+                 if (nodeListIndex < index) {
+                     while (nodeListIndex < index && nodeListNode != null) {
+                         nodeListIndex++;
+                         nodeListNode = nodeListNode.nextSibling;
+                     }
+                 }
+                 else if (nodeListIndex > index) {
+                     while (nodeListIndex > index && nodeListNode != null) {
+                         nodeListIndex--;
+                         nodeListNode = nodeListNode.previousSibling;
+                     }
+                 }
+                 return nodeListNode;
+             }
+ 
+             // long way
+             nodeListChanges = changes;
+             nodeListLength = -1;
+             nodeListNode = firstChild;
+             for (nodeListIndex = 0; 
+                  nodeListIndex < index && nodeListNode != null; 
+                  nodeListIndex++) {
+                 nodeListNode = nodeListNode.nextSibling;
+             }
+             return nodeListNode;
+ 
+         } // item(int):Node
+     }
+ 
      //
      // Static initialization
      //
***************
*** 218,223 ****
--- 285,292 ----
      	// Need to break the association w/ original kids
      	newnode.firstChild      = null;
          newnode.lastChild       = null;
+         newnode.nodeList        = null;
+         newnode.nodeListChanges = -1;
  
      	// Then, if deep, clone the kids too.
      	if (deep) {
***************
*** 262,268 ****
          if (syncChildren) {
              synchronizeChildren();
          }
!         return this;
  
      } // getChildNodes():NodeList
  
--- 331,341 ----
          if (syncChildren) {
              synchronizeChildren();
          }
!         if (nodeList == null) {
!             // Don't need to synchronize here, doesn't hurt to allocate two.
!             nodeList = new NodeListImpl();
!         }
!         return nodeList;
  
      } // getChildNodes():NodeList
  
***************
*** 722,794 ****
  		internalInsertBefore(newChild, oldChild,MUTATION_LOCAL);
  		internalRemoveChild(oldChild,MUTATION_LOCAL);
  		
! 		if(MUTATIONEVENTS)
  		{
  		    dispatchAggregateEvents(enclosingAttr);
  		}
  		
  		return oldChild;
      }
- 
-     //
-     // NodeList methods
-     //
- 
-     /**
-      * NodeList method: Count the immediate children of this node
-      * @return int
-      */
-     public int getLength() {
- 
-         if (nodeListChanges != changes) {
-             nodeListChanges = changes;
-             nodeListLength = 0;
-             nodeListIndex = 0;
-             nodeListNode = firstChild;
-             for (NodeImpl node = firstChild; node != null; node = node.nextSibling) {
-                 nodeListLength++;
-             }
-         }
-         
-         return nodeListLength;
- 
-     } // getLength():int
- 
-     /**
-      * NodeList method: Return the Nth immediate child of this node, or
-      * null if the index is out of bounds.
-      * @return org.w3c.dom.Node
-      * @param Index int
-      */
-     public Node item(int index) {
-         // short way
-         if (nodeListChanges == changes) {
-             if (nodeListIndex < index) {
-                 while (nodeListIndex < index && nodeListNode != null) {
-                     nodeListIndex++;
-                     nodeListNode = nodeListNode.nextSibling;
-                 }
-             }
-             else if (nodeListIndex > index) {
-                 while (nodeListIndex > index && nodeListNode != null) {
-                     nodeListIndex--;
-                     nodeListNode = nodeListNode.previousSibling;
-                 }
-             }
-             return nodeListNode;
-         }
- 
-         // long way
-         nodeListChanges = changes;
-         nodeListNode = firstChild;
-         for (nodeListIndex = 0; 
-              nodeListIndex < index && nodeListNode != null; 
-              nodeListIndex++) {
-             nodeListNode = nodeListNode.nextSibling;
-         }
-         return nodeListNode;
- 
-     } // item(int):Node
  
      //
      // DOM2: methods, getters, setters
--- 795,807 ----
  		internalInsertBefore(newChild, oldChild,MUTATION_LOCAL);
  		internalRemoveChild(oldChild,MUTATION_LOCAL);
  		
! 		if(ownerDocument.mutationEvents)
  		{
  		    dispatchAggregateEvents(enclosingAttr);
  		}
  		
  		return oldChild;
      }
  
      //
      // DOM2: methods, getters, setters



Re: Problem with Xerces DOM

Posted by Assaf Arkin <ar...@intalio.com>.
Like you I don't want to see the core DOM getting bigger for the same of
the HTML DOM, but we need to find some way to get code reusability.

Currently the HTML DOM tries to build on top of the XML DOM so it
benefits from the same structure and DOM Level 2 features without
excessive code rewriting.

Any clue on how to get that done?

arkin

Arnaud Le Hors wrote:
> 
> I understand the problem but I disagree with the solution. It definitely
> fixes the problem but not the right way.
> The reason NodeImpl implements both Node and NodeList is to save memory.
> We avoid allocating an extra object and having an extra reference per
> node. I don't like the idea of making the DOM Core (XML) bigger just for
> the sake of HTML DOM, especially when most people use the former and not
> the latter.
> Instead I suggest you use a solution similar to the one you're proposing
> but in the HTML DOM implementation itself. This can be done by
> overriding the appropriate methods there. If you come up with such a
> patch I'll be happy to commit it for you!
> Thanks for your report.
> --
> Arnaud  Le Hors - IBM Cupertino, XML Technology Group
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

-- 
----------------------------------------------------------------------
Assaf Arkin                                            www.intalio.com
CTO, Intalio Inc.                                       www.exolab.org

Re: Problem with Xerces DOM

Posted by Arnaud Le Hors <le...@us.ibm.com>.
I understand the problem but I disagree with the solution. It definitely
fixes the problem but not the right way.
The reason NodeImpl implements both Node and NodeList is to save memory.
We avoid allocating an extra object and having an extra reference per
node. I don't like the idea of making the DOM Core (XML) bigger just for
the sake of HTML DOM, especially when most people use the former and not
the latter.
Instead I suggest you use a solution similar to the one you're proposing
but in the HTML DOM implementation itself. This can be done by
overriding the appropriate methods there. If you come up with such a
patch I'll be happy to commit it for you!
Thanks for your report.
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group