You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Mike Morearty <mm...@macromedia.com> on 2004/07/21 21:27:10 UTC

DOMNodeListImpl performance

I am new to this alias.  My team has been using Xerces for a month or two.

One of the developers in my group wrote some code like this (I'm writing
this on the fly, so please ignore any typos):

	DOMNodeList* children = myelem->getChildNodes();
	int length = children->getLength();
	for (int i=0; i<length; ++i)
	{
		DOMNode* child = children->item(i);
		// process child ...
	}

This turned out to a surprisingly big performance bottleneck.  When we
investigated, we realized that Xerces stores lists of children in a linked
list, so the above code is O(n^2).  ("n" for my "for" loop, times "n" for
the loop inside the implementation of the item() function.)

We figured out the workaround -- which I found references to when searching
the archives of this alias -- which is to use getFirstChild() and
getNextSibling() instead.  That worked fine.  But the reason I am writing to
the alias is to request that DOMNodeListImpl be changed to optimize for this
very common scenario.

The gist of the optimization would be:

* item(n) would see if "n" is close to the previously requested item.  In
almost all cases, "n" will be either equal to, one less than, or one greater
than the previously requested item.  In that case, it would call
getPreviousSibling() or getNextSibling() as needed.
* getLength() would cache its result.
* If the node-list changes, the DOMNodeListImpl would discard (or adjust)
its cached information.

I see that you already have similar optimization code in
DOMDeepNodeListImpl.  If DOMNodeList could do something along the same
lines, that would be great.

There are two reasons this optimization would be worthwhile:

(1) Other people might write code like the code shown above, and never
discover -- as we did -- that it is slowing their program down.

(2) Perhaps a more convincing argument: This characteristic of
DOMNodeListImpl means that if I want to make sure my code is efficient, it
is not possible for me to pass a "DOMNodeList*" to someone, and have them
use it as a nice, clean interface.

Here is what I mean by that second point.  Let's say I wanted this:

	void DoSomethingUseful(DOMNodeList* nodeList);

Well, the only functions that DOMNodeList exposes -- as per the W3C DOM spec
-- are getLength() and item().  But that means that I *cannot* use the
getFirstChild()/getNextSibling() technique here.

Nor can I do this inside DoSomethingUseful:

	void DoSomethingUseful(DOMNodeList* nodeList)
	{
		DOMNode* n = nodeList->item(0); // get first
		while (n)
		{
			// process node

			n = n->getNextSibling(); // bad code!
		}
	}

That will definitely not work -- a DOMNodeList does not necessarily contain
a list of siblings.  For example, getElementsByName() returns a DOMNodeList
where the members are not siblings.

Because of the performance issues with DOMNodeListImpl, in a way that makes
DOMNodeList a not-very-useful interface.  Essentially it's a bad idea for
anyone to ever call the Xerces getChildNodes(), unless all they want to do
is count the children.

Thanks  - Mike Morearty

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOMNodeListImpl performance

Posted by Michael Glavassevich <mr...@apache.org>.
Hello Mike,

What you describe has existed for quite some time in Xerces-J's DOM 
implementation. See nodeListItem() and nodeListGetLength() [1].

[1] http://cvs.apache.org/viewcvs.cgi/xml-xerces/java/src/org/apache/xerces/dom/ParentNode.java?view=markup

On Wed, 21 Jul 2004, Mike Morearty wrote:

> The gist of the optimization would be:
>
> * item(n) would see if "n" is close to the previously requested item.  In
> almost all cases, "n" will be either equal to, one less than, or one greater
> than the previously requested item.  In that case, it would call
> getPreviousSibling() or getNextSibling() as needed.
> * getLength() would cache its result.
> * If the node-list changes, the DOMNodeListImpl would discard (or adjust)
> its cached information.
>

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org