You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by David Hoffer <da...@iserv.net> on 2002/11/23 01:04:24 UTC

DOM search optimization questions

I have the following DOM structure.  My application has to make repeated
requests to find the Label element value for each Patch element in a Side
element.  I may have up to 2000 Patches per Side.  All my application knows
is the Side Index and the RowIndex & ColIndex.  Because these are
attributes, I am getting a list of Patch elements and then for each Patch, I
get the list of attributes and then I read each attribute value to see if it
matches the one I am looking for.

Needless to say, this is in-efficient.  My requests take quite a long time
on a fast PC.  I am wondering if I either have bad search code(see below) or
should I not store the data this way?  I have control over both.  It seems
it would be faster to search if I changed the Patch element name to contain
the RowIndex and ColIndex, such as Patch_Row#_Col#.

Can anyone give me some insight into the best way of handling this problem?

<Side Index="0">
	<NumRows>21</NumRows>
	<NumCols>55</NumCols>
	<Patch RowIndex="20" ColIndex="0">
		<Label>1101</Label>
	</Patch>
	<Patch RowIndex="20" ColIndex="1">
		<Label>1102</Label>
	</Patch>
	<Patch RowIndex="20" ColIndex="2">
		<Label>1103</Label>
	</Patch>
...lots more Patch elements...
</Side>
<Side Index="1">
	<NumRows>21</NumRows>
	<NumCols>55</NumCols>
	<Patch RowIndex="20" ColIndex="0">
		<Label>1101</Label>
	</Patch>
	<Patch RowIndex="20" ColIndex="1">
		<Label>1102</Label>
	</Patch>
	<Patch RowIndex="20" ColIndex="2">
		<Label>1103</Label>
	</Patch>
...lots more Patch elements...
</Side>
...lots more Side elements...

Code...
DOMNode* CTarget::GetPatchNode(DOMNode* pSideNode, int iRowIndex, int
iColIndex)
{
	DOMNodeList* pPatchNodeList =
((DOMElement*)pSideNode)->getElementsByTagNameNS(NULL, L"Patch");
	if (pPatchNodeList)
	{
		int iNumPatchNodes = pPatchNodeList->getLength();

		for (XMLSize_t p=0; p<iNumPatchNodes; p++)
		{
			DOMNode* pPatchNode = pPatchNodeList->item(p);
			if (pPatchNode)
			{
				bool bRowFound = false;
				bool bColFound = false;

				DOMNamedNodeMap* pDOMNamedNodeMap = pPatchNode->getAttributes();

				int iNumAttributes = pDOMNamedNodeMap->getLength();

				for (XMLSize_t a=0; a<iNumAttributes; a++)
				{
					DOMNode* pDOMNode = pDOMNamedNodeMap->item(a);
					if (pDOMNode)
					{
						const XMLCh* pwszNodeName = pDOMNode->getNodeName();

						if (wcscmp(L"ColIndex", pwszNodeName) == 0)
						{
							const XMLCh* wszNodeValue = pDOMNode->getNodeValue();
							ASSERT(wszNodeValue);

							if (_ttoi(wszNodeValue) == iColIndex)
								bColFound = true;
							else
								break;
						}

						if (wcscmp(L"RowIndex", pwszNodeName) == 0)
						{
							const XMLCh* wszNodeValue = pDOMNode->getNodeValue();
							ASSERT(wszNodeValue);

							if (_ttoi(wszNodeValue) == iRowIndex)
								bRowFound = true;
							else
								break;
						}

						if (bRowFound && bColFound)
							return pPatchNode;
					}
				}
			}
		}
	}

	return NULL;
}


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOM search optimization questions

Posted by Gareth Reakes <ga...@decisionsoft.com>.
Hi,

	Ive added a couple of comments below. Its worth trying 
DOMNodeFilter. You could write one with the filtering criteria in it. I'm 
not sure how much faster it would be.


Gareth

> Needless to say, this is in-efficient.  My requests take quite a long time
> on a fast PC.  I am wondering if I either have bad search code(see below) or
> should I not store the data this way?  I have control over both.  It seems
> it would be faster to search if I changed the Patch element name to contain
> the RowIndex and ColIndex, such as Patch_Row#_Col#.

Sounds like it would be a bit faster to me as well.

> Code...
> DOMNode* CTarget::GetPatchNode(DOMNode* pSideNode, int iRowIndex, int
> iColIndex)
> {
> 	DOMNodeList* pPatchNodeList =
> ((DOMElement*)pSideNode)->getElementsByTagNameNS(NULL, L"Patch");

This does a recursive search. Better to just iterate over the children of 
1 element if you know that the patch elements are all at one level.


> 	if (pPatchNodeList)
> 	{
> 		int iNumPatchNodes = pPatchNodeList->getLength();
> 
> 		for (XMLSize_t p=0; p<iNumPatchNodes; p++)
> 		{
> 			DOMNode* pPatchNode = pPatchNodeList->item(p);
> 			if (pPatchNode)
> 			{
> 				bool bRowFound = false;
> 				bool bColFound = false;
> 
> 				DOMNamedNodeMap* pDOMNamedNodeMap = pPatchNode->getAttributes();
> 
> 				int iNumAttributes = pDOMNamedNodeMap->getLength();
> 
> 				for (XMLSize_t a=0; a<iNumAttributes; a++)
> 				{
> 					DOMNode* pDOMNode = pDOMNamedNodeMap->item(a);
> 					if (pDOMNode)
> 					{
> 						const XMLCh* pwszNodeName = pDOMNode->getNodeName();


Could you enforce the attributes exist with a DTD? If so then you could 
speed this up by just checking the first letter of the first attribute. If 
you only have one attribute then you could forgo this check. It follows 
that you could rewrite this to eliminate the getLength call.

> 
> 						if (wcscmp(L"ColIndex", pwszNodeName) == 0)
> 						{
> 							const XMLCh* wszNodeValue = pDOMNode->getNodeValue();
> 							ASSERT(wszNodeValue);
> 
> 							if (_ttoi(wszNodeValue) == iColIndex)
> 								bColFound = true;
> 							else
> 								break;
> 						}
> 
> 						if (wcscmp(L"RowIndex", pwszNodeName) == 0)
> 						{
> 							const XMLCh* wszNodeValue = pDOMNode->getNodeValue();
> 							ASSERT(wszNodeValue);
> 
> 							if (_ttoi(wszNodeValue) == iRowIndex)
> 								bRowFound = true;
> 							else
> 								break;
> 						}
> 
> 						if (bRowFound && bColFound)
> 							return pPatchNode;
> 					}
> 				}
> 			}
> 		}
> 	}
> 
> 	return NULL;
> }
> 

Gareth

-- 
Gareth Reakes, Head of Product Development  
DecisionSoft Ltd.            http://www.decisionsoft.com
Office: +44 (0) 1865 203192



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org