You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by "Altman, Drora" <da...@nds.com> on 2009/07/15 14:22:48 UTC
Right usage of xerces
Hi,
We have a project that uses xerces (2.8.0) in order to parse xml files. We allocate the data as a DOM object in memory and work on it (adding/deleting nodes).
The xml are quiet big.
It seems that we miss something in the right usage of xerces, since the application consumes a lot of memory and it does not seem to be released properly.
During our investigation of the code, I realized that the code uses the importNode function, which, as far as I understand, allocates the memory. This brings me to the following questions:
1. What's the difference between the importNode() & cloneNode() functions?
2. How can we actually delete the memory allocated by import\cloneNode if we delete this node afterwards?
3. What are your general recommendations, regarding cleanup memory at the end of the usage of xerces?
Thanks in advance,
Drora Altman
________________________________
This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmaster@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes.
To protect the environment please do not print this e-mail unless necessary.
An NDS Group Limited company. www.nds.com
Re: Right usage of xerces
Posted by David Bertoni <db...@apache.org>.
Altman, Drora wrote:
> Hi,
>
> I tried to do what suggested to me, adopted the document and released the parser, but it didn't help much to reduce the memory consumption.
> So I made the following trials which didn't succeed too much. Could you please help me figure out what I'm doing wrong and how should I do it correctly?
> 1. I tried to import the root node and release the document.
> After the call to DOMDocumentImpl::release() some memory is released, but:
> a. not all memory (used memory is more than what was before starting to use xerces)
This could be memory being held by the C++ run-time heap, which will be
used for later allocations. You might check your pl
> b. after releasing the memory, I couldn't use the root node anymore since it's corrupted after the document's released.
Yes, because releasing the document destroys everything. What I
suggested you do to compact a document is to clone the entire document:
newdoc = static_cast<DOMDocument*>(doc->cloneNode(true));
doc->release;
doc = newdoc;
> 2. I saw that there is an adoptNode() function, so I tried to use it- adopt the root node and than release
> the document.
> However, when trying this I got an exception, so I looked at the source code and found that in xerces 2.8.0 adoptNode() has the following implementation:
>
> DOMNode* DOMDocumentImpl::adoptNode(DOMNode*) {
> throw DOMException(DOMException::NOT_SUPPORTED_ERR, 0, getMemoryManager());
> return 0;
> }
>
> I looked at the source code of xerces 3.0.1 which has different implementation to that function, however, it didn't seem to do what I expected it to do (bottom line is, that using it didn't help either)
adoptNode() is used to bring a node in from another document, which
Xerces-C doesn't support.
>
> - Eventually, I don't know how can I release the memory allocated by xerces....
You usually cannot control on the C++ run-time heap relinquishes memory
to the operating system. You might want to investigate what sort of heap
control and debugging functions exist on your platform to see if there
are diagnostics that will help you figure out memory usage.
>
> Another issue that I realized is: I saw that we use (quite a lot) the function:
> DOMNodeList *DOMElementImpl::getElementsByTagName(const XMLCh *tagname) const
> I wondered when (and by whom) should the returned DOMNodeList* be deleted, I saw an answer regarding this issue in the following link:
> http://www.mail-archive.com/c-dev@xerces.apache.org/msg02511.html
> but didn't understand what is the meaning of: Memory for any returned object ... are owned by implementation - whose implementation?
> I think that releasing these lists after we used finished using them might help us as well (I saw that the fNodeListPool is cleaned up when the document is deleted, but it is not explicitly deleted in any place I could find)
These are kept in a pool within the document, so you cannot release
them. If you are calling this often with different tag names, that could
explain some increased memory usage. You might want to file an
enhancement request that allows for clearing this pool. If you use the
approach I outlined of cloning the entire document, that will take care
of cleaning up this pool.
Dave
RE: Right usage of xerces
Posted by "Altman, Drora" <da...@nds.com>.
Hi,
I tried to do what suggested to me, adopted the document and released the parser, but it didn't help much to reduce the memory consumption.
So I made the following trials which didn't succeed too much. Could you please help me figure out what I'm doing wrong and how should I do it correctly?
1. I tried to import the root node and release the document.
After the call to DOMDocumentImpl::release() some memory is released, but:
a. not all memory (used memory is more than what was before starting to use xerces)
b. after releasing the memory, I couldn't use the root node anymore since it's corrupted after the document's released.
2. I saw that there is an adoptNode() function, so I tried to use it- adopt the root node and than release
the document.
However, when trying this I got an exception, so I looked at the source code and found that in xerces 2.8.0 adoptNode() has the following implementation:
DOMNode* DOMDocumentImpl::adoptNode(DOMNode*) {
throw DOMException(DOMException::NOT_SUPPORTED_ERR, 0, getMemoryManager());
return 0;
}
I looked at the source code of xerces 3.0.1 which has different implementation to that function, however, it didn't seem to do what I expected it to do (bottom line is, that using it didn't help either)
- Eventually, I don't know how can I release the memory allocated by xerces....
Another issue that I realized is: I saw that we use (quite a lot) the function:
DOMNodeList *DOMElementImpl::getElementsByTagName(const XMLCh *tagname) const
I wondered when (and by whom) should the returned DOMNodeList* be deleted, I saw an answer regarding this issue in the following link:
http://www.mail-archive.com/c-dev@xerces.apache.org/msg02511.html
but didn't understand what is the meaning of: Memory for any returned object ... are owned by implementation - whose implementation?
I think that releasing these lists after we used finished using them might help us as well (I saw that the fNodeListPool is cleaned up when the document is deleted, but it is not explicitly deleted in any place I could find)
Thanks in advance,
Drora Altman
-----Original Message-----
From: David Bertoni [mailto:dbertoni@apache.org]
Sent: Wednesday, July 15, 2009 9:03 PM
To: c-users@xerces.apache.org
Subject: Re: Right usage of xerces
Altman, Drora wrote:
> Hi,
>
> We have a project that uses xerces (2.8.0) in order to parse xml files. We allocate the data as a DOM object in memory and work on it (adding/deleting nodes).
> The xml are quiet big.
>
> It seems that we miss something in the right usage of xerces, since the application consumes a lot of memory and it does not seem to be released properly.
> During our investigation of the code, I realized that the code uses the importNode function, which, as far as I understand, allocates the memory. This brings me to the following questions:
>
> 1. What's the difference between the importNode() & cloneNode() functions?
importNode() essentially clones a node from another document so you can add it to the target document.
> 2. How can we actually delete the memory allocated by import\cloneNode if we delete this node afterwards?
All of the memory for nodes is allocated from a pool owned by the document instance. There is no way to recover memory for individual nodes.
> 3. What are your general recommendations, regarding cleanup memory at the end of the usage of xerces?
If your usage model involves creating and releasing lots of nodes, you might want to consider "compacting" a document by cloning the entire document node and releasing the original document. If you created the original document using a parser, remember the document instance itself is owned by the parser, unless you call adoptDocument() on the parser instance.
Dave
This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmaster@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes.
To protect the environment please do not print this e-mail unless necessary.
An NDS Group Limited company. www.nds.com
Re: Right usage of xerces
Posted by David Bertoni <db...@apache.org>.
Altman, Drora wrote:
> Hi,
>
> We have a project that uses xerces (2.8.0) in order to parse xml files. We allocate the data as a DOM object in memory and work on it (adding/deleting nodes).
> The xml are quiet big.
>
> It seems that we miss something in the right usage of xerces, since the application consumes a lot of memory and it does not seem to be released properly.
> During our investigation of the code, I realized that the code uses the importNode function, which, as far as I understand, allocates the memory. This brings me to the following questions:
>
> 1. What's the difference between the importNode() & cloneNode() functions?
importNode() essentially clones a node from another document so you can
add it to the target document.
> 2. How can we actually delete the memory allocated by import\cloneNode if we delete this node afterwards?
All of the memory for nodes is allocated from a pool owned by the
document instance. There is no way to recover memory for individual nodes.
> 3. What are your general recommendations, regarding cleanup memory at the end of the usage of xerces?
If your usage model involves creating and releasing lots of nodes, you
might want to consider "compacting" a document by cloning the entire
document node and releasing the original document. If you created the
original document using a parser, remember the document instance itself
is owned by the parser, unless you call adoptDocument() on the parser
instance.
Dave