You are viewing a plain text version of this content. The canonical link for it is here.

Posted to c-users@xalan.apache.org by "Themmen, Joel" <Jo...@naviplan.com> on 2002/11/25 21:38:18 UTC

Is Xalan the appropriate tool?

Hi,

	I will give a little back ground to supply some context. Please be
patient with the OT references to Xerces since I really want to use both
products. 

I need a cross-OS XML parser and XPath evaluator. We originally used the
MSXML parser but that obviously was not available on non-Microsoft OS's. We
have used Xerces internally for a number of other projects and are quite
pleased with it. The only lacking has been that we had to write an internal
XPath parser. This was not fun since the specification is relatively large
and complex. We chose to implement a small section of the specification. It
is OK but that is all.

	I would really like to use Xalan to do the XPath evaluation (due to
completeness and (hopefully) robustness). In essence, we create need a DOM
document since we are doing many manual modifications of the document. We
use XPath to find nodes and then we may modify the node(s) - or add nodes-
remove nodes - etc. In other words, we really need to use the DOM
representation of the XML document since we do so much manual work on it.
Can I do this successfully with Xerces/Xalan? Is this the correct tool set?

	Issues that I have had:

	I cannot use the native Xerces/Xalan data structures since we have
an internal wrapper that I must comply to. We wrap the Xerces/Xalan data
structures in structures that are already in use within our code base. Is
this going to cause problems (it should not but perhaps someone has more
experience in this than I do) due to destructors that must be called? 

	Memory leaks - I have been using the Xerces code plus my XPath
parser and I am experiencing quite a few memory leaks. I believe that the
vast majority of leaks are of my own making however I am unclear on a few of
the basic premises of the Xerces/Xalan documents. In particular parsers and
documents:

		1.)	I will need to create a approximately 15 documents
in any one instantiation. Should I be using the same the same parser for
each a every document and then freeing the parser as I leave the code?

		Should I be calling doc->release as soon as I am done with a
document?

		2.) 	What does parser->resetDocumentPool() really do?
Does it free all documents created by that parser (even if doc->release()
has not been called?

	Can I create a Xalan document and modify it as I would a Xerces
document? Or should I be creating a Xerces document?

	I am sorry that this has wandered off topic but it seems that many
of my questions need input from both Xerces and Xalan. I will post over
there if I need to but would prefer to get scolded here :) prior to cross
posting,

Thanks in advance for help/pointers,

Joel

Re: Is Xalan the appropriate tool?

Posted by David N Bertoni/Cambridge/IBM <da...@us.ibm.com>.

"Themmen, Joel" <Jo...@naviplan.com> wrote:
> I need a cross-OS XML parser and XPath evaluator. We originally used the
> MSXML parser but that obviously was not available on non-Microsoft OS's.
We
> have used Xerces internally for a number of other projects and are quite
> pleased with it. The only lacking has been that we had to write an
internal
> XPath parser. This was not fun since the specification is relatively
large
> and complex. We chose to implement a small section of the specification.
It
> is OK but that is all.
>
> I would really like to use Xalan to do the XPath evaluation (due to
> completeness and (hopefully) robustness). In essence, we create need a
DOM
> document since we are doing many manual modifications of the document. We
> use XPath to find nodes and then we may modify the node(s) - or add
nodes-
> remove nodes - etc. In other words, we really need to use the DOM
> representation of the XML document since we do so much manual work on it.
> Can I do this successfully with Xerces/Xalan? Is this the correct tool
set?

There is an implementation of a wrapper around the Xerces DOM in Xalan.  It
does allow you to evaluate XPath expressions using the Xerces DOM.  The
biggest problem is that, for large trees, it can be inefficient.  Your
processing model adds more inefficiency, because you want to modify the
DOM.  That means the wrapper cannot cache any information about it's child,
parent, or sibling nodes, and that it cannot be indexed.  Depending on the
kinds of expressions you're going to evaluate, and the size of the
document, that can be very slow, since node-sets must be returned in
document order.

There will be an interim build of Xalan coming in the next week or two,
which will have some fixes to the wrapper layer, and a more efficient
mapping of nodes between the Xerces DOM and Xalan's internal interfaces.
You might want to look for that and give it a test run.

> I cannot use the native Xerces/Xalan data structures since we have
> an internal wrapper that I must comply to. We wrap the Xerces/Xalan data
> structures in structures that are already in use within our code base. Is
> this going to cause problems (it should not but perhaps someone has more
> experience in this than I do) due to destructors that must be called?

I have no idea -- can you give more information about which native data
structures you cannot use and what you're replacing them with?

> Memory leaks - I have been using the Xerces code plus my XPath
> parser and I am experiencing quite a few memory leaks. I believe that the
> vast majority of leaks are of my own making however I am unclear on a few
of
> the basic premises of the Xerces/Xalan documents. In particular parsers
and
> documents:
>
>   1.) I will need to create a approximately 15 documents in any one
> instantiation. Should I be using the same the same parser for
> each a every document and then freeing the parser as I leave the code?
>
> Should I be calling doc->release as soon as I am done with a
> document?

I suspect you should.  The memory a document uses is never release until
you release the document itself.

> 2.) What does parser->resetDocumentPool() really do?
> Does it free all documents created by that parser
> (even if doc->release() has not been called?

You should ask this on the Xerces list, but I think it destroys any
documents the parser has created, unless you've specifically adopted then
yourself, through XercesDOMParser::adoptDocument().  You could also take a
look at the source code to confirm this.

> Can I create a Xalan document and modify it as I would a Xerces
> document? Or should I be creating a Xerces document?

No.  Since XSLT views the source tree as immutable, Xalan's default
implementation is read-only.  This makes it more efficient for many things,
but precludes allowing modification.

Dave