You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Erik Onnen <EO...@c-bridge.com> on 2001/03/26 22:19:38 UTC

Xalan Scalability and Extensibility

All:

Sorry if these topics have already been discussed and sorry for the long
email. I tried searching the archives (manually, what happened to
searching?) and didn't find much so I thought I would raise a few questions.
If there is a thread(s) covering these issues please point me in the right
direction. I have three general questions:

1) Has anyone successfully used a custom built SAX parser for
transformations using Xalan (or any other processor for that matter) as
opposed to first creatating a DOM tree then traversing? By "custom", I mean
a parser that simulates the callbacks to the ContentHandler (or in the case
of TRaX the TransformerHandler) as if it were reading a document when the
actual underlying data is contained in something that is non-XML based and
not in a hierarchial node structure like a Java ResultSet. I got the idea
for this from Michael Kay's XSLT book and made it work for my own purposes,
but I ran into a load of problems making the custom implementations fit the
JAXP/TRaX framework. 

For example, the abstract SAXParser (necessary for using a SAXParserFactory
implementation) requires that the parse method take an InputSource and a
DefaultHandler. Unfortunately, the InputSource interface doesn't really fit
for reading from a ResultSet with the closest fit being getByteStream()
leaving you to researilize the result set object in your implementation (no
thanks). The current implementation of the JAXP SAXParser seems heavily
geared to reading in files which seems extremely limited to me, anybody else
feel this way? 

Additionally, the base SAXParser implementation in the JAXP source takes the
DefaultHandler passed into the parse method and blows away any
ContentHandler you have already established (i.e. the TransformerHandler).
Why does a parser expect a DefaultHandler object? Shouldn't it be a little
more flexible accepting ContentHandler and ErrorHandler implementations, not
concrete implementations of several interfaces in a single object? In the
end I had to overload virtually every method in the DefaultHandler and the
SAXParser to prevent developer error and the actual application has to
downcast all the way to the specific parser implementation to properly use
the parser, the JAXP framework turned out to be a hindrance, not a
timesaver. The upside was that the SAXParser outperforms the DOM parser in
linear tests by several orders of magnitude, not to mention saving time in
not having to create big, bulky DOM trees.


2) What is the current feel on how Xalan (j-2.0.1 with Xerces 1.2.3
specifically) scales in a multi-threaded load. In my own linear testing, it
seems to perform very well,  actually improving quite nicely as the hotspot
optimizations kick in. But, when I run multiple concurrent threads, the
processing  crawls. I am using the parser I discussed above and setting the
ContentHandler in the XMLReader to be the TransformationHandler derived from
the templates object, there is no recompilation of the stylesheets. I can
identify that the parser is not the bottleneck by forcing the threads to
queue for a single parser and timing the total thread run time versus the
transformation time. If there is only a single parser in the pool, the
transformation performs the same as in the linear tests (although the mean
thread execution crawls as they all have to wait for that one parser). But,
if I add 30 parsers to the pool and have 30 templates creatating new
TransformerHandlers, the transformation time takes forever. Anybody else
seen this behavior or have I missed something? 

Again, sorry for the long email and I appreciate any thoughts.

-Erik