You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by bruno <br...@inicia.es> on 2004/10/25 20:52:09 UTC
And whats about the Andy Clark pull?
ITs not clear to me if the JAmes clark "tools" are official "Apache" or not
Thank you Kesselman
Bruno
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: And whats about the Andy Clark pull?
Posted by Andy Clark <an...@cyberneko.net>.
bruno wrote:
> Sorry. I wanted to say the "Andy Clark tools
That's alright. My first name is actually James.
I do have a simple low-level pull API based on XNI that
is part of my CyberNeko Tools for XNI, available at the
following URL:
http://www.apache.org/~andyc/neko/
The NekoPull parser is very minimal. I designed it as a
prototype of how to implement pull at the XNI level. And
it could possibly even be used as the foundation of other
higher-level pull APIs that use Xerces2 as the parser.
Xerces2 is already implemented as a "burst push" parse,
as Joe mentioned. Because the parser is a pipeline of
components, there's no way to guarantee that one and only
one callback is performed per call to parse-next-piece.
My solution for this problem in NekoPull was to buffer
the callback information.
The good part of this approach is that I can guarantee
that even if the underlying parser bursts acouple of
events each time I call parse, the user of the pull API
will only see one at a time. And since the number of
buffered events is minimal, the performance hit isn't
too bad. Plus, you don't need multiple threads.
The bad side is that there *is* overhead in copying the
callback information in order to buffer them. The idea I've
had for a long time is to add a mechanism into Xerces2 by
which an application -- in this case, the NekoPull parser
-- could control the use of the character buffers and
other reusable structures inside of the parser. This
would obviate the need to copy things at the higher
level.
Joe made a very astute observation, though, that most
people who haven't implemented an XML parser miss: every
layer wants to be the inner loop. When pull parsing was
the new big thing, the obvious question was why not re-
implement the Xerces parser as pull on the inside to
take advantage of the performance benefits.
At that time, I went over it many times in my head but
the same problem kept recurring -- the pull approach
breaks down when you want to build a modular, component-
based parser. The overhead of making every component a
pull-parser and having to manage that was too difficult.
It easier to do a push parser with some buffering or a
threaded throttle on top of the Xerces2 "push" parser,
just as Steve mentioned. I take the former approach in
my NekoPull parser.
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: And whats about the Andy Clark pull?
Posted by bruno <br...@inicia.es>.
Sorry. I wanted to say the "Andy Clark tools
Bruno
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org