You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by bruno <br...@inicia.es> on 2004/10/25 20:52:09 UTC

And whats about the Andy Clark pull?

ITs not clear to me if the JAmes clark "tools" are official "Apache" or not

Thank you Kesselman

Bruno


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: And whats about the Andy Clark pull?

Posted by Andy Clark <an...@cyberneko.net>.

bruno wrote:
> Sorry. I wanted to say the "Andy Clark tools

That's alright. My first name is actually James.

I do have a simple low-level pull API based on XNI that
is part of my CyberNeko Tools for XNI, available at the
following URL:

   http://www.apache.org/~andyc/neko/

The NekoPull parser is very minimal. I designed it as a
prototype of how to implement pull at the XNI level. And
it could possibly even be used as the foundation of other
higher-level pull APIs that use Xerces2 as the parser.

Xerces2 is already implemented as a "burst push" parse,
as Joe mentioned. Because the parser is a pipeline of
components, there's no way to guarantee that one and only
one callback is performed per call to parse-next-piece.
My solution for this problem in NekoPull was to buffer
the callback information.

The good part of this approach is that I can guarantee
that even if the underlying parser bursts acouple of
events each time I call parse, the user of the pull API
will only see one at a time. And since the number of
buffered events is minimal, the performance hit isn't
too bad. Plus, you don't need multiple threads.

The bad side is that there *is* overhead in copying the
callback information in order to buffer them. The idea I've
had for a long time is to add a mechanism into Xerces2 by
which an application -- in this case, the NekoPull parser
-- could control the use of the character buffers and
other reusable structures inside of the parser. This
would obviate the need to copy things at the higher
level.

Joe made a very astute observation, though, that most
people who haven't implemented an XML parser miss: every
layer wants to be the inner loop. When pull parsing was
the new big thing, the obvious question was why not re-
implement the Xerces parser as pull on the inside to
take advantage of the performance benefits.

At that time, I went over it many times in my head but
the same problem kept recurring -- the pull approach
breaks down when you want to build a modular, component-
based parser. The overhead of making every component a
pull-parser and having to manage that was too difficult.

It easier to do a push parser with some buffering or a
threaded throttle on top of the Xerces2 "push" parser,
just as Steve mentioned. I take the former approach in
my NekoPull parser.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: And whats about the Andy Clark pull?

Posted by bruno <br...@inicia.es>.

Sorry. I wanted to say the "Andy Clark tools

Bruno

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org