You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-dev@axis.apache.org by Ted Leung <tw...@sauria.com> on 2001/08/14 08:34:40 UTC

Re: [Xerces2] Pull Parsing

I'm copying axis-dev in case they still care.
----- Original Message -----
From: "Andy Clark" <an...@apache.org>
To: <xe...@xml.apache.org>
Sent: Sunday, August 12, 2001 10:35 PM
Subject: [Xerces2] Pull Parsing

> [I'm forwarding this message from Ted to the mailing list.]
>
> Ted Leung wrote:
> > I sat down to work on a pull parser atop X2 today, and realized that
> > parseSome, etc are no longer exposed.  As far as I can tell, they got
pushed
> > down into an argument on XMLDocumentScanner.scanDocument.  It seems to
me
> > that the only way to write a pull parser is to create a new parser
> > configuration.  Am I missing something?  If not, I'll start a thread on
this
> > in xerces-j-dev.
>
> Ted, you're not missing anything. You've realized a deficiency
> in XNI. While it *is* possible to write a pull parser, you are
> right that you would have to write a new parser configuration.

Ok, it's nice to know I'm not getting too old...

> While we now have the ability to do pull-parse scanning through
> the new document and DTD scanners, this functionality is hidden
> behind the single parse(XMLInputSource) method in the parser
> configuration. Therefore, I think we need to make a minor change
> to the XMLParserConfiguration interface. So the one method
>
>   parse(XMLInputSource):void
>
> should become the two
>
>   setInputSource(XMLInputSource)
>   parseDocument(boolean):boolean
>
> Which would then cascade to the base parser implementations. And
> the current DOM and SAX parsers which have the parse(InputSource)
> method would first call setInputSource and then call parseDocument
> with a true value to tell the configuration to parse completely.
>
> Whatcha think?

This would be fine by me, because it would solve my problem.  But here's
my concern.  Does it make sense to surface all of these kinds of details up
through XNI?  Or does it make sense to solve this some other way, like via
an object returned as a property

> Ted, have you put some thought into what kind of API that should
> be on a Xerces2 based pull-parser? I would like to see an API
> that is simple enough to use for pull parsing but can communicate
> all of the information that XNI provides through its handler
> interfaces.

I've looked at XPP and KXML as alternative pull parsers.

XPP returns an int based typecode and then you get to call one of
a number of methods and supply a data struture to be filled in.  It's very
C like, and because of that, it's likely to be efficient in nice ways.

KXML returns objects for start and end tags, and it fills in the parent
child relationships for those objects as it goes.

I like the XPP approach in a lot of ways, because it's likely to be
efficient.
But I think the usage is kind of ugly.  I completely agree that it should be
based on
XNI.   I'm still debating over callbacks vs objects.  I'm used to callbacks,
but a
number of people that I've polled don't want to deal with callbacks, they
want
to deal with objects.  But I'm not sure that makes sense.  The nice thing
about a
pull parser is that you can pass the parser around to parts of your program
and
the parts that know what they need from it can ask the parser to get them.
That kind
of gets you out of encoding huge state machines into SAX handlers.

If anybody over in Axis still remembers or cares, I'd love to have some
input
on what kind of API is desirable.

> --
> Andy Clark * IBM, TRL - Japan * andyc@apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>