You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Joseph Kesselman <ke...@us.ibm.com> on 2003/03/21 19:00:56 UTC

While we're discussing XNI changes...

Actually, while we're talking about alternatives to XNI... Events are 
wonderful for UI and other realtime-driven stuff... but I am starting 
conclude that  they're the wrong model for parsing. I'm becoming more and 
more interested in genuine pull-parser APIs (essentially, treat the parser 
as an iterator with a next-node operator that either yields an accessor 
object or IS an accessor object for the node's properties).

This approach has several benefits:

1) The iterator model is a lot easier to treat as a "tokenizer", 
simplifying its use in traditional recursive-descent grammars and the like 
where next-token requests may occur in multiple places.

2) The use of an accessor object allows more scope for "lazy" evaluation. 
We already get some of those benefits by passing the list of attributes as 
an object so they can simply be skipped over if they aren't examined, so 
there might not be a great deal of gain here -- EXCEPT in the case of 
serializing some other data representation; in that situation there might 
be significant advantages to not preparing the node name (for example) 
until it's called for. In some sense, this combines the advantages of the 
DOM approach with those of the event systems; it makes writing a 
thin-layer adapter much, much easier.

3) If someone really wants an event stream, it isn't hard to write a 
driver loop which pulls nodes from the iterator and generates events. It's 
much harder, as we've seen, to take an event-based system such as SAX or 
XNI and "throttle" it to yield one event at a time.

4) The pull approach can be generalized to cover processing models other 
than parsing-in-document-order. I'm investigating using something along 
these lines to implement the other XPath Axes.


Downside: Operating as an iterator would require that the parser save 
state between calls to next-node. On the other hand, in the event approach 
we're generally asking the application code to save state between events. 
I'm not convinced that the iterator approach involves more computation; it 
certainly seems to involve less coding effort for the user.



I presume the Xerces team is already keeping an eye on this topic, since 
some other parsers have implemented true Pull system. But I figured I'd 
toss it out for brainstorming and let folks tell me where I'm mistaken... 
<smile/>


______________________________________
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more. 
"may'ron DaroQbe'chugh vaj bIrIQbej"  ("Put down the squeezebox and nobody 
gets hurt.")


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: While we're discussing XNI changes...

Posted by Andy Clark <an...@apache.org>.

Joseph Kesselman wrote:
> I presume the Xerces team is already keeping an eye on this topic, since 
> some other parsers have implemented true Pull system. But I figured I'd 
> toss it out for brainstorming and let folks tell me where I'm mistaken... 
> <smile/>

I am the Apache representative to JSR-173 which is working
on an XML pull-parsing API for Java. Granted, I joined the
process rather late but I have been following (and thinking
about) pull-parsing for quite awhile. But I have concerns
about how such a beast would be implemented in Xerces.

There are a lot of nice properties to a pull-parsing API
in regard to the application developer. However, when trying
to make a modular, configurable parser around this paradigm,
you quickly run into problems. From what I see, the easiest
way around these problems is to make your parser a single,
monolithic component (as is the case in the existing pull-
parser implementations). Something we've tried to avoid in
Xerces2...

But as this paradigm increases in popularity, though, it
is clear that we'll need to put more thought and effort
into a native Xerces implementation.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org