You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2001/08/24 06:33:42 UTC

[Xerces2] Pull Parsing Summary and Sample

In case you weren't following the "Pull Parsing" thread, here
is where we stand on pull parsing:

1) I made sure that the Xerces2 reference implementation of
   the document scanner and DTD scanner could be driven in
   a pull parser fashion.

2) I added an XMLPullParserConfiguration interface to XNI.
   This interface extends the XMLParserConfiguration to add
   methods appropriate for pull parsing.

3) I updated the Xerces2 standard parser configuration to
   implement the XMLPullParserConfiguration.

The current open issues are the following:

A) The breakdown of the pull parsing callbacks. At the
   moment you aren't guaranteed to have at least one
   callback per step. You may have zero or you may have
   multiple callbacks per step.

   In some cases, we can't avoid this because stages
   downstream from the scanner may introduce additional
   events. For example, the namespace binder adds the
   start/endPrefixMapping events to event stream.

   In other cases, like in the DTD scanning, the scanner
   implementation isn't broken out enough to allow proper
   pull parsing. So in this case you get a lot of callbacks
   per step. However, since pull parsing the DTD is not the
   most common use case, the DTD will be scanned completely
   when called by the document scanner, even when the
   document is being pull parsed. (Make sense?)

B) Should we have a Xerces2 PullParser class? And what API
   should it have?

C) Anything else?

Anyway, to highlight how the pull parsing mechanism can be
used, I've attached a sample program. This program uses the
xni.DocumentTracer sample so that you can see the events
coming through each step. So run the program specifying
your own XML files on the command line to see how it works.

The next beta release (maybe later today) will include the
changes I've made for pull parsing in Xerces2. So those
people who are interested in that should use Xerces 2.0.0
(beta2) [when it's available] to start working with the 
pull parsing capability. Enjoy!

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org