You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by James M Snell <ja...@us.ibm.com> on 2001/03/22 18:51:41 UTC

Re: Reasons for pull parser for streaming

I think that a perfect way to resolve this would be to mature and simplify 
the interface of the Xerces pull-parser to make it more directly usable. 
Layer the Sax layer on top of that, and the DOM layer on top of that. I'd 
gladly work with anybody on the Xerces team to isolate what our 
requirements are for the pull-parser and work with them on interface 
issues to get something put together.   Any takers?

- James Snell
     Software Engineer, Emerging Technologies, IBM
     jasnell@us.ibm.com (online)
     jsnell@lemoorenet.com (offline)

Please respond to axis-dev@xml.apache.org 
To:     axis-dev@xml.apache.org
cc: 
Subject:        Reasons for pull parser for streaming



 Hello Sanjiva. 8-)

 I changed the subject to the best fit for this message. 8-)

 I think a move from DOM to JDOM in a codebase that is so far mostly
experimental is of little impact.

 I agree that moving from a tree to a stream is a bigger issue, we
just thought that the requirements stated that we need to be able to
do some kind of parsing-by-parts (not having the entire message in
memory) so we cannot use tree implementations. Even though somebody
said that DOM is just an interface, I think that this interface
implies having the whole message in memory if we ever touch the last
element in the tree.

 As to SAX: I object to using SAX as the base of Axis because once
it's SAX it can't change. SAX implies a very strange pipelined
streaming programming model, whereas a pull stream parser (as well as
tree parsers) is fully controlled by the application.

 Yes the purpose of v3 is dealing with the hard parts, but the hard
part is streaming, and if it can be handled with a pull-style parser
as well as with SAX, I vote for pulling.

 Also SAX can be easily built on top of anything else, tree parsers
can be built on top of anything else, pull parser cannot easily be
built on top of SAX (OTOH it can be easily built on top of a tree).

                            Jacek Kopecky
                               Idoox



On Thu, 22 Mar 2001, Sanjiva Weerawarana wrote:

 > I'd like to register a concern with the process that the Axis team
 > is using. I am not an active participant, but I do have much interest
 > in Axis.
 >
 > There was a long discussion early on about streaming / non-streaming
 > and then the impl was done with DOM. The after the face-to-face 
involving
 > a small group, suddenly it was switched to JDOM. Now James' note says
 > that we've pretty much decided to go to something else. Then Glen says
 > "um, I didn't agree with that". Given the recent vote for Glen as the
 > project manager (which I fully support with a +100), that's a major
 > breakdown in the process .. the note even went to Xerces folks!
 >
 > ARGH.
 >
 > Speaking for myself, I don't view this process as being very open
 > or very constructive. The DOM -> JDOM stuff was done without much open
 > discussion. Now there seems to be an effort to move from JDOM without
 > much (any?) open discussion.
 >
 > Can we please open up a bit more?
 >
 > Architecturally, I'd like to see us using SAX (and DOM, where needed).
 > I know using SAX is hard as hell for this, but one of the main reasons
 > for the total re-write is to address hard issues.
 >
 > Sanjiva.
 >
 > ----- Original Message -----
 > From: <gd...@allaire.com>
 > To: <ax...@xml.apache.org>
 > Sent: Wednesday, March 21, 2001 5:32 PM
 > Subject: RE: The Great Debate: Xml Parsers
 >
 >
 > >
 > > >    1  Axis must not force the entire message object model to
 > > > be in memory
 > > > at one time.  In other words, DOM is out.
 > >
 > > OK, hang on a sec.  There are some pretty massive concerns around 
dealing
 > > with any kind of streaming model, concerns which I don't believe have 
been
 > > adequately addressed yet.  Until we resolve how we're building this, 
and
 > > what the object model for the messages really looks like, I am not
 > > personally ruling out using DOM or something much like it.
 > >
 > > >From my point of view, it is MUCH more important to get v1.0 out the 
door
 > > than it is to get *all* of the requirements met.  In particular I've 
been
 > > thinking about this one, and frankly I'm willing to give it up and 
just use
 > > JDOM/DOM internally if that gets us a working engine in the nearer 
term.
 > > This is not to say I don't support the goal, I just don't see it 
happening
 > > yet and I'm more leaning towards an "extreme programming" type 
viewpoint on
 > > this project; get v1.0 out, collect feedback, refactor for v2.  I'm 
willing
 > > to be convinced otherwise, if we can make good progress.
 > >
 > > >    2  Axis must be very fast and very scalable in order to be 
widely
 > > > adopted over other Web Service implementation platforms
 > >
 > > Yes, although what "fast enough" and "scalable enough" mean is 
somewhat open
 > > to debate.
 > >
 > > >    3  We must be able to independently parse individual
 > > > elements of the
 > > > message either as raw bits, SAX, the Axis defined Message API, DOM 
or
 > > > whatever else the user wants.
 > >
 > > OK, yes.  +1!
 > >
 > > >    4  We must be able to fully support SOAP semantics (i.e. 
multiref
 > > > elements, id/href, etc) without an overly negative impact on
 > > > performance
 > > > (see number 1 and 2)
 > >
 > > Yeah baby!
 > >
 > > > We've looked at Xerces, we've looked at JDOM, and most
 > > > recently I've been
 > > > doing some work with a new Xml Pull Parser developed originally by
 > > > Aleksander Slominski as part of a research project for
 > > > Indiana Univ. Below
 > > > is a basic summary of our thoughts thus far:
 > > >
 > > > Xerces 1.x ->  Our concern with Xerces 1.x DOM is that it is
 > > > slow, huge,
 > > > and complicated.  These are the standard complaints with DOM
 > > > that we've
 > > > all heard (note to the Xerces guys:  I eagerly await the release of
 > > > Xerces2 ! :-) ....)  It just won't scale well in the types of
 > > > environments
 > > > that we foresee Axis being deployed (which include limited capacity
 > > > devices such as handhelds (in which case it probably wouldn't
 > > > work at all
 > > > due simply to it's size).
 > > >
 > > > We also looked at SAX as an alternative but quickly
 > > > determined that SAX
 > > > just was not adequate for proper SOAP processing that also met the
 > > > requrements mentioned above.  (for those of you who weren't
 > > > part of that
 > > > discussion, I will not rehash it here, ping me later and I'll
 > > > give you the
 > > > rundown).
 > > >
 > > > JDOM -> Whlie JDOM is smaller and faster than Xerces and DOM,
 > > > which is
 > > > nice, it still does not meet our requirements listed above.
 > > > An additional
 > > > issue raised internally at IBM was that JDOM is nowhere near being 
a
 > > > standard yet.  (As some of you may know, the current Axis
 > > > codebase uses
 > > > JDOM for it's message processing).  We've all pretty much
 > > > decided already
 > > > that JDOM should be removed from the core and should be
 > > > replaced with a
 > > > lightweight XML parser that meets the requirements.
 > >
 > > Just speaking for myself, I haven't decided that yet.
 > >
 > > > Xml Pull Parser (XPP) -> XPP is a lightweight (23k) pull
 > > > parser that is
 > > > completely namespace aware and XML 1.0 compliant.  It's
 > > > interface needs
 > > > quite a bit of work so I've been working with the author on
 > > > getting it
 > > > cleaned up.  XPP has two advantages: 1. it's small, 2. it's
 > > > fast.  The
 > > > parser was originally implemented as part of a research
 > > > project comparing
 > > > the performance of various parsers in relation to
 > > > SOAP-deserialization.
 > > > I'll have to try to dig up the results of their tests again, but 
XPP
 > > > outperformed nearly everything else available.   XPP would
 > > > meet each of
 > > > our requirements once the interface redesign is complete.
 > > > This interface
 > > > redesign includes building a SAX layer over the parser's primary
 > > > interface.
 > > >
 > > > Now, here's what we need to decide:
 > > >
 > > > Which is more important: Performance/Scalability or Standards 
support?
 > >
 > > My opinion - if you can get the same product out, and it meets the 
goals
 > > outlined above, with either but not both of these things, I'd 
certainly pick
 > > performance/scalability.  However, as mentioned above, getting the 
product
 > > out is priority 1.
 > >
 > > > From earlier decisions, I believe that we have agreed that
 > > > performance and
 > > > scalability in the case of Axis far outweigh standards
 > > > support within the
 > > > core engine itself as long as there are hooks specifically
 > > > designed into
 > > > the engine that allow full standards support if the developer
 > > > wishes it.
 > > > Thus the reason we were going to provide our own Axis Message
 > > > API with
 > > > hooks for optionally processing the message with SAX or DOM.
 > > > (i.e. if the
 > > > developer wants to tank their performance by using DOM, so be it)
 > >
 > > +1
 > >
 > > > I would like to invite the Xerces guys to join this
 > > > discussion so that we
 > > > may figure out how to resolve this issue.  I understand now
 > > > that Xerces 2
 > > > includes a Pull Parser interface of it's own along with a low level
 > > > interface that enables modularization, but many of us here
 > > > either haven't
 > > > heard of it yet or aren't quite sure what it could mean for
 > > > Axis.  Could
 > > > anybody on the Xerces team explain this in greater depth for us?
 > > >
 > > > - James Snell
 > > >      Software Engineer, Emerging Technologies, IBM
 > > >      jasnell@us.ibm.com (online)
 > > >      jsnell@lemoorenet.com (offline)
 > > >
 >





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Reasons for pull parser for streaming

Posted by Ted Leung <tw...@sauria.com>.
I'm interested.  I believe Andy Clark will probably be interested, too.
Also, Scott Boag mentioned some interest in this topic a while ago,
so as long as we're doing this, I'm going to copy Scott.   Scott if you
want this to go onto the Xalan lists, please feel free.

Ted
----- Original Message -----
From: "James M Snell" <ja...@us.ibm.com>
To: <ax...@xml.apache.org>
Cc: <xe...@xml.apache.org>
Sent: Thursday, March 22, 2001 9:51 AM
Subject: Re: Reasons for pull parser for streaming


> I think that a perfect way to resolve this would be to mature and simplify
> the interface of the Xerces pull-parser to make it more directly usable.
> Layer the Sax layer on top of that, and the DOM layer on top of that. I'd
> gladly work with anybody on the Xerces team to isolate what our
> requirements are for the pull-parser and work with them on interface
> issues to get something put together.   Any takers?
>
> - James Snell
>      Software Engineer, Emerging Technologies, IBM
>      jasnell@us.ibm.com (online)
>      jsnell@lemoorenet.com (offline)
>
> Please respond to axis-dev@xml.apache.org
> To:     axis-dev@xml.apache.org
> cc:
> Subject:        Reasons for pull parser for streaming
>
>
>
>  Hello Sanjiva. 8-)
>
>  I changed the subject to the best fit for this message. 8-)
>
>  I think a move from DOM to JDOM in a codebase that is so far mostly
> experimental is of little impact.
>
>  I agree that moving from a tree to a stream is a bigger issue, we
> just thought that the requirements stated that we need to be able to
> do some kind of parsing-by-parts (not having the entire message in
> memory) so we cannot use tree implementations. Even though somebody
> said that DOM is just an interface, I think that this interface
> implies having the whole message in memory if we ever touch the last
> element in the tree.
>
>  As to SAX: I object to using SAX as the base of Axis because once
> it's SAX it can't change. SAX implies a very strange pipelined
> streaming programming model, whereas a pull stream parser (as well as
> tree parsers) is fully controlled by the application.
>
>  Yes the purpose of v3 is dealing with the hard parts, but the hard
> part is streaming, and if it can be handled with a pull-style parser
> as well as with SAX, I vote for pulling.
>
>  Also SAX can be easily built on top of anything else, tree parsers
> can be built on top of anything else, pull parser cannot easily be
> built on top of SAX (OTOH it can be easily built on top of a tree).
>
>                             Jacek Kopecky
>                                Idoox
>
>
>
> On Thu, 22 Mar 2001, Sanjiva Weerawarana wrote:
>
>  > I'd like to register a concern with the process that the Axis team
>  > is using. I am not an active participant, but I do have much interest
>  > in Axis.
>  >
>  > There was a long discussion early on about streaming / non-streaming
>  > and then the impl was done with DOM. The after the face-to-face
> involving
>  > a small group, suddenly it was switched to JDOM. Now James' note says
>  > that we've pretty much decided to go to something else. Then Glen says
>  > "um, I didn't agree with that". Given the recent vote for Glen as the
>  > project manager (which I fully support with a +100), that's a major
>  > breakdown in the process .. the note even went to Xerces folks!
>  >
>  > ARGH.
>  >
>  > Speaking for myself, I don't view this process as being very open
>  > or very constructive. The DOM -> JDOM stuff was done without much open
>  > discussion. Now there seems to be an effort to move from JDOM without
>  > much (any?) open discussion.
>  >
>  > Can we please open up a bit more?
>  >
>  > Architecturally, I'd like to see us using SAX (and DOM, where needed).
>  > I know using SAX is hard as hell for this, but one of the main reasons
>  > for the total re-write is to address hard issues.
>  >
>  > Sanjiva.
>  >
>  > ----- Original Message -----
>  > From: <gd...@allaire.com>
>  > To: <ax...@xml.apache.org>
>  > Sent: Wednesday, March 21, 2001 5:32 PM
>  > Subject: RE: The Great Debate: Xml Parsers
>  >
>  >
>  > >
>  > > >    1  Axis must not force the entire message object model to
>  > > > be in memory
>  > > > at one time.  In other words, DOM is out.
>  > >
>  > > OK, hang on a sec.  There are some pretty massive concerns around
> dealing
>  > > with any kind of streaming model, concerns which I don't believe have
> been
>  > > adequately addressed yet.  Until we resolve how we're building this,
> and
>  > > what the object model for the messages really looks like, I am not
>  > > personally ruling out using DOM or something much like it.
>  > >
>  > > >From my point of view, it is MUCH more important to get v1.0 out the
> door
>  > > than it is to get *all* of the requirements met.  In particular I've
> been
>  > > thinking about this one, and frankly I'm willing to give it up and
> just use
>  > > JDOM/DOM internally if that gets us a working engine in the nearer
> term.
>  > > This is not to say I don't support the goal, I just don't see it
> happening
>  > > yet and I'm more leaning towards an "extreme programming" type
> viewpoint on
>  > > this project; get v1.0 out, collect feedback, refactor for v2.  I'm
> willing
>  > > to be convinced otherwise, if we can make good progress.
>  > >
>  > > >    2  Axis must be very fast and very scalable in order to be
> widely
>  > > > adopted over other Web Service implementation platforms
>  > >
>  > > Yes, although what "fast enough" and "scalable enough" mean is
> somewhat open
>  > > to debate.
>  > >
>  > > >    3  We must be able to independently parse individual
>  > > > elements of the
>  > > > message either as raw bits, SAX, the Axis defined Message API, DOM
> or
>  > > > whatever else the user wants.
>  > >
>  > > OK, yes.  +1!
>  > >
>  > > >    4  We must be able to fully support SOAP semantics (i.e.
> multiref
>  > > > elements, id/href, etc) without an overly negative impact on
>  > > > performance
>  > > > (see number 1 and 2)
>  > >
>  > > Yeah baby!
>  > >
>  > > > We've looked at Xerces, we've looked at JDOM, and most
>  > > > recently I've been
>  > > > doing some work with a new Xml Pull Parser developed originally by
>  > > > Aleksander Slominski as part of a research project for
>  > > > Indiana Univ. Below
>  > > > is a basic summary of our thoughts thus far:
>  > > >
>  > > > Xerces 1.x ->  Our concern with Xerces 1.x DOM is that it is
>  > > > slow, huge,
>  > > > and complicated.  These are the standard complaints with DOM
>  > > > that we've
>  > > > all heard (note to the Xerces guys:  I eagerly await the release of
>  > > > Xerces2 ! :-) ....)  It just won't scale well in the types of
>  > > > environments
>  > > > that we foresee Axis being deployed (which include limited capacity
>  > > > devices such as handhelds (in which case it probably wouldn't
>  > > > work at all
>  > > > due simply to it's size).
>  > > >
>  > > > We also looked at SAX as an alternative but quickly
>  > > > determined that SAX
>  > > > just was not adequate for proper SOAP processing that also met the
>  > > > requrements mentioned above.  (for those of you who weren't
>  > > > part of that
>  > > > discussion, I will not rehash it here, ping me later and I'll
>  > > > give you the
>  > > > rundown).
>  > > >
>  > > > JDOM -> Whlie JDOM is smaller and faster than Xerces and DOM,
>  > > > which is
>  > > > nice, it still does not meet our requirements listed above.
>  > > > An additional
>  > > > issue raised internally at IBM was that JDOM is nowhere near being
> a
>  > > > standard yet.  (As some of you may know, the current Axis
>  > > > codebase uses
>  > > > JDOM for it's message processing).  We've all pretty much
>  > > > decided already
>  > > > that JDOM should be removed from the core and should be
>  > > > replaced with a
>  > > > lightweight XML parser that meets the requirements.
>  > >
>  > > Just speaking for myself, I haven't decided that yet.
>  > >
>  > > > Xml Pull Parser (XPP) -> XPP is a lightweight (23k) pull
>  > > > parser that is
>  > > > completely namespace aware and XML 1.0 compliant.  It's
>  > > > interface needs
>  > > > quite a bit of work so I've been working with the author on
>  > > > getting it
>  > > > cleaned up.  XPP has two advantages: 1. it's small, 2. it's
>  > > > fast.  The
>  > > > parser was originally implemented as part of a research
>  > > > project comparing
>  > > > the performance of various parsers in relation to
>  > > > SOAP-deserialization.
>  > > > I'll have to try to dig up the results of their tests again, but
> XPP
>  > > > outperformed nearly everything else available.   XPP would
>  > > > meet each of
>  > > > our requirements once the interface redesign is complete.
>  > > > This interface
>  > > > redesign includes building a SAX layer over the parser's primary
>  > > > interface.
>  > > >
>  > > > Now, here's what we need to decide:
>  > > >
>  > > > Which is more important: Performance/Scalability or Standards
> support?
>  > >
>  > > My opinion - if you can get the same product out, and it meets the
> goals
>  > > outlined above, with either but not both of these things, I'd
> certainly pick
>  > > performance/scalability.  However, as mentioned above, getting the
> product
>  > > out is priority 1.
>  > >
>  > > > From earlier decisions, I believe that we have agreed that
>  > > > performance and
>  > > > scalability in the case of Axis far outweigh standards
>  > > > support within the
>  > > > core engine itself as long as there are hooks specifically
>  > > > designed into
>  > > > the engine that allow full standards support if the developer
>  > > > wishes it.
>  > > > Thus the reason we were going to provide our own Axis Message
>  > > > API with
>  > > > hooks for optionally processing the message with SAX or DOM.
>  > > > (i.e. if the
>  > > > developer wants to tank their performance by using DOM, so be it)
>  > >
>  > > +1
>  > >
>  > > > I would like to invite the Xerces guys to join this
>  > > > discussion so that we
>  > > > may figure out how to resolve this issue.  I understand now
>  > > > that Xerces 2
>  > > > includes a Pull Parser interface of it's own along with a low level
>  > > > interface that enables modularization, but many of us here
>  > > > either haven't
>  > > > heard of it yet or aren't quite sure what it could mean for
>  > > > Axis.  Could
>  > > > anybody on the Xerces team explain this in greater depth for us?
>  > > >
>  > > > - James Snell
>  > > >      Software Engineer, Emerging Technologies, IBM
>  > > >      jasnell@us.ibm.com (online)
>  > > >      jsnell@lemoorenet.com (offline)
>  > > >
>  >
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>


Re: Reasons for pull parser for streaming

Posted by Ted Leung <tw...@sauria.com>.
I'm interested.  I believe Andy Clark will probably be interested, too.
Also, Scott Boag mentioned some interest in this topic a while ago,
so as long as we're doing this, I'm going to copy Scott.   Scott if you
want this to go onto the Xalan lists, please feel free.

Ted
----- Original Message -----
From: "James M Snell" <ja...@us.ibm.com>
To: <ax...@xml.apache.org>
Cc: <xe...@xml.apache.org>
Sent: Thursday, March 22, 2001 9:51 AM
Subject: Re: Reasons for pull parser for streaming


> I think that a perfect way to resolve this would be to mature and simplify
> the interface of the Xerces pull-parser to make it more directly usable.
> Layer the Sax layer on top of that, and the DOM layer on top of that. I'd
> gladly work with anybody on the Xerces team to isolate what our
> requirements are for the pull-parser and work with them on interface
> issues to get something put together.   Any takers?
>
> - James Snell
>      Software Engineer, Emerging Technologies, IBM
>      jasnell@us.ibm.com (online)
>      jsnell@lemoorenet.com (offline)
>
> Please respond to axis-dev@xml.apache.org
> To:     axis-dev@xml.apache.org
> cc:
> Subject:        Reasons for pull parser for streaming
>
>
>
>  Hello Sanjiva. 8-)
>
>  I changed the subject to the best fit for this message. 8-)
>
>  I think a move from DOM to JDOM in a codebase that is so far mostly
> experimental is of little impact.
>
>  I agree that moving from a tree to a stream is a bigger issue, we
> just thought that the requirements stated that we need to be able to
> do some kind of parsing-by-parts (not having the entire message in
> memory) so we cannot use tree implementations. Even though somebody
> said that DOM is just an interface, I think that this interface
> implies having the whole message in memory if we ever touch the last
> element in the tree.
>
>  As to SAX: I object to using SAX as the base of Axis because once
> it's SAX it can't change. SAX implies a very strange pipelined
> streaming programming model, whereas a pull stream parser (as well as
> tree parsers) is fully controlled by the application.
>
>  Yes the purpose of v3 is dealing with the hard parts, but the hard
> part is streaming, and if it can be handled with a pull-style parser
> as well as with SAX, I vote for pulling.
>
>  Also SAX can be easily built on top of anything else, tree parsers
> can be built on top of anything else, pull parser cannot easily be
> built on top of SAX (OTOH it can be easily built on top of a tree).
>
>                             Jacek Kopecky
>                                Idoox
>
>
>
> On Thu, 22 Mar 2001, Sanjiva Weerawarana wrote:
>
>  > I'd like to register a concern with the process that the Axis team
>  > is using. I am not an active participant, but I do have much interest
>  > in Axis.
>  >
>  > There was a long discussion early on about streaming / non-streaming
>  > and then the impl was done with DOM. The after the face-to-face
> involving
>  > a small group, suddenly it was switched to JDOM. Now James' note says
>  > that we've pretty much decided to go to something else. Then Glen says
>  > "um, I didn't agree with that". Given the recent vote for Glen as the
>  > project manager (which I fully support with a +100), that's a major
>  > breakdown in the process .. the note even went to Xerces folks!
>  >
>  > ARGH.
>  >
>  > Speaking for myself, I don't view this process as being very open
>  > or very constructive. The DOM -> JDOM stuff was done without much open
>  > discussion. Now there seems to be an effort to move from JDOM without
>  > much (any?) open discussion.
>  >
>  > Can we please open up a bit more?
>  >
>  > Architecturally, I'd like to see us using SAX (and DOM, where needed).
>  > I know using SAX is hard as hell for this, but one of the main reasons
>  > for the total re-write is to address hard issues.
>  >
>  > Sanjiva.
>  >
>  > ----- Original Message -----
>  > From: <gd...@allaire.com>
>  > To: <ax...@xml.apache.org>
>  > Sent: Wednesday, March 21, 2001 5:32 PM
>  > Subject: RE: The Great Debate: Xml Parsers
>  >
>  >
>  > >
>  > > >    1  Axis must not force the entire message object model to
>  > > > be in memory
>  > > > at one time.  In other words, DOM is out.
>  > >
>  > > OK, hang on a sec.  There are some pretty massive concerns around
> dealing
>  > > with any kind of streaming model, concerns which I don't believe have
> been
>  > > adequately addressed yet.  Until we resolve how we're building this,
> and
>  > > what the object model for the messages really looks like, I am not
>  > > personally ruling out using DOM or something much like it.
>  > >
>  > > >From my point of view, it is MUCH more important to get v1.0 out the
> door
>  > > than it is to get *all* of the requirements met.  In particular I've
> been
>  > > thinking about this one, and frankly I'm willing to give it up and
> just use
>  > > JDOM/DOM internally if that gets us a working engine in the nearer
> term.
>  > > This is not to say I don't support the goal, I just don't see it
> happening
>  > > yet and I'm more leaning towards an "extreme programming" type
> viewpoint on
>  > > this project; get v1.0 out, collect feedback, refactor for v2.  I'm
> willing
>  > > to be convinced otherwise, if we can make good progress.
>  > >
>  > > >    2  Axis must be very fast and very scalable in order to be
> widely
>  > > > adopted over other Web Service implementation platforms
>  > >
>  > > Yes, although what "fast enough" and "scalable enough" mean is
> somewhat open
>  > > to debate.
>  > >
>  > > >    3  We must be able to independently parse individual
>  > > > elements of the
>  > > > message either as raw bits, SAX, the Axis defined Message API, DOM
> or
>  > > > whatever else the user wants.
>  > >
>  > > OK, yes.  +1!
>  > >
>  > > >    4  We must be able to fully support SOAP semantics (i.e.
> multiref
>  > > > elements, id/href, etc) without an overly negative impact on
>  > > > performance
>  > > > (see number 1 and 2)
>  > >
>  > > Yeah baby!
>  > >
>  > > > We've looked at Xerces, we've looked at JDOM, and most
>  > > > recently I've been
>  > > > doing some work with a new Xml Pull Parser developed originally by
>  > > > Aleksander Slominski as part of a research project for
>  > > > Indiana Univ. Below
>  > > > is a basic summary of our thoughts thus far:
>  > > >
>  > > > Xerces 1.x ->  Our concern with Xerces 1.x DOM is that it is
>  > > > slow, huge,
>  > > > and complicated.  These are the standard complaints with DOM
>  > > > that we've
>  > > > all heard (note to the Xerces guys:  I eagerly await the release of
>  > > > Xerces2 ! :-) ....)  It just won't scale well in the types of
>  > > > environments
>  > > > that we foresee Axis being deployed (which include limited capacity
>  > > > devices such as handhelds (in which case it probably wouldn't
>  > > > work at all
>  > > > due simply to it's size).
>  > > >
>  > > > We also looked at SAX as an alternative but quickly
>  > > > determined that SAX
>  > > > just was not adequate for proper SOAP processing that also met the
>  > > > requrements mentioned above.  (for those of you who weren't
>  > > > part of that
>  > > > discussion, I will not rehash it here, ping me later and I'll
>  > > > give you the
>  > > > rundown).
>  > > >
>  > > > JDOM -> Whlie JDOM is smaller and faster than Xerces and DOM,
>  > > > which is
>  > > > nice, it still does not meet our requirements listed above.
>  > > > An additional
>  > > > issue raised internally at IBM was that JDOM is nowhere near being
> a
>  > > > standard yet.  (As some of you may know, the current Axis
>  > > > codebase uses
>  > > > JDOM for it's message processing).  We've all pretty much
>  > > > decided already
>  > > > that JDOM should be removed from the core and should be
>  > > > replaced with a
>  > > > lightweight XML parser that meets the requirements.
>  > >
>  > > Just speaking for myself, I haven't decided that yet.
>  > >
>  > > > Xml Pull Parser (XPP) -> XPP is a lightweight (23k) pull
>  > > > parser that is
>  > > > completely namespace aware and XML 1.0 compliant.  It's
>  > > > interface needs
>  > > > quite a bit of work so I've been working with the author on
>  > > > getting it
>  > > > cleaned up.  XPP has two advantages: 1. it's small, 2. it's
>  > > > fast.  The
>  > > > parser was originally implemented as part of a research
>  > > > project comparing
>  > > > the performance of various parsers in relation to
>  > > > SOAP-deserialization.
>  > > > I'll have to try to dig up the results of their tests again, but
> XPP
>  > > > outperformed nearly everything else available.   XPP would
>  > > > meet each of
>  > > > our requirements once the interface redesign is complete.
>  > > > This interface
>  > > > redesign includes building a SAX layer over the parser's primary
>  > > > interface.
>  > > >
>  > > > Now, here's what we need to decide:
>  > > >
>  > > > Which is more important: Performance/Scalability or Standards
> support?
>  > >
>  > > My opinion - if you can get the same product out, and it meets the
> goals
>  > > outlined above, with either but not both of these things, I'd
> certainly pick
>  > > performance/scalability.  However, as mentioned above, getting the
> product
>  > > out is priority 1.
>  > >
>  > > > From earlier decisions, I believe that we have agreed that
>  > > > performance and
>  > > > scalability in the case of Axis far outweigh standards
>  > > > support within the
>  > > > core engine itself as long as there are hooks specifically
>  > > > designed into
>  > > > the engine that allow full standards support if the developer
>  > > > wishes it.
>  > > > Thus the reason we were going to provide our own Axis Message
>  > > > API with
>  > > > hooks for optionally processing the message with SAX or DOM.
>  > > > (i.e. if the
>  > > > developer wants to tank their performance by using DOM, so be it)
>  > >
>  > > +1
>  > >
>  > > > I would like to invite the Xerces guys to join this
>  > > > discussion so that we
>  > > > may figure out how to resolve this issue.  I understand now
>  > > > that Xerces 2
>  > > > includes a Pull Parser interface of it's own along with a low level
>  > > > interface that enables modularization, but many of us here
>  > > > either haven't
>  > > > heard of it yet or aren't quite sure what it could mean for
>  > > > Axis.  Could
>  > > > anybody on the Xerces team explain this in greater depth for us?
>  > > >
>  > > > - James Snell
>  > > >      Software Engineer, Emerging Technologies, IBM
>  > > >      jasnell@us.ibm.com (online)
>  > > >      jsnell@lemoorenet.com (offline)
>  > > >
>  >
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Reasons for pull parser for streaming

Posted by Andy Clark <an...@apache.org>.
Oops, I responded to this over in xerces-j-dev and didn't
notice that it was cross-posted. But that's alright since
this is an Axis thread anyway. :)

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

Re: Reasons for pull parser for streaming

Posted by Andy Clark <an...@apache.org>.
James M Snell wrote:
> gladly work with anybody on the Xerces team to isolate what our
> requirements are for the pull-parser and work with them on interface
> issues to get something put together.   Any takers?

Well, in Xerces 1.x, you should be able to enable the pull
parsing with the following code:

  parser.parseSomeSetup(inputSource);
  while (parser.parseSome()) {
    // no-op
  }

However, even though Xerces2 shares the same state machine
parsing mechanism, we haven't thought much about what the
real API should be for pull parsing. So we'd appreciate any
input you have. Although, I figure that it'll be somewhat
similar to what is in Xerces 1.x because I don't think we
plan to make a fundamental change to become similar to the 
way that XPP works.

                           * * *

On a completely different topic can we please just quote the
relevent text instead of dropping the entire thread inline
to every response?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org