You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Dennis Sosnoski <dm...@sosnoski.com> on 2004/11/02 10:13:21 UTC
[Axis2] OM
I spent the weekend catching up with the last couple of months of Axis
emails and saw some of the activity around the OM. I have a few thoughts
I wanted to offer on this.
First off, if you really want to keep performance high then I urge you
not to build a model. I'd instead suggest something like a parse event
store that you can replay on demand using StAX, SAX, or custom APIs.
Models are expensive in terms of both time and memory. There's been talk
of integrating in XMLBeans, and I know XMLBeans already has some form of
backing event store for everything it does. I haven't looked into the
performance of XMLBeans, but something like that backing store would
probably be a great basis for what you need (and even has XPath and such
already implemented on top of it).
I've also implemented a simple parse event store for my XBIS project
(http://www.xbis.org - the parse event store is currently designed
around SAX, and can be found in the eventstore package
http://xbis.sourceforge.net/api/index.html). This gave excellent
performance (I think replaying the event stream at least 10X parser
speed) at a resonable memory cost (about 2X the actual size of the
document text for the cases I looked at) without much work on
optimization. Working with even an efficient document model is likely
going to be both considerably slower and considerably heavier in memory
usage.
The real limitation I saw for a parse event store was just that the
parser APIs are inefficient for working with the data - attributes have
to be kept as memory-consuming Strings rather than just character
ranges, and in the case of SAX have to be organized into structures for
reporting; namespaces are passed in the form of URIs and prefixes rather
than objects (forcing applications to go through the same work the
parser has done to associate the two); etc. If you actually designed a
parse event stream interface rather than working with either SAX or StAX
you could probably push the efficiency even higher (in other words, use
the event store as an adapter between the parser and your own internal
event stream API).
- Dennis
Re: [Axis2] OM
Posted by Srinath Perera <he...@gmail.com>.
Thanks Dennis; I try to add few thoughts.
1) We store the event's only when it is absoultly required. In axis if
the handlers do not accsess the body the event would come from stream
directly.
2) The Idea of OM is to store StAX events and let the user accsess the
events as a model if they wish to. I think the OM come near to the
store of pull event to a great extent. But the only way to say it is
to testitng it :). One of the implementation of the OM (one based on
table modle) is actually a table of Stax events and they are replayed
to generate the events later. Additionally the table let the user to
traversal the event as a model.
May be we should concentrate more on making the OM more a Stax event
store if that pays off :) But we should to bit of comparing first of
all. thoughts?
Cheers
Srinath
On Tue, 02 Nov 2004 01:13:21 -0800, Dennis Sosnoski <dm...@sosnoski.com> wrote:
> I spent the weekend catching up with the last couple of months of Axis
> emails and saw some of the activity around the OM. I have a few thoughts
> I wanted to offer on this.
>
> First off, if you really want to keep performance high then I urge you
> not to build a model. I'd instead suggest something like a parse event
> store that you can replay on demand using StAX, SAX, or custom APIs.
> Models are expensive in terms of both time and memory. There's been talk
> of integrating in XMLBeans, and I know XMLBeans already has some form of
> backing event store for everything it does. I haven't looked into the
> performance of XMLBeans, but something like that backing store would
> probably be a great basis for what you need (and even has XPath and such
> already implemented on top of it).
>
> I've also implemented a simple parse event store for my XBIS project
> (http://www.xbis.org - the parse event store is currently designed
> around SAX, and can be found in the eventstore package
> http://xbis.sourceforge.net/api/index.html). This gave excellent
> performance (I think replaying the event stream at least 10X parser
> speed) at a resonable memory cost (about 2X the actual size of the
> document text for the cases I looked at) without much work on
> optimization. Working with even an efficient document model is likely
> going to be both considerably slower and considerably heavier in memory
> usage.
>
> The real limitation I saw for a parse event store was just that the
> parser APIs are inefficient for working with the data - attributes have
> to be kept as memory-consuming Strings rather than just character
> ranges, and in the case of SAX have to be organized into structures for
> reporting; namespaces are passed in the form of URIs and prefixes rather
> than objects (forcing applications to go through the same work the
> parser has done to associate the two); etc. If you actually designed a
> parse event stream interface rather than working with either SAX or StAX
> you could probably push the efficiency even higher (in other words, use
> the event store as an adapter between the parser and your own internal
> event stream API).
>
> - Dennis
>
Re: [Axis2] OM
Posted by Dennis Sosnoski <dm...@sosnoski.com>.
Sanjiva Weerawarana wrote:
>Hi Dennis,
>
>One of the reasons I'm very gung-ho about using a typed StAX approach
>is to enable one to do a schema-aware binary serialization of the
>type aware XML Infoset. Is the binary serialization you've defined
>schema aware? I know that's somewhat controversial (at least in IBM!)..
>so I'd be curious to know where you stand on it.
>
>
I don't personally see that much need for schema-aware binary
serializations. They can certainly be more efficient than generic XML
serializations, but the cost of giving up XML compatibility seems high
to me. Sun made a big deal of their performance gains from schema-aware
serializations in Fast Web Services, but I get even better performance
out of my JibxSoap framework while still using ordinary text XML (as far
as I can tell - they don't publish source code, so I can't actually try
it with the same data). See
http://www.sosnoski.com/presents/cleansoap/comparing.html for details
(which I'm planning to update with Axis 1.2 final whenever that's out).
I guess the moral is that there are many other areas where performance
can be improved before text becomes a real bottleneck in most systems.
I would like to see a format like XBIS that could be plugged in just
like gzip compression over the wire. I think for many applications the
tradeoffs of a text-preserving format that's both faster and more
compact than plain text would be very useful. Unfortunately, Sun and
that crowd are pushing the schema-aware serializations, while other
groups are unwilling to see anything other than text (or gzipped text).
>In any case, would you consider getting involved to help implement
>some of those alternate serializations possibilities? Your help would
>be greatly appreciated!
>
>
I don't think I can take time for actual development work on Axis2. I've
got my own JiBX data binding and JibxSoap web services to extend, as
well as the paid projects that keep me going. If something's especially
interesting the priorities are always subject to change, though!
- Dennis
Re: [Axis2] OM
Posted by Sanjiva Weerawarana <sa...@opensource.lk>.
Hi Dennis,
One of the reasons I'm very gung-ho about using a typed StAX approach
is to enable one to do a schema-aware binary serialization of the
type aware XML Infoset. Is the binary serialization you've defined
schema aware? I know that's somewhat controversial (at least in IBM!)..
so I'd be curious to know where you stand on it.
In any case, would you consider getting involved to help implement
some of those alternate serializations possibilities? Your help would
be greatly appreciated!
Sanjiva.
----- Original Message -----
From: "Dennis Sosnoski" <dm...@sosnoski.com>
To: <ax...@ws.apache.org>
Sent: Tuesday, November 02, 2004 11:35 PM
Subject: Re: [Axis2] OM
> Aleksander Slominski wrote:
>
> > Dennis Sosnoski wrote:
> >
> >> First off, if you really want to keep performance high then I urge
> >> you not to build a model. I'd instead suggest something like a parse
> >> event store that you can replay on demand using StAX, SAX, or custom
> >> APIs. ...
> >
> >
> > XmlBeans uses model too (DOM2 like store). it would be interesting to
> > see independent performance results including memory footprint
>
> I've been meaning to run a test (and include .NET's data binding in
> hopes of getting permission to show the results - I think there's been
> some legal changes that mean they'll be removing their restrictions in
> the future). Hopefully I'll get this done in the next couple of weeks.
>
> >
> >>
> >> I've also implemented a simple parse event store for my XBIS project
> >> (http://www.xbis.org - the parse event store is currently designed
> >> around SAX, and can be found in the eventstore package
> >> http://xbis.sourceforge.net/api/index.html). This gave excellent
> >> performance (I think replaying the event stream at least 10X parser
> >> speed) at a resonable memory cost (about 2X the actual size of the
> >> document text for the cases I looked at) without much work on
> >> optimization. Working with even an efficient document model is likely
> >> going to be both considerably slower and considerably heavier in
> >> memory usage.
> >
> >
> > i think that is what we try to do but optimized for SOAP where SOAP
> > headers are kept in memory and accessible in whatever API is
> > needed/standard such as DOM, SAAJ, etc. - we have to optimize for this
> > common case however SOAP body can be retrieved by app directly as
> > event stream if required.
> >
> > currently thinking is around using SAX, StAX, XPP streams but having
> > even more optimized stream should only make things better :)
>
> Both Srinath's reply and this one clarified this for me. I guess I'd
> taken the discussion of a "model" too literally and not looked into the
> details of the implementation. Sounds like you're heading in a good
> direction (including with the OMNamespace)!
>
> >
> >> If you actually designed a parse event stream interface rather than
> >> working with either SAX or StAX you could probably push the
> >> efficiency even higher (in other words, use the event store as an
> >> adapter between the parser and your own internal event stream API).
> >
> >
> >
> > if you have done all hard work then i see why not try to use it :)
> >
> > what is the license for your API/source code? can it be included in
> > AXIS2 (either in OM core or as optional component)?
> >
> My code's BSD in any case, but I'd gladly donate it to the project if
> you think it's useful. The code's in the download and also in
> SourceForge CVS from the project site (http://www.xbis.org). I wasn't
> referring to my code in particular as the event store, though.
>
> Except for the namespace issue I think the XMLPull API is great for
> working with this type of store. If you'd extend the XMLPull API to use
> namespace objects (which I'd love to see) you could use that directly. I
> don't know about StAX - certainly the object API version is going to add
> a lot of overhead, and I don't know the details of how the StAX
> low-level API differs from XMLPull.
>
> - Dennis
>
[Axis2] tomorrow chat and my take on AXIOM [Re: [Axis2] OM
Posted by Aleksander Slominski <as...@cs.indiana.edu>.
Dennis Sosnoski wrote:
> BTW, the XBIS code also contains timing test implementations so you
> can see how long it takes to parse a document directly, to parse to
> the XBIS event store, and to replay the parse from the event store (as
> well as the document format tests that are the main focus of XBIS). It
> only supports SAX parser comparisons at present (and SAX parse event
> stream replay) because that was what I wanted to test against (most
> widely used, and also the only form supported by the JAXP transform
> for generating output document text).
i will take a look on XBIS on context of AXIOM and I will also look on
minimizing memory footprint and reuse such as OMNamespace and directly
using String as children of OMNode - that should not be too difficult if
java.util.List as it is very flexible (LinkedList and ArrayList) and
well tested storage mechanism.
one of the remaining OM API issues is how API to expose SOAP:Body stream
should look and what we want to do to support MTOM (i look on it as high
priority) and SAAJ/JAXM (i think it must be layered and not backed into
AXIOM ...)
thanks,
alek
>
> Dennis Sosnoski wrote:
>
>> Aleksander Slominski wrote:
>>
>>> Dennis Sosnoski wrote:
>>>
>>>> First off, if you really want to keep performance high then I urge
>>>> you not to build a model. I'd instead suggest something like a
>>>> parse event store that you can replay on demand using StAX, SAX, or
>>>> custom APIs. ...
>>>
>>>
>>>
>>>
>>> XmlBeans uses model too (DOM2 like store). it would be interesting
>>> to see independent performance results including memory footprint
>>
>>
>>
>> I've been meaning to run a test (and include .NET's data binding in
>> hopes of getting permission to show the results - I think there's
>> been some legal changes that mean they'll be removing their
>> restrictions in the future). Hopefully I'll get this done in the next
>> couple of weeks.
>>
>>>
>>>>
>>>> I've also implemented a simple parse event store for my XBIS
>>>> project (http://www.xbis.org - the parse event store is currently
>>>> designed around SAX, and can be found in the eventstore package
>>>> http://xbis.sourceforge.net/api/index.html). This gave excellent
>>>> performance (I think replaying the event stream at least 10X parser
>>>> speed) at a resonable memory cost (about 2X the actual size of the
>>>> document text for the cases I looked at) without much work on
>>>> optimization. Working with even an efficient document model is
>>>> likely going to be both considerably slower and considerably
>>>> heavier in memory usage.
>>>
>>>
>>>
>>>
>>> i think that is what we try to do but optimized for SOAP where SOAP
>>> headers are kept in memory and accessible in whatever API is
>>> needed/standard such as DOM, SAAJ, etc. - we have to optimize for
>>> this common case however SOAP body can be retrieved by app directly
>>> as event stream if required.
>>>
>>> currently thinking is around using SAX, StAX, XPP streams but having
>>> even more optimized stream should only make things better :)
>>
>>
>>
>> Both Srinath's reply and this one clarified this for me. I guess I'd
>> taken the discussion of a "model" too literally and not looked into
>> the details of the implementation. Sounds like you're heading in a
>> good direction (including with the OMNamespace)!
>>
>>>
>>>> If you actually designed a parse event stream interface rather than
>>>> working with either SAX or StAX you could probably push the
>>>> efficiency even higher (in other words, use the event store as an
>>>> adapter between the parser and your own internal event stream API).
>>>
>>>
>>>
>>>
>>>
>>> if you have done all hard work then i see why not try to use it :)
>>>
>>> what is the license for your API/source code? can it be included in
>>> AXIS2 (either in OM core or as optional component)?
>>>
>> My code's BSD in any case, but I'd gladly donate it to the project if
>> you think it's useful. The code's in the download and also in
>> SourceForge CVS from the project site (http://www.xbis.org). I wasn't
>> referring to my code in particular as the event store, though.
>>
>> Except for the namespace issue I think the XMLPull API is great for
>> working with this type of store. If you'd extend the XMLPull API to
>> use namespace objects (which I'd love to see) you could use that
>> directly. I don't know about StAX - certainly the object API version
>> is going to add a lot of overhead, and I don't know the details of
>> how the StAX low-level API differs from XMLPull.
>>
>> - Dennis
>
>
--
The best way to predict the future is to invent it - Alan Kay
Re: [Axis2] OM
Posted by Dennis Sosnoski <dm...@sosnoski.com>.
BTW, the XBIS code also contains timing test implementations so you can
see how long it takes to parse a document directly, to parse to the XBIS
event store, and to replay the parse from the event store (as well as
the document format tests that are the main focus of XBIS). It only
supports SAX parser comparisons at present (and SAX parse event stream
replay) because that was what I wanted to test against (most widely
used, and also the only form supported by the JAXP transform for
generating output document text).
- Dennis
Dennis Sosnoski wrote:
> Aleksander Slominski wrote:
>
>> Dennis Sosnoski wrote:
>>
>>> First off, if you really want to keep performance high then I urge
>>> you not to build a model. I'd instead suggest something like a parse
>>> event store that you can replay on demand using StAX, SAX, or custom
>>> APIs. ...
>>
>>
>>
>> XmlBeans uses model too (DOM2 like store). it would be interesting to
>> see independent performance results including memory footprint
>
>
> I've been meaning to run a test (and include .NET's data binding in
> hopes of getting permission to show the results - I think there's been
> some legal changes that mean they'll be removing their restrictions in
> the future). Hopefully I'll get this done in the next couple of weeks.
>
>>
>>>
>>> I've also implemented a simple parse event store for my XBIS project
>>> (http://www.xbis.org - the parse event store is currently designed
>>> around SAX, and can be found in the eventstore package
>>> http://xbis.sourceforge.net/api/index.html). This gave excellent
>>> performance (I think replaying the event stream at least 10X parser
>>> speed) at a resonable memory cost (about 2X the actual size of the
>>> document text for the cases I looked at) without much work on
>>> optimization. Working with even an efficient document model is
>>> likely going to be both considerably slower and considerably heavier
>>> in memory usage.
>>
>>
>>
>> i think that is what we try to do but optimized for SOAP where SOAP
>> headers are kept in memory and accessible in whatever API is
>> needed/standard such as DOM, SAAJ, etc. - we have to optimize for
>> this common case however SOAP body can be retrieved by app directly
>> as event stream if required.
>>
>> currently thinking is around using SAX, StAX, XPP streams but having
>> even more optimized stream should only make things better :)
>
>
> Both Srinath's reply and this one clarified this for me. I guess I'd
> taken the discussion of a "model" too literally and not looked into
> the details of the implementation. Sounds like you're heading in a
> good direction (including with the OMNamespace)!
>
>>
>>> If you actually designed a parse event stream interface rather than
>>> working with either SAX or StAX you could probably push the
>>> efficiency even higher (in other words, use the event store as an
>>> adapter between the parser and your own internal event stream API).
>>
>>
>>
>>
>> if you have done all hard work then i see why not try to use it :)
>>
>> what is the license for your API/source code? can it be included in
>> AXIS2 (either in OM core or as optional component)?
>>
> My code's BSD in any case, but I'd gladly donate it to the project if
> you think it's useful. The code's in the download and also in
> SourceForge CVS from the project site (http://www.xbis.org). I wasn't
> referring to my code in particular as the event store, though.
>
> Except for the namespace issue I think the XMLPull API is great for
> working with this type of store. If you'd extend the XMLPull API to
> use namespace objects (which I'd love to see) you could use that
> directly. I don't know about StAX - certainly the object API version
> is going to add a lot of overhead, and I don't know the details of
> how the StAX low-level API differs from XMLPull.
>
> - Dennis
Re: [Axis2] OM
Posted by Dennis Sosnoski <dm...@sosnoski.com>.
Aleksander Slominski wrote:
> Dennis Sosnoski wrote:
>
>> First off, if you really want to keep performance high then I urge
>> you not to build a model. I'd instead suggest something like a parse
>> event store that you can replay on demand using StAX, SAX, or custom
>> APIs. ...
>
>
> XmlBeans uses model too (DOM2 like store). it would be interesting to
> see independent performance results including memory footprint
I've been meaning to run a test (and include .NET's data binding in
hopes of getting permission to show the results - I think there's been
some legal changes that mean they'll be removing their restrictions in
the future). Hopefully I'll get this done in the next couple of weeks.
>
>>
>> I've also implemented a simple parse event store for my XBIS project
>> (http://www.xbis.org - the parse event store is currently designed
>> around SAX, and can be found in the eventstore package
>> http://xbis.sourceforge.net/api/index.html). This gave excellent
>> performance (I think replaying the event stream at least 10X parser
>> speed) at a resonable memory cost (about 2X the actual size of the
>> document text for the cases I looked at) without much work on
>> optimization. Working with even an efficient document model is likely
>> going to be both considerably slower and considerably heavier in
>> memory usage.
>
>
> i think that is what we try to do but optimized for SOAP where SOAP
> headers are kept in memory and accessible in whatever API is
> needed/standard such as DOM, SAAJ, etc. - we have to optimize for this
> common case however SOAP body can be retrieved by app directly as
> event stream if required.
>
> currently thinking is around using SAX, StAX, XPP streams but having
> even more optimized stream should only make things better :)
Both Srinath's reply and this one clarified this for me. I guess I'd
taken the discussion of a "model" too literally and not looked into the
details of the implementation. Sounds like you're heading in a good
direction (including with the OMNamespace)!
>
>> If you actually designed a parse event stream interface rather than
>> working with either SAX or StAX you could probably push the
>> efficiency even higher (in other words, use the event store as an
>> adapter between the parser and your own internal event stream API).
>
>
>
> if you have done all hard work then i see why not try to use it :)
>
> what is the license for your API/source code? can it be included in
> AXIS2 (either in OM core or as optional component)?
>
My code's BSD in any case, but I'd gladly donate it to the project if
you think it's useful. The code's in the download and also in
SourceForge CVS from the project site (http://www.xbis.org). I wasn't
referring to my code in particular as the event store, though.
Except for the namespace issue I think the XMLPull API is great for
working with this type of store. If you'd extend the XMLPull API to use
namespace objects (which I'd love to see) you could use that directly. I
don't know about StAX - certainly the object API version is going to add
a lot of overhead, and I don't know the details of how the StAX
low-level API differs from XMLPull.
- Dennis
Re: [Axis2] OM
Posted by Aleksander Slominski <as...@cs.indiana.edu>.
Dennis Sosnoski wrote:
> I spent the weekend catching up with the last couple of months of Axis
> emails and saw some of the activity around the OM. I have a few
> thoughts I wanted to offer on this.
>
> First off, if you really want to keep performance high then I urge you
> not to build a model. I'd instead suggest something like a parse event
> store that you can replay on demand using StAX, SAX, or custom APIs.
> Models are expensive in terms of both time and memory. There's been
> talk of integrating in XMLBeans, and I know XMLBeans already has some
> form of backing event store for everything it does. I haven't looked
> into the performance of XMLBeans, but something like that backing
> store would probably be a great basis for what you need (and even has
> XPath and such already implemented on top of it).
XmlBeans uses model too (DOM2 like store). it would be interesting to
see independent performance results including memory footprint
>
> I've also implemented a simple parse event store for my XBIS project
> (http://www.xbis.org - the parse event store is currently designed
> around SAX, and can be found in the eventstore package
> http://xbis.sourceforge.net/api/index.html). This gave excellent
> performance (I think replaying the event stream at least 10X parser
> speed) at a resonable memory cost (about 2X the actual size of the
> document text for the cases I looked at) without much work on
> optimization. Working with even an efficient document model is likely
> going to be both considerably slower and considerably heavier in
> memory usage.
i think that is what we try to do but optimized for SOAP where SOAP
headers are kept in memory and accessible in whatever API is
needed/standard such as DOM, SAAJ, etc. - we have to optimize for this
common case however SOAP body can be retrieved by app directly as event
stream if required.
currently thinking is around using SAX, StAX, XPP streams but having
even more optimized stream should only make things better :)
>
> The real limitation I saw for a parse event store was just that the
> parser APIs are inefficient for working with the data - attributes
> have to be kept as memory-consuming Strings rather than just character
> ranges, and in the case of SAX have to be organized into structures
> for reporting; namespaces are passed in the form of URIs and prefixes
> rather than objects (forcing applications to go through the same work
> the parser has done to associate the two); etc.
that is precisely point of having OMNamespace in OM API.
> If you actually designed a parse event stream interface rather than
> working with either SAX or StAX you could probably push the efficiency
> even higher (in other words, use the event store as an adapter between
> the parser and your own internal event stream API).
if you have done all hard work then i see why not try to use it :)
what is the license for your API/source code? can it be included in
AXIS2 (either in OM core or as optional component)?
thanks,
alek
--
The best way to predict the future is to invent it - Alan Kay