You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-dev@axis.apache.org by Dennis Sosnoski <dm...@sosnoski.com> on 2004/11/02 10:13:21 UTC

[Axis2] OM

I spent the weekend catching up with the last couple of months of Axis 
emails and saw some of the activity around the OM. I have a few thoughts 
I wanted to offer on this.

First off, if you really want to keep performance high then I urge you 
not to build a model. I'd instead suggest something like a parse event 
store that you can replay on demand using StAX, SAX, or custom APIs. 
Models are expensive in terms of both time and memory. There's been talk 
of integrating in XMLBeans, and I know XMLBeans already has some form of 
backing event store for everything it does. I haven't looked into the 
performance of XMLBeans, but something like that backing store would 
probably be a great basis for what you need (and even has XPath and such 
already implemented on top of it).

I've also implemented a simple parse event store for my XBIS project 
(http://www.xbis.org - the parse event store is currently designed 
around SAX, and can be found in the eventstore package 
http://xbis.sourceforge.net/api/index.html). This gave excellent 
performance (I think replaying the event stream at least 10X parser 
speed) at a resonable memory cost (about 2X the actual size of the 
document text for the cases I looked at) without much work on 
optimization. Working with even an efficient document model is likely 
going to be both considerably slower and considerably heavier in memory 
usage.

The real limitation I saw for a parse event store was just that the 
parser APIs are inefficient for working with the data - attributes have 
to be kept as memory-consuming Strings rather than just character 
ranges, and in the case of SAX have to be organized into structures for 
reporting; namespaces are passed in the form of URIs and prefixes rather 
than objects (forcing applications to go through the same work the 
parser has done to associate the two); etc. If you actually designed a 
parse event stream interface rather than working with either SAX or StAX 
you could probably push the efficiency even higher (in other words, use 
the event store as an adapter between the parser and your own internal 
event stream API).

  - Dennis

Re: [Axis2] OM

Posted by Srinath Perera <he...@gmail.com>.

Thanks Dennis; I try to add few thoughts. 
 
1) We store the event's only when it is absoultly required. In axis if
the handlers do not accsess the body the event would come from stream
directly.
2) The Idea of OM is to store StAX events and let the user accsess the
events as a model if they wish to.  I think the OM come near to the
store of pull event to a great extent. But the only way to say it is
to testitng it :). One of the implementation of the OM (one based on
table modle) is actually a table of Stax events and they are replayed
to generate the events later. Additionally the table let the user to
traversal the event  as a model.

May be we should concentrate more on making the OM more a Stax event
store if that pays off :) But we should to bit of comparing first of
all. thoughts?
Cheers
Srinath






On Tue, 02 Nov 2004 01:13:21 -0800, Dennis Sosnoski <dm...@sosnoski.com> wrote:
> I spent the weekend catching up with the last couple of months of Axis
> emails and saw some of the activity around the OM. I have a few thoughts
> I wanted to offer on this.
> 
> First off, if you really want to keep performance high then I urge you
> not to build a model. I'd instead suggest something like a parse event
> store that you can replay on demand using StAX, SAX, or custom APIs.
> Models are expensive in terms of both time and memory. There's been talk
> of integrating in XMLBeans, and I know XMLBeans already has some form of
> backing event store for everything it does. I haven't looked into the
> performance of XMLBeans, but something like that backing store would
> probably be a great basis for what you need (and even has XPath and such
> already implemented on top of it).
> 
> I've also implemented a simple parse event store for my XBIS project
> (http://www.xbis.org - the parse event store is currently designed
> around SAX, and can be found in the eventstore package
> http://xbis.sourceforge.net/api/index.html). This gave excellent
> performance (I think replaying the event stream at least 10X parser
> speed) at a resonable memory cost (about 2X the actual size of the
> document text for the cases I looked at) without much work on
> optimization. Working with even an efficient document model is likely
> going to be both considerably slower and considerably heavier in memory
> usage.
> 
> The real limitation I saw for a parse event store was just that the
> parser APIs are inefficient for working with the data - attributes have
> to be kept as memory-consuming Strings rather than just character
> ranges, and in the case of SAX have to be organized into structures for
> reporting; namespaces are passed in the form of URIs and prefixes rather
> than objects (forcing applications to go through the same work the
> parser has done to associate the two); etc. If you actually designed a
> parse event stream interface rather than working with either SAX or StAX
> you could probably push the efficiency even higher (in other words, use
> the event store as an adapter between the parser and your own internal
> event stream API).
> 
>   - Dennis
>

Re: [Axis2] OM

Posted by Dennis Sosnoski <dm...@sosnoski.com>.

Sanjiva Weerawarana wrote:

>Hi Dennis,
>
>One of the reasons I'm very gung-ho about using a typed StAX approach
>is to enable one to do a schema-aware binary serialization of the
>type aware XML Infoset. Is the binary serialization you've defined
>schema aware? I know that's somewhat controversial (at least in IBM!)..
>so I'd be curious to know where you stand on it.
>  
>
I don't personally see that much need for schema-aware binary 
serializations. They can certainly be more efficient than generic XML 
serializations, but the cost of giving up XML compatibility seems high 
to me. Sun made a big deal of their performance gains from schema-aware 
serializations in Fast Web Services, but I get even better performance 
out of my JibxSoap framework while still using ordinary text XML (as far 
as I can tell - they don't publish source code, so I can't actually try 
it with the same data). See 
http://www.sosnoski.com/presents/cleansoap/comparing.html for details 
(which I'm planning to update with Axis 1.2 final whenever that's out). 
I guess the moral is that there are many other areas where performance 
can be improved before text becomes a real bottleneck in most systems.

I would like to see a format like XBIS that could be plugged in just 
like gzip compression over the wire. I think for many applications the 
tradeoffs of a text-preserving format that's both faster and more 
compact than plain text would be very useful. Unfortunately, Sun and 
that crowd are pushing the schema-aware serializations, while other 
groups are unwilling to see anything other than text (or gzipped text).

>In any case, would you consider getting involved to help implement 
>some of those alternate serializations possibilities? Your help would
>be greatly appreciated!
>  
>
I don't think I can take time for actual development work on Axis2. I've 
got my own JiBX data binding and JibxSoap web services to extend, as 
well as the paid projects that keep me going. If something's especially 
interesting the priorities are always subject to change, though!

  - Dennis

Re: [Axis2] OM

Posted by Sanjiva Weerawarana <sa...@opensource.lk>.

Hi Dennis,

One of the reasons I'm very gung-ho about using a typed StAX approach
is to enable one to do a schema-aware binary serialization of the
type aware XML Infoset. Is the binary serialization you've defined
schema aware? I know that's somewhat controversial (at least in IBM!)..
so I'd be curious to know where you stand on it.

In any case, would you consider getting involved to help implement 
some of those alternate serializations possibilities? Your help would
be greatly appreciated!

Sanjiva.

----- Original Message ----- 
From: "Dennis Sosnoski" <dm...@sosnoski.com>
To: <ax...@ws.apache.org>
Sent: Tuesday, November 02, 2004 11:35 PM
Subject: Re: [Axis2] OM


> Aleksander Slominski wrote:
> 
> > Dennis Sosnoski wrote:
> >
> >> First off, if you really want to keep performance high then I urge 
> >> you not to build a model. I'd instead suggest something like a parse 
> >> event store that you can replay on demand using StAX, SAX, or custom 
> >> APIs. ...
> >
> >
> > XmlBeans uses model too (DOM2 like store). it would be interesting to 
> > see independent performance results including memory footprint
> 
> I've been meaning to run a test (and include .NET's data binding in 
> hopes of getting permission to show the results - I think there's been 
> some legal changes that mean they'll be removing their restrictions in 
> the future). Hopefully I'll get this done in the next couple of weeks.
> 
> >
> >>
> >> I've also implemented a simple parse event store for my XBIS project 
> >> (http://www.xbis.org - the parse event store is currently designed 
> >> around SAX, and can be found in the eventstore package 
> >> http://xbis.sourceforge.net/api/index.html). This gave excellent 
> >> performance (I think replaying the event stream at least 10X parser 
> >> speed) at a resonable memory cost (about 2X the actual size of the 
> >> document text for the cases I looked at) without much work on 
> >> optimization. Working with even an efficient document model is likely 
> >> going to be both considerably slower and considerably heavier in 
> >> memory usage.
> >
> >
> > i think that is what we try to do but optimized for SOAP where SOAP 
> > headers are kept in memory and accessible in whatever API is 
> > needed/standard such as DOM, SAAJ, etc. - we have to optimize for this 
> > common case however SOAP body can be retrieved by app directly as 
> > event stream if required.
> >
> > currently thinking is around using SAX, StAX, XPP streams but having 
> > even more optimized stream should only make things better :)
> 
> Both Srinath's reply and this one clarified this for me. I guess I'd 
> taken the discussion of a "model" too literally and not looked into the 
> details of the implementation. Sounds like you're heading in a good 
> direction (including with the OMNamespace)!
> 
> >
> >> If you actually designed a parse event stream interface rather than 
> >> working with either SAX or StAX you could probably push the 
> >> efficiency even higher (in other words, use the event store as an 
> >> adapter between the parser and your own internal event stream API).
> >
> >
> >
> > if you have done all hard work then i see why not try to use it :)
> >
> > what is the license for your API/source code? can it be included in 
> > AXIS2 (either in OM core or as optional component)?
> >
> My code's BSD in any case, but I'd gladly donate it to the project if 
> you think it's useful. The code's in the download and also in 
> SourceForge CVS from the project site (http://www.xbis.org). I wasn't 
> referring to my code in particular as the event store, though.
> 
> Except for the namespace issue I think the XMLPull API is great for 
> working with this type of store. If you'd extend the XMLPull API to use 
> namespace objects (which I'd love to see) you could use that directly. I 
> don't know about StAX - certainly the object API version is going to add 
> a lot of  overhead, and I don't know the details of how the StAX 
> low-level API differs from XMLPull.
> 
>   - Dennis
>

[Axis2] tomorrow chat and my take on AXIOM [Re: [Axis2] OM

Posted by Aleksander Slominski <as...@cs.indiana.edu>.

Dennis Sosnoski wrote:

> BTW, the XBIS code also contains timing test implementations so you 
> can see how long it takes to parse a document directly, to parse to 
> the XBIS event store, and to replay the parse from the event store (as 
> well as the document format tests that are the main focus of XBIS). It 
> only supports SAX parser comparisons at present (and SAX parse event 
> stream replay) because that was what I wanted to test against (most 
> widely used, and also the only form supported by the JAXP transform 
> for generating output document text).

i will take a look on XBIS on context of AXIOM and I will also look on 
minimizing memory footprint and reuse such as OMNamespace and directly 
using String as children of OMNode - that should not be too difficult if 
java.util.List as it is very flexible (LinkedList and ArrayList) and 
well tested storage mechanism.

one of the remaining OM API issues is how API to expose SOAP:Body stream 
should look and what we want to do to support MTOM (i look on it as high 
priority) and SAAJ/JAXM (i think it must be layered and not backed into 
AXIOM ...)

thanks,

alek

>
> Dennis Sosnoski wrote:
>
>> Aleksander Slominski wrote:
>>
>>> Dennis Sosnoski wrote:
>>>
>>>> First off, if you really want to keep performance high then I urge 
>>>> you not to build a model. I'd instead suggest something like a 
>>>> parse event store that you can replay on demand using StAX, SAX, or 
>>>> custom APIs. ...
>>>
>>>
>>>
>>>
>>> XmlBeans uses model too (DOM2 like store). it would be interesting 
>>> to see independent performance results including memory footprint
>>
>>
>>
>> I've been meaning to run a test (and include .NET's data binding in 
>> hopes of getting permission to show the results - I think there's 
>> been some legal changes that mean they'll be removing their 
>> restrictions in the future). Hopefully I'll get this done in the next 
>> couple of weeks.
>>
>>>
>>>>
>>>> I've also implemented a simple parse event store for my XBIS 
>>>> project (http://www.xbis.org - the parse event store is currently 
>>>> designed around SAX, and can be found in the eventstore package 
>>>> http://xbis.sourceforge.net/api/index.html). This gave excellent 
>>>> performance (I think replaying the event stream at least 10X parser 
>>>> speed) at a resonable memory cost (about 2X the actual size of the 
>>>> document text for the cases I looked at) without much work on 
>>>> optimization. Working with even an efficient document model is 
>>>> likely going to be both considerably slower and considerably 
>>>> heavier in memory usage.
>>>
>>>
>>>
>>>
>>> i think that is what we try to do but optimized for SOAP where SOAP 
>>> headers are kept in memory and accessible in whatever API is 
>>> needed/standard such as DOM, SAAJ, etc. - we have to optimize for 
>>> this common case however SOAP body can be retrieved by app directly 
>>> as event stream if required.
>>>
>>> currently thinking is around using SAX, StAX, XPP streams but having 
>>> even more optimized stream should only make things better :)
>>
>>
>>
>> Both Srinath's reply and this one clarified this for me. I guess I'd 
>> taken the discussion of a "model" too literally and not looked into 
>> the details of the implementation. Sounds like you're heading in a 
>> good direction (including with the OMNamespace)!
>>
>>>
>>>> If you actually designed a parse event stream interface rather than 
>>>> working with either SAX or StAX you could probably push the 
>>>> efficiency even higher (in other words, use the event store as an 
>>>> adapter between the parser and your own internal event stream API).
>>>
>>>
>>>
>>>
>>>
>>> if you have done all hard work then i see why not try to use it :)
>>>
>>> what is the license for your API/source code? can it be included in 
>>> AXIS2 (either in OM core or as optional component)?
>>>
>> My code's BSD in any case, but I'd gladly donate it to the project if 
>> you think it's useful. The code's in the download and also in 
>> SourceForge CVS from the project site (http://www.xbis.org). I wasn't 
>> referring to my code in particular as the event store, though.
>>
>> Except for the namespace issue I think the XMLPull API is great for 
>> working with this type of store. If you'd extend the XMLPull API to 
>> use namespace objects (which I'd love to see) you could use that 
>> directly. I don't know about StAX - certainly the object API version 
>> is going to add a lot of  overhead, and I don't know the details of 
>> how the StAX low-level API differs from XMLPull.
>>
>>  - Dennis
>
>


-- 
The best way to predict the future is to invent it - Alan Kay

Re: [Axis2] OM

Posted by Dennis Sosnoski <dm...@sosnoski.com>.

BTW, the XBIS code also contains timing test implementations so you can 
see how long it takes to parse a document directly, to parse to the XBIS 
event store, and to replay the parse from the event store (as well as 
the document format tests that are the main focus of XBIS). It only 
supports SAX parser comparisons at present (and SAX parse event stream 
replay) because that was what I wanted to test against (most widely 
used, and also the only form supported by the JAXP transform for 
generating output document text).

  - Dennis

Dennis Sosnoski wrote:

> Aleksander Slominski wrote:
>
>> Dennis Sosnoski wrote:
>>
>>> First off, if you really want to keep performance high then I urge 
>>> you not to build a model. I'd instead suggest something like a parse 
>>> event store that you can replay on demand using StAX, SAX, or custom 
>>> APIs. ...
>>
>>
>>
>> XmlBeans uses model too (DOM2 like store). it would be interesting to 
>> see independent performance results including memory footprint
>
>
> I've been meaning to run a test (and include .NET's data binding in 
> hopes of getting permission to show the results - I think there's been 
> some legal changes that mean they'll be removing their restrictions in 
> the future). Hopefully I'll get this done in the next couple of weeks.
>
>>
>>>
>>> I've also implemented a simple parse event store for my XBIS project 
>>> (http://www.xbis.org - the parse event store is currently designed 
>>> around SAX, and can be found in the eventstore package 
>>> http://xbis.sourceforge.net/api/index.html). This gave excellent 
>>> performance (I think replaying the event stream at least 10X parser 
>>> speed) at a resonable memory cost (about 2X the actual size of the 
>>> document text for the cases I looked at) without much work on 
>>> optimization. Working with even an efficient document model is 
>>> likely going to be both considerably slower and considerably heavier 
>>> in memory usage.
>>
>>
>>
>> i think that is what we try to do but optimized for SOAP where SOAP 
>> headers are kept in memory and accessible in whatever API is 
>> needed/standard such as DOM, SAAJ, etc. - we have to optimize for 
>> this common case however SOAP body can be retrieved by app directly 
>> as event stream if required.
>>
>> currently thinking is around using SAX, StAX, XPP streams but having 
>> even more optimized stream should only make things better :)
>
>
> Both Srinath's reply and this one clarified this for me. I guess I'd 
> taken the discussion of a "model" too literally and not looked into 
> the details of the implementation. Sounds like you're heading in a 
> good direction (including with the OMNamespace)!
>
>>
>>> If you actually designed a parse event stream interface rather than 
>>> working with either SAX or StAX you could probably push the 
>>> efficiency even higher (in other words, use the event store as an 
>>> adapter between the parser and your own internal event stream API).
>>
>>
>>
>>
>> if you have done all hard work then i see why not try to use it :)
>>
>> what is the license for your API/source code? can it be included in 
>> AXIS2 (either in OM core or as optional component)?
>>
> My code's BSD in any case, but I'd gladly donate it to the project if 
> you think it's useful. The code's in the download and also in 
> SourceForge CVS from the project site (http://www.xbis.org). I wasn't 
> referring to my code in particular as the event store, though.
>
> Except for the namespace issue I think the XMLPull API is great for 
> working with this type of store. If you'd extend the XMLPull API to 
> use namespace objects (which I'd love to see) you could use that 
> directly. I don't know about StAX - certainly the object API version 
> is going to add a lot of  overhead, and I don't know the details of 
> how the StAX low-level API differs from XMLPull.
>
>  - Dennis

Re: [Axis2] OM

Posted by Dennis Sosnoski <dm...@sosnoski.com>.

Aleksander Slominski wrote:

> Dennis Sosnoski wrote:
>
>> First off, if you really want to keep performance high then I urge 
>> you not to build a model. I'd instead suggest something like a parse 
>> event store that you can replay on demand using StAX, SAX, or custom 
>> APIs. ...
>
>
> XmlBeans uses model too (DOM2 like store). it would be interesting to 
> see independent performance results including memory footprint

I've been meaning to run a test (and include .NET's data binding in 
hopes of getting permission to show the results - I think there's been 
some legal changes that mean they'll be removing their restrictions in 
the future). Hopefully I'll get this done in the next couple of weeks.

>
>>
>> I've also implemented a simple parse event store for my XBIS project 
>> (http://www.xbis.org - the parse event store is currently designed 
>> around SAX, and can be found in the eventstore package 
>> http://xbis.sourceforge.net/api/index.html). This gave excellent 
>> performance (I think replaying the event stream at least 10X parser 
>> speed) at a resonable memory cost (about 2X the actual size of the 
>> document text for the cases I looked at) without much work on 
>> optimization. Working with even an efficient document model is likely 
>> going to be both considerably slower and considerably heavier in 
>> memory usage.
>
>
> i think that is what we try to do but optimized for SOAP where SOAP 
> headers are kept in memory and accessible in whatever API is 
> needed/standard such as DOM, SAAJ, etc. - we have to optimize for this 
> common case however SOAP body can be retrieved by app directly as 
> event stream if required.
>
> currently thinking is around using SAX, StAX, XPP streams but having 
> even more optimized stream should only make things better :)

Both Srinath's reply and this one clarified this for me. I guess I'd 
taken the discussion of a "model" too literally and not looked into the 
details of the implementation. Sounds like you're heading in a good 
direction (including with the OMNamespace)!

>
>> If you actually designed a parse event stream interface rather than 
>> working with either SAX or StAX you could probably push the 
>> efficiency even higher (in other words, use the event store as an 
>> adapter between the parser and your own internal event stream API).
>
>
>
> if you have done all hard work then i see why not try to use it :)
>
> what is the license for your API/source code? can it be included in 
> AXIS2 (either in OM core or as optional component)?
>
My code's BSD in any case, but I'd gladly donate it to the project if 
you think it's useful. The code's in the download and also in 
SourceForge CVS from the project site (http://www.xbis.org). I wasn't 
referring to my code in particular as the event store, though.

Except for the namespace issue I think the XMLPull API is great for 
working with this type of store. If you'd extend the XMLPull API to use 
namespace objects (which I'd love to see) you could use that directly. I 
don't know about StAX - certainly the object API version is going to add 
a lot of  overhead, and I don't know the details of how the StAX 
low-level API differs from XMLPull.

  - Dennis

Re: [Axis2] OM

Posted by Aleksander Slominski <as...@cs.indiana.edu>.

Dennis Sosnoski wrote:

> I spent the weekend catching up with the last couple of months of Axis 
> emails and saw some of the activity around the OM. I have a few 
> thoughts I wanted to offer on this.
>
> First off, if you really want to keep performance high then I urge you 
> not to build a model. I'd instead suggest something like a parse event 
> store that you can replay on demand using StAX, SAX, or custom APIs. 
> Models are expensive in terms of both time and memory. There's been 
> talk of integrating in XMLBeans, and I know XMLBeans already has some 
> form of backing event store for everything it does. I haven't looked 
> into the performance of XMLBeans, but something like that backing 
> store would probably be a great basis for what you need (and even has 
> XPath and such already implemented on top of it).

XmlBeans uses model too (DOM2 like store). it would be interesting to 
see independent performance results including memory footprint

>
> I've also implemented a simple parse event store for my XBIS project 
> (http://www.xbis.org - the parse event store is currently designed 
> around SAX, and can be found in the eventstore package 
> http://xbis.sourceforge.net/api/index.html). This gave excellent 
> performance (I think replaying the event stream at least 10X parser 
> speed) at a resonable memory cost (about 2X the actual size of the 
> document text for the cases I looked at) without much work on 
> optimization. Working with even an efficient document model is likely 
> going to be both considerably slower and considerably heavier in 
> memory usage.

i think that is what we try to do but optimized for SOAP where SOAP 
headers are kept in memory and accessible in whatever API is 
needed/standard such as DOM, SAAJ, etc. - we have to optimize for this 
common case however SOAP body can be retrieved by app directly as event 
stream if required.

currently thinking is around using SAX, StAX, XPP streams but having 
even more optimized stream should only make things better :)

>
> The real limitation I saw for a parse event store was just that the 
> parser APIs are inefficient for working with the data - attributes 
> have to be kept as memory-consuming Strings rather than just character 
> ranges, and in the case of SAX have to be organized into structures 
> for reporting; namespaces are passed in the form of URIs and prefixes 
> rather than objects (forcing applications to go through the same work 
> the parser has done to associate the two); etc. 

that is precisely point of having OMNamespace in OM API.

> If you actually designed a parse event stream interface rather than 
> working with either SAX or StAX you could probably push the efficiency 
> even higher (in other words, use the event store as an adapter between 
> the parser and your own internal event stream API).


if you have done all hard work then i see why not try to use it :)

what is the license for your API/source code? can it be included in 
AXIS2 (either in OM core or as optional component)?

thanks,

alek

-- 
The best way to predict the future is to invent it - Alan Kay