You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@axis.apache.org by Martin Jericho <ma...@radiocity.com.au> on 2002/09/23 06:31:02 UTC

Streaming RPC calls

I have implemented a special class which allows streaming of complex bean types in an RPC call, but have found that AXIS stores the whole message in memory before sending / deserializing it anyway, which defeats the whole purpose of what I am trying to do.

The first problem is that AXIS doesn't support chunked http messages, so the content length must be known before sending it.

QUESTION 1:  Are there any plans to implement chunked transfer encoding soon?

The SOAPPart class stores the entire message as a byte array before it is sent across the wire, which apart from the reason above is unnecessary.

QUESTION 2:  Are there any plans to clean this up to allow direct streaming from the serializers onto the wire?

I have only researched the client side to see how and why the entire message is stored in memory before sending it.  I have also noticed that the server stores it all in memory before deserializing it, but I haven't investigated whether there is any good reason for this.

QUESTION 3:  If AXIS won't support this any time soon, which I assume is the case, does anyone know whether the Sun JAXRPC reference implementation suffers the same problem?  I would also like to know whether the MS .NET implementation has the same problem if anyone knows about it.

QUESTION 4:  All I really want to do is send a large array of structured data in a platform independent way.  I would like to use the standard RPC encoding of SOAP to avoid having to define my own XML schemas for the data, but I'm not sure whether today's SOAP implementations are mature enough to use for this purpose.  What do other people do in this situation?  I can't imagine I'm the only one.

I would be more than happy to share my streaming serialization/deserialization code if anyone is interested, although it is not of much use with the current version of axis.

Thanks
Martin Jericho




RE: Streaming RPC calls

Posted by Ricky Ho <ri...@cisco.com>.
Hi Graham,

Can an interceptor be "stream based" as well (something like a SAX callback 
API) ?
I agree with you the added complexity and technical challenges but also see 
it will be valuable in some cases.

Rgds, Ricky

At 02:54 AM 9/23/2002 -0500, you wrote:
>hi martin,
>
>glue currently creates a dom representation of a soap message before 
>streaming.
>funnily enough, when doing benchmarking of soap messages that have an 
>on-the-wire
>representation > 1Mbyte, this approach seems to be faster than vendors that
>tout a pure streaming approach.
>
>i question whether your requirement of streaming millions of records in a 
>single
>message is appropriate. is there not a better way in which the client 
>requests a
>subset during a particular operation? alternatively, passing the records 
>as a SOAP
>attachment bypasses this problem completely.
>
>the main drawback that we've found regarding a pure streaming architecture
>(we have considered this many times in the past) is the extra complexity it
>injects at the API level for those who want to add interceptors, XSLT 
>transforms, etc.
>
>cheers,
>graham
>-----Original Message-----
>From: Martin Jericho [mailto:martin.j@radiocity.com.au]
>Sent: Monday, September 23, 2002 1:29 AM
>To: axis-user@xml.apache.org
>Subject: Re: Streaming RPC calls
>
>Hi Graham
>
>Thanks for your interest.  I have included some source files to show you 
>how I did the streaming.  It's not a complete set so it doesn't compile, 
>but it should give you an idea how the client side works at least.  (Sorry 
>for posting an attachment everyone!  It's only 10K)
>
>The original BatchItemStreamArray class as generated by WSDL2Java simply 
>contains an array of BatchItem objects called batchItems.  Either the 
>client or server can use this default implementation, at the risk of 
>running into memory problems if the array is large.  The new 
>BatchItemStreamArray class emulates the original class, but extends the 
>StreamArray class, which contains all the hooks necessary to 
>serialize/deserialize on the fly without having to actually create the 
>array.  It's all a bit unrefined at the moment, since I'm just doing a 
>proof of concept as a first step.
>
>The application I am designing must be able to cope with hundreds of 
>thousands or even millions of records.  The size of each record is 
>dynamic, so I can't put an exact figure on the number of MB.
>
>Does Glue have the same limitations as axis in this regard?
>
>----- Original Message -----
>From: <ma...@themindelectric.com>graham glass
>To: <ma...@xml.apache.org>axis-user@xml.apache.org
>Sent: Monday, September 23, 2002 2:38 PM
>Subject: RE: Streaming RPC calls
>
>hi martin,
>
>out of interest, how would you write an interceptor to process inputs and 
>outputs during
>the streaming process? one advantage of having a DOM represention is that 
>it's easy to
>manipulate during inbound/outbound messaging using tools like xalan. would 
>you provide
>some kind of streaming interceptor interface?
>
>also, out of interest, how large is the data you are sending (on the wire)?
>
>cheers,
>graham
>-----Original Message-----
>From: Martin Jericho [mailto:martin.j@radiocity.com.au]
>Sent: Sunday, September 22, 2002 11:31 PM
>To: axis-user@xml.apache.org
>Subject: Streaming RPC calls
>
>I have implemented a special class which allows streaming of complex bean 
>types in an RPC call, but have found that AXIS stores the whole message in 
>memory before sending / deserializing it anyway, which defeats the whole 
>purpose of what I am trying to do.
>
>The first problem is that AXIS doesn't support chunked http messages, so 
>the content length must be known before sending it.
>
>QUESTION 1:  Are there any plans to implement chunked transfer encoding soon?
>
>The SOAPPart class stores the entire message as a byte array before it is 
>sent across the wire, which apart from the reason above is unnecessary.
>
>QUESTION 2:  Are there any plans to clean this up to allow direct 
>streaming from the serializers onto the wire?
>
>I have only researched the client side to see how and why the entire 
>message is stored in memory before sending it.  I have also noticed that 
>the server stores it all in memory before deserializing it, but I haven't 
>investigated whether there is any good reason for this.
>
>QUESTION 3:  If AXIS won't support this any time soon, which I assume is 
>the case, does anyone know whether the Sun JAXRPC reference implementation 
>suffers the same problem?  I would also like to know whether the MS .NET 
>implementation has the same problem if anyone knows about it.
>
>QUESTION 4:  All I really want to do is send a large array of structured 
>data in a platform independent way.  I would like to use the standard RPC 
>encoding of SOAP to avoid having to define my own XML schemas for the 
>data, but I'm not sure whether today's SOAP implementations are mature 
>enough to use for this purpose.  What do other people do in this 
>situation?  I can't imagine I'm the only one.
>
>I would be more than happy to share my streaming 
>serialization/deserialization code if anyone is interested, although it is 
>not of much use with the current version of axis.
>
>Thanks
>Martin Jericho
>
>
>

Re: Streaming RPC calls

Posted by Ricky Ho <ri...@cisco.com>.
"Streaming" also means the processing is "SEQUENTIAL".

Rgds, Ricky


At 07:44 PM 9/23/2002 -0300, Rogerio Saran wrote:
>Martin, forget webservices. Nowadays they are all about "pull" or 
>"submit", and you want to "push" data.
>
>Did you consider a messaging system?
>
>First, you probably want to chunk it to make each "delivery unit" more 
>manageable. Next, you want them to be delivered as chunks, so a broken 
>connection will not compromise the whole transfer. Finally you want it 
>fast so you need to open several sockets at once.
>
>If you need to exchange a lot of data it does not makes sense to force it 
>through a single TCP connection. Most servers have a limited ability  to 
>reach high speed in a single connection due to TCP handshaking limitations.
>
>A quick and dirty recipe to transfer a zillion records of structured data 
>as a stream:
>
>a) Again, forget webservices and RPC.
>    This is a raw data transfer problem.
>
>b) Chunk your data and pack it in well formed XML documents, making them 
>suitable to be handled in lots of platforms.
>
>c) Send them through a messaging system. The simpler, the better. Is SMTP 
>good enough? Go for it. If you want a sophisticated, "dog wags the tail" 
>solution, there are also plenty of message-queue servers around.
>
>d) Need to ensure chunk sequence? Here you will cover your hands with some 
>dirt to write a transport handshake mechanism, like windows in TCP.
>
>
>*Saran
>
>Martin Jericho wrote:
>>
>>                 QUESTION 4:  All I really want to do is send a large
>>                 array of structured data in a platform independent 
>> way.                 I would like to use the standard RPC encoding of SOAP to
>>                 avoid having to define my own XML schemas for the data,
>>                 but I'm not sure whether today's SOAP implementations
>>                 are mature enough to use for this purpose.  What do
>>                 other people do in this situation?  I can't imagine I'm
>>                 the only one.
>>
>>
>>                 Thanks
>>                 Martin Jericho
>>
>>
>>
>
>


Re: Performance problems with RPC messages over 20k

Posted by Dennis Sosnoski <dm...@sosnoski.com>.
WJCarpenter wrote:

>>the times). These figures are from Sun JRE 1.3.1 on Linux, running on a
>> PIIIm with 256MB RAM. I used "-Xmx64M -Xms64M" options on the Java
>>command line to avoid a lot of threshing as the heap grew; running with
>>    
>>
>I am curious if you measured heap use and if 64 MB is enough?  I haven't
>done any testing of this sort with Axis, but in Apache SOAP I routinely
>use 256 MB and it is often worth it.  Anyhow, it would be interesting
>to hear about memory figures for Axis, too.
>  
>
 From a quick look with "-verbose:gc" on the client JVM the 320KB 
messages are showing several partial garbage collections for each 
request/response round trip, with a total of about 12MB collected. In a 
run of 11 round trips I had one full garbage collection after the 10th 
round trip, which collected about 58MB. Judging from this it looks like 
the total trash generated on the client side is about 18MB for each 
round trip of my 320KB message.

That's pretty high, but consistent with what I've seen of the code. 
There's a lot of short-lived object creation. A lot of it looks tied to 
the JAX-RPC interface, and that's going to be difficult to change.

 From looking at these figures it doesn't look like adding more memory 
is going to help on the client side, unless you're sending really huge 
(multi-MB) messages - in which case you should probably not be using 
SOAP. :-) If you run with the default JVM settings you start with only 
2MB, which is definitely not enough, but setting "-Xms32M" or "-Xms64M" 
should be more than enough for any practical client applications. For 
the server more memory is definitely going to be useful, especially for 
real world applications with multiple overlapping requests. Just how 
much depends on the message size and rate, as well as other demands on 
the server - it's probably good to start with at least "-Xmx64M -Xms64M" 
and try going up from there to see if it helps your particular environment.

  - Dennis

Dennis M. Sosnoski
Enterprise Java, XML, and Web Services Support
http://www.sosnoski.com


Re: Performance problems with RPC messages over 20k

Posted by WJCarpenter <bi...@carpenter.org>.
> the times). These figures are from Sun JRE 1.3.1 on Linux, running on a
>  PIIIm with 256MB RAM. I used "-Xmx64M -Xms64M" options on the Java
> command line to avoid a lot of threshing as the heap grew; running with

I am curious if you measured heap use and if 64 MB is enough?  I haven't
done any testing of this sort with Axis, but in Apache SOAP I routinely
use 256 MB and it is often worth it.  Anyhow, it would be interesting
to hear about memory figures for Axis, too.




Re: Performance problems with RPC messages over 20k

Posted by Dennis Sosnoski <dm...@sosnoski.com>.
I investigated this further and found that there definitely is a problem 
in 1.0 with large messages using RPC encoding. With my particular test 
data it started showing up at the 320KB message size and got 
exponentially worse with larger sizes. I think I've tracked this down, 
and have entered a bug report and fix against the offending code. In my 
tests the fix keeps performance stable at least into the 1.3MB range.

That done, I figured I should correct my earlier, overly- (or at least 
prematurely-) optimistic, statement about Axis performance with large 
messages. :-)

  - Dennis

Dennis M. Sosnoski
Enterprise Java, XML, and Web Services Support
http://www.sosnoski.com

Dennis Sosnoski wrote:

> In my own tests (running client and server on a single system) I found 
> Axis performance went up at first as I increased the message size, 
> then basically leveled off. Here's what my raw results look like:
>
> Message size   Roundtrip Time (ms.)
>    10KB                     107
>    20KB                     162
>    40KB                     289
>    80KB                     491
>   160KB                    981
>   320KB                   2000
>
> Martin Jericho wrote:
>
>> I was doing some benchmarking to test how much it would impact on
>> performance to break up a single, large request into several smaller 
>> ones.
>> I was expecting of course that for a fixed volume of data, dividing 
>> it into
>> more separate messages would increase the overheads and make things 
>> slower.
>>
>> What I found was quite surprising.  It seems that once a message gets 
>> bigger
>> than 20kb, the response time increases at a rate much greater than the
>> linear relationship one would expect.  I did some tests with a bean
>> containing an array of other beans.  The size of the message with no 
>> array
>> elements is 1571 bytes, and each array element is 772 bytes.
>>
>> The times recorded are from calling the service method on the client 
>> until
>> receiving the response back from the server (the response is just a 
>> string).
>>
>> The results were as follows:
>>
>> Number of calls,    Number of Array Items,    Total Response Time in 
>> Seconds
>> 0001,    1000,    20.7
>> 0002,    0500,    13.6
>> 0004,    0250,    9.9
>> 0005,    0200,    9.7
>> 0010,    0100,    7.6
>> 0020,    0050,    7.2
>> 0040,    0025,    6.8
>> 0050,    0020,    6.9
>> 0100,    0010,    7.3
>> 0200,    0005,    9.4
>> 0250,    0004,    10.5
>> 0500,    0002,    15.4
>> 1000,    0001,    25.6
>>
>> So the most efficent way to send my 1000 beans was in 40 separate 
>> messages
>> each containing 25 beans, each of about 20kb in size.
>>
>> Does anyone know an explanation for this?  It seems to me that there 
>> must be
>> something in axis which has been very poorly implemented to cause this
>> blowout in performace.
>>
>>
>>  
>>
>


Re: Performance problems with RPC messages over 20k

Posted by Dennis Sosnoski <dm...@sosnoski.com>.
  Martin,

I noticed this email a while ago and wanted to look into it. I see by 
your recent email that you're now getting away from using Axis, but 
thought it might be of interest to other people on the list anyway.

Assuming you were using separate client and server systems for this 
test, I suspect you'd see a similar curve using any SOAP implementation. 
If you consider how this works, when you send all the data as a single 
message you have a completely linear process - the request is generated 
as text on the client, then sent to the server, then converted back into 
objects on the server, and finally processed by your server code. The 
response then goes through the same series of steps getting back to the 
client. When you break your data up into several requests you allow 
several of these steps to be executed in parallel. In particular, your 
client can be working on one request while an earlier request is being 
transmitted to the server, the server is working on an earlier request 
or response, and a still earlier response is being transmitted back to 
the client.

If you ran your tests with client and server on the same system I 
wouldn't expect to see the kind of results you found. Let me know if 
this is the case, perhaps there are some unusual aspects to your data 
that account for the differences.

Seeing this did make me curious about Axis performance, though. In my 
own tests (running client and server on a single system) I found Axis 
performance went up at first as I increased the message size, then 
basically leveled off. Here's what my raw results look like:

Message size   Roundtrip Time (ms.)
    10KB                     107
    20KB                     162
    40KB                     289
    80KB                     491
   160KB                    981
   320KB                   2000

Message sizes are the actual character count for the request and 
response, times are the average over 11 requests and responses, 
excluding the first request and response (to avoid bringing in class 
loading overhead and such - this is basically a constant added to all 
the times). These figures are from Sun JRE 1.3.1 on Linux, running on a 
PIIIm with 256MB RAM. I used "-Xmx64M -Xms64M" options on the Java 
command line to avoid a lot of threshing as the heap grew; running with 
the default settings will add more overhead to the handling time of 
larger messages initially, until the JVM gets enough memory to run 
efficiently.

My data consists of an object graph with variable numbers of objects. 
There are a lot of links between objects, so this might not be typical 
of what you'd see working with flatter data structures. My actual 
service processing just reverses the order of elements in arrays, so it 
doesn't contribute anything significant to the overall time.

  - Dennis

Dennis M. Sosnoski
Enterprise Java, XML, and Web Services Support
http://www.sosnoski.com

Martin Jericho wrote:

>I was doing some benchmarking to test how much it would impact on
>performance to break up a single, large request into several smaller ones.
>I was expecting of course that for a fixed volume of data, dividing it into
>more separate messages would increase the overheads and make things slower.
>
>What I found was quite surprising.  It seems that once a message gets bigger
>than 20kb, the response time increases at a rate much greater than the
>linear relationship one would expect.  I did some tests with a bean
>containing an array of other beans.  The size of the message with no array
>elements is 1571 bytes, and each array element is 772 bytes.
>
>The times recorded are from calling the service method on the client until
>receiving the response back from the server (the response is just a string).
>
>The results were as follows:
>
>Number of calls,    Number of Array Items,    Total Response Time in Seconds
>0001,    1000,    20.7
>0002,    0500,    13.6
>0004,    0250,    9.9
>0005,    0200,    9.7
>0010,    0100,    7.6
>0020,    0050,    7.2
>0040,    0025,    6.8
>0050,    0020,    6.9
>0100,    0010,    7.3
>0200,    0005,    9.4
>0250,    0004,    10.5
>0500,    0002,    15.4
>1000,    0001,    25.6
>
>So the most efficent way to send my 1000 beans was in 40 separate messages
>each containing 25 beans, each of about 20kb in size.
>
>Does anyone know an explanation for this?  It seems to me that there must be
>something in axis which has been very poorly implemented to cause this
>blowout in performace.
>
>
>  
>


Performance problems with RPC messages over 20k

Posted by Martin Jericho <ma...@radiocity.com.au>.
I was doing some benchmarking to test how much it would impact on
performance to break up a single, large request into several smaller ones.
I was expecting of course that for a fixed volume of data, dividing it into
more separate messages would increase the overheads and make things slower.

What I found was quite surprising.  It seems that once a message gets bigger
than 20kb, the response time increases at a rate much greater than the
linear relationship one would expect.  I did some tests with a bean
containing an array of other beans.  The size of the message with no array
elements is 1571 bytes, and each array element is 772 bytes.

The times recorded are from calling the service method on the client until
receiving the response back from the server (the response is just a string).

The results were as follows:

Number of calls,    Number of Array Items,    Total Response Time in Seconds
0001,    1000,    20.7
0002,    0500,    13.6
0004,    0250,    9.9
0005,    0200,    9.7
0010,    0100,    7.6
0020,    0050,    7.2
0040,    0025,    6.8
0050,    0020,    6.9
0100,    0010,    7.3
0200,    0005,    9.4
0250,    0004,    10.5
0500,    0002,    15.4
1000,    0001,    25.6

So the most efficent way to send my 1000 beans was in 40 separate messages
each containing 25 beans, each of about 20kb in size.

Does anyone know an explanation for this?  It seems to me that there must be
something in axis which has been very poorly implemented to cause this
blowout in performace.



Re: Streaming RPC calls

Posted by Vikas Manocha <vi...@yahoo.com>.
Martin,

I have a similar design problem. I need to transfer
huge amounts of data in a fast, efficient, scalable,
reliable manner. I am using SOAP to communicate
between the two systems and I would like to use SOAP
to do the data transfer also, rather than look at
something else.

Looking at where SOAP & Axis currently are, I have
decided to chunk the data at application level myself
and then then send that as SOAP Attachments (ugly but
should work). In order to make efficient use of
bandwidth, I am also planning to use the zlib library
to compress the attachment.

Ideally, I would have liked for SOAP & Axis to have
built in support to do this. I think when DIME is
accepted as a standard, that may be the right
solution. Currently you can get more information on
DIME at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnglobspec/html/dimeindex.asp.

I am certainly interested in any alternative solutions
that other people can recommend. Also would love to
hear from you on what you finally decide to go with.

thx,

Vikas.

--- Martin Jericho <ma...@radiocity.com.au> wrote:
> 
> ----- Original Message -----
> From: "Rogerio Saran" <rs...@organox.com.br>
> To: <ax...@xml.apache.org>
> Sent: Tuesday, September 24, 2002 8:44 AM
> Subject: Re: Streaming RPC calls
> 
> 
> > Martin, forget webservices. Nowadays they are all
> about "pull" or
> > "submit", and you want to "push" data.
> 
> I'm not too sure why you say I'm trying to "push"
> data.  All interaction is
> initiated by the client.  Do I have an outdated
> notion of the terms push and
> pull?
> 
> > Did you consider a messaging system?
> 
> No I haven't.  I thought SOAP would be the most
> appropriate because of the
> cross-platform data encoding standard.  I haven't
> had much experience with
> messaging.  Are there messaging systems out there
> that are cross-platform
> and language independent?  I thought they were all
> tied to a particular
> platform/language.  A built-in data encoding
> standard would also be very
> nice, but not essential.  Do you have one in
> particular in mind?
> 
> >
> > First, you probably want to chunk it to make each
> "delivery unit" more
> > manageable. Next, you want them to be delivered as
> chunks, so a broken
> > connection will not compromise the whole transfer.
> Finally you want it
> > fast so you need to open several sockets at once.
> 
> I thought about this as well.  It was a compromise I
> was willing to make to
> gain the benefits of soap.  I would certainly want
> the chunking to be
> abstracted out of the API though.  I don't want to
> have to worry about it on
> the application level.
> 
> >
> > If you need to exchange a lot of data it does not
> makes sense to force
> > it through a single TCP connection. Most servers
> have a limited ability
> >   to reach high speed in a single connection due
> to TCP handshaking
> > limitations.
> >
> > A quick and dirty recipe to transfer a zillion
> records of structured
> > data as a stream:
> >
> > a) Again, forget webservices and RPC.
> >     This is a raw data transfer problem.
> 
> That is exactly what it isn't.  It's not raw data,
> it's structured data.  It
> has to be understood by different systems.  If I
> could borrrow the XML
> encoding of SOAP and use it in a messaging system I
> would be all for it, but
> that sounds suspiciously like SOAP again, only using
> a different transport
> protocol.
> 
> >
> > b) Chunk your data and pack it in well formed XML
> documents, making them
> > suitable to be handled in lots of platforms.
> >
> > c) Send them through a messaging system. The
> simpler, the better. Is
> > SMTP good enough? Go for it. If you want a
> sophisticated, "dog wags the
> > tail" solution, there are also plenty of
> message-queue servers around.
> >
> > d) Need to ensure chunk sequence? Here you will
> cover your hands with
> > some dirt to write a transport handshake
> mechanism, like windows in TCP.
> 
> Sequence doesn't matter too much, but the client
> needs to get an answer back
> when the entire message has been processed
> successfully.  I did not want to
> have the client acting as a server as well because
> of firewall issues, and
> polling for the response is unresponsive and klunky,
> so SMTP is definitely
> out of the question.
> 
> I'm certainly not committed to SOAP, but I would
> appreciate it if you could
> give me a pointer to even one example of an
> appropriate messaging system.
> 
> Thanks
> Martin
> 
> >
> >
> > *Saran
> >
> > Martin Jericho wrote:
> > >
> > >                 QUESTION 4:  All I really want
> to do is send a large
> > >                 array of structured data in a
> platform independent way.
> > >                 I would like to use the standard
> RPC encoding of SOAP to
> > >                 avoid having to define my own
> XML schemas for the data,
> > >                 but I'm not sure whether today's
> SOAP implementations
> > >                 are mature enough to use for
> this purpose.  What do
> > >                 other people do in this
> situation?  I can't imagine I'm
> > >                 the only one.
> > >
> > >
> > >                 Thanks
> > >                 Martin Jericho
> > >
> > >
> > >
> > >
> >
> >
> >
> 


__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com

Re: Streaming RPC calls

Posted by Martin Jericho <ma...@radiocity.com.au>.
----- Original Message -----
From: "Rogerio Saran" <rs...@organox.com.br>
To: <ax...@xml.apache.org>
Sent: Tuesday, September 24, 2002 8:44 AM
Subject: Re: Streaming RPC calls


> Martin, forget webservices. Nowadays they are all about "pull" or
> "submit", and you want to "push" data.

I'm not too sure why you say I'm trying to "push" data.  All interaction is
initiated by the client.  Do I have an outdated notion of the terms push and
pull?

> Did you consider a messaging system?

No I haven't.  I thought SOAP would be the most appropriate because of the
cross-platform data encoding standard.  I haven't had much experience with
messaging.  Are there messaging systems out there that are cross-platform
and language independent?  I thought they were all tied to a particular
platform/language.  A built-in data encoding standard would also be very
nice, but not essential.  Do you have one in particular in mind?

>
> First, you probably want to chunk it to make each "delivery unit" more
> manageable. Next, you want them to be delivered as chunks, so a broken
> connection will not compromise the whole transfer. Finally you want it
> fast so you need to open several sockets at once.

I thought about this as well.  It was a compromise I was willing to make to
gain the benefits of soap.  I would certainly want the chunking to be
abstracted out of the API though.  I don't want to have to worry about it on
the application level.

>
> If you need to exchange a lot of data it does not makes sense to force
> it through a single TCP connection. Most servers have a limited ability
>   to reach high speed in a single connection due to TCP handshaking
> limitations.
>
> A quick and dirty recipe to transfer a zillion records of structured
> data as a stream:
>
> a) Again, forget webservices and RPC.
>     This is a raw data transfer problem.

That is exactly what it isn't.  It's not raw data, it's structured data.  It
has to be understood by different systems.  If I could borrrow the XML
encoding of SOAP and use it in a messaging system I would be all for it, but
that sounds suspiciously like SOAP again, only using a different transport
protocol.

>
> b) Chunk your data and pack it in well formed XML documents, making them
> suitable to be handled in lots of platforms.
>
> c) Send them through a messaging system. The simpler, the better. Is
> SMTP good enough? Go for it. If you want a sophisticated, "dog wags the
> tail" solution, there are also plenty of message-queue servers around.
>
> d) Need to ensure chunk sequence? Here you will cover your hands with
> some dirt to write a transport handshake mechanism, like windows in TCP.

Sequence doesn't matter too much, but the client needs to get an answer back
when the entire message has been processed successfully.  I did not want to
have the client acting as a server as well because of firewall issues, and
polling for the response is unresponsive and klunky, so SMTP is definitely
out of the question.

I'm certainly not committed to SOAP, but I would appreciate it if you could
give me a pointer to even one example of an appropriate messaging system.

Thanks
Martin

>
>
> *Saran
>
> Martin Jericho wrote:
> >
> >                 QUESTION 4:  All I really want to do is send a large
> >                 array of structured data in a platform independent way.
> >                 I would like to use the standard RPC encoding of SOAP to
> >                 avoid having to define my own XML schemas for the data,
> >                 but I'm not sure whether today's SOAP implementations
> >                 are mature enough to use for this purpose.  What do
> >                 other people do in this situation?  I can't imagine I'm
> >                 the only one.
> >
> >
> >                 Thanks
> >                 Martin Jericho
> >
> >
> >
> >
>
>
>


Re: Streaming RPC calls

Posted by Rogerio Saran <rs...@organox.com.br>.
Martin, forget webservices. Nowadays they are all about "pull" or 
"submit", and you want to "push" data.

Did you consider a messaging system?

First, you probably want to chunk it to make each "delivery unit" more 
manageable. Next, you want them to be delivered as chunks, so a broken 
connection will not compromise the whole transfer. Finally you want it 
fast so you need to open several sockets at once.

If you need to exchange a lot of data it does not makes sense to force 
it through a single TCP connection. Most servers have a limited ability 
  to reach high speed in a single connection due to TCP handshaking 
limitations.

A quick and dirty recipe to transfer a zillion records of structured 
data as a stream:

a) Again, forget webservices and RPC.
    This is a raw data transfer problem.

b) Chunk your data and pack it in well formed XML documents, making them 
suitable to be handled in lots of platforms.

c) Send them through a messaging system. The simpler, the better. Is 
SMTP good enough? Go for it. If you want a sophisticated, "dog wags the 
tail" solution, there are also plenty of message-queue servers around.

d) Need to ensure chunk sequence? Here you will cover your hands with 
some dirt to write a transport handshake mechanism, like windows in TCP.


*Saran

Martin Jericho wrote:
>                  
>                 QUESTION 4:  All I really want to do is send a large
>                 array of structured data in a platform independent way. 
>                 I would like to use the standard RPC encoding of SOAP to
>                 avoid having to define my own XML schemas for the data,
>                 but I'm not sure whether today's SOAP implementations
>                 are mature enough to use for this purpose.  What do
>                 other people do in this situation?  I can't imagine I'm
>                 the only one.
>                  
>                  
>                 Thanks
>                 Martin Jericho
>                  
>                  
>                  
> 



Re: Streaming RPC calls

Posted by Martin Jericho <ma...@radiocity.com.au>.
Graham,

Could you let me know which are the vendors that tout a pure streaming approach?  (I realise you probably don't want to promote competitors too much, don't tell the boss!)

The requirement to stream millions of records is quite appropriate, the client isn't just a dumb user interface, and has to send as well as receive large amounts of data.  Any attempt to chop it up would be purely an implementation workaround.

If I send it as an attachment I may as well abandon SOAP, since I have to invent some way of encoding it in a cross-platform way.

Is there really no-one else who has faced this issue?  No comment from anyone else?

Martin

  ----- Original Message ----- 
  From: graham glass 
  To: axis-user@xml.apache.org 
  Sent: Monday, September 23, 2002 5:54 PM
  Subject: RE: Streaming RPC calls


  hi martin,
   
  glue currently creates a dom representation of a soap message before streaming.
  funnily enough, when doing benchmarking of soap messages that have an on-the-wire
  representation > 1Mbyte, this approach seems to be faster than vendors that
  tout a pure streaming approach.
   
  i question whether your requirement of streaming millions of records in a single
  message is appropriate. is there not a better way in which the client requests a
  subset during a particular operation? alternatively, passing the records as a SOAP
  attachment bypasses this problem completely.
   
  the main drawback that we've found regarding a pure streaming architecture
  (we have considered this many times in the past) is the extra complexity it
  injects at the API level for those who want to add interceptors, XSLT transforms, etc.
   
  cheers,
  graham
    -----Original Message-----
    From: Martin Jericho [mailto:martin.j@radiocity.com.au]
    Sent: Monday, September 23, 2002 1:29 AM
    To: axis-user@xml.apache.org
    Subject: Re: Streaming RPC calls


    Hi Graham

    Thanks for your interest.  I have included some source files to show you how I did the streaming.  It's not a complete set so it doesn't compile, but it should give you an idea how the client side works at least.  (Sorry for posting an attachment everyone!  It's only 10K)

    The original BatchItemStreamArray class as generated by WSDL2Java simply contains an array of BatchItem objects called batchItems.  Either the client or server can use this default implementation, at the risk of running into memory problems if the array is large.  The new BatchItemStreamArray class emulates the original class, but extends the StreamArray class, which contains all the hooks necessary to serialize/deserialize on the fly without having to actually create the array.  It's all a bit unrefined at the moment, since I'm just doing a proof of concept as a first step.

    The application I am designing must be able to cope with hundreds of thousands or even millions of records.  The size of each record is dynamic, so I can't put an exact figure on the number of MB.

    Does Glue have the same limitations as axis in this regard?

      ----- Original Message ----- 
      From: graham glass 
      To: axis-user@xml.apache.org 
      Sent: Monday, September 23, 2002 2:38 PM
      Subject: RE: Streaming RPC calls


      hi martin,
       
      out of interest, how would you write an interceptor to process inputs and outputs during 
      the streaming process? one advantage of having a DOM represention is that it's easy to 
      manipulate during inbound/outbound messaging using tools like xalan. would you provide 
      some kind of streaming interceptor interface?
       
      also, out of interest, how large is the data you are sending (on the wire)?
       
      cheers,
      graham
        -----Original Message-----
        From: Martin Jericho [mailto:martin.j@radiocity.com.au]
        Sent: Sunday, September 22, 2002 11:31 PM
        To: axis-user@xml.apache.org
        Subject: Streaming RPC calls


        I have implemented a special class which allows streaming of complex bean types in an RPC call, but have found that AXIS stores the whole message in memory before sending / deserializing it anyway, which defeats the whole purpose of what I am trying to do.

        The first problem is that AXIS doesn't support chunked http messages, so the content length must be known before sending it.

        QUESTION 1:  Are there any plans to implement chunked transfer encoding soon?

        The SOAPPart class stores the entire message as a byte array before it is sent across the wire, which apart from the reason above is unnecessary.

        QUESTION 2:  Are there any plans to clean this up to allow direct streaming from the serializers onto the wire?

        I have only researched the client side to see how and why the entire message is stored in memory before sending it.  I have also noticed that the server stores it all in memory before deserializing it, but I haven't investigated whether there is any good reason for this.

        QUESTION 3:  If AXIS won't support this any time soon, which I assume is the case, does anyone know whether the Sun JAXRPC reference implementation suffers the same problem?  I would also like to know whether the MS .NET implementation has the same problem if anyone knows about it.

        QUESTION 4:  All I really want to do is send a large array of structured data in a platform independent way.  I would like to use the standard RPC encoding of SOAP to avoid having to define my own XML schemas for the data, but I'm not sure whether today's SOAP implementations are mature enough to use for this purpose.  What do other people do in this situation?  I can't imagine I'm the only one.

        I would be more than happy to share my streaming serialization/deserialization code if anyone is interested, although it is not of much use with the current version of axis.

        Thanks
        Martin Jericho




RE: Streaming RPC calls

Posted by graham glass <gr...@themindelectric.com>.
hi martin,

glue currently creates a dom representation of a soap message before
streaming.
funnily enough, when doing benchmarking of soap messages that have an
on-the-wire
representation > 1Mbyte, this approach seems to be faster than vendors that
tout a pure streaming approach.

i question whether your requirement of streaming millions of records in a
single
message is appropriate. is there not a better way in which the client
requests a
subset during a particular operation? alternatively, passing the records as
a SOAP
attachment bypasses this problem completely.

the main drawback that we've found regarding a pure streaming architecture
(we have considered this many times in the past) is the extra complexity it
injects at the API level for those who want to add interceptors, XSLT
transforms, etc.

cheers,
graham
  -----Original Message-----
  From: Martin Jericho [mailto:martin.j@radiocity.com.au]
  Sent: Monday, September 23, 2002 1:29 AM
  To: axis-user@xml.apache.org
  Subject: Re: Streaming RPC calls


  Hi Graham

  Thanks for your interest.  I have included some source files to show you
how I did the streaming.  It's not a complete set so it doesn't compile, but
it should give you an idea how the client side works at least.  (Sorry for
posting an attachment everyone!  It's only 10K)

  The original BatchItemStreamArray class as generated by WSDL2Java simply
contains an array of BatchItem objects called batchItems.  Either the client
or server can use this default implementation, at the risk of running into
memory problems if the array is large.  The new BatchItemStreamArray class
emulates the original class, but extends the StreamArray class, which
contains all the hooks necessary to serialize/deserialize on the fly without
having to actually create the array.  It's all a bit unrefined at the
moment, since I'm just doing a proof of concept as a first step.

  The application I am designing must be able to cope with hundreds of
thousands or even millions of records.  The size of each record is dynamic,
so I can't put an exact figure on the number of MB.

  Does Glue have the same limitations as axis in this regard?

    ----- Original Message -----
    From: graham glass
    To: axis-user@xml.apache.org
    Sent: Monday, September 23, 2002 2:38 PM
    Subject: RE: Streaming RPC calls


    hi martin,

    out of interest, how would you write an interceptor to process inputs
and outputs during
    the streaming process? one advantage of having a DOM represention is
that it's easy to
    manipulate during inbound/outbound messaging using tools like xalan.
would you provide
    some kind of streaming interceptor interface?

    also, out of interest, how large is the data you are sending (on the
wire)?

    cheers,
    graham
      -----Original Message-----
      From: Martin Jericho [mailto:martin.j@radiocity.com.au]
      Sent: Sunday, September 22, 2002 11:31 PM
      To: axis-user@xml.apache.org
      Subject: Streaming RPC calls


      I have implemented a special class which allows streaming of complex
bean types in an RPC call, but have found that AXIS stores the whole message
in memory before sending / deserializing it anyway, which defeats the whole
purpose of what I am trying to do.

      The first problem is that AXIS doesn't support chunked http messages,
so the content length must be known before sending it.

      QUESTION 1:  Are there any plans to implement chunked transfer
encoding soon?

      The SOAPPart class stores the entire message as a byte array before it
is sent across the wire, which apart from the reason above is unnecessary.

      QUESTION 2:  Are there any plans to clean this up to allow direct
streaming from the serializers onto the wire?

      I have only researched the client side to see how and why the entire
message is stored in memory before sending it.  I have also noticed that the
server stores it all in memory before deserializing it, but I haven't
investigated whether there is any good reason for this.

      QUESTION 3:  If AXIS won't support this any time soon, which I assume
is the case, does anyone know whether the Sun JAXRPC reference
implementation suffers the same problem?  I would also like to know whether
the MS .NET implementation has the same problem if anyone knows about it.

      QUESTION 4:  All I really want to do is send a large array of
structured data in a platform independent way.  I would like to use the
standard RPC encoding of SOAP to avoid having to define my own XML schemas
for the data, but I'm not sure whether today's SOAP implementations are
mature enough to use for this purpose.  What do other people do in this
situation?  I can't imagine I'm the only one.

      I would be more than happy to share my streaming
serialization/deserialization code if anyone is interested, although it is
not of much use with the current version of axis.

      Thanks
      Martin Jericho




Re: Streaming RPC calls

Posted by Martin Jericho <ma...@radiocity.com.au>.
Hi Graham

Thanks for your interest.  I have included some source files to show you how I did the streaming.  It's not a complete set so it doesn't compile, but it should give you an idea how the client side works at least.  (Sorry for posting an attachment everyone!  It's only 10K)

The original BatchItemStreamArray class as generated by WSDL2Java simply contains an array of BatchItem objects called batchItems.  Either the client or server can use this default implementation, at the risk of running into memory problems if the array is large.  The new BatchItemStreamArray class emulates the original class, but extends the StreamArray class, which contains all the hooks necessary to serialize/deserialize on the fly without having to actually create the array.  It's all a bit unrefined at the moment, since I'm just doing a proof of concept as a first step.

The application I am designing must be able to cope with hundreds of thousands or even millions of records.  The size of each record is dynamic, so I can't put an exact figure on the number of MB.

Does Glue have the same limitations as axis in this regard?

  ----- Original Message ----- 
  From: graham glass 
  To: axis-user@xml.apache.org 
  Sent: Monday, September 23, 2002 2:38 PM
  Subject: RE: Streaming RPC calls


  hi martin,
   
  out of interest, how would you write an interceptor to process inputs and outputs during 
  the streaming process? one advantage of having a DOM represention is that it's easy to 
  manipulate during inbound/outbound messaging using tools like xalan. would you provide 
  some kind of streaming interceptor interface?
   
  also, out of interest, how large is the data you are sending (on the wire)?
   
  cheers,
  graham
    -----Original Message-----
    From: Martin Jericho [mailto:martin.j@radiocity.com.au]
    Sent: Sunday, September 22, 2002 11:31 PM
    To: axis-user@xml.apache.org
    Subject: Streaming RPC calls


    I have implemented a special class which allows streaming of complex bean types in an RPC call, but have found that AXIS stores the whole message in memory before sending / deserializing it anyway, which defeats the whole purpose of what I am trying to do.

    The first problem is that AXIS doesn't support chunked http messages, so the content length must be known before sending it.

    QUESTION 1:  Are there any plans to implement chunked transfer encoding soon?

    The SOAPPart class stores the entire message as a byte array before it is sent across the wire, which apart from the reason above is unnecessary.

    QUESTION 2:  Are there any plans to clean this up to allow direct streaming from the serializers onto the wire?

    I have only researched the client side to see how and why the entire message is stored in memory before sending it.  I have also noticed that the server stores it all in memory before deserializing it, but I haven't investigated whether there is any good reason for this.

    QUESTION 3:  If AXIS won't support this any time soon, which I assume is the case, does anyone know whether the Sun JAXRPC reference implementation suffers the same problem?  I would also like to know whether the MS .NET implementation has the same problem if anyone knows about it.

    QUESTION 4:  All I really want to do is send a large array of structured data in a platform independent way.  I would like to use the standard RPC encoding of SOAP to avoid having to define my own XML schemas for the data, but I'm not sure whether today's SOAP implementations are mature enough to use for this purpose.  What do other people do in this situation?  I can't imagine I'm the only one.

    I would be more than happy to share my streaming serialization/deserialization code if anyone is interested, although it is not of much use with the current version of axis.

    Thanks
    Martin Jericho




RE: Streaming RPC calls

Posted by graham glass <gr...@themindelectric.com>.
hi martin,

out of interest, how would you write an interceptor to process inputs and
outputs during
the streaming process? one advantage of having a DOM represention is that
it's easy to
manipulate during inbound/outbound messaging using tools like xalan. would
you provide
some kind of streaming interceptor interface?

also, out of interest, how large is the data you are sending (on the wire)?

cheers,
graham
  -----Original Message-----
  From: Martin Jericho [mailto:martin.j@radiocity.com.au]
  Sent: Sunday, September 22, 2002 11:31 PM
  To: axis-user@xml.apache.org
  Subject: Streaming RPC calls


  I have implemented a special class which allows streaming of complex bean
types in an RPC call, but have found that AXIS stores the whole message in
memory before sending / deserializing it anyway, which defeats the whole
purpose of what I am trying to do.

  The first problem is that AXIS doesn't support chunked http messages, so
the content length must be known before sending it.

  QUESTION 1:  Are there any plans to implement chunked transfer encoding
soon?

  The SOAPPart class stores the entire message as a byte array before it is
sent across the wire, which apart from the reason above is unnecessary.

  QUESTION 2:  Are there any plans to clean this up to allow direct
streaming from the serializers onto the wire?

  I have only researched the client side to see how and why the entire
message is stored in memory before sending it.  I have also noticed that the
server stores it all in memory before deserializing it, but I haven't
investigated whether there is any good reason for this.

  QUESTION 3:  If AXIS won't support this any time soon, which I assume is
the case, does anyone know whether the Sun JAXRPC reference implementation
suffers the same problem?  I would also like to know whether the MS .NET
implementation has the same problem if anyone knows about it.

  QUESTION 4:  All I really want to do is send a large array of structured
data in a platform independent way.  I would like to use the standard RPC
encoding of SOAP to avoid having to define my own XML schemas for the data,
but I'm not sure whether today's SOAP implementations are mature enough to
use for this purpose.  What do other people do in this situation?  I can't
imagine I'm the only one.

  I would be more than happy to share my streaming
serialization/deserialization code if anyone is interested, although it is
not of much use with the current version of axis.

  Thanks
  Martin Jericho