You are viewing a plain text version of this content. The canonical link for it is here.

Posted to soap-dev@ws.apache.org by WJCarpenter <bi...@carpenter.ORG> on 2004/02/28 02:11:20 UTC

eliminating one or two in memory copies of the response payload

I'm developing an obsession with getting rid of unnecessary copies
during response processing.  I believe the most common case still has
two in-memory versions of the response payload (before the parsing is
done, which you could count as a 3rd version).  I also think that in
the most common case at least one and maybe both of these could be
eliminated.  By "most common case", I mean and RPC call where the
response is a single-part XML (I didn't look at Message).  These
copies don't matter when the response is small, but I have some
multi-megabyte uses cases, as do others I've heard mentioned here.
Those start to hurt under load.

Here are some notes I took while step debugging through the current
CVS code.

1.  Constructor to TransportMessage reads entire after-header reponse into
    a byte[] (staring around line #168).  In fact, for large
    responses, you get lots of collectible memory since the target
    buffer gets reallocated as it grows.  It starts at 4k and doubles
    on reallocation.  As a finishing stroke, the entire thing is
    reallocated at the end to make the size of the byte[] array
    exactly match the byte count.  So, for a "just under" 2 MB
    envelope, this will be scratch buffers of 4k, 8k, 16k, 32k, 64k,
    128k, 256k, 512k, 1024k, and 2048k; that's in addition to the
    non-collectible 2 MB byte[] for the final results.

2.  HTTPUtils.post, near line #675, calls TransportMessage.read() but
    ignores the String return value.  The innards of
    TransportMessage.read, near line #342, doesn't actually "read"
    anything, but constructs and keeps a reference to the SOAP
    envelope as a String (from the already read byte[]) and also calls
    SOAPContext.setRootPart with that same String.  Obviously, this
    makes a copy of the payload inside the String object.

3.  Call.invoke, near line #334 makes a call to
    Call.getEnvelopeString, which in turn, near line #261, calls
    SOAPContext.getEnvelope.  The reason for calling
    Call.getEnvelopeString is so that the String is available for use
    as part of reporting a parsing problem.  (The actual parsing
    exception is discarded in that case.)

4.  Call.invoke then passes the String from item 3 to
    XMLParseUtils.parse.

Given the above, I'm thinking of cranking out a patch to do these
things:

A.  Constructor to TransportMessage will keep a reference to the
    InputStream and only read it into a byte[] when it has to.  In the
    usual case, I think it will never have to.

B.  HTTPUtils.post won't call TransportMessage.read() but will instead
    call some new method that returns void.  It will still have the
    side effect of calling SOAPContext.setRootPart, but instead of
    passing a byte[], it will use one of the overloads and pass a
    MimeBodyPart (constructed from a DataSource in turn constructed
    from the original SocketInputStream).

C.  Skip the call to Call.getEnvelopeString from item #3 above.
    Having the text of the SOAP envelope in the message about a
    parsing problem seems to me of frankly dubious value and so not
    worth forcing the read into a byte[] and conversion to String.

D.  Make the call to the overload of XMLParseUtils.parse that takes an
    InputSource, where that InputSource would be constructed from the
    original SocketInputStream.

I believe all of the above can be done with a reasonably small patch,
and for the usual case, the XML parser will be reading directly from
the SocketInputStream.  I'm imagining a few places where the state of
the payload can be one of InputStream, byte[], or String, with
on-demand conversion through that progression.  I also believe I will
be able to do this such that the non-usual cases won't suffer (they'll
really just end up forcing the conversions on demand where they would
have happened unconditionally in the current code.
-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3

Re: eliminating one or two in memory copies of the response payload

Posted by WJCarpenter <bi...@carpenter.ORG>.

sn> One thing I notice.  In B you say "pass a MimeBodyPart
sn> (constructed from a DataSource in turn constructed from the
sn> original SocketInputStream)" and in D you say "where that
sn> InputSource would be constructed from the original
sn> SocketInputStream".  Since you can only read the InputStream once,
sn> you would need to construct the InputSource for D from the part
sn> you created in B.

Yeah, actually it's a little worse than that.  I found out that the
MimeBodyPart stuff reads the content immediately rather than streaming
it in/out as needed (at least I deduce that from clues in the JavaDoc
since I don't have sources for mailapi.jar), and my whole idea is to
avoid reading the stream until the parser does it.

Anyhow, I'll keep casting around until I find some handy class that
does that or maybe make a simple class that starts with a stream and
produces a stream, byte[], or String on demand.  At some logical layer
above that class, I'll make sure nobody needs to read the stream twice
(ie, nobody needs to get the byte[] or String after the stream has
already been read by the XML parser).

I may wail on this if I can get an undistracted few hours over the
next couple weeks.  If I don't make it by then, my guess is I'll never
get to it.
-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3

Re: eliminating one or two in memory copies of the response payload

Posted by Scott Nichol <sn...@scottnichol.com>.

I encourage you to pursue this!

One thing I notice.  In B you say "pass a MimeBodyPart (constructed from a DataSource in turn constructed from the original SocketInputStream)" and in D you say "where that InputSource would be constructed from the original SocketInputStream".  Since you can only read the InputStream once, you would need to construct the InputSource for D from the part you created in B.

Scott Nichol

Do not send e-mail directly to this e-mail address,
because it is filtered to accept only mail from
specific mail lists.
----- Original Message ----- 
From: "WJCarpenter" <bi...@carpenter.ORG>
To: "soap-dev" <so...@ws.apache.org>
Sent: Friday, February 27, 2004 8:11 PM
Subject: eliminating one or two in memory copies of the response payload


I'm developing an obsession with getting rid of unnecessary copies
during response processing.  I believe the most common case still has
two in-memory versions of the response payload (before the parsing is
done, which you could count as a 3rd version).  I also think that in
the most common case at least one and maybe both of these could be
eliminated.  By "most common case", I mean and RPC call where the
response is a single-part XML (I didn't look at Message).  These
copies don't matter when the response is small, but I have some
multi-megabyte uses cases, as do others I've heard mentioned here.
Those start to hurt under load.

Here are some notes I took while step debugging through the current
CVS code.

1.  Constructor to TransportMessage reads entire after-header reponse into
    a byte[] (staring around line #168).  In fact, for large
    responses, you get lots of collectible memory since the target
    buffer gets reallocated as it grows.  It starts at 4k and doubles
    on reallocation.  As a finishing stroke, the entire thing is
    reallocated at the end to make the size of the byte[] array
    exactly match the byte count.  So, for a "just under" 2 MB
    envelope, this will be scratch buffers of 4k, 8k, 16k, 32k, 64k,
    128k, 256k, 512k, 1024k, and 2048k; that's in addition to the
    non-collectible 2 MB byte[] for the final results.

2.  HTTPUtils.post, near line #675, calls TransportMessage.read() but
    ignores the String return value.  The innards of
    TransportMessage.read, near line #342, doesn't actually "read"
    anything, but constructs and keeps a reference to the SOAP
    envelope as a String (from the already read byte[]) and also calls
    SOAPContext.setRootPart with that same String.  Obviously, this
    makes a copy of the payload inside the String object.

3.  Call.invoke, near line #334 makes a call to
    Call.getEnvelopeString, which in turn, near line #261, calls
    SOAPContext.getEnvelope.  The reason for calling
    Call.getEnvelopeString is so that the String is available for use
    as part of reporting a parsing problem.  (The actual parsing
    exception is discarded in that case.)

4.  Call.invoke then passes the String from item 3 to
    XMLParseUtils.parse.

Given the above, I'm thinking of cranking out a patch to do these
things:

A.  Constructor to TransportMessage will keep a reference to the
    InputStream and only read it into a byte[] when it has to.  In the
    usual case, I think it will never have to.

B.  HTTPUtils.post won't call TransportMessage.read() but will instead
    call some new method that returns void.  It will still have the
    side effect of calling SOAPContext.setRootPart, but instead of
    passing a byte[], it will use one of the overloads and pass a
    MimeBodyPart (constructed from a DataSource in turn constructed
    from the original SocketInputStream).

C.  Skip the call to Call.getEnvelopeString from item #3 above.
    Having the text of the SOAP envelope in the message about a
    parsing problem seems to me of frankly dubious value and so not
    worth forcing the read into a byte[] and conversion to String.

D.  Make the call to the overload of XMLParseUtils.parse that takes an
    InputSource, where that InputSource would be constructed from the
    original SocketInputStream.

I believe all of the above can be done with a reasonably small patch,
and for the usual case, the XML parser will be reading directly from
the SocketInputStream.  I'm imagining a few places where the state of
the payload can be one of InputStream, byte[], or String, with
on-demand conversion through that progression.  I also believe I will
be able to do this such that the non-usual cases won't suffer (they'll
really just end up forcing the conversions on demand where they would
have happened unconditionally in the current code.
-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3

RE: eliminating one or two in memory copies of the response payload

Posted by WJCarpenter <bi...@carpenter.ORG>.

wjc> I ran a series of tests where the response payload varied in 10
wjc> steps from just under 200 kbytes to just under 2 mbytes

wjc> The difference in collectible memory footprint is a bigger
wjc> mystery, though.

wjc> For example, at the 1 mbyte payload step, the difference is about
wjc> 9 mbytes, whereas the expected difference would be only about 5
wjc> mbytes.

Good news.  The mystery actually turned out to be a blunder on my
part.  I measured the payload in characters (String.length()) when I
had intended to count bytes.  So, adjusting for that blunder, the
expected worst case difference is 5x the payload size, and the
observed difference is consistently about 4.5x.  In other words, the
memory footprint improvement meets expectations.

Sorry for the noise.
-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3

RE: eliminating one or two in memory copies of the response payload

Posted by WJCarpenter <bi...@carpenter.ORG>.

wjc> I'm developing an obsession with getting rid of unnecessary
wjc> copies during response processing.

A patch for this is attached to bug SOAP-166:

   http://nagoya.apache.org/jira/browse/SOAP-166

I'd appreciate it if anyone who can would give it a try and report pro
or con.  For me, it gave 10-15% speed improvement and a pretty good
memory footprint improvement.
-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3

RE: eliminating one or two in memory copies of the response payload

Posted by WJCarpenter <bi...@carpenter.ORG>.

wjc> I'm developing an obsession with getting rid of unnecessary
wjc> copies during response processing.

To refresh your memories, I'm working on a patch that will allow
streaming the response payload directly into the XML parser inside
Call.invoke(), and I think I can also have a real stream inside the
DataSource returned by Message.receive().  A little while back, I
reported time improvements of about 5% for this in single-threaded
testing.  Now that I've done more organized testing, the difference is
more in the ballpark of 10-15% for time and 4-5x payload size
difference in scratch memory footprint (a 100 byte response on the
wire translates to a 100 character response in memory, which is 200
bytes in memory; the difference in the example would be 800-100
bytes).

In other words, a respectable improvement in both factors.

It's mostly easy to make it completely transparent to stream response
payloads to the receiver, but it can't be 100% because of the
following factors, which leads to a couple of upward compatibility
questions for interested parties:

* As I mentioned in my original message, there's one place inside
Call.invoke where a scratch copy of the response is tucked away "just
in case" for possible use in exception text.  My guess is nobody
really cares about that case.

* SOAPContext contains a protected field "rootPartString" that holds a
reference to the response payload as a String.  The field is not
referenced outside its class, so the only upward compatibility problem
is for subclasses of SOAPContext.  There are no subclasses of
SOAPContext in the Apache SOAP distribution.  There is also an
accessor method, "getEnvelope", which returns the same thing.  So, is
this field (and the dozen or so other fields) protected for the
benefit of subclasses, or was it just somebody's habit to code fields
as protected?  More importantly, does it matter if I get rid of that
particular field?  My guess is that it doesn't matter.

* Same question as above for TransportMessage, except it contains two
such protected fields.  "bytes" is a reference to the payload as a
byte[] and "envelope" is a reference to it as a String (yeah, I
thought the same thing about that :-).  There are no subclasses of
TransportMessage in the Apache SOAP distribution.  So, likewise, were
those fields intentionally "protected", and does it matter if I get
rid of them.  Again, my guess is that it doesn't matter.

* Class TransportMessage implements Serializable.  Hrmmm?  Any
particular reason for this?  My guess is that it's completely
extraneous and a TransportMessage is never actually serialized (in the
java.io.Serializable sense).  The reason I make that bold claim is
because one of the fields inside TransportMessage is a SOAPContext,
and SOAPContext does *not* implement Serializable, which means any
attempt to actually serialize TransportMessage would blow up.  So,
does this "implements Serializable" matter?  As I said, my guess is
not. (The reason I ask is because I want to add another field inside
TransportMessage, and I don't want to go to the bother of properly
implementing serialization in that referenced class.)

Finally, a more general sort of question:  What's the confidence in
the Apache SOAP test code in detecting regression errors?  The change
I'm making is not conceptually complicated, but it is fairly invasive,
mostly because of the use of the fields mentioned above.  Not counting
a new utility class, the unified patch is about 27k, and a lot of that
is couple-of-lines changes scattered around.  I'm pretty confident in
being right for the code paths I test directly, but there are code
paths I don't test at all (and perhaps am not even aware of).  Is
running the test code likely to shake out any problems?
-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3

RE: eliminating one or two in memory copies of the response payload

Posted by WJCarpenter <bi...@carpenter.ORG>.

wjc> I'm developing an obsession with getting rid of unnecessary
wjc> copies during response processing.

I got a chance to work on this over the weekend, and things went well.
(I have it working perfectly for the case that I care about, but I
haven't yet proved to myself that I didn't goof up cases that I don't
run.  How's the coverage in the Apache SOAP regression test suite?)
Anyhow, I thought I would report some initial numerical results
because it contains a bit of a puzzle.

I ran a series of tests where the response payload varied in 10 steps
from just under 200 kbytes to just under 2 mbytes (actually, just dumb
luck that it was nice round numbers).  My response data is fairly
typical SOAP encoding: "lots of little nested elements".  I compared
the results of running with my modification ("stream") versus the
status quo ("preread").  To avoid having made a blunder in the
"preread" case, I ran that test pass using an unmodified soap.jar
built from a current CVS snapshot.

To recap, for a simple single-part RPC call, "preread" reads the
entire response payload into a byte[] and then converts it to a String
before parsing.  The "stream" modification avoids the byte[] and
String versions and feeds the original InputStream into the XML
parser.

As you might guess, processing times are slightly better with "stream"
since the time to copy into memory is eliminated.  In my tests, the
ratio was pretty linear and averaged out to just under 4%.

The difference in collectible memory footprint is a bigger mystery,
though.  For this, I forced GC and then recorded free memory before
and after my individual stepped test calls.  I ran with a large
min/max heap to avoid spurious GCs inside a test call.  Given Java's
16-bit characters and the two in-memory copies described above, one
would've naively expected the difference between "stream" and
"preread" runs to be about 3 times the payload size.  That's a slight
miscalculation because the copy into a byte[] uses a doubling
algorithm and then resizes at the end of it all, so that's about 2x in
scratch buffers, for a total of 5x collectible memory. In fact, the
measured difference is pretty consistent across all the steps and is
approximately 9 times the payload size.

For example, at the 1 mbyte payload step, the difference is about 9
mbytes, whereas the expected difference would be only about 5 mbytes.

I don't know what to conclude about this except that maybe the XML
parser is a lot more efficient in memory at parsing from an
InputSource backed by an InputStream than it is parsing from one
backed by a String.  (It's possible that I've missed some in-memory
copying in soap.jar before getting to the XML parsing, but that seems
unlikely given the number of times I've step-debugged through this
recently.)  Anyhow, the good news is that the results for "stream" are
even better than expected.
-- 
bill@carpenter.ORG (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3