You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-dev@axis.apache.org by Archit Shah <ar...@alum.mit.edu> on 2004/11/12 21:27:12 UTC

Axis memory usage and performance

Inspired by jira issue AXIS-1391, I have been investigating server side 
performance and memory consumption by looking at the memory allocations 
(using the JVM's -Xrunhprof:heap=sites) and using the SOAP benchmark 
from Alek (http://www.extreme.indiana.edu/xgws/soap_bench/) and 
SimpleAxisServer. The most illustrative test involved 10 calls to a 
simple SOAP method:

public int receiveDoubles(double[] input) throws Exception {
     return input.length;
}

with an input array of length 5000. Memory profiling seems to show that 
the data is copied repeatedly during deserialization and that there is 
a lot of per-array element work (about 40 objects allocated). Reducing 
this work could help boost performance and reduce memory footprint. 
Here is what I think is happening:

Data starts in char[] buffer.
1) sent to the SAX2EventRecorder where it is converted to a string 
(line 91)
2) during replay it is converted back to char[] in order to use the 
ContentHandler API
3) the replay client (SimpleDeserializer, I think) copies the data to 
its internal CharArrayWriter
4) SimpleDeserializer.onEndElement converts its CharArrayWriter to a 
String

That covers 4 copies of each piece of data that needs deserialization. 
In addition, I see significant allocations from SOAPHander.onEndElement 
(lines 114 and 118). SOAPHandler.onEndElement calls addTextNode, first 
converting to String from a CharArrayWriter. addTextNode ends up 
calling into crimson which converts the string back to a char[]. I 
believe this happens before the replay. I am not sure why this is 
necessary, as the data is stored in the event recorder.

In addition to the input data being copied, there are other sources of 
per-array element allocations:

- SimpleDeserializer initialization (1 allocated per array element)

- NSStack (1 call to cloneFrame per array element)

- NodeImpl.initializeChildren (3 array list allocations per array 
element)

- DeserializerImpl.addChildDeserializer (1 call per array element, 
allocating 1 CallbackTarget and 1 vector)

Here is the the output from the allocation profiler showing the per 
array element allocations (the full output is available on jira as an 
attachment to issue 1391 (it will be once I can successfully upload, 
something seems to be broken right now):

           percent         live       alloc'ed  stack class
  rank   self  accum    bytes objs   bytes objs trace name
     1  5.83%  5.83%   520104 5001 5201040 50010 12694 
org.apache.axis.message.MessageElement
     2  5.74% 11.57%   512016    1 7424400   25 13463 java.lang.Object
     3  4.57% 16.14%   407328 4243 4798464 49984 13160 
org.apache.axis.encoding.ser.SimpleDeserializer
     4  3.81% 19.94%   339440 4243 3998720 49984 13165 
java.util.HashMap$Entry
     5  3.81% 23.75%   339440 4243 3998720 49984 13162 [C
     6  3.81% 27.55%   339440 4243 3998720 49984 13167 [C
     7  3.14% 30.69%   280056 5001 2800560 50010 12696 java.lang.Object
     8  3.14% 33.83%   280000 5000 2800000 50000 12750 java.lang.Object
     9  3.14% 36.97%   280000 5000 2800000 50000 12724 java.lang.Object
    10  2.69% 39.66%   240000 5000 2400000 50000 12714 
org.apache.axis.message.Text
    11  2.66% 42.33%   237608 4243 2800000 50000 13066 java.lang.Object
    12  1.90% 44.23%   169720 4243 1999360 49984 13164 java.util.HashMap
    13  1.90% 46.13%   169720 4243 1999360 49984 13171 
java.lang.FloatingDecimal
    14  1.79% 47.92%   160000 5000 1600000 50000 12721 
org.apache.crimson.tree.TextNode
    15  1.79% 49.72%   160000 5000 1520000 50000 12722 [C
    16  1.79% 51.51%   159920 4999 1519320 49991 12703 [C
    17  1.52% 53.03%   135808 4246 1520384 50036 13075 [C
    18  1.52% 54.56%   135776 4243 1519616 49984 13170 [C
    19  1.52% 56.08%   135776 4243 1520000 50000 13077 [C
    20  1.52% 57.60%   135776 4243 1599488 49984 13161 
java.io.CharArrayWriter
    21  1.52% 59.12%   135776 4243 1599488 49984 13166 
java.io.CharArrayWriter
    22  1.35% 60.47%   120024 5001 1200240 50010 12695 
java.util.ArrayList
    23  1.35% 61.81%   120000 5000 1200000 50000 12749 
java.util.ArrayList
    24  1.35% 63.16%   120000 5000 1200000 50000 12723 
java.util.ArrayList
    25  1.35% 64.50%   119976 4999 1199784 49991 12702 java.lang.String
    26  1.14% 65.65%   101832 4243 1200000 50000 13065 java.util.Vector
    27  1.14% 66.79%   101832 4243 1200000 50000 13076 java.lang.String
    28  1.14% 67.93%   101832 4243 1200000 50000 13044 java.lang.Object
    29  1.14% 69.07%   101832 4243 1200000 50000 13067 
java.util.HashMap$Entry
    30  1.14% 70.21%   101808 4242 1200000 50000 13020 
org.apache.axis.encoding.TypeMappingImpl$Pair
    31  1.14% 71.35%   101808 4242 1200000 50000 13021 
org.apache.axis.encoding.TypeMappingImpl$Pair
    32  0.99% 72.35%    88648  865   88768  867  2811 [C
    33  0.90% 73.24%    80000 5000  799744 49984 13168 java.lang.Double
    34  0.88% 74.13%    78768  478   78768  478   558 [B
    35  0.76% 74.89%    67888 4243  799744 49984 13163 java.util.HashSet
    36  0.76% 75.65%    67888 4243  800000 50000 13058 
org.apache.axis.encoding.DeserializerTarget
    37  0.76% 76.41%    67888 4243  800000 50000 13100 java.util.Vector$1
    38  0.76% 77.17%    67888 4243  800000 50000 13068 
org.apache.axis.encoding.CallbackTarget
    39  0.76% 77.93%    67888 4243  799744 49984 13169 java.lang.Double
    40  0.76% 78.69%    67888 4243  800000 50000 13058 java.lang.Integer
    41  0.76% 79.45%    67888 4243  800000 50000 13081 java.lang.Object
    42  0.76% 80.22%    67888 4243  800000 50000 13022 
org.apache.axis.utils.cache.MethodCache$MethodKey

  -- Archit


Re: Axis memory usage and performance

Posted by Davanum Srinivas <da...@gmail.com>.
Archit,

checked in your patch. Thanks for the same. If you see something else
fishy, give us a holler.

-- dims


On Tue, 16 Nov 2004 11:09:14 -0500, Archit Shah
<ar...@alum.mit.edu> wrote:
> I forgot to mention that the revised version gives the same performance
> boost as before, about 5-10% in the benchmark from issue 1391.
> 
> 
> 
>   -- Archit
> 
> On Tuesday, Nov 16, 2004, at 10:50 US/Eastern, Davanum Srinivas wrote:
> 
> > this patch looks good....How much perf boost are u getting with this?
> > (same as before 5-10%)
> 
> 


-- 
Davanum Srinivas - http://webservices.apache.org/~dims/

Re: Axis memory usage and performance

Posted by Archit Shah <ar...@alum.mit.edu>.
I forgot to mention that the revised version gives the same performance 
boost as before, about 5-10% in the benchmark from issue 1391.

  -- Archit

On Tuesday, Nov 16, 2004, at 10:50 US/Eastern, Davanum Srinivas wrote:

> this patch looks good....How much perf boost are u getting with this?
> (same as before 5-10%)



Re: Axis memory usage and performance

Posted by Davanum Srinivas <da...@gmail.com>.
Archit,

this patch looks good....How much perf boost are u getting with this?
(same as before 5-10%)?

thanks,
dims


On Tue, 16 Nov 2004 10:19:29 -0500, Archit Shah
<ar...@alum.mit.edu> wrote:
> Revised patch attached. It does not touch the Deserializer interface
> and does not make private the targets field of the public
> DeserializerImpl class.
> 
>   -- Archit
> 
> 
> 
> On Tuesday, Nov 16, 2004, at 06:55 US/Eastern, Davanum Srinivas wrote:
> 
> > problem is...it's too late to change method signatures...can you take
> > a stab at keeping the method signatures?
> >
> > On Tue, 16 Nov 2004 01:34:20 -0500, Archit Shah
> > <ar...@alum.mit.edu> wrote:
> >> I have attached a patch that changes NodeListImpl and DeserializerImpl
> >> to use ArrayList instead of Vector as Vector should only be used when
> >> synchronization is needed. The patch also changes the targets field of
> >> DeserializerImpl to be private instead of protected and removes the
> >> overrides of removeValueTargets in subclasses SimpleDeserializer and
> >> SimpleListDeserializer. In 2 places I have switched to using index
> >> based for loops instead of iteration. Altogether, this should provide
> >> a
> >> 5-10% improvement for the benchmark in issue 1391.
> 
> 
> 
> 
> 
> 
> 


-- 
Davanum Srinivas - http://webservices.apache.org/~dims/

Re: Axis memory usage and performance

Posted by Archit Shah <ar...@alum.mit.edu>.
Revised patch attached. It does not touch the Deserializer interface 
and does not make private the targets field of the public 
DeserializerImpl class.

  -- Archit

On Tuesday, Nov 16, 2004, at 06:55 US/Eastern, Davanum Srinivas wrote:

> problem is...it's too late to change method signatures...can you take
> a stab at keeping the method signatures?
>
> On Tue, 16 Nov 2004 01:34:20 -0500, Archit Shah
> <ar...@alum.mit.edu> wrote:
>> I have attached a patch that changes NodeListImpl and DeserializerImpl
>> to use ArrayList instead of Vector as Vector should only be used when
>> synchronization is needed. The patch also changes the targets field of
>> DeserializerImpl to be private instead of protected and removes the
>> overrides of removeValueTargets in subclasses SimpleDeserializer and
>> SimpleListDeserializer. In 2 places I have switched to using index
>> based for loops instead of iteration. Altogether, this should provide 
>> a
>> 5-10% improvement for the benchmark in issue 1391.


Re: Axis memory usage and performance

Posted by Davanum Srinivas <da...@gmail.com>.
Archit,

problem is...it's too late to change method signatures...can you take
a stab at keeping the method signatures?

thanks,
dims


On Tue, 16 Nov 2004 01:34:20 -0500, Archit Shah
<ar...@alum.mit.edu> wrote:
> I have attached a patch that changes NodeListImpl and DeserializerImpl
> to use ArrayList instead of Vector as Vector should only be used when
> synchronization is needed. The patch also changes the targets field of
> DeserializerImpl to be private instead of protected and removes the
> overrides of removeValueTargets in subclasses SimpleDeserializer and
> SimpleListDeserializer. In 2 places I have switched to using index
> based for loops instead of iteration. Altogether, this should provide a
> 5-10% improvement for the benchmark in issue 1391.
> 
>   -- Archit
> 
> 
> 
> On Friday, Nov 12, 2004, at 15:27 US/Eastern, Archit Shah wrote:
> 
> > with an input array of length 5000. Memory profiling seems to show
> > that the data is copied repeatedly during deserialization and that
> > there is a lot of per-array element work (about 40 objects allocated).
> > Reducing this work could help boost performance and reduce memory
> > footprint. Here is what I think is happening:
> 
> 
> 
> 
> 
> 
> 


-- 
Davanum Srinivas - http://webservices.apache.org/~dims/

Re: Axis memory usage and performance

Posted by Archit Shah <ar...@alum.mit.edu>.
I have attached a patch that changes NodeListImpl and DeserializerImpl 
to use ArrayList instead of Vector as Vector should only be used when 
synchronization is needed. The patch also changes the targets field of 
DeserializerImpl to be private instead of protected and removes the 
overrides of removeValueTargets in subclasses SimpleDeserializer and 
SimpleListDeserializer. In 2 places I have switched to using index 
based for loops instead of iteration. Altogether, this should provide a 
5-10% improvement for the benchmark in issue 1391.

  -- Archit

On Friday, Nov 12, 2004, at 15:27 US/Eastern, Archit Shah wrote:

> with an input array of length 5000. Memory profiling seems to show 
> that the data is copied repeatedly during deserialization and that 
> there is a lot of per-array element work (about 40 objects allocated). 
> Reducing this work could help boost performance and reduce memory 
> footprint. Here is what I think is happening:


Re: Axis memory usage and performance

Posted by Samuel Meder <me...@mcs.anl.gov>.
On Sun, 2004-11-14 at 17:37 -0500, Davanum Srinivas wrote:
> Archit,
> 
> Sam/Jarek just checked in a bunch of changes...

These are all Jarek's, so no blaming me ;)

/Sam

> Can you please re-run
> your tests and send us the latest hprof output? I will check on JIRA
> outage.
> 
> thanks,
> dims
> 
> 
> On Sun, 14 Nov 2004 12:26:17 -0500, Archit Shah
> <ar...@alum.mit.edu> wrote:
> > Not sure why I can't upload to jira. I get a 502 error. "The proxy
> > server received an invalid response from an upstream server. The proxy
> > server could not handle the request <em><a
> > href="/jira/secure/AttachFile.jspa">POST&nbsp;/jira/secure/
> > AttachFile.jspa</a></em>.<p> Reason: <strong>Error reading from remote
> > server</strong></p>"
> > 
> > The file is attached here instead.
> > 
> >   -- Archit
> > 
> > On Sunday, Nov 14, 2004, at 02:07 US/Eastern, Jarek Gawor wrote:
> > 
> > 
> > 
> > >> -----Original Message-----
> > >> Here is the the output from the allocation profiler showing the per
> > >> array element allocations (the full output is available on jira as an
> > >> attachment to issue 1391 (it will be once I can successfully upload,
> > >> something seems to be broken right now):
> > >
> > > Where is that attachment? I can't find it with that bug.
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> 
> 


Re: Axis memory usage and performance

Posted by Archit Shah <ar...@alum.mit.edu>.
Jira is working fine now. File uploaded.

http://nagoya.apache.org/jira/secure/attachment/18141/rd2.hprof.txt.gz

  -- Archit

On Sunday, Nov 14, 2004, at 17:37 US/Eastern, Davanum Srinivas wrote:

> Sam/Jarek just checked in a bunch of changes...Can you please re-run
> your tests and send us the latest hprof output? I will check on JIRA
> outage.


Re: Axis memory usage and performance

Posted by Davanum Srinivas <da...@gmail.com>.
Archit,

Sam/Jarek just checked in a bunch of changes...Can you please re-run
your tests and send us the latest hprof output? I will check on JIRA
outage.

thanks,
dims


On Sun, 14 Nov 2004 12:26:17 -0500, Archit Shah
<ar...@alum.mit.edu> wrote:
> Not sure why I can't upload to jira. I get a 502 error. "The proxy
> server received an invalid response from an upstream server. The proxy
> server could not handle the request <em><a
> href="/jira/secure/AttachFile.jspa">POST&nbsp;/jira/secure/
> AttachFile.jspa</a></em>.<p> Reason: <strong>Error reading from remote
> server</strong></p>"
> 
> The file is attached here instead.
> 
>   -- Archit
> 
> On Sunday, Nov 14, 2004, at 02:07 US/Eastern, Jarek Gawor wrote:
> 
> 
> 
> >> -----Original Message-----
> >> Here is the the output from the allocation profiler showing the per
> >> array element allocations (the full output is available on jira as an
> >> attachment to issue 1391 (it will be once I can successfully upload,
> >> something seems to be broken right now):
> >
> > Where is that attachment? I can't find it with that bug.
> 
> 
> 
> 
> 
> 
> 


-- 
Davanum Srinivas - http://webservices.apache.org/~dims/

Re: Axis memory usage and performance

Posted by Archit Shah <ar...@alum.mit.edu>.
Not sure why I can't upload to jira. I get a 502 error. "The proxy  
server received an invalid response from an upstream server. The proxy  
server could not handle the request <em><a  
href="/jira/secure/AttachFile.jspa">POST&nbsp;/jira/secure/ 
AttachFile.jspa</a></em>.<p> Reason: <strong>Error reading from remote  
server</strong></p>"

The file is attached here instead.

  -- Archit

On Sunday, Nov 14, 2004, at 02:07 US/Eastern, Jarek Gawor wrote:

>> -----Original Message-----
>> Here is the the output from the allocation profiler showing the per
>> array element allocations (the full output is available on jira as an
>> attachment to issue 1391 (it will be once I can successfully upload,
>> something seems to be broken right now):
>
> Where is that attachment? I can't find it with that bug.


RE: Axis memory usage and performance

Posted by Jarek Gawor <ga...@mcs.anl.gov>.

> -----Original Message-----
> From: Archit Shah [mailto:archit.shah@alum.mit.edu] 
> Sent: Friday, November 12, 2004 3:27 PM
> To: axis-dev@ws.apache.org
> Subject: Axis memory usage and performance
> 
> 
> Inspired by jira issue AXIS-1391, I have been investigating 
> server side 
> performance and memory consumption by looking at the memory 
> allocations 
> (using the JVM's -Xrunhprof:heap=sites) and using the SOAP benchmark 
> from Alek (http://www.extreme.indiana.edu/xgws/soap_bench/) and 
> SimpleAxisServer. The most illustrative test involved 10 calls to a 
> simple SOAP method:
> 
> public int receiveDoubles(double[] input) throws Exception {
>      return input.length;
> }
> 
> with an input array of length 5000. Memory profiling seems to 
> show that 
> the data is copied repeatedly during deserialization and that 
> there is 
> a lot of per-array element work (about 40 objects allocated). 
> Reducing 
> this work could help boost performance and reduce memory footprint. 
> Here is what I think is happening:
> 
> Data starts in char[] buffer.
> 1) sent to the SAX2EventRecorder where it is converted to a string 
> (line 91)
> 2) during replay it is converted back to char[] in order to use the 
> ContentHandler API
> 3) the replay client (SimpleDeserializer, I think) copies the data to 
> its internal CharArrayWriter
> 4) SimpleDeserializer.onEndElement converts its CharArrayWriter to a 
> String

I updated the SAX2EventRecorder not to convert to String and then back to
char[]. The input char[] still needs to be cloned, but that's one less clone
then before.

> That covers 4 copies of each piece of data that needs 
> deserialization. 
> In addition, I see significant allocations from 
> SOAPHander.onEndElement 
> (lines 114 and 118). SOAPHandler.onEndElement calls 
> addTextNode, first 
> converting to String from a CharArrayWriter. addTextNode ends up 
> calling into crimson which converts the string back to a char[]. I 
> believe this happens before the replay. I am not sure why this is 
> necessary, as the data is stored in the event recorder.

I'm not sure why addTextNode is called in the first place either. I ran all
axis tests and tests in our software and saw no problems with that code
removed.

> In addition to the input data being copied, there are other 
> sources of 
> per-array element allocations:
> 
> - SimpleDeserializer initialization (1 allocated per array element)
> 
> - NSStack (1 call to cloneFrame per array element)
> 
> - NodeImpl.initializeChildren (3 array list allocations per array 
> element)
> 
> - DeserializerImpl.addChildDeserializer (1 call per array element, 
> allocating 1 CallbackTarget and 1 vector)
> 
> Here is the the output from the allocation profiler showing the per 
> array element allocations (the full output is available on jira as an 
> attachment to issue 1391 (it will be once I can successfully upload, 
> something seems to be broken right now):

Where is that attachment? I can't find it with that bug.

Jarek