You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cxf.apache.org by Alessio Soldano <as...@redhat.com> on 2015/02/03 14:11:56 UTC

Re: JAX-WS client performances

A brief update: I've committed the workaround for http connection issue 
(thanks Dan for the help on that!) as well as some other straightforward 
optimizations on stuff that popped up while profiling.

Now, next "big" topic I found is the way we get and set properties in 
the message context. We spend a relevant amount of time in creating 
HashMap instances and especially in MessageImpl#calcContextCache, which 
copies (putAll) all the Bus, Service, Endpoint, etc. properties into the 
context cache. You can see [1] the cpu hotspot view I get currently with 
the previously mentioned test app. AFAICS in the source history, there 
used to be a different way for dealing with message properties in the 
past [2], then the cache mechanism was added. So I'm wondering if some 
kind of profiling / perf testing have been performed in the past and led 
to the changes. I might simply be testing an edge scenario, with very 
few properties being looked up and hence not justifying the caching 
mechanism.
Any comment / idea / suggestion?

Cheers
Alessio

[1] http://pasteboard.co/QgiD4Af.png
[2]

On 27/01/15 18:14, Alessio Soldano wrote:
> Hi,
> my attention has been recently brought to a scenario in which an 
> Apache CXF client invokes an endpoint operation in a loop and the 
> number of invocations performed in a given amount of time (say, 2 
> minutes) is used as benchmark for measuring WS stack performances. 
> It's actually a very simplistic scenario, with a plain JAX-WS single 
> thread client sending and receiving small RPC/Lit SOAP messages [1]. 
> The reason why I've been asked to have a look is that with default 
> settings the Apache CXF JAX-WS impl seems to perform *shamefully* bad 
> compared to the Metro (JAX-WS RI) implementation. I've been blaming 
> the user log configuration, etc but when I eventually tried on my own 
> I could actually reproduce the bad results. I've been profiling a bit 
> and found few hot spot area where CXF could possibly be optimized, but 
> the big issue really seems to be at the HTTPCounduit / 
> HTTPURLConnection level.
> I found that almost all the invocations end up into 
> sun.net.www.http.HttpClient.New(..) calling available() method [2] as 
> part of the process for re-using cached connections [3]; that goes to 
> the wire to try reading and takes a lot of time.
> When the RI does the equivalent operation, the available() method is 
> not called [4], resulting in much better performances.
> By looking at the JDK code, it looks to me that the problem boils down 
> to sun.net.www.protocol.http.HttpURLConnection#streaming() [5] 
> returning different values, as a consequence of the fixedContentLenght 
> attribute being set to a value different from -1 when running on CXF 
> only. As a matter of fact, that is set when 
> HTTPConduit.WrappedOutputStream#thresholdNotReached() is called, 
> whenever a message is completely written to the outpustream buffer 
> before the chunking threshold is reached (at least AFAIU). I've 
> searched through the JAX-WS RI and could not find any place where 
> setFixedLengthStreamingMode is called on the connection instead.
> So, I've performed two quick and dirty tries: the first time I forced 
> allowChunking = false on the client policy, the second time I 
> commented out the code in 
> HTTPConduit.WrappedOutputStream#thresholdNotReached(). In both cases I 
> managed to get performances comparable to what I can get with the 
> JAX-WS RI.
> Now, few questions:
> - are we really required to call setFixedLengthStreamingMode as we 
> currently do? what's the drawback of not calling it?
> - should we actually do something for getting decent performances by 
> default in this scenario? (not sure expecting the user to disable 
> chunking is that an option...)
> As a side note, the relevant part of the JDK HttpClient code changed 
> between JDK6 and JDK7, so things have not always been as explained 
> above...
>
> Cheers
> Alessio
>
>
> [1] http://www.fpaste.org/176166/14223765/
> [2] http://pasteboard.co/FR5QVrP.png
> [3] 
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/net/www/http/HttpClient.java#276
> [4] http://pasteboard.co/FR8okYM.png
> [5] 
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/net/www/protocol/http/HttpURLConnection.java#HttpURLConnection.streaming%28%29
>


-- 
Alessio Soldano
Web Service Lead, JBoss


Re: JAX-WS client performances

Posted by Alessio Soldano <as...@redhat.com>.
Hi Dan,
thanks for the feedback.

On 05/02/15 23:01, Daniel Kulp wrote:
> 1) Can you save/print a list of everything that is a “get” from the non-message contexts?   I’m mostly concerned about Bus/BindingOperation/Endpoint objects.   A long long time ago (2.0/2.1 timeframe), those objects were stored as contextual properties and the only way to retrieve them was via the “getContextualProperty” things.   Since we use those all over the place, I added direct accessors to the Exchange, but for compatibility still stored them as properties.   However, I don’t remember if I went and found all the places that used to look those up via the “get” methods and replaced them.   We probably could remove those from being stored as contextual properties and document that. (should have been done long ago)  Would reduce the number of properties to deal with.
I've added few simple System.out in the get method of "array" version 
and run the jbossws testsuite. No get of Bus/BindingOperation/Endpoint 
objects afaics; so I've tried grepping getContextualProperty in the 
whole cxf sources and the only thing I could find is 
org/apache/cxf/rs/security/saml/sso/AbstractRequestAssertionConsumerHandler 
getting the Bus from the context.
I'm sending you the zip with the logs.


> 2) In the “array” version, you could loop through once and just do an == compare and then only do the .equals if that ends up null.   I’m willing to bet most of the strings we use as keys are interned.
Tried that, but it's not improving performances. To be honest, though, 
my bench app is likely not stressing the 'get' that much, as it doesn't 
use WS-Security, etc.
Anyway, my gut feeling is that's also because the cache misses are by 
far much more then the cache hits (for instance, on client side in the 
jbossws testsuite I see ~5700 props found in the message context, ~350 
props found in non-message contexts and ~72000 props not found.

Cheers
Alessio

-- 
Alessio Soldano
Web Service Lead, JBoss


Re: JAX-WS client performances

Posted by Daniel Kulp <dk...@apache.org>.
Alessio,

Couple “thoughts”:

1) Can you save/print a list of everything that is a “get” from the non-message contexts?   I’m mostly concerned about Bus/BindingOperation/Endpoint objects.   A long long time ago (2.0/2.1 timeframe), those objects were stored as contextual properties and the only way to retrieve them was via the “getContextualProperty” things.   Since we use those all over the place, I added direct accessors to the Exchange, but for compatibility still stored them as properties.   However, I don’t remember if I went and found all the places that used to look those up via the “get” methods and replaced them.   We probably could remove those from being stored as contextual properties and document that. (should have been done long ago)  Would reduce the number of properties to deal with.

2) In the “array” version, you could loop through once and just do an == compare and then only do the .equals if that ends up null.   I’m willing to bet most of the strings we use as keys are interned.


Dan


> On Feb 5, 2015, at 4:25 PM, Alessio Soldano <as...@redhat.com> wrote:
> 
> I've posted two comments, each with a patch file, on https://issues.apache.org/jira/browse/CXF-6227 .
> Any feedback, please let me know :-)
> 
> Cheers
> Alessio
> 
> On 04/02/15 11:13, Alessio Soldano wrote:
>> On 03/02/15 23:52, Alessio Soldano wrote:
>>> so we'd have a get method computing the requested key's hash once and not for each sub-map, etc. I don't really know yet if this is doable,
>> it isn't... as each map has its own hash seed. Will go on thinking about alternatives though.
>> 
>> Alessio
> 
> 
> -- 
> Alessio Soldano
> Web Service Lead, JBoss
> 

-- 
Daniel Kulp
dkulp@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com


Re: JAX-WS client performances

Posted by Alessio Soldano <as...@redhat.com>.
I've posted two comments, each with a patch file, on 
https://issues.apache.org/jira/browse/CXF-6227 .
Any feedback, please let me know :-)

Cheers
Alessio

On 04/02/15 11:13, Alessio Soldano wrote:
> On 03/02/15 23:52, Alessio Soldano wrote:
>> so we'd have a get method computing the requested key's hash once and 
>> not for each sub-map, etc. I don't really know yet if this is doable,
> it isn't... as each map has its own hash seed. Will go on thinking 
> about alternatives though.
>
> Alessio


-- 
Alessio Soldano
Web Service Lead, JBoss


Re: JAX-WS client performances

Posted by Alessio Soldano <as...@redhat.com>.
On 03/02/15 23:52, Alessio Soldano wrote:
> so we'd have a get method computing the requested key's hash once and 
> not for each sub-map, etc. I don't really know yet if this is doable,
it isn't... as each map has its own hash seed. Will go on thinking about 
alternatives though.

Alessio

Re: JAX-WS client performances

Posted by Alessio Soldano <as...@redhat.com>.
Hi Dan,

On 03/02/15 21:02, Daniel Kulp wrote:
>> On Feb 3, 2015, at 8:11 AM, Alessio Soldano <as...@redhat.com> wrote:
>>
>> A brief update: I've committed the workaround for http connection issue (thanks Dan for the help on that!) as well as some other straightforward optimizations on stuff that popped up while profiling.
>>
>> Now, next "big" topic I found is the way we get and set properties in the message context. We spend a relevant amount of time in creating HashMap instances and especially in MessageImpl#calcContextCache, which copies (putAll) all the Bus, Service, Endpoint, etc. properties into the context cache. You can see [1] the cpu hotspot view I get currently with the previously mentioned test app. AFAICS in the source history, there used to be a different way for dealing with message properties in the past [2], then the cache mechanism was added. So I'm wondering if some kind of profiling / perf testing have been performed in the past and led to the changes. I might simply be testing an edge scenario, with very few properties being looked up and hence not justifying the caching mechanism.
>> Any comment / idea / suggestion?
> At one point, every “get” of a property would end up checking 4 or 5 hash maps which resulted in the keys being hashCoded many times, lots of checks, etc…    When you get into the WS-Security cases and some of the HTTP configuration cases where there are a bunch of keys being looked up, there was a LOT of time being spent on the lookups.
OK, I see, thanks; perhaps calls to "get" became a problem when you had 
a wide range of properties that could be there in the context, but 
actually were not, so you went through all the maps to eventually figure 
out the prop was not there... I wonder if a different cache approach 
(save the cache hit and cache miss entries in 2 maps as they're 
requested) could make sense or not.

>     For the most part, at the time, the maps were relatively small and the cost to build a single “context” map was small in comparison which is why this was done.
>
> That said, the size of the cache map is likely fairly small as well.   Maybe a dozen keys?  (and I'm willing to bet most of the keys are interned where a == would catch it)  Might be simpler to just use an Object[] or something.
I've tried printing the cache size while running the jbossws testsuite 
and on average I get values around 40~50 (with most of the stuff being 
in the exchange and message maps). So I'm not sure a plain array would 
be a better solution in terms of performances.
What I've been thinking about (but didn't actually try, at least not 
yet) is a custom HashMap extension that would work as a composition of 
multiple HashMap instances; so we'd have a get method computing the 
requested key's hash once and not for each sub-map, etc. I don't really 
know yet if this is doable, perhaps I can try providing something 
working that avoids all the maps copies (btw, I bet we have basically no 
key overlaps among the various maps for the same context).

Thanks for now
Alessio

-- 
Alessio Soldano
Web Service Lead, JBoss


Re: JAX-WS client performances

Posted by Daniel Kulp <dk...@apache.org>.
> On Feb 3, 2015, at 8:11 AM, Alessio Soldano <as...@redhat.com> wrote:
> 
> A brief update: I've committed the workaround for http connection issue (thanks Dan for the help on that!) as well as some other straightforward optimizations on stuff that popped up while profiling.
> 
> Now, next "big" topic I found is the way we get and set properties in the message context. We spend a relevant amount of time in creating HashMap instances and especially in MessageImpl#calcContextCache, which copies (putAll) all the Bus, Service, Endpoint, etc. properties into the context cache. You can see [1] the cpu hotspot view I get currently with the previously mentioned test app. AFAICS in the source history, there used to be a different way for dealing with message properties in the past [2], then the cache mechanism was added. So I'm wondering if some kind of profiling / perf testing have been performed in the past and led to the changes. I might simply be testing an edge scenario, with very few properties being looked up and hence not justifying the caching mechanism.
> Any comment / idea / suggestion?

At one point, every “get” of a property would end up checking 4 or 5 hash maps which resulted in the keys being hashCoded many times, lots of checks, etc…    When you get into the WS-Security cases and some of the HTTP configuration cases where there are a bunch of keys being looked up, there was a LOT of time being spent on the lookups.   For the most part, at the time, the maps were relatively small and the cost to build a single “context” map was small in comparison which is why this was done.   

That said, the size of the cache map is likely fairly small as well.   Maybe a dozen keys?  (and I'm willing to bet most of the keys are interned where a == would catch it)  Might be simpler to just use an Object[] or something.

Dan



> 
> Cheers
> Alessio
> 
> [1] http://pasteboard.co/QgiD4Af.png
> [2]
> 
> On 27/01/15 18:14, Alessio Soldano wrote:
>> Hi,
>> my attention has been recently brought to a scenario in which an Apache CXF client invokes an endpoint operation in a loop and the number of invocations performed in a given amount of time (say, 2 minutes) is used as benchmark for measuring WS stack performances. It's actually a very simplistic scenario, with a plain JAX-WS single thread client sending and receiving small RPC/Lit SOAP messages [1]. The reason why I've been asked to have a look is that with default settings the Apache CXF JAX-WS impl seems to perform *shamefully* bad compared to the Metro (JAX-WS RI) implementation. I've been blaming the user log configuration, etc but when I eventually tried on my own I could actually reproduce the bad results. I've been profiling a bit and found few hot spot area where CXF could possibly be optimized, but the big issue really seems to be at the HTTPCounduit / HTTPURLConnection level.
>> I found that almost all the invocations end up into sun.net.www.http.HttpClient.New(..) calling available() method [2] as part of the process for re-using cached connections [3]; that goes to the wire to try reading and takes a lot of time.
>> When the RI does the equivalent operation, the available() method is not called [4], resulting in much better performances.
>> By looking at the JDK code, it looks to me that the problem boils down to sun.net.www.protocol.http.HttpURLConnection#streaming() [5] returning different values, as a consequence of the fixedContentLenght attribute being set to a value different from -1 when running on CXF only. As a matter of fact, that is set when HTTPConduit.WrappedOutputStream#thresholdNotReached() is called, whenever a message is completely written to the outpustream buffer before the chunking threshold is reached (at least AFAIU). I've searched through the JAX-WS RI and could not find any place where setFixedLengthStreamingMode is called on the connection instead.
>> So, I've performed two quick and dirty tries: the first time I forced allowChunking = false on the client policy, the second time I commented out the code in HTTPConduit.WrappedOutputStream#thresholdNotReached(). In both cases I managed to get performances comparable to what I can get with the JAX-WS RI.
>> Now, few questions:
>> - are we really required to call setFixedLengthStreamingMode as we currently do? what's the drawback of not calling it?
>> - should we actually do something for getting decent performances by default in this scenario? (not sure expecting the user to disable chunking is that an option...)
>> As a side note, the relevant part of the JDK HttpClient code changed between JDK6 and JDK7, so things have not always been as explained above...
>> 
>> Cheers
>> Alessio
>> 
>> 
>> [1] http://www.fpaste.org/176166/14223765/
>> [2] http://pasteboard.co/FR5QVrP.png
>> [3] http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/net/www/http/HttpClient.java#276
>> [4] http://pasteboard.co/FR8okYM.png
>> [5] http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/net/www/protocol/http/HttpURLConnection.java#HttpURLConnection.streaming%28%29
>> 
> 
> 
> -- 
> Alessio Soldano
> Web Service Lead, JBoss
> 

-- 
Daniel Kulp
dkulp@apache.org - http://dankulp.com/blog
Talend Community Coder - http://coders.talend.com