You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Jared Jacobs <ja...@kaching.com> on 2009/11/05 02:25:12 UTC

Poor performance of pooled connections worse in 4.0

Hi there. I'm new to the group. Just upgraded from 3.1 to 4.0 for a
high-traffic production server cluster and noticed a drop in performance.
Requests are consistently taking ~40% longer. Disabling *
http.connection.stalecheck* had little impact.

While investigating the issue, I noticed that switching from a shared
HttpClient with a ThreadSafeClientConnManager to a new simple HttpClient per
request cuts down minimum and average request times dramatically (over 80%).

It seems the overhead for pooling and reusing connections dwarfs the
overhead of establishing HTTP connections. Is this just me? Anyone else seen
this?

Jared

P.S.

Here's some of my raw benchmarking data. These numbers are for simple GETs
to http://www.google.com. The results are nearly identical for our
production situation (talking to a specific low-latency, non-Google web
service).

My benchmark just makes 50 requests to the same URL either serially or in
parallel. The timed code block is simply this:
   client.execute(newHttpRequest()).getEntity().consumeContent();

HttpClient 4.0

*ThreadSafeClientConnManager*
N=50, avg=305.8ms, min=218, max=444
N=50, avg=323.5ms, min=221, max=564
N=50, avg=519.9ms, min=223, max=1102
N=50, avg=410.2ms, min=197, max=693
N=50, avg=313.0ms, min=204, max=449

*SingleClientConnManager*
N=50, avg=36.1ms, min=20, max=474
N=50, avg=39.0ms, min=27, max=395
N=50, avg=37.9ms, min=28, max=368

HttpClient 3.1 (for comparison)

*MultiThreadedHttpConnectionManager*

N=50, avg=221.7, min=122, max=350
N=50, avg=215.3, min=133, max=303
N=50, avg=205.1, min=132, max=284
N=50, avg=170.6, min=105, max=250
N=50, avg=276.3, min=102, max=525

*SimpleHttpConnectionManager*

N=50, avg=37.9, min=29, max=173
N=50, avg=29.8, min=19, max=198
N=50, avg=26.1, min=18, max=143
N=50, avg=27.7, min=18, max=147
N=50, avg=29.7, min=20, max=189

Re: Poor performance of pooled connections worse in 4.0

Posted by Jared Jacobs <ja...@kaching.com>.
Hi Tony.

Oleg was right to question my initial benchmark. In the connection pool
benchmark, I was only issuing one request per thread, and contrary to my
intuition, each thread runs slowly at first, even if other threads have
already run the same code (i.e. it has already been JIT compiled).
Subsequent requests in each thread were much faster. It didn't take anywhere
near 10,000 requests to reach stable, optimal performance, though. The
second and third requests in each thread, for example, were just as fast as
the rest.

You might be wondering if the slowness of the first request that I'm talking
about can be attributed to establishing an HTTP connection that is then
reused by subsequent requests. Nope. I saw the same effect even when using a
new HttpClient per request (i.e. no connection pooling or reuse).

Initially, my observations on one of our production servers rose two
questions in my mind:
1) Is the overhead for pooling and reusing connections greater than the
overhead of establishing new HTTP connections?
2) Is the performance of pooled connections significantly worse in
HttpClient 4.0 than in 3.1?

After improving my benchmarks, I've concluded that, in general, the answer
to both questions is "no" (as one would expect). Reusing connections
increases throughput a bit when there's a steady stream of requests that
need to be made to a particular host, and 4.0's
ThreadSafeClientConnManager performs
roughly as well as 3.1's MultiThreadedHttpConnectionManager. Our server that
was having problems was both low on memory and occasionally CPU bound. I
believe it was those conditions that made 4.0's ThreadSafeClientConnManager
slow for us when we first switched to it in production.

I'm still not certain which configuration minimizes request latency when
requests need to made to a single host pretty often, but at random times. I
hope to have the time to answer this question for our use case by
experimentation in the coming week.

Regards,
Jared


On Thu, Nov 12, 2009 at 5:38 AM, Tony Poppleton
<to...@wanadoo.fr>wrote:

> Hi Jared,
>
> I would be very interested to know if you have made any further progress on
> this potential problem.
>
> Thanks,
> Tony
>
> Jared Jacobs wrote:
>
>> Thanks again for your response, Oleg.
>>
>> We're dealing with a latency problem. The total duration of each and every
>> request needs to be reliably small. Throughput isn't very important in our
>> application.
>>
>> I'll get to the bottom of the issue with some profiling.
>>
>> Cheers,
>> Jared
>>
>>
>> ---------------------------------------------------------------------------------------
>> Orange vous informe que cet  e-mail a ete controle par l'anti-virus mail.
>> Aucun virus connu a ce jour par nos services n'a ete detecte.
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

Re: Poor performance of pooled connections worse in 4.0

Posted by Tony Poppleton <to...@wanadoo.fr>.
Hi Jared,

I would be very interested to know if you have made any further progress 
on this potential problem.

Thanks,
Tony

Jared Jacobs wrote:
> Thanks again for your response, Oleg.
>
> We're dealing with a latency problem. The total duration of each and every
> request needs to be reliably small. Throughput isn't very important in our
> application.
>
> I'll get to the bottom of the issue with some profiling.
>
> Cheers,
> Jared
>
> ---------------------------------------------------------------------------------------
> Orange vous informe que cet  e-mail a ete controle par l'anti-virus mail. 
> Aucun virus connu a ce jour par nos services n'a ete detecte.
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Poor performance of pooled connections worse in 4.0

Posted by Jared Jacobs <ja...@kaching.com>.
Thanks again for your response, Oleg.

We're dealing with a latency problem. The total duration of each and every
request needs to be reliably small. Throughput isn't very important in our
application.

I'll get to the bottom of the issue with some profiling.

Cheers,
Jared

Re: Poor performance of pooled connections worse in 4.0

Posted by Oleg Kalnichevski <ol...@apache.org>.
Jared Jacobs wrote:
>> What is the point of having 10 connection limit and using 50 worker
>> threads?
>>
> 
> To simulate heavy load (more demand than supply). It's irrelevant. Make the
> numbers the same and you'll still see the latency problem.
> 

With 50 requests you will not even warm up the JIT compiler. You have to 
be doing 200,000 requests at the very least.

> 
>> Regardless of what the max connections per host limit is set to, though, at
>>> least the first request should not block at all.
>>>
>> Why?
> 
> 
> Because the pool is empty the first time around. To be clear, I meant block
> *waiting for a connection from the pool to become available*.
> 

It takes some time to set up a pool and open a connection.

> All these numbers are meaningless given such a small number of requests.
>>>  You should be executing 10,000 HTTP requests in order to get any meaningful
>>> performance data.
>>
> In our production environment, we're talking to reliable services with
> strict SLAs over fast, reliable connections. We do tens of thousands of
> requests, and as I mentioned in my first email, we saw a large increase in
> request times when we upgraded from httpclient 3.1 to 4.0.
> 
> Some things become clearer when you isolate factors. Hence my benchmarks.
> Any statistician will tell you that when the minimum of 50 samples increases
> by 10x, it is statistically significant. The samples are coming from two
> very different populations.
> 
> Oleg, I've sent you my benchmark source code off-list. Feel free to use it,
> or not. I would be interested in any performance measurements that you or
> someone else at Apache has done.
> 

Your benchmark is meaningless. You are comparing the execution speed of 
50 request over 50 connections to 50 requests over 10 connection. WHAT 
IS IT _EXACTLY_ you are trying to measure with your benchmark?

Here's the code I used to compare _throughput_ in terms of requests per 
second of different HTTP client transports.

Feel free to adapt this code to your particular needs

http://wiki.apache.org/HttpComponents/HttpClient3vsHttpClient4vsHttpCore

Oleg

> Best regards,
> Jared
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Poor performance of pooled connections worse in 4.0

Posted by Jared Jacobs <ja...@kaching.com>.
>
> What is the point of having 10 connection limit and using 50 worker
> threads?
>

To simulate heavy load (more demand than supply). It's irrelevant. Make the
numbers the same and you'll still see the latency problem.


> Regardless of what the max connections per host limit is set to, though, at
>> least the first request should not block at all.
>>
>
> Why?


Because the pool is empty the first time around. To be clear, I meant block
*waiting for a connection from the pool to become available*.

All these numbers are meaningless given such a small number of requests.
>>  You should be executing 10,000 HTTP requests in order to get any meaningful
>> performance data.
>
>
In our production environment, we're talking to reliable services with
strict SLAs over fast, reliable connections. We do tens of thousands of
requests, and as I mentioned in my first email, we saw a large increase in
request times when we upgraded from httpclient 3.1 to 4.0.

Some things become clearer when you isolate factors. Hence my benchmarks.
Any statistician will tell you that when the minimum of 50 samples increases
by 10x, it is statistically significant. The samples are coming from two
very different populations.

Oleg, I've sent you my benchmark source code off-list. Feel free to use it,
or not. I would be interested in any performance measurements that you or
someone else at Apache has done.

Best regards,
Jared

Re: Poor performance of pooled connections worse in 4.0

Posted by Oleg Kalnichevski <ol...@apache.org>.
Jared Jacobs wrote:
> Thanks for the response, Oleg.
> 
> Have you increased the max limit on connections per host, which is set to 2
>> per default?
> 
> 
> Yes, I did increase the limit. Here's how I initialized the HttpClient:
> 
>   private static HttpClient newMultiThreadedHttpClient() {
>     return new DefaultHttpClient(
>         new ThreadSafeClientConnManager(
>             new BasicHttpParams()
>               .setParameter(STALE_CONNECTION_CHECK, false)
>               .setParameter(MAX_TOTAL_CONNECTIONS, 10)
>               .setParameter(MAX_CONNECTIONS_PER_ROUTE, new ConnPerRoute() {
>                 public int getMaxForRoute(HttpRoute route) {
>                   return 10;
>                 }}),
>             createSchemeRegistry()),
>         null);
>   }
> 

What is the point of having 10 connection limit and using 50 worker 
threads?

> Regardless of what the max connections per host limit is set to, though, at
> least the first request should not block at all.

Why?

  Notice that the
> *minimum* elapsed
> time of the N=50 requests done in each of my trial runs are all very high
> when using pooled connections. This means that even the first request is
> consistently slow.
> 

All these numbers are meaningless given such a small number of requests. 
  You should be executing 10,000 HTTP requests in order to get any 
meaningful performance data.

> I'd be happy to send you my benchmark source code. It's a single file, 100
> lines.
> 

Send the log of the session with connection pooling.

> For now, we're content using a disposable HttpClient per request. I hope to
> have time to profile and investigate the connection pooling issue further
> soon.
> 
> My main reason for posting to this list was the hope that someone would be
> able to contradict me, ideally with measurements of their own.
> 

I suspect your measurements are flawed, mainly due to unrepresentative 
number of requests they are based upon.

Oleg

> Regards,
> Jared
> 
> 
> On Thu, Nov 5, 2009 at 1:18 PM, Oleg Kalnichevski <ol...@apache.org> wrote:
> 
>> Jared Jacobs wrote:
>>
>>> Hi there. I'm new to the group. Just upgraded from 3.1 to 4.0 for a
>>> high-traffic production server cluster and noticed a drop in performance.
>>> Requests are consistently taking ~40% longer. Disabling *
>>> http.connection.stalecheck* had little impact.
>>>
>>> While investigating the issue, I noticed that switching from a shared
>>> HttpClient with a ThreadSafeClientConnManager to a new simple HttpClient
>>> per
>>> request cuts down minimum and average request times dramatically (over
>>> 80%).
>>>
>>> It seems the overhead for pooling and reusing connections dwarfs the
>>> overhead of establishing HTTP connections. Is this just me? Anyone else
>>> seen
>>> this?
>>>
>>> Jared
>>>
>>>
>> Jared
>>
>> Have you increased the max limit on connections per host, which is set to 2
>> per default? Most likely your 50 worker threads spend most of their time
>> blocked waiting for one of those two connections to become available.
>>
>> You can see what exactly is happening with the connection pool using the
>> following logging config:
>>
>> -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.SimpleLog
>> -Dorg.apache.commons.logging.simplelog.showdatetime=true
>> -Dorg.apache.commons.logging.simplelog.log.org.apache.http.impl.conn=DEBUG
>>
>> Hope this helps
>>
>> Oleg
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Poor performance of pooled connections worse in 4.0

Posted by Jared Jacobs <ja...@kaching.com>.
Thanks for the response, Oleg.

Have you increased the max limit on connections per host, which is set to 2
> per default?


Yes, I did increase the limit. Here's how I initialized the HttpClient:

  private static HttpClient newMultiThreadedHttpClient() {
    return new DefaultHttpClient(
        new ThreadSafeClientConnManager(
            new BasicHttpParams()
              .setParameter(STALE_CONNECTION_CHECK, false)
              .setParameter(MAX_TOTAL_CONNECTIONS, 10)
              .setParameter(MAX_CONNECTIONS_PER_ROUTE, new ConnPerRoute() {
                public int getMaxForRoute(HttpRoute route) {
                  return 10;
                }}),
            createSchemeRegistry()),
        null);
  }

Regardless of what the max connections per host limit is set to, though, at
least the first request should not block at all. Notice that the
*minimum* elapsed
time of the N=50 requests done in each of my trial runs are all very high
when using pooled connections. This means that even the first request is
consistently slow.

I'd be happy to send you my benchmark source code. It's a single file, 100
lines.

For now, we're content using a disposable HttpClient per request. I hope to
have time to profile and investigate the connection pooling issue further
soon.

My main reason for posting to this list was the hope that someone would be
able to contradict me, ideally with measurements of their own.

Regards,
Jared


On Thu, Nov 5, 2009 at 1:18 PM, Oleg Kalnichevski <ol...@apache.org> wrote:

> Jared Jacobs wrote:
>
>> Hi there. I'm new to the group. Just upgraded from 3.1 to 4.0 for a
>> high-traffic production server cluster and noticed a drop in performance.
>> Requests are consistently taking ~40% longer. Disabling *
>> http.connection.stalecheck* had little impact.
>>
>> While investigating the issue, I noticed that switching from a shared
>> HttpClient with a ThreadSafeClientConnManager to a new simple HttpClient
>> per
>> request cuts down minimum and average request times dramatically (over
>> 80%).
>>
>> It seems the overhead for pooling and reusing connections dwarfs the
>> overhead of establishing HTTP connections. Is this just me? Anyone else
>> seen
>> this?
>>
>> Jared
>>
>>
> Jared
>
> Have you increased the max limit on connections per host, which is set to 2
> per default? Most likely your 50 worker threads spend most of their time
> blocked waiting for one of those two connections to become available.
>
> You can see what exactly is happening with the connection pool using the
> following logging config:
>
> -Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.SimpleLog
> -Dorg.apache.commons.logging.simplelog.showdatetime=true
> -Dorg.apache.commons.logging.simplelog.log.org.apache.http.impl.conn=DEBUG
>
> Hope this helps
>
> Oleg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
>

Re: Poor performance of pooled connections worse in 4.0

Posted by Oleg Kalnichevski <ol...@apache.org>.
Jared Jacobs wrote:
> Hi there. I'm new to the group. Just upgraded from 3.1 to 4.0 for a
> high-traffic production server cluster and noticed a drop in performance.
> Requests are consistently taking ~40% longer. Disabling *
> http.connection.stalecheck* had little impact.
> 
> While investigating the issue, I noticed that switching from a shared
> HttpClient with a ThreadSafeClientConnManager to a new simple HttpClient per
> request cuts down minimum and average request times dramatically (over 80%).
> 
> It seems the overhead for pooling and reusing connections dwarfs the
> overhead of establishing HTTP connections. Is this just me? Anyone else seen
> this?
> 
> Jared
> 

Jared

Have you increased the max limit on connections per host, which is set 
to 2 per default? Most likely your 50 worker threads spend most of their 
time blocked waiting for one of those two connections to become available.

You can see what exactly is happening with the connection pool using the 
following logging config:

-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.SimpleLog
-Dorg.apache.commons.logging.simplelog.showdatetime=true
-Dorg.apache.commons.logging.simplelog.log.org.apache.http.impl.conn=DEBUG

Hope this helps

Oleg

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org