You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Yan Cheng Cheok <yc...@yahoo.com> on 2009/08/16 15:11:33 UTC

Best Practice to Use HttpClient in Multithreaded Environment

Hi all,

All the while, I am using HttpClient in multithreaded environment. For every threads, when they initiate a connection, they will create a complete new HttpClient instance.

Recently, I discover, by using this approach, it can cause the user is having too many port being opened, and most of the connections are in TIME_WAIT state.

http://www.opensubscriber.com/message/commons-httpclient-dev@jakarta.apache.org/86045.html

Hence, instead of per thread doing :
HttpClient c = new HttpClient();
try {
    c.executeMethod(method);
}
catch(...) {
}
finally {
    method.releaseConnection();
}


We plan to have :

[METHOD A]

// global_c is initialized once through
// HttpClient global_c = new HttpClient(new MultiThreadedHttpConnectionManager());

try {
    global_c.executeMethod(method);
}
catch(...) {
}
finally {
    method.releaseConnection();
}

In normal situation, global_c will be accessed by 50++ threads concurrently. I was wondering, whether this will occur any performance issue? Is MultiThreadedHttpConnectionManager using lock-free mechanism to implement its thread safe policy?

It is possible if 10 threads are using global_c, will the other 40 threads being locked?

Or will it better if in every threads, I create a instance for every HttpClient, but release the connection manager explicitly.

[METHOD B]
HttpClient c = new HttpClient();
try {
    c.executeMethod(method);
}
catch(...) {
}
finally {
    method.releaseConnection();
    c.getHttpConnectionManager().shutdown();
}

Is c.getHttpConnectionManager().shutdown() suffer performance issues?

May I know which method (A or B) is better, for application using 50++ threads?

I am using HttpClient 3.1

Thanks and Regards
Yan Cheng Cheok



      

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Best Practice to Use HttpClient in Multithreaded Environment

Posted by Ken Krugler <kk...@transpac.com>.
On Aug 16, 2009, at 7:45pm, yccheok wrote:

>
> Hi Ken,
>
> Can you elaborate more on "and if these exceeds a (configurable)  
> limit,
> you'll get an exception."?

At least in HttpClient 4.0, if you make a request, and there's no free  
connection in the pool, then your request blocks until a connection  
becomes available (because some other request completed). If too much  
time passes with the request blocked, the blocked request throws an  
exception.

-- Ken


> I am not quite understand.
>
>
> Ken Krugler wrote:
>>
>> Hi Yan Cheng,
>>
>> I haven't used HttpClient 3.x for a while - switched to 4.0 and
>> haven't looked back.
>>
>> But in general method A is going to work better. You can configure  
>> the
>> MultiThreadedHttpConnectionManager with a maximum number of threads -
>> e.g. you could pick a number equal to the max # of threads that you
>> know will be using it. If it's configured with less than the max
>> number of threads, then some of your connection requests will block
>> until a free connection becomes available - and if these exceeds a
>> (configurable) limit, you'll get an exception.
>>
>> In extreme situations I've run with up to 1000 threads and one
>> connection manager, so I don't think you'll hit any limits there.
>>
>> -- Ken
>>
>>
>> On Aug 16, 2009, at 6:11am, Yan Cheng Cheok wrote:
>>
>>> Hi all,
>>>
>>> All the while, I am using HttpClient in multithreaded environment.
>>> For every threads, when they initiate a connection, they will create
>>> a complete new HttpClient instance.
>>>
>>> Recently, I discover, by using this approach, it can cause the user
>>> is having too many port being opened, and most of the connections
>>> are in TIME_WAIT state.
>>>
>>> http://www.opensubscriber.com/message/commons-httpclient-dev@jakarta.apache.org/86045.html
>>>
>>> Hence, instead of per thread doing :
>>> HttpClient c = new HttpClient();
>>> try {
>>>   c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>> }
>>>
>>>
>>> We plan to have :
>>>
>>> [METHOD A]
>>>
>>> // global_c is initialized once through
>>> // HttpClient global_c = new HttpClient(new
>>> MultiThreadedHttpConnectionManager());
>>>
>>> try {
>>>   global_c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>> }
>>>
>>> In normal situation, global_c will be accessed by 50++ threads
>>> concurrently. I was wondering, whether this will occur any
>>> performance issue? Is MultiThreadedHttpConnectionManager using lock-
>>> free mechanism to implement its thread safe policy?
>>>
>>> It is possible if 10 threads are using global_c, will the other 40
>>> threads being locked?
>>>
>>> Or will it better if in every threads, I create a instance for every
>>> HttpClient, but release the connection manager explicitly.
>>>
>>> [METHOD B]
>>> HttpClient c = new HttpClient();
>>> try {
>>>   c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>>   c.getHttpConnectionManager().shutdown();
>>> }
>>>
>>> Is c.getHttpConnectionManager().shutdown() suffer performance  
>>> issues?
>>>
>>> May I know which method (A or B) is better, for application using  
>>> 50+
>>> + threads?
>>>
>>> I am using HttpClient 3.1
>>>
>>> Thanks and Regards
>>> Yan Cheng Cheok


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Best Practice to Use HttpClient in Multithreaded Environment

Posted by yccheok <yc...@yahoo.com>.
Hi Ken,

Can you elaborate more on "and if these exceeds a (configurable) limit,
you'll get an exception."?

I am not quite understand.


Ken Krugler wrote:
> 
> Hi Yan Cheng,
> 
> I haven't used HttpClient 3.x for a while - switched to 4.0 and  
> haven't looked back.
> 
> But in general method A is going to work better. You can configure the  
> MultiThreadedHttpConnectionManager with a maximum number of threads -  
> e.g. you could pick a number equal to the max # of threads that you  
> know will be using it. If it's configured with less than the max  
> number of threads, then some of your connection requests will block  
> until a free connection becomes available - and if these exceeds a  
> (configurable) limit, you'll get an exception.
> 
> In extreme situations I've run with up to 1000 threads and one  
> connection manager, so I don't think you'll hit any limits there.
> 
> -- Ken
> 
> 
> On Aug 16, 2009, at 6:11am, Yan Cheng Cheok wrote:
> 
>> Hi all,
>>
>> All the while, I am using HttpClient in multithreaded environment.  
>> For every threads, when they initiate a connection, they will create  
>> a complete new HttpClient instance.
>>
>> Recently, I discover, by using this approach, it can cause the user  
>> is having too many port being opened, and most of the connections  
>> are in TIME_WAIT state.
>>
>> http://www.opensubscriber.com/message/commons-httpclient-dev@jakarta.apache.org/86045.html
>>
>> Hence, instead of per thread doing :
>> HttpClient c = new HttpClient();
>> try {
>>    c.executeMethod(method);
>> }
>> catch(...) {
>> }
>> finally {
>>    method.releaseConnection();
>> }
>>
>>
>> We plan to have :
>>
>> [METHOD A]
>>
>> // global_c is initialized once through
>> // HttpClient global_c = new HttpClient(new  
>> MultiThreadedHttpConnectionManager());
>>
>> try {
>>    global_c.executeMethod(method);
>> }
>> catch(...) {
>> }
>> finally {
>>    method.releaseConnection();
>> }
>>
>> In normal situation, global_c will be accessed by 50++ threads  
>> concurrently. I was wondering, whether this will occur any  
>> performance issue? Is MultiThreadedHttpConnectionManager using lock- 
>> free mechanism to implement its thread safe policy?
>>
>> It is possible if 10 threads are using global_c, will the other 40  
>> threads being locked?
>>
>> Or will it better if in every threads, I create a instance for every  
>> HttpClient, but release the connection manager explicitly.
>>
>> [METHOD B]
>> HttpClient c = new HttpClient();
>> try {
>>    c.executeMethod(method);
>> }
>> catch(...) {
>> }
>> finally {
>>    method.releaseConnection();
>>    c.getHttpConnectionManager().shutdown();
>> }
>>
>> Is c.getHttpConnectionManager().shutdown() suffer performance issues?
>>
>> May I know which method (A or B) is better, for application using 50+ 
>> + threads?
>>
>> I am using HttpClient 3.1
>>
>> Thanks and Regards
>> Yan Cheng Cheok
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>>
> 
> --------------------------
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-210-6378
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Best-Practice-to-Use-HttpClient-in-Multithreaded-Environment-tp24993345p25000299.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Best Practice to Use HttpClient in Multithreaded Environment

Posted by Ken Krugler <kk...@transpac.com>.
Hi Yan Cheung,

See below - but one caveat...Oleg could very well correct all of my  
comments below :)

On Aug 16, 2009, at 6:17pm, yccheok wrote:

> Hi Ken,
>
> So, in my case, I should set
>
> httpConnectionManagerParams.setDefaultMaxConnectionsPerHost(50);

Yes, if all of your requests will be coming from the same domain, and  
you're going to be hitting it with all 50 threads at the same time.  
But that's not a normal use case - hope you're really good friends  
with that site's ops team :)

E.g. in Bixo we configure HttpClient for one thread per host, as  
that's what you need for polite crawling.

> httpConnectionManagerParams.setMaxTotalConnections(50);
> // hostConfiguration will be obtained from HttpClient iteself.
> httpConnectionManagerParams.setMaxConnectionsPerHost(HostConfiguration
> hostConfiguration, 50);
>
> Is there any side effect of setting the number of too high, like 1000?

I don't know the details of how HttpClient (3.x or 4.x) allocates  
connections in the pool, but I assume they only create a connection  
when one is needed, there's no free connection, and the total number  
of connections is less than this limit.

So leaving aside issues of memory requirements, max # of open sockets,  
etc. that you'd hit with 1000 active connections, I don't think there  
would be any issue with using a large value.

> If compared to 100 HttpClient with maxConnection = 10 each, will  
> single
> HttpClient with maxConnection = 1000 performs better? Or it depends  
> case by
> case situation?

I think performance will mostly depend on the servers that you're  
accessing.

See http://ken-blog.krugler.org/2009/05/19/performance-problems-with-verticalfocused-web-crawling/ 
  for a blog post I wrote about crawl performance. This was using Bixo  
and HttpClient 4.0

> I know HttpClient does maintain its own connection pool. Does "this  
> figure"
> (1000) affect "number of simultaneous connections allowed" in a  
> given time?
> or "this figure" itself is the number of connections allowed in  
> HttpClient
> connection pool?

There are two HttpClient-based limits for maximum number of  
simultaneous connections - the max connections per host and the max  
total connections. Assuming you are hitting 1000 different hosts, then  
you could have 1000 simultaneous connections. Though you'll also  
typically run into other limits, like running out of system memory due  
to the amount of stack space used per thread, or DNS lookups becoming  
slow, etc.

-- Ken


> Ken Krugler wrote:
>>
>> Hi Yan Cheng,
>>
>> I haven't used HttpClient 3.x for a while - switched to 4.0 and
>> haven't looked back.
>>
>> But in general method A is going to work better. You can configure  
>> the
>> MultiThreadedHttpConnectionManager with a maximum number of threads -
>> e.g. you could pick a number equal to the max # of threads that you
>> know will be using it. If it's configured with less than the max
>> number of threads, then some of your connection requests will block
>> until a free connection becomes available - and if these exceeds a
>> (configurable) limit, you'll get an exception.
>>
>> In extreme situations I've run with up to 1000 threads and one
>> connection manager, so I don't think you'll hit any limits there.
>>
>> -- Ken
>>
>>
>> On Aug 16, 2009, at 6:11am, Yan Cheng Cheok wrote:
>>
>>> Hi all,
>>>
>>> All the while, I am using HttpClient in multithreaded environment.
>>> For every threads, when they initiate a connection, they will create
>>> a complete new HttpClient instance.
>>>
>>> Recently, I discover, by using this approach, it can cause the user
>>> is having too many port being opened, and most of the connections
>>> are in TIME_WAIT state.
>>>
>>> http://www.opensubscriber.com/message/commons-httpclient-dev@jakarta.apache.org/86045.html
>>>
>>> Hence, instead of per thread doing :
>>> HttpClient c = new HttpClient();
>>> try {
>>>   c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>> }
>>>
>>>
>>> We plan to have :
>>>
>>> [METHOD A]
>>>
>>> // global_c is initialized once through
>>> // HttpClient global_c = new HttpClient(new
>>> MultiThreadedHttpConnectionManager());
>>>
>>> try {
>>>   global_c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>> }
>>>
>>> In normal situation, global_c will be accessed by 50++ threads
>>> concurrently. I was wondering, whether this will occur any
>>> performance issue? Is MultiThreadedHttpConnectionManager using lock-
>>> free mechanism to implement its thread safe policy?
>>>
>>> It is possible if 10 threads are using global_c, will the other 40
>>> threads being locked?
>>>
>>> Or will it better if in every threads, I create a instance for every
>>> HttpClient, but release the connection manager explicitly.
>>>
>>> [METHOD B]
>>> HttpClient c = new HttpClient();
>>> try {
>>>   c.executeMethod(method);
>>> }
>>> catch(...) {
>>> }
>>> finally {
>>>   method.releaseConnection();
>>>   c.getHttpConnectionManager().shutdown();
>>> }
>>>
>>> Is c.getHttpConnectionManager().shutdown() suffer performance  
>>> issues?
>>>
>>> May I know which method (A or B) is better, for application using  
>>> 50+
>>> + threads?
>>>
>>> I am using HttpClient 3.1
>>>
>>> Thanks and Regards
>>> Yan Cheng Cheok


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Best Practice to Use HttpClient in Multithreaded Environment

Posted by yccheok <yc...@yahoo.com>.
Hi Ken,

So, in my case, I should set 

httpConnectionManagerParams.setDefaultMaxConnectionsPerHost(50);
httpConnectionManagerParams.setMaxTotalConnections(50);
// hostConfiguration will be obtained from HttpClient iteself.
httpConnectionManagerParams.setMaxConnectionsPerHost(HostConfiguration
hostConfiguration, 50);

Is there any side effect of setting the number of too high, like 1000? 

If compared to 100 HttpClient with maxConnection = 10 each, will single
HttpClient with maxConnection = 1000 performs better? Or it depends case by
case situation?

I know HttpClient does maintain its own connection pool. Does "this figure"
(1000) affect "number of simultaneous connections allowed" in a given time?
or "this figure" itself is the number of connections allowed in HttpClient
connection pool?

Thanks!


Ken Krugler wrote:
> 
> Hi Yan Cheng,
> 
> I haven't used HttpClient 3.x for a while - switched to 4.0 and  
> haven't looked back.
> 
> But in general method A is going to work better. You can configure the  
> MultiThreadedHttpConnectionManager with a maximum number of threads -  
> e.g. you could pick a number equal to the max # of threads that you  
> know will be using it. If it's configured with less than the max  
> number of threads, then some of your connection requests will block  
> until a free connection becomes available - and if these exceeds a  
> (configurable) limit, you'll get an exception.
> 
> In extreme situations I've run with up to 1000 threads and one  
> connection manager, so I don't think you'll hit any limits there.
> 
> -- Ken
> 
> 
> On Aug 16, 2009, at 6:11am, Yan Cheng Cheok wrote:
> 
>> Hi all,
>>
>> All the while, I am using HttpClient in multithreaded environment.  
>> For every threads, when they initiate a connection, they will create  
>> a complete new HttpClient instance.
>>
>> Recently, I discover, by using this approach, it can cause the user  
>> is having too many port being opened, and most of the connections  
>> are in TIME_WAIT state.
>>
>> http://www.opensubscriber.com/message/commons-httpclient-dev@jakarta.apache.org/86045.html
>>
>> Hence, instead of per thread doing :
>> HttpClient c = new HttpClient();
>> try {
>>    c.executeMethod(method);
>> }
>> catch(...) {
>> }
>> finally {
>>    method.releaseConnection();
>> }
>>
>>
>> We plan to have :
>>
>> [METHOD A]
>>
>> // global_c is initialized once through
>> // HttpClient global_c = new HttpClient(new  
>> MultiThreadedHttpConnectionManager());
>>
>> try {
>>    global_c.executeMethod(method);
>> }
>> catch(...) {
>> }
>> finally {
>>    method.releaseConnection();
>> }
>>
>> In normal situation, global_c will be accessed by 50++ threads  
>> concurrently. I was wondering, whether this will occur any  
>> performance issue? Is MultiThreadedHttpConnectionManager using lock- 
>> free mechanism to implement its thread safe policy?
>>
>> It is possible if 10 threads are using global_c, will the other 40  
>> threads being locked?
>>
>> Or will it better if in every threads, I create a instance for every  
>> HttpClient, but release the connection manager explicitly.
>>
>> [METHOD B]
>> HttpClient c = new HttpClient();
>> try {
>>    c.executeMethod(method);
>> }
>> catch(...) {
>> }
>> finally {
>>    method.releaseConnection();
>>    c.getHttpConnectionManager().shutdown();
>> }
>>
>> Is c.getHttpConnectionManager().shutdown() suffer performance issues?
>>
>> May I know which method (A or B) is better, for application using 50+ 
>> + threads?
>>
>> I am using HttpClient 3.1
>>
>> Thanks and Regards
>> Yan Cheng Cheok
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>>
> 
> --------------------------
> Ken Krugler
> TransPac Software, Inc.
> <http://www.transpac.com>
> +1 530-210-6378
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Best-Practice-to-Use-HttpClient-in-Multithreaded-Environment-tp24993345p24999782.html
Sent from the HttpClient-User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Best Practice to Use HttpClient in Multithreaded Environment

Posted by Ken Krugler <kk...@transpac.com>.
Hi Yan Cheng,

I haven't used HttpClient 3.x for a while - switched to 4.0 and  
haven't looked back.

But in general method A is going to work better. You can configure the  
MultiThreadedHttpConnectionManager with a maximum number of threads -  
e.g. you could pick a number equal to the max # of threads that you  
know will be using it. If it's configured with less than the max  
number of threads, then some of your connection requests will block  
until a free connection becomes available - and if these exceeds a  
(configurable) limit, you'll get an exception.

In extreme situations I've run with up to 1000 threads and one  
connection manager, so I don't think you'll hit any limits there.

-- Ken


On Aug 16, 2009, at 6:11am, Yan Cheng Cheok wrote:

> Hi all,
>
> All the while, I am using HttpClient in multithreaded environment.  
> For every threads, when they initiate a connection, they will create  
> a complete new HttpClient instance.
>
> Recently, I discover, by using this approach, it can cause the user  
> is having too many port being opened, and most of the connections  
> are in TIME_WAIT state.
>
> http://www.opensubscriber.com/message/commons-httpclient-dev@jakarta.apache.org/86045.html
>
> Hence, instead of per thread doing :
> HttpClient c = new HttpClient();
> try {
>    c.executeMethod(method);
> }
> catch(...) {
> }
> finally {
>    method.releaseConnection();
> }
>
>
> We plan to have :
>
> [METHOD A]
>
> // global_c is initialized once through
> // HttpClient global_c = new HttpClient(new  
> MultiThreadedHttpConnectionManager());
>
> try {
>    global_c.executeMethod(method);
> }
> catch(...) {
> }
> finally {
>    method.releaseConnection();
> }
>
> In normal situation, global_c will be accessed by 50++ threads  
> concurrently. I was wondering, whether this will occur any  
> performance issue? Is MultiThreadedHttpConnectionManager using lock- 
> free mechanism to implement its thread safe policy?
>
> It is possible if 10 threads are using global_c, will the other 40  
> threads being locked?
>
> Or will it better if in every threads, I create a instance for every  
> HttpClient, but release the connection manager explicitly.
>
> [METHOD B]
> HttpClient c = new HttpClient();
> try {
>    c.executeMethod(method);
> }
> catch(...) {
> }
> finally {
>    method.releaseConnection();
>    c.getHttpConnectionManager().shutdown();
> }
>
> Is c.getHttpConnectionManager().shutdown() suffer performance issues?
>
> May I know which method (A or B) is better, for application using 50+ 
> + threads?
>
> I am using HttpClient 3.1
>
> Thanks and Regards
> Yan Cheng Cheok
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>

--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org