You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by vigna <vi...@di.unimi.it> on 2013/01/05 23:38:21 UTC

Suggested parameters for highly parallel async client

Is there any suggestion for parameters of the asynchronous client in case of
several thousands parallel requests (e.g., for the IOReactor)? We are
experimenting both with DefaulHttpClient and DefaultHttpAsyncClient, and
with the same configuration (e.g., 4000 threads using DefaultHttpClient or
64 threads pushing 4000 async requests into a default
DefaultHttpAsyncClient) we see completely different behaviours. The sync
client fetches more than 10000 pages/s, the async client speed fetches
50 p/s. 

Should we increase the number of threads or the I/O interval of the
IOReactor? Or are we doing something really stupid?



--
View this message in context: http://httpcomponents.10934.n7.nabble.com/Suggested-parameters-for-highly-parallel-async-client-tp18644.html
Sent from the HttpClient-User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sat, 2013-02-09 at 08:24 -0800, vigna wrote:
> Oleg Kalnichevski wrote
> > I am working on improving HttpAsyncClient performance right now and I
> > expect it to get better, but overall with a relatively small number of
> > concurrent connections (<1000) I expect HttpClient to outperform it by
> > 25-30%.
> 
> Well, our code is conditional, so we can test HttpAsyncClient any time by
> switching a boolean. As I said, we tried even 10000 parallel connections,
> but we could not push it beyond 5000 p/s. If it becomes faster things might
> get interesting though... :) Please let us know if you want us to test it.
> 
> 
The main characteristic of NIO is ability to handle thousands of
concurrent connection is a predictable manner rather than data
throughput. 

I'll let you know once I have something you could take for a spin.

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by vigna <vi...@di.unimi.it>.
Oleg Kalnichevski wrote
> I am working on improving HttpAsyncClient performance right now and I
> expect it to get better, but overall with a relatively small number of
> concurrent connections (<1000) I expect HttpClient to outperform it by
> 25-30%.

Well, our code is conditional, so we can test HttpAsyncClient any time by
switching a boolean. As I said, we tried even 10000 parallel connections,
but we could not push it beyond 5000 p/s. If it becomes faster things might
get interesting though... :) Please let us know if you want us to test it.




--
View this message in context: http://httpcomponents.10934.n7.nabble.com/Suggested-parameters-for-highly-parallel-async-client-tp18644p19269.html
Sent from the HttpClient-User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Fri, 2013-02-08 at 15:23 -0800, vigna wrote:
> Just to complete the discussion, I finally set up the async client properly
> and it's pulling around 5000p/s, whereas the blocking client (with 1000
> threads) has peaks of 10000-15000 p/s, and average around 8000p/s, which is
> in line with your predictions.
> 

I am working on improving HttpAsyncClient performance right now and I
expect it to get better, but overall with a relatively small number of
concurrent connections (<1000) I expect HttpClient to outperform it by
25-30%.

Cheers

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by vigna <vi...@di.unimi.it>.
Just to complete the discussion, I finally set up the async client properly
and it's pulling around 5000p/s, whereas the blocking client (with 1000
threads) has peaks of 10000-15000 p/s, and average around 8000p/s, which is
in line with your predictions.



--
View this message in context: http://httpcomponents.10934.n7.nabble.com/Suggested-parameters-for-highly-parallel-async-client-tp18644p19261.html
Sent from the HttpClient-User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by vigna <vi...@di.unimi.it>.
We use a WritableByteChannel (backed first by memory, then by disk) to store
the data reported in the ByteBuffer by AsyncByteConsumer.onByteReceived(),
which I guess should be good practice.



--
View this message in context: http://httpcomponents.10934.n7.nabble.com/Suggested-parameters-for-highly-parallel-async-client-tp18644p19271.html
Sent from the HttpClient-User mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Fri, 2013-02-08 at 18:26 -0500, Jean-Marc Spaggiari wrote:
> > From my personal experience a decent blocking HTTP client can be
> > expected to outperform a decent non-blocking HTTP client by 50 to 100%,
> > but such a massive difference does look very suspicious. My guess,
> > though, that the way pages are being processed can be a limiting factor
> > more more than the way they are being retrieved. How do you parse /
> > process the content of the pages? Is your processing code based on
> > standard java InputStream APIs?
> 
> Hi Oleg,
> 
> Is the something better to use than the standard InputStream API to
> process the respons?
> 

You see, it is not a matter of 'better' or 'worse'. InputStream API is
still being used by an overwhelming majority of parsers and content
processing libraries. The trouble is that InputStream is inherently
blocking. If you use an async HTTP client to retrieve content and some
library based on InputStream API to process it, you pretty much lose all
the advantages of asynchronous data transfer.

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
> From my personal experience a decent blocking HTTP client can be
> expected to outperform a decent non-blocking HTTP client by 50 to 100%,
> but such a massive difference does look very suspicious. My guess,
> though, that the way pages are being processed can be a limiting factor
> more more than the way they are being retrieved. How do you parse /
> process the content of the pages? Is your processing code based on
> standard java InputStream APIs?

Hi Oleg,

Is the something better to use than the standard InputStream API to
process the respons?

JM

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Suggested parameters for highly parallel async client

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Sat, 2013-01-05 at 14:38 -0800, vigna wrote:
> Is there any suggestion for parameters of the asynchronous client in case of
> several thousands parallel requests (e.g., for the IOReactor)? We are
> experimenting both with DefaulHttpClient and DefaultHttpAsyncClient, and
> with the same configuration (e.g., 4000 threads using DefaultHttpClient or
> 64 threads pushing 4000 async requests into a default
> DefaultHttpAsyncClient) we see completely different behaviours. The sync
> client fetches more than 10000 pages/s, the async client speed fetches
> 50 p/s. 
> 

>From my personal experience a decent blocking HTTP client can be
expected to outperform a decent non-blocking HTTP client by 50 to 100%,
but such a massive difference does look very suspicious. My guess,
though, that the way pages are being processed can be a limiting factor
more more than the way they are being retrieved. How do you parse /
process the content of the pages? Is your processing code based on
standard java InputStream APIs? 


> Should we increase the number of threads 

No, you should not. There is no point having more I/O threads than the
number of physical CPU cores.

> or the I/O interval of the
> IOReactor? 

No, you should not. This will have no impact on performance of what so
ever. By reducing the select interval one can get more granular socket
timeouts (which is cannot be less than 1 second with the default select
interval of 1 second).

Oleg



---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org