You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Ken Krugler <kk...@transpac.com> on 2010/01/28 00:31:54 UTC
Re: Best-Practices for Multithreaded use of HttpClient (with Cookies)?
You can create a local context and use that for all requests to the
same server. This then lets you re-use the same HttpClient, which is
how you want to handle this (versus creating new instances for each
domain).
For example, in Bixo's SimpleHttpFetcher there's this code:
getter = new HttpGet(new URI(url));
// Create a local instance of cookie store, and bind to
local context
// Without this we get killed w/lots of threads, due to
sync() on single cookie store.
HttpContext localContext = new BasicHttpContext();
CookieStore cookieStore = new BasicCookieStore();
localContext.setAttribute(ClientContext.COOKIE_STORE,
cookieStore);
response = _httpClient.execute(getter, localContext);
The call to execute the GET request uses the localContext, which is
what I think Jens want.
-- Ken
On Jan 27, 2010, at 3:22pm, Sam Crawford wrote:
> I could well be mistaken, but my experience suggests that with version
> 4.0 you need a new HttpClient each time you deal with a different set
> of cookies. Creating multiple HttpContexts used across a single
> DefaultHttpClient instance did not seem to be sufficient.
>
> That said, I only tried this briefly and didn't spend a huge amount of
> time investigating it. I keep meaning to do so and to submit a bug if
> I find a genuinely reproducible issue.
>
> Thanks,
>
> Sam
>
>
> 2010/1/27 Jens Mueller supidupi007@googlemail.com <supidupi007@googlemail.com
> >:
>> Hello HC Experts,
>>
>> I would be very greatful for an advice regarding my question. I
>> already
>> spend a lot of time searching the internet, but I am still have not
>> found an
>> example that answers my questions. There are lot of examples
>> available (also
>> for the multithreaded use-cases) but the only adress the use-case
>> making
>> one(!!) request. I am completely uncertain how to "best" make a
>> series of
>> requests (to the same webserver).
>>
>> I need to develop a simple Crawler that crawls some websites for
>> specific
>> information. The Basic idea is to download the single webpages of a
>> website
>> (for example www.a.com) sequentially but run several of these
>> "sequential"
>> downloaders in threads for different webpages (www.b.com and www.c.com
>> ) in
>> parallel.
>>
>> My current concept/implementation looks like this:
>>
>> 1. Instanciate a ThreadSafeClientConnManager (with a lot of default
>> parameters). This connection Manager will be used/shared by all
>> "DefaultHttpClient's"s
>> 2. For every Webpage (of a Website, with multiple webpages), I
>> Instanciate
>> for every(!!) webpage-request a new DefaultHttpClient and then call
>> the
>> "httpClient.execute(httpGet)" method with the instanciated
>> GetMethod(url).
>>
>> ==> I am more and more wondering if this is the correct usage of the
>> DefaultHttpClient and the .execute() Method. Am I doing something
>> wrong
>> here, to instanciate a new DefaultHttpClient for every request of a
>> wepage?
>> Or should I rather instanciate only one(!!) DefaultHttpClient and
>> then share
>> this for the sequential .execute() calls?
>>
>> To be honest, what I also have not really understood yet is the
>> Cookie
>> Management. Do I as the Programmer have to instanciate the
>> CookieStore
>> manually
>> 1. httpClient.setCookieStore(new BasicCookieStore());
>> and then after calling the .execute() method "get" the Cookie store
>> 2. savedcookies = httpClient.getCookieStore()
>> and then reinject this cookie store for the next call to the same
>> wepage (to
>> maintain state)?
>> 3. httpClient.setCookie(savedcookies)
>> Or is there some implicit magic that A) does create the cookie store
>> implicitly and B) somehow shares this CookieStore among the
>> HttpClients
>> and/or HttpGet's?
>>
>> Thank you very much!!
>> Jens
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
Re: Best-Practices for Multithreaded use of HttpClient (with
Cookies)?
Posted by Sam Crawford <sa...@gmail.com>.
Ah yes, that makes sense.
In my scenario I'm using HttpClient as the client-side of a reverse
proxy, and therefore can't use a single context per server (as we have
multiple users accessing backend servers simultaneously, so their
cookies get all mixed up).
Thanks,
Sam
2010/1/27 Ken Krugler <kk...@transpac.com>:
> You can create a local context and use that for all requests to the same
> server. This then lets you re-use the same HttpClient, which is how you want
> to handle this (versus creating new instances for each domain).
>
> For example, in Bixo's SimpleHttpFetcher there's this code:
>
> getter = new HttpGet(new URI(url));
>
> // Create a local instance of cookie store, and bind to local
> context
> // Without this we get killed w/lots of threads, due to sync() on
> single cookie store.
> HttpContext localContext = new BasicHttpContext();
> CookieStore cookieStore = new BasicCookieStore();
> localContext.setAttribute(ClientContext.COOKIE_STORE,
> cookieStore);
> response = _httpClient.execute(getter, localContext);
>
> The call to execute the GET request uses the localContext, which is what I
> think Jens want.
>
> -- Ken
>
>
> On Jan 27, 2010, at 3:22pm, Sam Crawford wrote:
>
>> I could well be mistaken, but my experience suggests that with version
>> 4.0 you need a new HttpClient each time you deal with a different set
>> of cookies. Creating multiple HttpContexts used across a single
>> DefaultHttpClient instance did not seem to be sufficient.
>>
>> That said, I only tried this briefly and didn't spend a huge amount of
>> time investigating it. I keep meaning to do so and to submit a bug if
>> I find a genuinely reproducible issue.
>>
>> Thanks,
>>
>> Sam
>>
>>
>> 2010/1/27 Jens Mueller supidupi007@googlemail.com
>> <su...@googlemail.com>:
>>>
>>> Hello HC Experts,
>>>
>>> I would be very greatful for an advice regarding my question. I already
>>> spend a lot of time searching the internet, but I am still have not found
>>> an
>>> example that answers my questions. There are lot of examples available
>>> (also
>>> for the multithreaded use-cases) but the only adress the use-case making
>>> one(!!) request. I am completely uncertain how to "best" make a series of
>>> requests (to the same webserver).
>>>
>>> I need to develop a simple Crawler that crawls some websites for specific
>>> information. The Basic idea is to download the single webpages of a
>>> website
>>> (for example www.a.com) sequentially but run several of these
>>> "sequential"
>>> downloaders in threads for different webpages (www.b.com and www.c.com)
>>> in
>>> parallel.
>>>
>>> My current concept/implementation looks like this:
>>>
>>> 1. Instanciate a ThreadSafeClientConnManager (with a lot of default
>>> parameters). This connection Manager will be used/shared by all
>>> "DefaultHttpClient's"s
>>> 2. For every Webpage (of a Website, with multiple webpages), I
>>> Instanciate
>>> for every(!!) webpage-request a new DefaultHttpClient and then call the
>>> "httpClient.execute(httpGet)" method with the instanciated
>>> GetMethod(url).
>>>
>>> ==> I am more and more wondering if this is the correct usage of the
>>> DefaultHttpClient and the .execute() Method. Am I doing something wrong
>>> here, to instanciate a new DefaultHttpClient for every request of a
>>> wepage?
>>> Or should I rather instanciate only one(!!) DefaultHttpClient and then
>>> share
>>> this for the sequential .execute() calls?
>>>
>>> To be honest, what I also have not really understood yet is the Cookie
>>> Management. Do I as the Programmer have to instanciate the CookieStore
>>> manually
>>> 1. httpClient.setCookieStore(new BasicCookieStore());
>>> and then after calling the .execute() method "get" the Cookie store
>>> 2. savedcookies = httpClient.getCookieStore()
>>> and then reinject this cookie store for the next call to the same wepage
>>> (to
>>> maintain state)?
>>> 3. httpClient.setCookie(savedcookies)
>>> Or is there some implicit magic that A) does create the cookie store
>>> implicitly and B) somehow shares this CookieStore among the HttpClients
>>> and/or HttpGet's?
>>>
>>> Thank you very much!!
>>> Jens
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>>
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c w e b m i n i n g
>
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org