You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Ken Krugler <kk...@transpac.com> on 2010/01/28 00:31:54 UTC

Re: Best-Practices for Multithreaded use of HttpClient (with Cookies)?

You can create a local context and use that for all requests to the  
same server. This then lets you re-use the same HttpClient, which is  
how you want to handle this (versus creating new instances for each  
domain).

For example, in Bixo's SimpleHttpFetcher there's this code:

             getter = new HttpGet(new URI(url));

             // Create a local instance of cookie store, and bind to  
local context
             // Without this we get killed w/lots of threads, due to  
sync() on single cookie store.
             HttpContext localContext = new BasicHttpContext();
             CookieStore cookieStore = new BasicCookieStore();
             localContext.setAttribute(ClientContext.COOKIE_STORE,  
cookieStore);
             response = _httpClient.execute(getter, localContext);

The call to execute the GET request uses the localContext, which is  
what I think Jens want.

-- Ken


On Jan 27, 2010, at 3:22pm, Sam Crawford wrote:

> I could well be mistaken, but my experience suggests that with version
> 4.0 you need a new HttpClient each time you deal with a different set
> of cookies. Creating multiple HttpContexts used across a single
> DefaultHttpClient instance did not seem to be sufficient.
>
> That said, I only tried this briefly and didn't spend a huge amount of
> time investigating it. I keep meaning to do so and to submit a bug if
> I find a genuinely reproducible issue.
>
> Thanks,
>
> Sam
>
>
> 2010/1/27 Jens Mueller supidupi007@googlemail.com <supidupi007@googlemail.com 
> >:
>> Hello HC Experts,
>>
>> I would be very greatful for an advice regarding my question. I  
>> already
>> spend a lot of time searching the internet, but I am still have not  
>> found an
>> example that answers my questions. There are lot of examples  
>> available (also
>> for the multithreaded use-cases) but the only adress the use-case  
>> making
>> one(!!) request. I am completely uncertain how to "best" make a  
>> series of
>> requests (to the same webserver).
>>
>> I need to develop a simple Crawler that crawls some websites for  
>> specific
>> information. The Basic idea is to download the single webpages of a  
>> website
>> (for example www.a.com) sequentially but run several of these  
>> "sequential"
>> downloaders in threads for different webpages (www.b.com and www.c.com 
>> ) in
>> parallel.
>>
>> My current concept/implementation looks like this:
>>
>> 1.  Instanciate a ThreadSafeClientConnManager (with a lot of default
>> parameters). This connection Manager will be used/shared by all
>> "DefaultHttpClient's"s
>> 2.  For every Webpage (of a Website, with multiple webpages), I  
>> Instanciate
>> for every(!!) webpage-request a new DefaultHttpClient and then call  
>> the
>> "httpClient.execute(httpGet)" method with the instanciated  
>> GetMethod(url).
>>
>> ==> I am more and more wondering if this is the correct usage of the
>> DefaultHttpClient and the .execute() Method. Am I doing something  
>> wrong
>> here, to instanciate a new DefaultHttpClient for every request of a  
>> wepage?
>> Or should I rather instanciate only one(!!) DefaultHttpClient and  
>> then share
>> this for the sequential .execute() calls?
>>
>> To be honest, what I also have not really understood yet is the  
>> Cookie
>> Management. Do I as the Programmer have to instanciate the  
>> CookieStore
>> manually
>> 1. httpClient.setCookieStore(new BasicCookieStore());
>> and then after calling the .execute() method "get" the Cookie store
>> 2. savedcookies = httpClient.getCookieStore()
>> and then reinject this cookie store for the next call to the same  
>> wepage (to
>> maintain state)?
>> 3. httpClient.setCookie(savedcookies)
>> Or is there some implicit magic that A) does create the cookie store
>> implicitly and B) somehow shares this CookieStore among the  
>> HttpClients
>> and/or HttpGet's?
>>
>> Thank you very much!!
>> Jens
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Re: Best-Practices for Multithreaded use of HttpClient (with Cookies)?

Posted by Sam Crawford <sa...@gmail.com>.
Ah yes, that makes sense.

In my scenario I'm using HttpClient as the client-side of a reverse
proxy, and therefore can't use a single context per server (as we have
multiple users accessing backend servers simultaneously, so their
cookies get all mixed up).

Thanks,

Sam


2010/1/27 Ken Krugler <kk...@transpac.com>:
> You can create a local context and use that for all requests to the same
> server. This then lets you re-use the same HttpClient, which is how you want
> to handle this (versus creating new instances for each domain).
>
> For example, in Bixo's SimpleHttpFetcher there's this code:
>
>            getter = new HttpGet(new URI(url));
>
>            // Create a local instance of cookie store, and bind to local
> context
>            // Without this we get killed w/lots of threads, due to sync() on
> single cookie store.
>            HttpContext localContext = new BasicHttpContext();
>            CookieStore cookieStore = new BasicCookieStore();
>            localContext.setAttribute(ClientContext.COOKIE_STORE,
> cookieStore);
>            response = _httpClient.execute(getter, localContext);
>
> The call to execute the GET request uses the localContext, which is what I
> think Jens want.
>
> -- Ken
>
>
> On Jan 27, 2010, at 3:22pm, Sam Crawford wrote:
>
>> I could well be mistaken, but my experience suggests that with version
>> 4.0 you need a new HttpClient each time you deal with a different set
>> of cookies. Creating multiple HttpContexts used across a single
>> DefaultHttpClient instance did not seem to be sufficient.
>>
>> That said, I only tried this briefly and didn't spend a huge amount of
>> time investigating it. I keep meaning to do so and to submit a bug if
>> I find a genuinely reproducible issue.
>>
>> Thanks,
>>
>> Sam
>>
>>
>> 2010/1/27 Jens Mueller supidupi007@googlemail.com
>> <su...@googlemail.com>:
>>>
>>> Hello HC Experts,
>>>
>>> I would be very greatful for an advice regarding my question. I already
>>> spend a lot of time searching the internet, but I am still have not found
>>> an
>>> example that answers my questions. There are lot of examples available
>>> (also
>>> for the multithreaded use-cases) but the only adress the use-case making
>>> one(!!) request. I am completely uncertain how to "best" make a series of
>>> requests (to the same webserver).
>>>
>>> I need to develop a simple Crawler that crawls some websites for specific
>>> information. The Basic idea is to download the single webpages of a
>>> website
>>> (for example www.a.com) sequentially but run several of these
>>> "sequential"
>>> downloaders in threads for different webpages (www.b.com and www.c.com)
>>> in
>>> parallel.
>>>
>>> My current concept/implementation looks like this:
>>>
>>> 1.  Instanciate a ThreadSafeClientConnManager (with a lot of default
>>> parameters). This connection Manager will be used/shared by all
>>> "DefaultHttpClient's"s
>>> 2.  For every Webpage (of a Website, with multiple webpages), I
>>> Instanciate
>>> for every(!!) webpage-request a new DefaultHttpClient and then call the
>>> "httpClient.execute(httpGet)" method with the instanciated
>>> GetMethod(url).
>>>
>>> ==> I am more and more wondering if this is the correct usage of the
>>> DefaultHttpClient and the .execute() Method. Am I doing something wrong
>>> here, to instanciate a new DefaultHttpClient for every request of a
>>> wepage?
>>> Or should I rather instanciate only one(!!) DefaultHttpClient and then
>>> share
>>> this for the sequential .execute() calls?
>>>
>>> To be honest, what I also have not really understood yet is the Cookie
>>> Management. Do I as the Programmer have to instanciate the CookieStore
>>> manually
>>> 1. httpClient.setCookieStore(new BasicCookieStore());
>>> and then after calling the .execute() method "get" the Cookie store
>>> 2. savedcookies = httpClient.getCookieStore()
>>> and then reinject this cookie store for the next call to the same wepage
>>> (to
>>> maintain state)?
>>> 3. httpClient.setCookie(savedcookies)
>>> Or is there some implicit magic that A) does create the cookie store
>>> implicitly and B) somehow shares this CookieStore among the HttpClients
>>> and/or HttpGet's?
>>>
>>> Thank you very much!!
>>> Jens
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
>> For additional commands, e-mail: httpclient-users-help@hc.apache.org
>>
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org