You are viewing a plain text version of this content. The canonical link for it is here.
Posted to httpclient-users@hc.apache.org by Ken Krugler <kk...@transpac.com> on 2009/12/03 04:15:59 UTC

Setting cookie policy with HttpClient 4.0

Below is an email from August 7th, which I'm reviving due to this  
becoming a bigger issue over in Bixo-land.

I've continued to run into this issue with my crawls, but so far I'm  
not doing anything with cookies, so it hasn't been a priority to track  
down.

However another Bixo user also runs into it, and he noticed that by  
switching back to HttpClient 4.0-beta3, the warnings went away.

I believe he just opened HTTPCLIENT-896 as a clone of HTTPCLIENT-773,  
which seemed to be this exact same bug (fixed by Oleg around 17/May/08).

I'm wondering if the bug crept back into the code sometime between  
then and the final release.

Thanks,

-- Ken

= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
========================================================================

Hi all,

I'm seeing errors in my logs relating to parsing the expires date  
value in a cookie:

09/08/07 10:26:46 WARN protocol.ResponseProcessCookies:137 - Invalid  
cookie header: "Set-Cookie: IU=deleted; expires=Thu, 07 Aug 2008  
17:26:45 GMT; path=/; domain=.yahoo.com". Unable to parse expires  
attribute: Thu, 07 Aug 2008 17:26:45 GMT

I looked through the HtttpClient source, and it seems like if I've set  
up my cookie policy properly, the above date would be parsed properly.

Here's now I'm setting up the cookie policy:

=================================================================
HttpParams params = new BasicHttpParams();
...
CookieSpecParamBean cookieParams = new CookieSpecParamBean(params);
cookieParams.setSingleHeader(true);
...
ClientConnectionManager cm = new ThreadSafeClientConnManager(params,  
schemeRegistry);
DefaultHttpClient httpClient = new DefaultHttpClient(cm, params);
...
params = httpClient.getParams();
HttpClientParams.setCookiePolicy(params, CookiePolicy.BEST_MATCH);
=================================================================

But the above code was assembled from a few different snippets online,  
IIRC. So maybe this isn't correct.

For example, in the "Choosing Cookie Policy" section of the tutorial  
docs, it uses the setParameter() API to set the policy:

httpclient.getParams().setParameter(ClientPNames.COOKIE_POLICY,  
CookiePolicy.RFC_2965);

I assume the two are equivalent, but any input would be appreciated.

Thanks,

-- Ken

--------------------------------------------
<http://ken-blog.krugler.org>
+1 530-265-2225




--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Setting cookie policy with HttpClient 4.0

Posted by Oleg Kalnichevski <ol...@apache.org>.
Ken Krugler wrote:
> Hi Oleg,
> 
> On Dec 3, 2009, at 2:40am, Oleg Kalnichevski wrote:
> 
>> On Wed, 2009-12-02 at 19:15 -0800, Ken Krugler wrote:
>>> Below is an email from August 7th, which I'm reviving due to this
>>> becoming a bigger issue over in Bixo-land.
>>>
>>> I've continued to run into this issue with my crawls, but so far I'm
>>> not doing anything with cookies, so it hasn't been a priority to track
>>> down.
>>>
>>> However another Bixo user also runs into it, and he noticed that by
>>> switching back to HttpClient 4.0-beta3, the warnings went away.
>>>
>>> I believe he just opened HTTPCLIENT-896 as a clone of HTTPCLIENT-773,
>>> which seemed to be this exact same bug (fixed by Oleg around 17/May/08).
>>>
>>> I'm wondering if the bug crept back into the code sometime between
>>> then and the final release.
>>>
>>> Thanks,
>>>
>>> -- Ken
>>>
>>
>> Hi Ken
>>
>> The cookie in question violates the format of 'expires' attribute
>> expected by the Netscape policy. One can configure the policy to be more
>> lenient about the format of 'expires' attribute by using a special HTTP
>> parameter. For details see HTTPCLIENT-896.
>>
>> It is not really a regression. I think the Netscape cookie policy was
>> made stricter at some point of time post 4.0-beta1
>>
>> Hope this clarifies the situation.
> 
> Thanks for the clarification, and the example code you added in a 
> comment to HTTPCLIENT-896.
> 
> Given the number of invalid cookies w/this issue that I see during a 
> crawl, would it make sense for the "best match" policy to select a more 
> lenient Netscape format?
> 
> Or maybe add a "best match-lenient" policy that does this?
> 
> I haven't had to do much in the way of cookie processing in the past, so 
> I'll confess up front that I'm ignorant about the potential issues that 
> could arise from using a more lenient policy.
> 
> Thanks again,
> 
> -- Ken
> 


Ken

I am somewhat reluctant to optimize HttpClient for just one particular 
use case, such as web crawling. Not only does the cookie in question 
violate the HTTP state management standards, it also violates the 
Netscape Draft spec. I do not think HttpClient should accept such 
cookies as valid per default. At the same time it is really easy to 
override the default behavior with just one parameter.

Cheers

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Setting cookie policy with HttpClient 4.0

Posted by Ken Krugler <kk...@transpac.com>.
Hi Oleg,

On Dec 3, 2009, at 2:40am, Oleg Kalnichevski wrote:

> On Wed, 2009-12-02 at 19:15 -0800, Ken Krugler wrote:
>> Below is an email from August 7th, which I'm reviving due to this
>> becoming a bigger issue over in Bixo-land.
>>
>> I've continued to run into this issue with my crawls, but so far I'm
>> not doing anything with cookies, so it hasn't been a priority to  
>> track
>> down.
>>
>> However another Bixo user also runs into it, and he noticed that by
>> switching back to HttpClient 4.0-beta3, the warnings went away.
>>
>> I believe he just opened HTTPCLIENT-896 as a clone of HTTPCLIENT-773,
>> which seemed to be this exact same bug (fixed by Oleg around 17/May/ 
>> 08).
>>
>> I'm wondering if the bug crept back into the code sometime between
>> then and the final release.
>>
>> Thanks,
>>
>> -- Ken
>>
>
> Hi Ken
>
> The cookie in question violates the format of 'expires' attribute
> expected by the Netscape policy. One can configure the policy to be  
> more
> lenient about the format of 'expires' attribute by using a special  
> HTTP
> parameter. For details see HTTPCLIENT-896.
>
> It is not really a regression. I think the Netscape cookie policy was
> made stricter at some point of time post 4.0-beta1
>
> Hope this clarifies the situation.

Thanks for the clarification, and the example code you added in a  
comment to HTTPCLIENT-896.

Given the number of invalid cookies w/this issue that I see during a  
crawl, would it make sense for the "best match" policy to select a  
more lenient Netscape format?

Or maybe add a "best match-lenient" policy that does this?

I haven't had to do much in the way of cookie processing in the past,  
so I'll confess up front that I'm ignorant about the potential issues  
that could arise from using a more lenient policy.

Thanks again,

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Re: Setting cookie policy with HttpClient 4.0

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2009-12-02 at 19:15 -0800, Ken Krugler wrote:
> Below is an email from August 7th, which I'm reviving due to this  
> becoming a bigger issue over in Bixo-land.
> 
> I've continued to run into this issue with my crawls, but so far I'm  
> not doing anything with cookies, so it hasn't been a priority to track  
> down.
> 
> However another Bixo user also runs into it, and he noticed that by  
> switching back to HttpClient 4.0-beta3, the warnings went away.
> 
> I believe he just opened HTTPCLIENT-896 as a clone of HTTPCLIENT-773,  
> which seemed to be this exact same bug (fixed by Oleg around 17/May/08).
> 
> I'm wondering if the bug crept back into the code sometime between  
> then and the final release.
> 
> Thanks,
> 
> -- Ken
> 

Hi Ken

The cookie in question violates the format of 'expires' attribute
expected by the Netscape policy. One can configure the policy to be more
lenient about the format of 'expires' attribute by using a special HTTP
parameter. For details see HTTPCLIENT-896.

It is not really a regression. I think the Netscape cookie policy was
made stricter at some point of time post 4.0-beta1   

Hope this clarifies the situation.

Cheers,

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org