You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jeremy Boynes <jb...@apache.org> on 2013/12/26 20:23:29 UTC

Re: 8-bit text in cookie values

On Dec 26, 2013, at 2:47 AM, Mark Thomas <ma...@apache.org> wrote:

Focusing on the 8-bit issue address by the patch, leaving the other RFC6265 thread for broader discussion ...

>> The change only allows these characters in values if version == 0
>> where Netscape’s rather than RFC2109’s syntax applies (per the
>> Servlet spec). The Netscape spec is vague in that it does not
>> define “OPAQUE_STRING" at all and defines “VALUE” as containing
>> equally undefined “characters” although historically[1] those have
>> been taken to be OCTETs as permitted by RFC2616’s “*TEXT” variant
>> of “field-content.” The change will continue to reject these
>> characters in names and in unquoted values when version != 0
>> (RFC2109’s “word" rule)
>> 
>> [1] based on comments by Fielding et al. on http-state and what
>> I’ve seen in the wild
> 
> Can you provide references for [1]?

This is the mail in the run up to RFC6265 that triggered the discussion:
http://www.ietf.org/mail-archive/web/http-state/current/msg01232.html

The relevant bit was:
> Changing the ABNF
> to include base64 does not do that -- it is just another
> fantasy production that differs from all prior specs of
> the cookie algorithm.  Changing it to
> 
>  cookie-value      = %x21-2B / %x2D-3A / %x3C-7E / %x80-FF
> 
> or just the minimum
> 
>  cookie-value      = %x21-2B / %x2D-3A / %x3C-7E
> 
> returns the definition to the original Netscape spec (at
> least in the first case), reflects how they are implemented
> on the Internet, and eliminates this artificial distinction
> between the server and user agent requirements.

with the observation that the rule including %x80-ff was the one matching the Netscape spec. The RFC6265 editor actually chose the latter production which led to the following exchange
http://www.ietf.org/mail-archive/web/http-state/current/msg01234.html
http://www.ietf.org/mail-archive/web/http-state/current/msg01236.html
asserting that the support for 8-bit characters implied by *TEXT was implicit in the original Netscape spec.

In this message:
http://www.ietf.org/mail-archive/web/http-state/current/msg01207.html
Roy asserts that the
  cookie-value      = %x21-2B / %x2D-3A / %x3C-7E / %x80-FF
production would be needed to support cookies currently in the wild, including the issue with the __utmz cookie that I’ve seen.

Further discussion resulted in the final production:
 cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       ; US-ASCII characters excluding CTLs,
                       ; whitespace DQUOTE, comma, semicolon,
                       ; and backslash

on the basis that setting headers with the top bit set was deemed a bad idea by httpbis (I don’t have a reference for that). It was noted though that conformance to this was qualified by "Servers SHOULD NOT send Set-Cookie headers that fail to conform to the following grammar” which discourages 8-bit values but still allows them to be sent and means that parsers receiving a cookie value need to be prepared to handle them.

Given cookies with these values may be set by other servers in the domain and are sent by user agents, failing hard as we do now prevents the application handling the request at all. The patch tolerates those characters and lets them through to the application. I don’t know of any security issue there given they are being decoded as ISO-8859-1 rather than UTF-8. I believe it’s backwards compatible in that the consequence to the application is that it will now see the request with a cookie that it either expects or that it would be ignoring anyway (on the basis that the cookie would be present if it didn’t have an 8-bit character).

The patch does not change the generation behaviour so any attempt to set a V0 cookie value containing one of these characters will still cause an IAE from HttpServletResponse#addCookie().

Cheers
Jeremy


Re: 8-bit text in cookie values

Posted by Jeremy Boynes <jb...@apache.org>.
Adding more confusion to the pile, HTML5[1] now specifies that JavaScript can set Unicode characters through document.cookie and that they must be encoded as UTF-8 in the header. Quick testing with Chrome shows it does just that (i.e. U+00E1 is sent as 0xC3 0xA1). If client and server-side application code is going to interoperate then we would need to accept them in a Cookie header and allow them to be sent in a Set-Cookie header. However, this is ambiguous when compared to Netscape and its implicit assumption of ISO-8859-1.

[1] http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie

On Jan 1, 2014, at 10:18 AM, Jeremy Boynes <jb...@apache.org> wrote:

> On Jan 1, 2014, at 8:59 AM, Mark Thomas <ma...@apache.org> wrote:
> 
>> Signed PGP part
>> On 26/12/2013 19:23, Jeremy Boynes wrote:
>>> On Dec 26, 2013, at 2:47 AM, Mark Thomas <ma...@apache.org> wrote:
>>> 
>>> Focusing on the 8-bit issue address by the patch, leaving the other
>>> RFC6265 thread for broader discussion ...
>>> 
>>>>> The change only allows these characters in values if version ==
>>>>> 0 where Netscape’s rather than RFC2109’s syntax applies (per
>>>>> the Servlet spec). The Netscape spec is vague in that it does
>>>>> not define “OPAQUE_STRING" at all and defines “VALUE” as
>>>>> containing equally undefined “characters” although
>>>>> historically[1] those have been taken to be OCTETs as permitted
>>>>> by RFC2616’s “*TEXT” variant of “field-content.” The change
>>>>> will continue to reject these characters in names and in
>>>>> unquoted values when version != 0 (RFC2109’s “word" rule)
>>>>> 
>>>>> [1] based on comments by Fielding et al. on http-state and
>>>>> what I’ve seen in the wild
>>>> 
>>>> Can you provide references for [1]?
>>> 
>>> This is the mail in the run up to RFC6265 that triggered the
>>> discussion:
>>> http://www.ietf.org/mail-archive/web/http-state/current/msg01232.html
>> 
>> Thanks
>>> 
>> for that reference. What a complete mess. RFC6265 really
>> dropped the ball on this. The grammar for cookie-value is a disaster.
>> So far the issues include:
>> - no support for 0x80 to 0xFF
>> - no support for \" sequences
>> - no support for using whitespace, comma, semi-colon, backslash
>> 
>> I was beginning to think that factoring out the cookie generation /
>> parsing and then providing different implementations (one for Netscape
>> + RFC2109 - roughly what we have now with a few fixes, one for RFC6265
>> and maybe one very relaxed) would be the way to go. Having looked at
>> the first issue that plan already looks like it needs a re-think.
>> 
>> I'm still hoping that by documenting all the various issues in one
>> place we will be able to come up with a solution that both addresses
>> all the issues you have raised and is better than the handful of
>> system properties we have currently.
> 
> I think they did a reasonable job given the mess cookies are in the wild today. They summarize this in the preamble:
>> The recommendations for cookie generation provided in Section 4 represent a preferred subset of current server behavior, and even the more liberal cookie processing algorithm provided in Section 5 does not recommend all of the syntactic and semantic variations in use today.
> 
> Section 4 recommends guidelines for servers generating cookies. I interpret that as being “if you follow these guidelines, you have a good chance of actually getting back the value you tried to set.” The rules above (no 8-bit, no escaping, no Netscape delimiters) reflect that principle. A server application can step outside those guidelines but "thar ther be dragons."
> 
> —
> Jeremy


Re: 8-bit text in cookie values

Posted by Jeremy Boynes <jb...@apache.org>.
On Jan 1, 2014, at 8:59 AM, Mark Thomas <ma...@apache.org> wrote:

> Signed PGP part
> On 26/12/2013 19:23, Jeremy Boynes wrote:
> > On Dec 26, 2013, at 2:47 AM, Mark Thomas <ma...@apache.org> wrote:
> >
> > Focusing on the 8-bit issue address by the patch, leaving the other
> > RFC6265 thread for broader discussion ...
> >
> >>> The change only allows these characters in values if version ==
> >>> 0 where Netscape’s rather than RFC2109’s syntax applies (per
> >>> the Servlet spec). The Netscape spec is vague in that it does
> >>> not define “OPAQUE_STRING" at all and defines “VALUE” as
> >>> containing equally undefined “characters” although
> >>> historically[1] those have been taken to be OCTETs as permitted
> >>> by RFC2616’s “*TEXT” variant of “field-content.” The change
> >>> will continue to reject these characters in names and in
> >>> unquoted values when version != 0 (RFC2109’s “word" rule)
> >>>
> >>> [1] based on comments by Fielding et al. on http-state and
> >>> what I’ve seen in the wild
> >>
> >> Can you provide references for [1]?
> >
> > This is the mail in the run up to RFC6265 that triggered the
> > discussion:
> > http://www.ietf.org/mail-archive/web/http-state/current/msg01232.html
> 
> Thanks
> >
> for that reference. What a complete mess. RFC6265 really
> dropped the ball on this. The grammar for cookie-value is a disaster.
> So far the issues include:
> - no support for 0x80 to 0xFF
> - no support for \" sequences
> - no support for using whitespace, comma, semi-colon, backslash
> 
> I was beginning to think that factoring out the cookie generation /
> parsing and then providing different implementations (one for Netscape
> + RFC2109 - roughly what we have now with a few fixes, one for RFC6265
> and maybe one very relaxed) would be the way to go. Having looked at
> the first issue that plan already looks like it needs a re-think.
> 
> I'm still hoping that by documenting all the various issues in one
> place we will be able to come up with a solution that both addresses
> all the issues you have raised and is better than the handful of
> system properties we have currently.

I think they did a reasonable job given the mess cookies are in the wild today. They summarize this in the preamble:
> The recommendations for cookie generation provided in Section 4 represent a preferred subset of current server behavior, and even the more liberal cookie processing algorithm provided in Section 5 does not recommend all of the syntactic and semantic variations in use today.

Section 4 recommends guidelines for servers generating cookies. I interpret that as being “if you follow these guidelines, you have a good chance of actually getting back the value you tried to set.” The rules above (no 8-bit, no escaping, no Netscape delimiters) reflect that principle. A server application can step outside those guidelines but "thar ther be dragons."

—
Jeremy


Re: 8-bit text in cookie values

Posted by Mark Thomas <ma...@apache.org>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 26/12/2013 19:23, Jeremy Boynes wrote:
> On Dec 26, 2013, at 2:47 AM, Mark Thomas <ma...@apache.org> wrote:
> 
> Focusing on the 8-bit issue address by the patch, leaving the other
> RFC6265 thread for broader discussion ...
> 
>>> The change only allows these characters in values if version ==
>>> 0 where Netscape’s rather than RFC2109’s syntax applies (per
>>> the Servlet spec). The Netscape spec is vague in that it does
>>> not define “OPAQUE_STRING" at all and defines “VALUE” as
>>> containing equally undefined “characters” although
>>> historically[1] those have been taken to be OCTETs as permitted
>>> by RFC2616’s “*TEXT” variant of “field-content.” The change
>>> will continue to reject these characters in names and in
>>> unquoted values when version != 0 (RFC2109’s “word" rule)
>>> 
>>> [1] based on comments by Fielding et al. on http-state and
>>> what I’ve seen in the wild
>> 
>> Can you provide references for [1]?
> 
> This is the mail in the run up to RFC6265 that triggered the
> discussion: 
> http://www.ietf.org/mail-archive/web/http-state/current/msg01232.html

Thanks
> 
for that reference. What a complete mess. RFC6265 really
dropped the ball on this. The grammar for cookie-value is a disaster.
So far the issues include:
- - no support for 0x80 to 0xFF
- - no support for \" sequences
- - no support for using whitespace, comma, semi-colon, backslash

I was beginning to think that factoring out the cookie generation /
parsing and then providing different implementations (one for Netscape
+ RFC2109 - roughly what we have now with a few fixes, one for RFC6265
and maybe one very relaxed) would be the way to go. Having looked at
the first issue that plan already looks like it needs a re-think.

I'm still hoping that by documenting all the various issues in one
place we will be able to come up with a solution that both addresses
all the issues you have raised and is better than the handful of
system properties we have currently.

Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSxEmLAAoJEBDAHFovYFnnyVcP+wfe+dxLyTEG856JW2NcyrBY
j3iszFdsriJHqGnFOI3YWzKflF5h72oZjBL5cKQ5MozlF2Ycx+UHsPu2p6f1wpy8
d2T2frCwaXIULpqMdsMVMIEMZbVjwWdB9zYKKZAxZm1uhHUhqNyzsIG3rs/dTJrP
Ytt9/hJCKEYEgFCNFCmDoCj4tWCkIFz/bdYb3D7kLe2AP/SF7rUrgkJgW9bF3/y+
BMZYUXIgBj1NZ0Ts9C7K/k8ngiWgpsCXiJos2b0lMU1ga9agadTTJU+2EJgrd9m9
NjVXlBMIraEbPp+Gj2WHPBuVMRhDKwTvyg7AnR0B1toEkqEK986YJU5wzOUHp/em
KW8M81oCY6t+JdvVZ48rAjuFBsj8DQVCyjIOBUNYZ1e/oS68Wjt84c2/NZfPUtVr
iCEWEgeUpb7fTwCQezn6+FdNu1urnuouaw/4szkRPruQKCBbh/ngLZ3PChuttozR
QpePdcXIyG0XRSIB682UGyuZoUWFQQ3Ug67sC6rb9yKu3oOlaMg6Ii32UulGUczA
SfoNIeQj2uz9pfqA79PqDY9Qkg7GcqvDQl7WKDb8tJ4Of+NAvh7affcm0Nvf+ldt
0hezWjhlhnSA9dowycSe7Z20OM+dWFXCwl3czMH0Ick4JX+QeqT8z9TDYKtDMYpq
EXHhPslORjxfHCf4zNQ0
=gHjq
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org