You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jeremy Boynes <jb...@apache.org> on 2014/01/01 18:46:52 UTC
Re: Support RFC6265 cookie processing

I have experimented with how different desktop browsers handle cookie values, specifically with:
* Aurora-28.0a2
* Chrome-31
* Firefox-26
* Internet Explorer-11
* Safari-7.0.1
on OS X 10.9 except for IE which was on Windows 7. This mail is a dump of things I found out and I will summarize conclusions on the wiki.

I first tried setting cookie values Netscape/RFC6265-style without any version attribute:
< Set-Cookie: cookieOctet=X!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X
< Set-Cookie: quoted="!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~"
< Set-Cookie: mismatchedQuote="abc
< Set-Cookie: semi=a;c
< Set-Cookie: space= a b c
< Set-Cookie: comma=a,c
< Set-Cookie: backslash=a\c
< Set-Cookie: dquote=a"c

The cookieOctet value matches the production from RFC6265 and contains all CHARs except DQUOTE, common, semicolon and backslash. The quoted cookie is the same but wrapped in DQUOTEs. All browsers accepted and returned those values as is. In particular, they did not remove the DQUOTEs from quoted, showing them stored client-side as part of the value.

The other values contain questionable values. These were all stored as-is with a couple of exceptions:
* leading and trailing whitespace was removed
* semi was truncated to “a” as would be expected if the “;” was being treated as a delimiter
* mismatched quote was stored by Safari as "abc, semi=a;c, space= a b c, comma=a,c, backslash=a\c, dquote=a"c

The Cookie request header generated by Chrome after the response above is:
> Cookie:cookieOctet=X!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; quoted="!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~"; mismatchedQuote="abc; semi=a; space=a b c; comma=a,c; backslash=a\c; dquote=a”c
which is the same value as returned from document.cookie in JavaScript:
> cookieOctet=X!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; quoted="!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~"; mismatchedQuote="abc; semi=a; space=a b c; comma=a,c; backslash=a\c; dquote=a"c


When I attempt to set RFC2109 V1 cookies, Chrome’s behaviour is unchanged:
< Set-Cookie: cookieOctet=X!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X ; Version=1
< Set-Cookie: quoted="!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~" ; Version=1
< Set-Cookie: mismatchedQuote="abc ; Version=1
< Set-Cookie: semi=a;c ; Version=1
< Set-Cookie: space= a b c ; Version=1
< Set-Cookie: comma=a,c ; Version=1
< Set-Cookie: backslash=a\c ; Version=1
< Set-Cookie: dquote=a"c ; Version=1
< Set-Cookie: escaped="a\"c" ; Version=1
results in 
> Cookie:cookieOctet=X!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; quoted="!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~"; mismatchedQuote="abc; semi=a; space=a b c; comma=a,c; backslash=a\c; dquote=a"c; escaped="a\”c”

The other browsers show the same with the exception of Safari’s handling of mismatchedQuote:
> Cookie: cookieOctet=X!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; escaped="a\"c"; mismatchedQuote="abc ; Version=1, semi=a;c ; Version=1, space= a b c ; Version=1, comma=a,c ; Version=1, backslash=a\c ; Version=1, dquote=a"c; quoted="!#$%&'()*+-./0123456789:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~”
The best I’ve been able to come up with for Safari is that if the value starts with a DQUOTE then it parses until it finds a matching DQUOTE, folding headers as it goes, including the commas from header folding, and stopping only when it hits a semi outside the quotes. The mismatched quotes in “mismatchQuote” and “dquote” end up getting paired and the resulting "abc ; Version=1, semi=a;c ; Version=1, space= a b c ; Version=1, comma=a,c ; Version=1, backslash=a\c ; Version=1, dquote=a”c value (note the trailing “c”) gets assigned to the “mismatchedQuote” cookie. None of the browsers treat the values as being invalid and ignore the cookie. 

A response containing
< Set-Cookie: foo="a;b" ; Version=1
which should contain the cookie value “a;b” actually results in:
> Cookie: foo=“a

except on Safari:
> Cookie: foo="a;b”
due to its quote handling above.

I have not been able to induce any of the browsers to generate a $Version attribute to indicate the Cookie header is a RFC2109 V1 header. They are also all happy to store a cookie with the name “$Version” which allows such a header to be faked:
< Set-Cookie: $Version=1
< Set-Cookie: foo=bar
results in:
> Cookie: $Version=1; foo=bar

All browsers are happy to create cookies with bad names. Given the response:
< Set-Cookie: X!#$%&'()*+-./0123456789:<>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X=test
< Set-Cookie: =X
< Set-Cookie: Y===test
< Set-Cookie: Z
Chrome, Aurora and IE generate:
> Cookie: X!#$%&'()*+-./0123456789:<>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X=test; Y===test; Z

and Safari:
> Cookie: X!#$%&'()*+-./0123456789:<>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X=test; Y===test

The long name contains every cookie-octet value except equals; note these are valid per Netscape. 

Safari is missing the cookie from the “=X” and “Z” lines, the others treat it as a cookie with an empty name. IOW, if just “=X” is set, then the Cookie header would contain “X” whereas if just “Z” is set then the header would contain “Z” but because both are set the “Z” overwrites “=X” as the value of the cookie-with-no-name. However, this is not quite the correct model as there is no ‘=‘ character added to the Cookie header for this cookie (i.e. “=X” is returned as “X” not “=X”).

On Dec 27, 2013, at 10:47 AM, Jeremy Boynes <jb...@apache.org> wrote:

> I think this is actually the principle that is the crux of the matter:
>> With respect to the
>> requirements of section 4.2, the Tomcat developers have always taken
>> the view that cookie headers should be treated as "combinations of
>> token, separators, and quoted-string" rather than "*TEXT" for the
>> purposes of compliance with RFC2616. This view is rather fundamental
>> to how cookies have been handled to date. 
> 
> 
> Whilst I agree with the principle I think there are sufficient edge-cases in the ways user-agents and other servers handle cookies that trying to handle values by using RFC2109’s rules to avoid the need for handling “*TEXT” have resulted in some of the issues seen in the past. Simply put, a Netscape format Cookie header is not a combination of "token, separators, and quoted-string."
> 
> Part of this is that I have not seen widespread adoption of RFC2109 by user-agents. Instead, in the way they handle values they appear to accept RFC2109 format Set-Cookie headers from servers, process them in a Netscape-format manner, and then generate a Cookie header conforming to Netscape’s rules. Their RFC2109 support is primarily limited to attribute handing (e.g. allowing Max-Age or DQUOTE characters in Path (except IE)). They specifically don’t handle “Version” and don’t include the “$Version” attribute required by RFC2109 in the Cookie header.
> 
> Reviewing the “fun,” much of it can be attributed to presuming RFC2109 can be applied. For example, the fun around “=“ or “/“ characters, or the use of separator characters in unquoted values caused by trying to apply “token” rules to free text. Upgrading to RFC2109 to use quoted values is also an issue as user-agents are not parsing them as such; for example, Chrome-31 does not handle a quoted-string value per RFC2109:
>  Set-Cookie:Customer="WILE_;_COYOTE"; Version="1"; Path="/"
> instead treating it, loosely, as a Netscape-style value:
>  Cookie:Customer="WILE_
> Note, that value is not valid per RFC6265’s set-cookie-header rule but is per Netscape’s.
> 
> I think we can avoid this “fun” if we only apply RFC2109 rules to cookies that explicitly use version 1 (i.e. if setVersion(1) is called when setting a cookie or if $Version=“1” is present when parsing). For all others we would use RFC6265’s rules treating them as a less ambiguous version of Netscape’s syntax.
> 
> Whether that is actually true or not will depend on the behaviour we actually see from user-agents. With that in mind, I plan to start extending the tests to capture that behaviour so we have a clearer model of what is actually happening. That would then define what we need to do to ensure symmetry for the application developer and avoid the type of problem I mentioned above with Chrome. If that can be an incremental change then that would be ideal but that may not be possible if the change needed is to handle V0 and V1 cookies separately.
> 
> Cheers
> Jeremy
> 
> On Dec 24, 2013, at 2:24 AM, Mark Thomas <ma...@apache.org> wrote:
> 
>> Signed PGP part
>> On 24/12/2013 01:21, Jeremy Boynes wrote:
>>> In comments on issue #55917, there was suggestion for refactoring
>>> cookie support along the lines described in RFC6265.
>> 
>> No, that isn't what I said. What I said was that now might be a good
>> time to refactor the cookie parsing to use the HttpParser and that if
>> we did that, that we should keep RFC6265 in mind. My intention was to
>> suggest that if there were places where RFC6265 suggested relaxing the
>> parsing rules and those rules could be relaxed without creating
>> ambiguities or security concerns then we should consider such a change.
>> 
>> I also hinted at the 'fun' we have had with cookies in the past. I
>> strongly recommend that you go and read all the various discussions
>> and bug reports relating to cookie parsing.
>> 
>> In particular, the requirements of RFC2616 for HTTP headers were often
>> overlooked.
>> 
>>> Reading this RFC, it appears to be more of an effort to standardize
>>> the actual behaviour seen on the Internet for different browser and
>>> server implementations. The observation is the RFC2109 has received
>>> limited adoption and RFC2965 virtually none at all, with most
>>> implementations falling back to the original specification released
>>> by Netscape that contains certain ambiguities.
>> 
>> While I agree that RFC2965 has received little adoption, I disagree
>> with the remainder. With the exception of internet explorer (and this
>> might have changed in the few years since I last looked) the browsers
>> all had pretty good support for RFC2109.
>> 
>>> The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour
>>> with caveats around interoperability. It defines version 0 as
>>> complying with Netscape’s original specification and version 1 as
>>> complying RFC2109 (with the note “Since RFC 2109 is still somewhat
>>> new, consider version 1 as experimental; do not use it yet on
>>> production sites”).
>> 
>> That text is unchanged since at least Servlet 2.3 (12 years) so I
>> wouldn't give it too much weight. The important part is that the
>> Servlet spec requires support for the Netscape spec and RFC2109.
>> 
>>> The current implementation uses a number of system properties to
>>> control how cookies are validated. In implementing RFC6265 I hope
>>> that some of these can be eliminated. If not, I would propose to
>>> add configuration options on the Connector or Host objects to allow
>>> the configuration to be set separately for different host domains.
>> 
>> Every single one of those options was created for a good reason. As a
>> minimum, the discussion that lead to the creation of the option needs
>> to be reviewed before dropping the option.
>> 
>> Per connector or per host options are unlikely to be useful. Per
>> context options are more likely to be a better choice.
>> 
>>> RFC6265 has separate sections in respect for generating and
>>> parsing cookie headers. It follows the practice that generation be
>>> strict but parsing be more tolerant of invalid input. Our current
>>> implementation generally follows that trend by suppressing invalid
>>> input data (after logging). However, for some input data, primary
>>> CTLs, it throws an IllegalArgumentException from the connector
>>> which does not allow the application to recover. In refactoring, I
>>> would propose to simply ignore that input thereby allowing the
>>> application to handle it, for example by parsing the header field
>>> manually. Cookie parsing in particular needs to be tolerant of
>>> cookies set by other sources, including different servers handling
>>> other parts of the domain and JavaScript or other client-side code
>>> setting values in the browser.
>> 
>> What do you mean by ignoring the input? It could be any of:
>> a) skipping the CTL character but otherwise processing the cookie
>> b) replacing the CTL with a space
>> c) ignoring the individual cookie
>> d) ignoring the entire cookie header
>> 
>> I'd support c) but am unlikely to support a) b) or d)
>> 
>>> In light of this, I propose separating the “Set-Cookie” generation
>>> side from the “Cookie” parsing side.
>> 
>> A complete separation may not be appropriate. They share a lot of
>> common code.
>> 
>>> Generation ========== The general principle here would be to use
>>> the version property of Cookie to determine the level of
>>> verification to perform: if 0 follow RFC6265, if 1 use RFC2109. The
>>> primary verification point would be in
>>> HttpServletRequest#addCookie() which would use the version in the
>>> Cookie instance. Characters will always be converted to octets
>>> using the ISO-8859-1 charset; unmappable values will result in an
>>> IAE.
>> 
>> That is what used to happen and will lead to invalid cookies being
>> generated. For valid cookies to be generated, the automatic switching
>> from v0 to v1 needs to be retained.
>> 
>> (As an aside, I believe the logic for this is correct but the comments
>> and the method names appear to be misleading.)
>> 
>>> The Servlet spec requires an IAE be thrown in Cookie’s constructor
>>> if the name is not valid pre RFC2109. Both RFC6265 and RFC2109
>>> define the name to be a “token” (per RFC2616 HTTP/1.1) so I would
>>> propose to always validate by those rules; this would allow
>>> US-ASCII characters except CTLs and separators. This will different
>>> from the current implementation that slash “/“ would be treated as
>>> a separator which would not be allowed in a name by default; this
>>> is consistent with the RFC’s and Glassfish’s implementation and I’m
>>> assuming that allowing it in our current implementation is a
>>> hangover from where we enabled use of “/“ in values.
>> 
>> Correct. We allowed the use of / in values by default primarily
>> because Microsoft Internet Explorer could not handle a cookie where
>> the path was a quoted string. As I recall, we simply allowed / to be
>> used in a token rather than drawing a distinction (as perhaps we
>> should) between the different places a token may appear.
>> 
>>> The spec
>> 
>> Which specification? Be precise. (I know you mean the Servlet
>> specification but given that we are talking about at least 5
>> specifications, not everyone else will.)
>> 
>>> allows vendors to provide "a configuration option that allows
>>> cookie names conforming to the original Netscape Cookie
>>> Specification to be accepted” and I propose to retain the system
>>> property “org.apache.tomcat.util.http.ServerCookie.STRICT_NAMING”
>>> for that. If explicitly set to false, it will verify names using
>>> Netscape’s rules and allow "a sequence of characters excluding
>>> semi-colon, comma and white space” but also excluding “=“ and CTLs
>>> per RFC2616; note this *would* allow 8-bit ISO-8859-1 characters
>>> in the name and relax the RFC2109 constraint that "NAMEs that begin
>>> with $ are reserved for other uses and must not be used by
>>> applications.”
>> 
>> The general requirements of RFC2616 also apply. With respect to the
>> requirements of section 4.2, the Tomcat developers have always taken
>> the view that cookie headers should be treated as "combinations of
>> token, separators, and quoted-string" rather than "*TEXT" for the
>> purposes of compliance with RFC2616. This view is rather fundamental
>> to how cookies have been handled to date. I'm not convinced that there
>> is sufficient support to change that view.
>> 
>> (Aside: The Netscape specification refers to characters. An
>> interesting question is whether or not this means CHAR as defined by
>> RFC2616 (which would not allow 0x80 to 0xFF.)
>> 
>>> The value would not be checked until addCookie() was called and
>>> the cookie version is known. This would in principle use RFC6265’s
>>> “cookie-value” rule if version == 0 or RFC2109’s “value” rules if
>>> version == 1; values that do not conform would result in an IAE
>>> from addCookie(). Unlike the current implementation, this would
>>> not automatically upgrade the version or add quotes around RFC2109
>>> “values” that did not match the “token” rule.
>> 
>> I would be very strongly against such a change.
>> 
>>> If STRICT_SERVLET_COMPLIANCE is set, the rule for version 0 values
>>> would be relaxed to allow any value conforming to the Netscape
>>> specification except CTLs; this would effectively add DQUOTE,
>>> backslash, and 0x80-0xFF. For more granular control, I propose
>>> adding the system property
>>> “org.apache.tomcat.util.http.ServerCookie.ALLOW_IN_VALUE” which
>>> would take one of the following enum values to determine what
>>> octets were allowed: * Netscape * RFC2616_token * RFC2109_value *
>>> RFC6265_cookie_octet * Netscape_restricted (limits the permitted
>>> characters as recommended in the Servlet spec) *
>>> RFC6265_ISO-8859-1 (adds 0x80-0xff to cookie_octet)
>> 
>> I would be against this change too.
>> 
>>> RFC6265 does allow value to be omitted so if value is null then a
>>> name-only cookie will be produced. This will contain the “=“
>>> character required by the “cookie-pair” rule. RFC2109 does not
>>> allow the value to be omitted so a null value will result in an IAE
>>> unless “org.apache.tomcat.util.http.ServerCookie.ALLOW_NAME_ONLY”
>>> is set to true.
>>> 
>>> Max-Age and Expires will always be sent.
>> 
>> They should not be sent unless the application explicitly set a
>> maxAge. We should also update our information on browser support for
>> Max-Age and Expires.
>> 
>>> Parsing ======= RFC6265 says the user-agent MUST send only a
>>> single Cookie header, and RFC2109 is written to imply that
>>> assumption. Netscape says “a line” is added to the request. Our
>>> current implementation processes all Cookie headers in a request
>>> independently which leads to a difference in behaviour if the
>>> headers are folded by an intermediate proxy. Specifically any
>>> $Version value specified in one header is lost when processing the
>>> next whereas if the headers were folded the version information
>>> would apply when processing the values from the second header. To
>>> avoid this inconsistency, I propose only processing the first
>>> header received.
>> 
>> Is there any evidence that proxies are folding headers in this way?
>> Such a folding would not be permitted by RFC2616 so I'm inclined not
>> to be too concerned with this case. I'm more concerned about a proxy
>> that adds its own cookie header (e.g. for authentication) that your
>> proposal would then ignore.
>> 
>>> RFC2109 requires the header start with “cookie-version” which can
>>> be used to discriminate between RFC2109 and RFC6265/Netscape
>>> formats. Specifically, if the line starts with “$Version” then it
>>> would be processed using RFC2109 rules, otherwise processing would
>>> use RFC6265 rules. The analysis behind RFC6265 indicates that most
>>> user agents simply ignore any “Version” attribute in a RFC2109
>>> Set-Cookie so we would expect most requests to simply contain a
>>> RFC6265-format header.
>> 
>> See previous comments on my views on how we should be using RFC6265.
>> 
>>> When parsing a RFC2109 Cookie header, we will assume conformance
>>> to the specification. Any invalid “cookie-value” will simply be
>>> dropped and the parser will attempt to recover at the next
>>> potential “cookie-value” based on the current parse state (i.e. in
>>> a quoted-string or not). A missing VALUE will be considered
>>> invalid unless
>>> “org.apache.tomcat.util.http.ClientCookie.ALLOW_NAME_ONLY” is set
>>> to true. The version, path and domain properties in Cookie will be
>>> set based on attributes in the cookie-value.
>> 
>> No objections to this paragraph.
>> 
>>> When parsing a RFC6265 Cookie header, we will assume it is
>>> comprised of “cookie-pair” separated by the sequence “;” SP. A
>>> “cookie-pair” must start with a valid “cookie-name” followed by an
>>> “=“ character. The name must be a valid “token” unless
>>> “org.apache.tomcat.util.http.ClientCookie.STRICT_NAMING” is
>>> explicitly set to false. The will would be considered valid if it
>>> complies with the “cookie-value” rule unless
>>> “org.apache.tomcat.util.http.ClientCookie.ALLOW_IN_VALUE” is set
>>> to one of the alternatives above. This will allow name-only cookies
>>> (in the form "name=“). It will also mean cookies with unencoded
>>> JSON values will normally be suppressed so any application
>>> expecting such a value would need to parser the Cookie header
>>> directly. Any invalid cookie-pair will not be returned to the
>>> application and will not cause an exception to be thrown by the
>>> parser; a user-data error will be logged. To recover from an
>>> invalid cookie-pair the parser will look for the next “;” SP
>>> sequence. The path and domain properties in Cookie will always be
>>> null and the version will always be 0.
>> 
>> Again, see my previous comments on RFC6265.
>> 
>>> I plan to start looking at this for trunk/TC8.0 starting with a
>>> cleanup of the current tests. It should be possible to back-port
>>> to TC7.0.x if that is desirable.
>> 
>> Cleaning up the tests to what end?
>> 
>> Given the history of changes to cookie parsing and the problems that
>> have gone with them, this proposal is going far too far, far too fast.
>> 
>> Small, incremental changes are the way to go and each change needs to:
>> a) solve a real problem
>> b) be reviewed carefully with backwards compatibility in mind
>> c) be reviewed carefully with security in mind
>> 
>> Mark
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>> 
>