You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jeremy Boynes <jb...@apache.org> on 2014/01/20 22:38:13 UTC

Using HttpParser for Cookie header?

I started to look at using HttpParser for the Cookie header but there are some differences in the way it works compared to the existing parser in Cookies that I wanted to check direction before getting too far in.

The area I’m concerned about is the need to copy the bytes in order to parse the header. The Cookies parser relies heavily on MessageBytes and avoids copying to a String as far as possible. HttpParser, however, operates on a StringReader which requires converting to a String before parsing.

After digging into the usage of Cookies I think there are only two places that read them:
1) Request#getCookies(), which needs to copy to Strings anyway in order to create the Cookie instances it returns
2) CoyoteAdapter#parseSessionCookiesId(), which parses the header and compares names as MessageBytes, only allocating a String for the value if the session cookie is found

It’s this second one that has me concerned about switching to HttpParser as this gets called for every request. If we switch then there is going to be allocation and copying of the header that we currently don’t do. 

Having said that, the current parse relies heavily on the assumption that the header is US-ASCII encoded and that it is only dealing with 7-bit characters (it freely casts bytes to chars). The cookie change proposal has us supporting UTF-8 as specified by HTML5 which means a more robust decoder will be needed and the copy may not be avoidable.

My plan here is to KISS and implement a parser similar to the others in HttpParser assuming the header has already been decoded so it can just deal with the chars. Then if we notice any performance degradation we can focus on improving HttpParser which will have the benefit of working for the other header parsers as well. I’ll implement this alongside the existing code (actually, in the parser package) to make it easier to do an A-B comparison.

There would likely be some follow-on changes from such a change. Cookies and ServerCookie are recyclable objects associated with the request. By moving away from MessageBytes these could be replaced by basic String values and may not be needed e.g. Request already caches the array of Cookie values returned from getCookies() and that could be now populated directly from the parse. These classes may end up going away.

Thanks
Jeremy


Re: Using HttpParser for Cookie header?

Posted by Jeremy Boynes <jb...@apache.org>.
On Jan 20, 2014, at 1:57 PM, Mark Thomas <ma...@apache.org> wrote:

> Signed PGP part
> On 20/01/2014 21:38, Jeremy Boynes wrote:
> > I started to look at using HttpParser for the Cookie header but
> > there are some differences in the way it works compared to the
> > existing parser in Cookies that I wanted to check direction before
> > getting too far in.
> >
> > The area I’m concerned about is the need to copy the bytes in
> > order to parse the header. The Cookies parser relies heavily on
> > MessageBytes and avoids copying to a String as far as possible.
> > HttpParser, however, operates on a StringReader which requires
> > converting to a String before parsing.
> >
> > After digging into the usage of Cookies I think there are only two
> > places that read them: 1) Request#getCookies(), which needs to
> > copy to Strings anyway in order to create the Cookie instances it
> > returns 2) CoyoteAdapter#parseSessionCookiesId(), which parses the
> > header and compares names as MessageBytes, only allocating a String
> > for the value if the session cookie is found
> >
> > It’s this second one that has me concerned about switching to
> > HttpParser as this gets called for every request. If we switch
> > then there is going to be allocation and copying of the header that
> > we currently don’t do.
> 
> I share your concern. Worst case, we'll need to do a specific
> MessageBytes based parse just for the session cookie. Assuming that
> the session cookie name and value will remain US-ASCII (see no reason
> why this should not be the case), we could get away with this as long
> as we are mindful that there might be some UTF-8 we need to skip over.

I started implementing this and a patch can be found here:
  http://people.apache.org/~jboynes/patches/0013-Start-of-a-more-lenient-RFC6265-cookie-parser.patch

The best illustration of how it is parsing is probably the test cases. The main difference is that it does not apply RFC2109 rules unless $Version=1 is specified and because of that I did not end up using as much of HttpParser as I’d hoped.

I believe the implementation handles correctly formatted headers according to the rules from Netscape, RFC2109 and RFC6265. 

It is lenient with respect to malformed input, breaking the header down into pairs separated by semi-colons (allowing for RFC2109 quoting/escaping where applicable) and attempting to accept each pair in isolation. If an individual pair is invalid (e.g. contains disallowed characters), it defaults to accepting that rather than dropping the cookie. The only cookies that actually get dropped are ones that do not specify a name or ones where Cookie’s constructor throws an IAE on the name. What this means is that an application will see every cookie Cookie will allow, making it symmetrical, but it will not see cookies whose names the Servlet spec configuration (i.e. STRICT_NAMING) would disallow.

I kept the de-quoting behavior with one difference: for V0 cookies, RFC6265 quoting rules are applied which means any escapes are not undone. IOW for V0 «"a\bc"» will be returned as «a\bc» whereas for V1 it would be returned as «abc». I’m not convinced this is useful and it may be more predictable for users if we simply returned the actual value, quotes and all, by default with a config option to enable unquoting for V1 values only (perhaps the same option as would enable alternative G1a).

Feedback welcome, ideally in the form of a test case showing what should be returned for a header text input.

Cheers
Jeremy


Re: Using HttpParser for Cookie header?

Posted by Mark Thomas <ma...@apache.org>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 20/01/2014 21:38, Jeremy Boynes wrote:
> I started to look at using HttpParser for the Cookie header but
> there are some differences in the way it works compared to the
> existing parser in Cookies that I wanted to check direction before
> getting too far in.
> 
> The area I’m concerned about is the need to copy the bytes in
> order to parse the header. The Cookies parser relies heavily on 
> MessageBytes and avoids copying to a String as far as possible. 
> HttpParser, however, operates on a StringReader which requires 
> converting to a String before parsing.
> 
> After digging into the usage of Cookies I think there are only two 
> places that read them: 1) Request#getCookies(), which needs to
> copy to Strings anyway in order to create the Cookie instances it
> returns 2) CoyoteAdapter#parseSessionCookiesId(), which parses the
> header and compares names as MessageBytes, only allocating a String
> for the value if the session cookie is found
> 
> It’s this second one that has me concerned about switching to 
> HttpParser as this gets called for every request. If we switch
> then there is going to be allocation and copying of the header that
> we currently don’t do.

I share your concern. Worst case, we'll need to do a specific
MessageBytes based parse just for the session cookie. Assuming that
the session cookie name and value will remain US-ASCII (see no reason
why this should not be the case), we could get away with this as long
as we are mindful that there might be some UTF-8 we need to skip over.

No objections here to you continuing as per you plan.

Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJS3Zu7AAoJEBDAHFovYFnnYS8P/ihRKZ1VgAbwGG4f1fu8w232
IecFGCXRRuW48SZuj3ADX/rUwab+yoj/1N4QQetS7LRVhnQWDmNdZuQ596d8vsA0
HXHkAmtw4Et/s9r8QH+2UGabt65BjAsIlMU6OXJWn2Ph/N3b18sh0vHyW248WUs5
ck21WC8dAj4faqaJR9VI0N99Asq9Q6SoMswP7KYn7LhoLRA4oo3MtzGv7jW4m0BL
aDAlS13JqzO4dImo4MBsm5fevvuVhXppRBglkHuqVTYDrZ6hI4HGZ4VtoNcs352x
tNsDZU82LAxzL4Ix3Lsdxa7xOHxa6Dlffwh6YitwEERUhRYG0i//xLHER7lac83d
D+x5A2JFCvGxIN/jvLR5Up56+IzxhnVfPsiFpoa9ABtjxVvoiHzqGZu3ltBFdwbt
TEwG7uUD3L0pZ3yN1vGkDnEy7e15Aju5tMhd9PzCctSW1+ytPJv6psnxYucL+u6u
VlN8+wEmXFl3OIM9NhPtmW1qgu42zlWZl48RSfkoBMGmIUeE8FVS6uDsCZEHLmF3
+R/EiKPkDk+AkooCY5l3pEvM5WCPZpRj/8qCITEZJ0ETurmQ885Mn1sevK1oZQuX
tkKIG+3ICbDewwaFqGDwvy3lKOYOMK+D6xlt5If6iPqGgHjXIVC8LrtXfoX+wh44
g/+QPXHlVkX5eOlvi5Qb
=xnh9
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org