You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2014/01/04 22:34:01 UTC

[Bug 55951] New: HTML5 specifies UTF-8 encoding for cookie values

https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

            Bug ID: 55951
           Summary: HTML5 specifies UTF-8 encoding for cookie values
           Product: Tomcat 8
           Version: trunk
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Connectors
          Assignee: dev@tomcat.apache.org
          Reporter: jboynes@apache.org

The HTML5 specification is specifying that cookie values may contain characters
that are not part of US-ASCII or ISO-8859-1 and that those codepoints should be
UTF-8 encoded for display.

http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie

This will result in 8-bit high values in cookies that need to be accepted and
set.

This will also require special encoding to handle conversion to the UCS-16
characters used by the Java String used to represent the value in the Cookie
class.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from Mark Thomas <ma...@apache.org> ---
This has now been fixed in 8.0.x for 8.0.15 onwards.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

Jeremy Boynes <jb...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Depends on|                            |55917

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

--- Comment #5 from Mark Thomas <ma...@apache.org> ---
Here is a patch that adds support for sending HTTP headers in character sets
other than ISO-8859-1 and then uses that support for sending Set-Cookie
headers.

Both AJP and HTTP needed changes SPDY didn't as it already used the approach
the patch uses.

I still have some work to do to restore the filtering of CTLs.

http://people.apache.org/~markt/patches/2014-10-02-bug55951-tc8-v1.patch

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951
Bug 55951 depends on bug 55917, which changed state.

Bug 55917 Summary: Cookie parsing fails hard with ISO-8859-1 values
https://issues.apache.org/bugzilla/show_bug.cgi?id=55917

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |---

--- Comment #2 from Mark Thomas <ma...@apache.org> ---
Re-opening as the unit test didn't cover the end to end process are there are
still some issues to resolve.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

--- Comment #6 from Mark Thomas <ma...@apache.org> ---
This is the completed patch:
http://people.apache.org/~markt/patches/2014-10-06-bug55951-tc8-v2.patch

I'll give folks a day or so to review and comment and then commit it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

--- Comment #3 from Konstantin Kolinko <kn...@gmail.com> ---
(In reply to Jeremy Boynes from comment #0)
> The HTML5 specification is specifying that cookie values may contain
> characters that are not part of US-ASCII or ISO-8859-1 and that those
> codepoints should be UTF-8 encoded for display.
> 
> http://www.w3.org/html/wg/drafts/html/master/single-page.html#cookie
>

What is the exact wording?

The above link is broken - there is no "cookie" anchor in the current version
of that document. All I see are references to [COOKIES] document (#refsCOOKIES
anchor) = RFC 6265.

http://tools.ietf.org/html/rfc6265


RFC 6265 does not allow non-ascii characters in cookie value in Set-Cookie
header. Citing from its Chapter 4.1.1. Set-Cookie / Syntax,

 set-cookie-header = "Set-Cookie:" SP set-cookie-string
 set-cookie-string = cookie-pair *( ";" SP cookie-av )
 cookie-pair       = cookie-name "=" cookie-value
 cookie-name       = token
 cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       ; US-ASCII characters excluding CTLs,
                       ; whitespace DQUOTE, comma, semicolon,
                       ; and backslash

The cookie-value is limited to US-ASCII, even when quoted.

At the same time, attributes (cookie-av) do not have such limitation and as
such may be UTF-8:

 path-av           = "Path=" path-value
 path-value        = <any CHAR except CTLs or ";">


For reference, the place where UTF-8 is mentioned in RFC 6265 is in chapter
5.4. The Cookie Header. Citing:

   NOTE: Despite its name, the cookie-string is actually a sequence of
   octets, not a sequence of characters.  To convert the cookie-string
   (or components thereof) into a sequence of characters (e.g., for
   presentation to the user), the user agent might wish to try using the
   UTF-8 character encoding [RFC3629] to decode the octet sequence.
   This decoding might fail, however, because not every sequence of
   octets is valid UTF-8.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

Mark Thomas <ma...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #1 from Mark Thomas <ma...@apache.org> ---
This will be available in 8.0.15 onwards via the Rfc6265CookieProcessor.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 55951] HTML5 specifies UTF-8 encoding for cookie values

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=55951

--- Comment #4 from Mark Thomas <ma...@apache.org> ---
(In reply to Konstantin Kolinko from comment #3)
> The cookie-value is limited to US-ASCII, even when quoted.
Agreed.

> At the same time, attributes (cookie-av) do not have such limitation and as
> such may be UTF-8:
> 
>  path-av           = "Path=" path-value
>  path-value        = <any CHAR except CTLs or ";">

Nope. CHAR is limited to USASCII. See the definition in section 2.2 of RFC
6265.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org