You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2002/11/18 19:32:22 UTC

DO NOT REPLY [Bug 14119] - MimeHeaders are allways encoded in UTF8

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14119>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14119

MimeHeaders are allways encoded in UTF8

Brian.Ewins@btinternet.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From Brian.Ewins@btinternet.com  2002-11-18 18:32 -------
Your belief is incorrect. The definition of what can go in a header is, from
http://www.ietf.org/rfc/rfc2068.txt , section 4.2:
          field-content  = <the OCTETs making up the field-value
                           and consisting of either *TEXT or combinations
                           of token, tspecials, and quoted-string>

Where *TEXT is defined in section 2.2:
   "Words of *TEXT may contain characters from character sets other than ISO
   8859-1 [22] only when encoded according to the rules of RFC 1522."

An example of such an encoded piece of text is:
=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=

(there are no special characters there, the question marks are intended). Note
that it is self-describing with respect to its character encoding; it works this
way to allow stream handling of headers - you cannot use the character encoding
of the response body since this header might not have been read yet.

The other possible expansions of field-content are quoted strings (which are
quoted TEXT in the sense above) and token:
token          = 1*<any CHAR except CTLs or tspecials>
CHAR           = <any US-ASCII character (octets 0 - 127)>

In other words, *TEXT is the only expansion of field-content that allows non
US-ASCII characters, and those characters are ISO-8859-1 (not UTF-8, or the body
content charset)

If you wish to use international characters in headers, you need to supply code
to perform RFC1522 encoding, and note that you can't use such values in most
headers.

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>