You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by André Warnier <aw...@ice-sa.com> on 2014/02/04 11:59:35 UTC

Re: [OT] cookie issue with Tomcat 7 - does not accept the character "é"

Mark Thomas wrote:
> Cookie handling is fundamentally a complete mess. Specifications exist
> but are not fully implemented, are not consistent with related
> specifications, etc.
> 
> Having tried to sort this out the last time around and having read
> Jeremy's great work on documenting where we stand at the present moment,
> it often feels like it wouldn't be too hard to make a case that just
> about any cookie name or value that isn't an token (as per RFC2616) is
> either valid or invalid depending on which specification(s) you choose
> to read.
> 
> I'd strongly encourage anyone thinking about commenting further on this
> thread to take the time to read the wiki page [1] where the Tomcat
> committers (and Jeremy in particular) are currently trying to figure out
> exactly how Tomcat should handle cookies in the future.
> 
> Mark
> 
> 
> [1] http://wiki.apache.org/tomcat/Cookies
> 

Hi agree whith everything you say above.

About the Wiki, what seems to be missing is additional lines in the tables showing some 
examples of cookie values containing what English-speaking people often call "additional" 
or "accented" characters (and what other people just call "characters").  For example, 
what happens when the cookie value is a string like "ÄÖÜäöüéèîôâ" (that's about the extent 
of what I can enter easily on this current German keyboard).

And let's also reflect on the fact that no matter what else we have been discussing here, 
we have still not provided the original OP of this thread with any useful and practical 
recommendation to resolve his problem, which seems to originate in a variation between how 
Tomcat 6 and Tomcat 7 handle cookies with "accented characters" in their value.


Otherwise, to generalise the debate, it is not just cookies, but just about anything which 
has to do with non-US-ASCII characters under HTTP and HTML which is a mess, and has been a 
mess for several years if not decades.  The current jumble of RFCs that deal with this 
issue is in the end more confusing than helpful.  And all the current "solutions" in terms 
of implementation (browser-side as well as server-side) resemble patches over patches over 
wooden legs.

I am not saying that resolving the issue is simple, nor that one can simply ignore the 
past and/or backward-compatibility issues.  But, despite the immense respect I have for 
people like Roy Fielding and their achievements, I cannot but slowly get the impression 
that the Internet RFC mechanism is, in that respect, slowly getting "fossilised", and that 
nobody seems to have the energy and drive anymore to think radically, and tackle the issue 
from the top down.

Nobody nowadays discusses anymore that Unicode and UTF-8 provide a form of "universal" 
solution to most of the issues in terms of alphabets, character sets and encodings 
suitable for 99% of the human users of computers and of the Internet.  And nobody 
discusses anymore that 99% of currently in-use hardware and software can handle arbitrary 
sequences of bytes and bits perfectly fine.

Yet in terms of programming "for the Internet", we still have to live with - and work 
around every day - a set of standards and recommendations based on a myriad of alphabets 
and encodings which can each properly represent only a tiny fraction of the languages that 
people worldwide speak and read.
And the issues related to encoding/decoding/transliterating between these different 
alphabets and encodings, are costing thousands of productive hours lost every day, 
independently of the confusions and aggravations that they generate.

Why is it exactly that we can come up with things like websockets and HTML-5 and SOAP and 
java annotations, but not with a new HTTP/HTML version which would make Unicode/UTF-8 the 
*default*, and everything else into exceptions ?

That for the sake of interoperability and mutual comprehension, things like HTTP header 
*names* would be restricted to sequences of printable characters in a limited range that 
is available on all human interface devices and universally readable is one thing; but why 
would HTTP header *values* or URI path or query-string components (which often have to 
carry real-world multilingual textual information) be similarly limited, and confusing and 
inconsistent ?  Why does it still have to be so difficult, in 2014, to create a web 
user-interface application which insures that people from different countries can enter 
their name and place of residence as they know it, and not have the server-side or 
client-side application mangle them ?

If someone were to take the text of RFC 2616 and replace any direct or indirect mention of 
US-ASCII and ISO-8859-1 in it, by Unicode/UTF-8, and present this as an RFC for HTTP 2.0, 
would the Internet instantly crumble ?
Hoe does one go about doing this ?



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: [OT] cookie issue with Tomcat 7 - does not accept the character "é"

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

André,

On 2/4/14, 5:59 AM, André Warnier wrote:
> Why is it exactly that we can come up with things like websockets
> and HTML-5 and SOAP and java annotations, but not with a new
> HTTP/HTML version which would make Unicode/UTF-8 the *default*, and
> everything else into exceptions ?

Because the standards were written without the benefit of hindsight.

Talk to us in 10 years when Websocket looks like a rats-nest of
false-starts, failed standards, and divergent implementations.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJS8RD8AAoJEBzwKT+lPKRYvmMP/iZNbLzx7xIAxUfcKSbbtRvc
9ZGiTJ2oi646lHcWj8DKyr7mEiNiiuptRMswOflsYyK6ZhvTYr0LZhhvoY7IPto0
DejnwnAO+ywMjMk5FuVChrO535QCAZWxlgyRo6aAyl4Kwppaf/7OPl4/MRlu2O4J
7GG+i4pnPQKzBpGy5o8X7l6miMNXjPjUHXrGlZSUyn8wj3zOLqBbgTkSzQQuCkYj
GL/evamN2L4AX6YNdRIXVWQfVWvv76qmznd5cIVntcx0+ryj+kJOcNcEjMah+OwK
6Qh6pfJS1BumWL+EavjhXc9TtM5W5wkemfmN3KxomfzmHqkkFfi9GwKvmCrl5JcB
F/G9rd+N23eamUXioK84bE+3JP33KfELSbjv7AVVFHP/zhkxf/Bl031hqVQ6XsKU
cBQ26tx4ATGpBhaKjrvrbDGEUkVszsH9LiWpPGrSlVRKupPvmrv54kOuTiGhgCZc
xoY3LlmEA8r3DQtyAGqUS045j0+QIo8VWByqSDloAqEvROBWHGJwZ6drTXRpd0yw
/pKtVluR4ECN3nByqvyB866HjgqoVkpl77hEqcK/4k7G9kDdmDAzV3/5r/377/Bf
YrIXC+HN5zSgycSfG+O+HLvNYL1lfJwRdMrjqzZC+jpVewalGgqUkgW/HUVTbWra
E1qpB0AEnfaF9KGuraJv
=I3X1
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org