You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Shanti Suresh <sh...@umich.edu> on 2013/07/02 16:04:46 UTC
Re: [slightly OT] FORM based authentication and utf-8 encoding of credentials
Greetings,
On Wed, Jun 26, 2013 at 4:08 PM, Christopher Schultz <
chris@christopherschultz.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> André,
>
>
>
> > But, even when sending UTF-8 encoded data according to this
> > principle, they are *not* indicating that it is UTF-8 data, which
> > is basically wrong, because the standard HTTP/HTML character set is
> > iso-8859-1, and they *should* indicate it when that is not what
> > they are sending. But that is the reality.
>
> No, as much as it pains me to do so, I agree with with Mozilla folks
> on this one: adding a charset attribute to an
> application/x-form-urlencoded Content-Type violates the spec. There is
> no good solution.
> ...
>
> > We really need an RFC for HTTP 2.0, with UTF-8 as the default
> > charset/encoding.
>
> +1
>
> Maybe they can clear-up Tomcat logging configuration while they are at
> it :)
>
>
Thank you! This discussion was quite informational.
-Shanti
Re: [slightly OT] FORM based authentication and utf-8 encoding of
credentials
Posted by André Warnier <aw...@ice-sa.com>.
Shanti Suresh wrote:
> Greetings,
>
>
> On Wed, Jun 26, 2013 at 4:08 PM, Christopher Schultz <
> chris@christopherschultz.net> wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> André,
>>
>>
>>
>>> But, even when sending UTF-8 encoded data according to this
>>> principle, they are *not* indicating that it is UTF-8 data, which
>>> is basically wrong, because the standard HTTP/HTML character set is
>>> iso-8859-1, and they *should* indicate it when that is not what
>>> they are sending. But that is the reality.
>> No, as much as it pains me to do so, I agree with with Mozilla folks
>> on this one: adding a charset attribute to an
>> application/x-form-urlencoded Content-Type violates the spec. There is
>> no good solution.
>> ...
>>
>
>
>>> We really need an RFC for HTTP 2.0, with UTF-8 as the default
>>> charset/encoding.
>> +1
>>
>> Maybe they can clear-up Tomcat logging configuration while they are at
>> it :)
>>
>>
> Thank you! This discussion was quite informational.
>
You are welcome.
Further as relatively [OT], in some other - non-Tomcat, non-Java - applications, we solve
the general issue as follows (taking into account the deficiencies of the RFCs, the
servers, the browsers, and the users) :
- when programmers create the html documents containing the forms, they must make sure
that they use a tool which really saves the html document in the charset/encoding that
corresponds to their wishes
- the html page should contain a declaration like :
<meta http-equiv="Content-Type" content="text/html; charset=xxxxx" />
(where xxxx is the correct charset/encoding, like "UTF-8")
- each form that is sent to the browser is sent by the server with an explicit HTTP header
: Content-type: text/html; charset=xxxx
(that normally happens automatically, but you should nevertheless check that it matches)
- the <form> tag of the form should contain the "accept-charset" attribute with the
expected character set as value, like
<form accept-charset="UTF-8" ...>
- the form itself contains a hidden parameter like :
<input type="hidden" name="charset-test" value="yyyyy">
(where yyyyy is a character sequence which is so that, seen as bytes, its length would be
different depending on the real character set used. E.g., for Europe, "ÖöÜüÄä")
- the application which receives the form submit data, must first check if the string
received for the "charset-test" parameter matches its expectations.
In other words, if the application expects UTF-8, then it should check that the received
string has a byte length of 12 and a character length of 6, and matches a Unicode string
"ÖöÜüÄä")
And if it doesn't, then it should take appropriate action (abort the action, or try
another charset)
(if the form sent by the server contains additional data coming from a back-end database
system, then one should also check that the charset of that data matches the one of the
form of course).
This may look a bit like overkill, but it is the result of long and painful real-world
experience with multi-lingual applications used with multiple browsers and multiple types
of users in multiple countries doing cut-and-paste of all kinds of stuff into forms.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org