You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by cm...@yahoo.com on 2001/02/14 17:01:59 UTC

RE: charset used for parameters decoding on HTTP request Tomcat3. x,4

> 
> The problem is that browsers do not send the charset used to encode the
> form's parameters; but they sent the request with the ContentType header
> application/x-www-form-urlencoded. The charset should follow the encoding
> type ex: "application/x-www-form-urlencoded; charset=UTF8" but in most of
> cases does not.

I know. But that's the standard, and we have to follow it first.
If that fails ( and will - in most browsers that ignore the standards ) -
then we can try workarounds. 


> >From my point of view instead of implementing a routine in charge of
> analysing the request header to extract the data's encoding charset (few
> chances for it to really work), It would be better to adopt the following
> policy:

There is no "instead" here - in addition of the ";charset=" we can do
many things.


>  * we suppose that the request's parameters encoding is the one used for the
> response to this request content encoding. If the servlet processing
> generates a result page encoded with Shift_JIS charset, it is reasonnable to
> suppose that the incoming form data used for the page generation is encoded
> with the Shift_JIS charset.
>...
> (javax.servlet.http.HttpServletResponse.setCharacterEncoding(String)).
>...

That's a good idea - thanks Adalbert. 

There are other few tricks we can try ( in addition to this one ), and in
time we can hope that browsers will follow the standards.

BTW, another small improvement would be to specify an encoding per
application ( instead of defaulting to the platform or UTF).
And one may guess the charset from the Accept-Language ( in some cases ).
A very common mechanism seems to be a "charset" parameter in the request (
it seems there it is possible to do a javascript trick in the page to add
a hidden param with the current browser encoding ).

I'll start working on that in 1-2 weeks, and any sugestion ( like this
one ) will help.

Costin


Re: charset used for parameters decoding on HTTP request Tomcat3.x,4

Posted by Kazuhiro Kazama <ka...@ingrid.org>.
From: Hans Bergsten <ha...@gefionsoftware.com>
Subject: Re: charset used for parameters decoding on HTTP request Tomcat3.x,4
Date: Wed, 14 Feb 2001 11:47:17 -0800
Message-ID: <3A...@gefionsoftware.com>
> I'm afraid I have to -1 this proposal. Sure, it may be a nice feature but it's
> not defined by Servlet 2.2. And, for better or for worse, TC 3.x is the
> Reference 
> Implementation for Servlet 2.2. If we add this behavior to TC 3.x, a servlet
> that takes advantage of it will not be portable to other spec compliant 2.2
> containers.

Agreed.

Some vendor surly has already introduced their own encoding detection
methods which Costin mentioned. But the detail of detection method
isn't opened and it caused breakage under a complicated environment.

Servlet 2.3 will introduce setCharacterEncoding() method. This is a
simple, but I think this is a good solution.

Although some i18n problems are solved in Servlet 2.3 and JSP 1.2, it
is inappropriate to introduce a new spec. I (and perhaps all japanese)
hope to transition to Servlet 2.3 and JSP 1.2. It is better to use
Servlet 2.3 spec in Tomcat 3.3 ... Is it exceed the limit of Tomcat
3.3?

From: Adalbert Wysocki <wa...@imediation.com>
Subject: RE: charset used for parameters decoding on HTTP request Tomcat3.	x,4
Date: Wed, 14 Feb 2001 14:26:19 -0000
Message-ID: <9B...@PARSV011>
>  * we suppose that the request's parameters encoding is the one used for the
> response to this request content encoding. If the servlet processing
> generates a result page encoded with Shift_JIS charset, it is reasonnable to
> suppose that the incoming form data used for the page generation is encoded
> with the Shift_JIS charset.

There is a exception. In Japan, some systems sometime accept another
charset because JIS character set can be encoded in ISO-2022-JP,
EUC-JP and Shift_JIS, and user-defined HTML forms may be encoded in
another charset. In this case, they uses a "JISAutoDetect" converter
that has auto recognition facility for JIS variant character
encodings.

From: Adalbert Wysocki <wa...@imediation.com>
Subject: charset used for parameters decoding on HTTP request Tomcat3.x,4
Date: Mon, 12 Feb 2001 18:00:14 -0000
Message-ID: <9B...@PARSV011>
> NB: A solution would be to overwrite the system property "file.encoding" on
> the command line. But on exotic platforms (such as Korean), overwriting the

In Japan, another solution is used:

    s = new String(s.getBytes("iso-8859-1"), "Shift_JIS");

This method is dirty. But it don't change a Java default character
encoding. And it can work on Servlet 2.3 based container because
Servlet 2.3 defines the default value is "iso-8859-1".

Kazuhiro Kazama (kazama@ingrid.org)		NTT Network Innovation Laboratories