You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@tomcat.apache.org by André Warnier <aw...@ice-sa.com> on 2011/12/04 22:57:13 UTC

Character set issue

Hi.

I need help with a problem on a Tomcat system. The system is of difficult access, and I
cannot access it directly right now (this is Sunday night in Europe).
I know that the system runs Tomcat 6.something, under Oracle/Sun Java 1.6, and that's all
I can say right now. The platform is RedHat RHEL, current version.

The problem which happens is that, after the update of a webapp (of which I do not have
the code), it seems that non-US-English "diacritic" characters posted to the webapp from a
web <form>, are now "corrupted". And I would like to understand better the Tomcat
mechanism for reading HTTP request form parameters, so that I can start to figure out what
is going wrong.

The webapp consists of a single servlet, wrapped by two filters.
The application's web.xml defines the order as
filter1
filter2
servlet
with both filters processing all requests to the servlet.

"filter1" is a commercial product used on many Tomcat sites.
"filter2" is my own filter (and it is the only part of which I have the source code)
"servlet" is also a commercial product of which I do not have the code, and the one which
has just been updated.

What I would like to know is : with a setup such as the above, how does Tomcat determine
in which /character set/ the body of the POST will be read ?

For example :
Suppose that we have 2 html forms, form1 and form2. Both forms are functionally
identical, and contain a text input box named "name1".
The form form1 has an html declaration which specifies it as having the charset "iso-8859-1".
The form form2 has an html declaration which specifies it as having the charset "UTF-8".

The user, in the input box "name1" of each form, types the string "TÜV" (second character
= uppercase U with umlaut) and then posts the form to the webapp.
The user browser is the same in all cases.

If the servlet executes a request.getParameter("name1"), what are the factors which can
determine how it receives the value of this parameter ?

Or maybe my question should be : /can/ the servlet (or one of the filters) do anything
that would cause the value of "name1" to /not/ be a correct Java "TÜV" string in the servlet ?

Additional information :
Only the servlet was updated. Prior to that update, the application worked correctly. So
I strongly suspect that it is the updated servlet which creates the problem. But I'd like
to understand /how/ it can create such a problem, and if for example something in filter1
or filter2 could contribute to the problem, or not.
Filter1 is an authentication servlet filter, and as far as I know it only checks HTTP
headers, and does not concern itself with the body of the request. But I suppose that
even the request body "passes through" this filter, and that it could presumably corrupt
this body (although I would consider this unlikely right now).
Filter2 is my own filter (and I am not a Java expert). This filter works at a number of
installations (and also here, before this servlet update). It subclasses the HTTP
request, because it needs to add a HTTP header to the request, on-the-fly. But the
subclass only overrides the methods which have to do with the HTTP headers, and does not
handle the body directly.

Any information or ideas welcome.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org