You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Thierry Templier <te...@yahoo.fr> on 2011/04/29 14:33:05 UTC

mojk and utf8 charset problem

Hello,

I developped an application that uses UTF8 encoding since it needs to display arabic characters. When directly accessing the application from Tomcat, everything works fine. When I tried to access it through Apache web server and mod jk, I have problems to display such characters. Utf8 is correctly configured within Apache web server since I can display them from static pages. So it seems the problem comes from mod jk.

Is there a way to configure modjk to use utf8 encoding for http requests and responses?

Thanks very much for your answers.
Thierry



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thierry,

On 5/2/2011 4:31 AM, Thierry Templier wrote:
> <meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8"></meta>

Just to be sure, I highly recommend coding your pages like this:

<meta http-equiv="CONTENT-TYPE" content="text/html; charset=<?=
response.getCharacterEncoding() %>"></meta>

This will ensure that you aren't sending ISO-8859-1 but claiming that
it's UTF-8.

> The content type header is the same and specifies UTF-8 as encoding... However it appears that when using Apache / modjk / Tomcat, the reponse content is compressed using gzip. It's not the case when directly accessing Tomcat. I don't know if it could be the reason of the problem...

gzip encoding is unlikely to be causing the problem.

Can you post the configuration you have for your <Connector> elements in
Tomcat's conf/server.xml? Remember to remove any sensitive information
(ip addresses, JK secrets, etc.)

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2+vrEACgkQ9CaO5/Lv0PBqIgCeKKh2ihG6UX/EESHe1dgkMK0O
NDYAn06+/cyLX0CiQJLSg+6IuKS8tCsx
=kcom
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by Thierry Templier <te...@yahoo.fr>.
Hello André,

I made tests in both browsers:

- Firefox 3.6.16 (linux)
- Chrome 11.0.696.57 (linux)

and I have the same behavior.

Thierry

> Additional question : did you try it
> with different browsers ?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by André Warnier <aw...@ice-sa.com>.
Additional question : did you try it with different browsers ?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by Thierry Templier <te...@yahoo.fr>.
Hello,

Sorry for my very late answer!

I took me time to solve the problem basing on what you suggested. In fact, there are two different ones:

- I use Tiles and I don't specify header in all elements building the final page (<%@page language="java" contentType="text/html; charset=UTF-8"%>). After having specified that, utf8 characters display correctly.

- I also have a configuration problem at the MySQL JDBC driver level. Whereas the database is configured for utf8, I also need to specify some parameters in the JDBC url (see http://confluence.atlassian.com/display/DOC/Configuring+Database+Character+Encoding).

Thanks very much for your help!
Thierry

> Logic would have it that, independently of what the server
> does,
> - if you have the same browser at the client side
> - if the HTTP response headers are the same in both cases
> - if the response content is the same in both cases
> then the browser should display the same thing.
> 
> And if it doesn't, then one of the above premises is
> wrong.
> 
> To my knowledge, there is no purpose-built mechanism in
> either the AJP Connector, or mod_jk, to change the response
> content after it has been produced by the application.
> 
> There could be a bug somewhere however, in particular when
> talking about characters which may need more than 2 bytes
> for a proper UTF-8 representation (and chunked encoding?
> that may be a little-investigated area).
> But if the received content is the same, then this also
> makes no sense.
> 
> Another test : what about using "wget" to retrieve one of
> your pages directly from tomcat and then through
> Apache/mod_jk, saving the result as 2 files, and then
> comparing these files with "diff" ?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by André Warnier <aw...@ice-sa.com>.
Thierry Templier wrote:
> Hello André,
> 
> After having disabled compression at Apache level, things change a bit since now content from database is correctly displayed using JSTL (<c:out value="(...)" escapeXml="false"/>) but it's still not the case for content of JSP pages. I have however that at the beginning of JSP pages: <%@page language="java" contentType="text/html; charset=UTF-8"%>.
> 
Logic would have it that, independently of what the server does,
- if you have the same browser at the client side
- if the HTTP response headers are the same in both cases
- if the response content is the same in both cases
then the browser should display the same thing.

And if it doesn't, then one of the above premises is wrong.

To my knowledge, there is no purpose-built mechanism in either the AJP Connector, or 
mod_jk, to change the response content after it has been produced by the application.

There could be a bug somewhere however, in particular when talking about characters which 
may need more than 2 bytes for a proper UTF-8 representation (and chunked encoding? that 
may be a little-investigated area).
But if the received content is the same, then this also makes no sense.

Another test : what about using "wget" to retrieve one of your pages directly from tomcat 
and then through Apache/mod_jk, saving the result as 2 files, and then comparing these 
files with "diff" ?


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by Thierry Templier <te...@yahoo.fr>.
Hello André,

After having disabled compression at Apache level, things change a bit since now content from database is correctly displayed using JSTL (<c:out value="(...)" escapeXml="false"/>) but it's still not the case for content of JSP pages. I have however that at the beginning of JSP pages: <%@page language="java" contentType="text/html; charset=UTF-8"%>.

Thierry

> It seems unlikely that it would be the compression that
> causes the problem.
> Content encoding is only supposed to be used during the
> transport from the server to the browser.  So it is
> applied last at the server (Apache) side, and removed first
> at the browser side, before interpreting the content.
> But just in case, it should be easy to disable, if even
> just for a test.
> 
> Under Ubuntu, you may try the command "a2dismod deflate" to
> disable the filter.
> Or if that does not work, have a look here to modify your
> configuration :
> http://httpd.apache.org/docs/2.2/mod/mod_deflate.html
> 
> I believe Ubuntu is similar to Debian.  If so, then
> the setup of the mod_deflate filter may be in a file like
> /etc/apache2/mods-available/deflate.conf

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by André Warnier <aw...@ice-sa.com>.
Thierry Templier wrote:
> Hi André,
> 
> Thanks very much for your help!
> 
> I checked difference between two access:
> 
> - Using Apache / modjk / Tomcat that can't display correclty non latin1 characters
> - Directly using Tomcat that works fine
> 
> Except characters that don't display correctly content are the same, especially meta tags at the beginning:
> 
> <meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8"></meta>
> 
> As suggested, I also have a look at request / response content and it seems that there are some different, as described below.
> 
> - Response headers when using Apache / Modjk / Tomcat:
> 
> Date	Mon, 02 May 2011 08:21:16 GMT
> Server	Apache/2.2.14 (Ubuntu)
> Pragma	no-cache
> Expires	Thu, 01 Jan 1970 00:00:00 GMT
> Cache-Control	no-cache, no-store
> Content-Language	en-UK
> Vary	Accept-Encoding
> Content-Encoding	gzip
> Content-Length	2494
> Keep-Alive	timeout=15, max=93
> Connection	Keep-Alive
> Content-Type	text/html;charset=UTF-8
> 
> - Response headers when directly using Tomcat:
> 
> Server	Apache-Coyote/1.1
> Pragma	no-cache
> Expires	Thu, 01 Jan 1970 00:00:00 GMT
> Cache-Control	no-cache, no-store
> Content-Type	text/html;charset=UTF-8
> Content-Language	en-UK
> Transfer-Encoding	chunked
> Date	Mon, 02 May 2011 08:19:39 GMT
> 
> The content type header is the same and specifies UTF-8 as encoding... However it appears that when using Apache / modjk / Tomcat, the reponse content is compressed using gzip. It's not the case when directly accessing Tomcat. I don't know if it could be the reason of the problem...
> 
It seems unlikely that it would be the compression that causes the problem.
Content encoding is only supposed to be used during the transport from the server to the 
browser.  So it is applied last at the server (Apache) side, and removed first at the 
browser side, before interpreting the content.
But just in case, it should be easy to disable, if even just for a test.

Under Ubuntu, you may try the command "a2dismod deflate" to disable the filter.
Or if that does not work, have a look here to modify your configuration :
http://httpd.apache.org/docs/2.2/mod/mod_deflate.html

I believe Ubuntu is similar to Debian.  If so, then the setup of the mod_deflate filter 
may be in a file like /etc/apache2/mods-available/deflate.conf



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by Thierry Templier <te...@yahoo.fr>.
Hi André,

Thanks very much for your help!

I checked difference between two access:

- Using Apache / modjk / Tomcat that can't display correclty non latin1 characters
- Directly using Tomcat that works fine

Except characters that don't display correctly content are the same, especially meta tags at the beginning:

<meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8"></meta>

As suggested, I also have a look at request / response content and it seems that there are some different, as described below.

- Response headers when using Apache / Modjk / Tomcat:

Date	Mon, 02 May 2011 08:21:16 GMT
Server	Apache/2.2.14 (Ubuntu)
Pragma	no-cache
Expires	Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control	no-cache, no-store
Content-Language	en-UK
Vary	Accept-Encoding
Content-Encoding	gzip
Content-Length	2494
Keep-Alive	timeout=15, max=93
Connection	Keep-Alive
Content-Type	text/html;charset=UTF-8

- Response headers when directly using Tomcat:

Server	Apache-Coyote/1.1
Pragma	no-cache
Expires	Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control	no-cache, no-store
Content-Type	text/html;charset=UTF-8
Content-Language	en-UK
Transfer-Encoding	chunked
Date	Mon, 02 May 2011 08:19:39 GMT

The content type header is the same and specifies UTF-8 as encoding... However it appears that when using Apache / modjk / Tomcat, the reponse content is compressed using gzip. It's not the case when directly accessing Tomcat. I don't know if it could be the reason of the problem...

Thierry

> Hi.
> 
> I suggest to get one of the browser add-ons which allow to
> display the complete HTTP response from the webserver to the
> browser (iow the HTTP headers as well as the content).
> For Firefox, you can use for example HttpFox; for IE, there
> is Fiddler2. A quick search in Google will lead you to the
> download page.
> 
> Install one of those, re-do your server request, and
> carefully compare what you get back
> a) from Tomcat directly
> b) from Apache + mod_jk + tomcat
> 
> The way that a browser will display a page (in terms of
> charset) depends on 3 elements :
> 
> 1) when the server sends a response, it includes a
> "Content-type" HTTP header, which in this case should be
> something like :
> Content-type: text/html; charset=UTF-8
> 
> 2) any <meta> tags included inside the <head>
> portion of the html page.
> For example, a tag such as :
> <meta http-equiv="content-type" value="text/html;
> charset=UTF-8" />
> 
> 3) the way in which the browser (each specific browser, and
> sometimes even version) interprets the above.
> 
> According to the HTTP RFCs, the browser SHOULD NOT
> "second-guess" what the server says in terms of
> content-type. In other words, if the server says
> Content-type: something; charset=somecharset
> then the browser should blindly follow this, and not make
> its own determination.
> 
> However, IE for one is notorious for not following this
> aspect of the RFCs, and constantly trying to determine by
> itself what it is receiving, often in contradiction to what
> the server says. And worse, the determination it makes
> depends on the version of IE, and sometimes even on the
> patches applied to ir or to Windows.
> 
> Also,
> 3a) ultimately, it is the user who is in control.  In
> the browser settings, there is a way to override the above,
> and force the browser to always display the page in a
> specific character set.  It does not sound that this is
> an issue in your case, but better check anyway.
> 
> But first, make sure that what you are receiving in one
> case or the other is really the same, headers and content.
> And maybe also try it with different browsers, to see if
> the result is always the same.
> 
> Once you know the answer to that, then you can start
> looking for the issue in a more focused way.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: mojk and utf8 charset problem

Posted by André Warnier <aw...@ice-sa.com>.
Thierry Templier wrote:
> Hello,
> 
> I developped an application that uses UTF8 encoding since it needs to display arabic characters. When directly accessing the application from Tomcat, everything works fine. When I tried to access it through Apache web server and mod jk, I have problems to display such characters. Utf8 is correctly configured within Apache web server since I can display them from static pages. So it seems the problem comes from mod jk.
> 
> Is there a way to configure modjk to use utf8 encoding for http requests and responses?
> 
Hi.

I suggest to get one of the browser add-ons which allow to display the complete HTTP 
response from the webserver to the browser (iow the HTTP headers as well as the content).
For Firefox, you can use for example HttpFox; for IE, there is Fiddler2. A quick search in 
Google will lead you to the download page.

Install one of those, re-do your server request, and carefully compare what you get back
a) from Tomcat directly
b) from Apache + mod_jk + tomcat

The way that a browser will display a page (in terms of charset) depends on 3 elements :

1) when the server sends a response, it includes a "Content-type" HTTP header, which in 
this case should be something like :
Content-type: text/html; charset=UTF-8

2) any <meta> tags included inside the <head> portion of the html page.
For example, a tag such as :
<meta http-equiv="content-type" value="text/html; charset=UTF-8" />

3) the way in which the browser (each specific browser, and sometimes even version) 
interprets the above.

According to the HTTP RFCs, the browser SHOULD NOT "second-guess" what the server says in 
terms of content-type. In other words, if the server says
Content-type: something; charset=somecharset
then the browser should blindly follow this, and not make its own determination.

However, IE for one is notorious for not following this aspect of the RFCs, and constantly 
trying to determine by itself what it is receiving, often in contradiction to what the 
server says. And worse, the determination it makes depends on the version of IE, and 
sometimes even on the patches applied to ir or to Windows.

Also,
3a) ultimately, it is the user who is in control.  In the browser settings, there is a way 
to override the above, and force the browser to always display the page in a specific 
character set.  It does not sound that this is an issue in your case, but better check anyway.

But first, make sure that what you are receiving in one case or the other is really the 
same, headers and content.
And maybe also try it with different browsers, to see if the result is always the same.

Once you know the answer to that, then you can start looking for the issue in a more 
focused way.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: mojk and utf8 charset problem

Posted by Thierry Templier <te...@yahoo.fr>.
Hello Matteo,

Thanks very much for your answer but I didn't receive the end...

As suggested, I tried both addresses and the result isn't the same. When using Tomcat directly, everything works fine and when accessing through modjk, I have problem with non latin1 characters... So I think that it's a modjk / connector configuration problem.

Thierry

> From my experience modjk doesn't have
> charset configuration, only on connector into server.xml you
> can change charset configuration (URIEncodig,
> useBodyEncodingForURI) but only to parse the uri and
> parameters of the request, not for output.
> 
> Did you try with the same tomcat to get pages from http
> tomcat connector (port 8080 default) and from apache (port
> 80 default).
> 
> i.e: if you have a dynamic page testPage.html build by
> tomcat, try this on your browser
> http://localhost:8080/testPage.html

> http://localhost:80/testPage.html

> 
> If the result is the same, connector and apache are not the
> origin.
> 
> If you 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: mojk and utf8 charset problem

Posted by Matteo Turra <m....@kion.it>.
From my experience modjk doesn't have charset configuration, only on connector into server.xml you can change charset configuration (URIEncodig, useBodyEncodingForURI) but only to parse the uri and parameters of the request, not for output.

Did you try with the same tomcat to get pages from http tomcat connector (port 8080 default) and from apache (port 80 default).

i.e: if you have a dynamic page testPage.html build by tomcat, try this on your browser
http://localhost:8080/testPage.html
http://localhost:80/testPage.html

If the result is the same, connector and apache are not the origin.

If you 




-----Original Message-----
From: Thierry Templier [mailto:templth@yahoo.fr] 
Sent: venerdì 29 aprile 2011 14:33
To: users@tomcat.apache.org
Subject: mojk and utf8 charset problem

Hello,

I developped an application that uses UTF8 encoding since it needs to display arabic characters. When directly accessing the application from Tomcat, everything works fine. When I tried to access it through Apache web server and mod jk, I have problems to display such characters. Utf8 is correctly configured within Apache web server since I can display them from static pages. So it seems the problem comes from mod jk.

Is there a way to configure modjk to use utf8 encoding for http requests and responses?

Thanks very much for your answers.
Thierry



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org