You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Thierry Templier <te...@yahoo.fr> on 2011/04/29 14:33:05 UTC
mojk and utf8 charset problem
Hello,
I developped an application that uses UTF8 encoding since it needs to display arabic characters. When directly accessing the application from Tomcat, everything works fine. When I tried to access it through Apache web server and mod jk, I have problems to display such characters. Utf8 is correctly configured within Apache web server since I can display them from static pages. So it seems the problem comes from mod jk.
Is there a way to configure modjk to use utf8 encoding for http requests and responses?
Thanks very much for your answers.
Thierry
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Thierry,
On 5/2/2011 4:31 AM, Thierry Templier wrote:
> <meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8"></meta>
Just to be sure, I highly recommend coding your pages like this:
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=<?=
response.getCharacterEncoding() %>"></meta>
This will ensure that you aren't sending ISO-8859-1 but claiming that
it's UTF-8.
> The content type header is the same and specifies UTF-8 as encoding... However it appears that when using Apache / modjk / Tomcat, the reponse content is compressed using gzip. It's not the case when directly accessing Tomcat. I don't know if it could be the reason of the problem...
gzip encoding is unlikely to be causing the problem.
Can you post the configuration you have for your <Connector> elements in
Tomcat's conf/server.xml? Remember to remove any sensitive information
(ip addresses, JK secrets, etc.)
- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk2+vrEACgkQ9CaO5/Lv0PBqIgCeKKh2ihG6UX/EESHe1dgkMK0O
NDYAn06+/cyLX0CiQJLSg+6IuKS8tCsx
=kcom
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by Thierry Templier <te...@yahoo.fr>.
Hello André,
I made tests in both browsers:
- Firefox 3.6.16 (linux)
- Chrome 11.0.696.57 (linux)
and I have the same behavior.
Thierry
> Additional question : did you try it
> with different browsers ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by André Warnier <aw...@ice-sa.com>.
Additional question : did you try it with different browsers ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by Thierry Templier <te...@yahoo.fr>.
Hello,
Sorry for my very late answer!
I took me time to solve the problem basing on what you suggested. In fact, there are two different ones:
- I use Tiles and I don't specify header in all elements building the final page (<%@page language="java" contentType="text/html; charset=UTF-8"%>). After having specified that, utf8 characters display correctly.
- I also have a configuration problem at the MySQL JDBC driver level. Whereas the database is configured for utf8, I also need to specify some parameters in the JDBC url (see http://confluence.atlassian.com/display/DOC/Configuring+Database+Character+Encoding).
Thanks very much for your help!
Thierry
> Logic would have it that, independently of what the server
> does,
> - if you have the same browser at the client side
> - if the HTTP response headers are the same in both cases
> - if the response content is the same in both cases
> then the browser should display the same thing.
>
> And if it doesn't, then one of the above premises is
> wrong.
>
> To my knowledge, there is no purpose-built mechanism in
> either the AJP Connector, or mod_jk, to change the response
> content after it has been produced by the application.
>
> There could be a bug somewhere however, in particular when
> talking about characters which may need more than 2 bytes
> for a proper UTF-8 representation (and chunked encoding?
> that may be a little-investigated area).
> But if the received content is the same, then this also
> makes no sense.
>
> Another test : what about using "wget" to retrieve one of
> your pages directly from tomcat and then through
> Apache/mod_jk, saving the result as 2 files, and then
> comparing these files with "diff" ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by André Warnier <aw...@ice-sa.com>.
Thierry Templier wrote:
> Hello André,
>
> After having disabled compression at Apache level, things change a bit since now content from database is correctly displayed using JSTL (<c:out value="(...)" escapeXml="false"/>) but it's still not the case for content of JSP pages. I have however that at the beginning of JSP pages: <%@page language="java" contentType="text/html; charset=UTF-8"%>.
>
Logic would have it that, independently of what the server does,
- if you have the same browser at the client side
- if the HTTP response headers are the same in both cases
- if the response content is the same in both cases
then the browser should display the same thing.
And if it doesn't, then one of the above premises is wrong.
To my knowledge, there is no purpose-built mechanism in either the AJP Connector, or
mod_jk, to change the response content after it has been produced by the application.
There could be a bug somewhere however, in particular when talking about characters which
may need more than 2 bytes for a proper UTF-8 representation (and chunked encoding? that
may be a little-investigated area).
But if the received content is the same, then this also makes no sense.
Another test : what about using "wget" to retrieve one of your pages directly from tomcat
and then through Apache/mod_jk, saving the result as 2 files, and then comparing these
files with "diff" ?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by Thierry Templier <te...@yahoo.fr>.
Hello André,
After having disabled compression at Apache level, things change a bit since now content from database is correctly displayed using JSTL (<c:out value="(...)" escapeXml="false"/>) but it's still not the case for content of JSP pages. I have however that at the beginning of JSP pages: <%@page language="java" contentType="text/html; charset=UTF-8"%>.
Thierry
> It seems unlikely that it would be the compression that
> causes the problem.
> Content encoding is only supposed to be used during the
> transport from the server to the browser. So it is
> applied last at the server (Apache) side, and removed first
> at the browser side, before interpreting the content.
> But just in case, it should be easy to disable, if even
> just for a test.
>
> Under Ubuntu, you may try the command "a2dismod deflate" to
> disable the filter.
> Or if that does not work, have a look here to modify your
> configuration :
> http://httpd.apache.org/docs/2.2/mod/mod_deflate.html
>
> I believe Ubuntu is similar to Debian. If so, then
> the setup of the mod_deflate filter may be in a file like
> /etc/apache2/mods-available/deflate.conf
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by André Warnier <aw...@ice-sa.com>.
Thierry Templier wrote:
> Hi André,
>
> Thanks very much for your help!
>
> I checked difference between two access:
>
> - Using Apache / modjk / Tomcat that can't display correclty non latin1 characters
> - Directly using Tomcat that works fine
>
> Except characters that don't display correctly content are the same, especially meta tags at the beginning:
>
> <meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8"></meta>
>
> As suggested, I also have a look at request / response content and it seems that there are some different, as described below.
>
> - Response headers when using Apache / Modjk / Tomcat:
>
> Date Mon, 02 May 2011 08:21:16 GMT
> Server Apache/2.2.14 (Ubuntu)
> Pragma no-cache
> Expires Thu, 01 Jan 1970 00:00:00 GMT
> Cache-Control no-cache, no-store
> Content-Language en-UK
> Vary Accept-Encoding
> Content-Encoding gzip
> Content-Length 2494
> Keep-Alive timeout=15, max=93
> Connection Keep-Alive
> Content-Type text/html;charset=UTF-8
>
> - Response headers when directly using Tomcat:
>
> Server Apache-Coyote/1.1
> Pragma no-cache
> Expires Thu, 01 Jan 1970 00:00:00 GMT
> Cache-Control no-cache, no-store
> Content-Type text/html;charset=UTF-8
> Content-Language en-UK
> Transfer-Encoding chunked
> Date Mon, 02 May 2011 08:19:39 GMT
>
> The content type header is the same and specifies UTF-8 as encoding... However it appears that when using Apache / modjk / Tomcat, the reponse content is compressed using gzip. It's not the case when directly accessing Tomcat. I don't know if it could be the reason of the problem...
>
It seems unlikely that it would be the compression that causes the problem.
Content encoding is only supposed to be used during the transport from the server to the
browser. So it is applied last at the server (Apache) side, and removed first at the
browser side, before interpreting the content.
But just in case, it should be easy to disable, if even just for a test.
Under Ubuntu, you may try the command "a2dismod deflate" to disable the filter.
Or if that does not work, have a look here to modify your configuration :
http://httpd.apache.org/docs/2.2/mod/mod_deflate.html
I believe Ubuntu is similar to Debian. If so, then the setup of the mod_deflate filter
may be in a file like /etc/apache2/mods-available/deflate.conf
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by Thierry Templier <te...@yahoo.fr>.
Hi André,
Thanks very much for your help!
I checked difference between two access:
- Using Apache / modjk / Tomcat that can't display correclty non latin1 characters
- Directly using Tomcat that works fine
Except characters that don't display correctly content are the same, especially meta tags at the beginning:
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=UTF-8"></meta>
As suggested, I also have a look at request / response content and it seems that there are some different, as described below.
- Response headers when using Apache / Modjk / Tomcat:
Date Mon, 02 May 2011 08:21:16 GMT
Server Apache/2.2.14 (Ubuntu)
Pragma no-cache
Expires Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control no-cache, no-store
Content-Language en-UK
Vary Accept-Encoding
Content-Encoding gzip
Content-Length 2494
Keep-Alive timeout=15, max=93
Connection Keep-Alive
Content-Type text/html;charset=UTF-8
- Response headers when directly using Tomcat:
Server Apache-Coyote/1.1
Pragma no-cache
Expires Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control no-cache, no-store
Content-Type text/html;charset=UTF-8
Content-Language en-UK
Transfer-Encoding chunked
Date Mon, 02 May 2011 08:19:39 GMT
The content type header is the same and specifies UTF-8 as encoding... However it appears that when using Apache / modjk / Tomcat, the reponse content is compressed using gzip. It's not the case when directly accessing Tomcat. I don't know if it could be the reason of the problem...
Thierry
> Hi.
>
> I suggest to get one of the browser add-ons which allow to
> display the complete HTTP response from the webserver to the
> browser (iow the HTTP headers as well as the content).
> For Firefox, you can use for example HttpFox; for IE, there
> is Fiddler2. A quick search in Google will lead you to the
> download page.
>
> Install one of those, re-do your server request, and
> carefully compare what you get back
> a) from Tomcat directly
> b) from Apache + mod_jk + tomcat
>
> The way that a browser will display a page (in terms of
> charset) depends on 3 elements :
>
> 1) when the server sends a response, it includes a
> "Content-type" HTTP header, which in this case should be
> something like :
> Content-type: text/html; charset=UTF-8
>
> 2) any <meta> tags included inside the <head>
> portion of the html page.
> For example, a tag such as :
> <meta http-equiv="content-type" value="text/html;
> charset=UTF-8" />
>
> 3) the way in which the browser (each specific browser, and
> sometimes even version) interprets the above.
>
> According to the HTTP RFCs, the browser SHOULD NOT
> "second-guess" what the server says in terms of
> content-type. In other words, if the server says
> Content-type: something; charset=somecharset
> then the browser should blindly follow this, and not make
> its own determination.
>
> However, IE for one is notorious for not following this
> aspect of the RFCs, and constantly trying to determine by
> itself what it is receiving, often in contradiction to what
> the server says. And worse, the determination it makes
> depends on the version of IE, and sometimes even on the
> patches applied to ir or to Windows.
>
> Also,
> 3a) ultimately, it is the user who is in control. In
> the browser settings, there is a way to override the above,
> and force the browser to always display the page in a
> specific character set. It does not sound that this is
> an issue in your case, but better check anyway.
>
> But first, make sure that what you are receiving in one
> case or the other is really the same, headers and content.
> And maybe also try it with different browsers, to see if
> the result is always the same.
>
> Once you know the answer to that, then you can start
> looking for the issue in a more focused way.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: mojk and utf8 charset problem
Posted by André Warnier <aw...@ice-sa.com>.
Thierry Templier wrote:
> Hello,
>
> I developped an application that uses UTF8 encoding since it needs to display arabic characters. When directly accessing the application from Tomcat, everything works fine. When I tried to access it through Apache web server and mod jk, I have problems to display such characters. Utf8 is correctly configured within Apache web server since I can display them from static pages. So it seems the problem comes from mod jk.
>
> Is there a way to configure modjk to use utf8 encoding for http requests and responses?
>
Hi.
I suggest to get one of the browser add-ons which allow to display the complete HTTP
response from the webserver to the browser (iow the HTTP headers as well as the content).
For Firefox, you can use for example HttpFox; for IE, there is Fiddler2. A quick search in
Google will lead you to the download page.
Install one of those, re-do your server request, and carefully compare what you get back
a) from Tomcat directly
b) from Apache + mod_jk + tomcat
The way that a browser will display a page (in terms of charset) depends on 3 elements :
1) when the server sends a response, it includes a "Content-type" HTTP header, which in
this case should be something like :
Content-type: text/html; charset=UTF-8
2) any <meta> tags included inside the <head> portion of the html page.
For example, a tag such as :
<meta http-equiv="content-type" value="text/html; charset=UTF-8" />
3) the way in which the browser (each specific browser, and sometimes even version)
interprets the above.
According to the HTTP RFCs, the browser SHOULD NOT "second-guess" what the server says in
terms of content-type. In other words, if the server says
Content-type: something; charset=somecharset
then the browser should blindly follow this, and not make its own determination.
However, IE for one is notorious for not following this aspect of the RFCs, and constantly
trying to determine by itself what it is receiving, often in contradiction to what the
server says. And worse, the determination it makes depends on the version of IE, and
sometimes even on the patches applied to ir or to Windows.
Also,
3a) ultimately, it is the user who is in control. In the browser settings, there is a way
to override the above, and force the browser to always display the page in a specific
character set. It does not sound that this is an issue in your case, but better check anyway.
But first, make sure that what you are receiving in one case or the other is really the
same, headers and content.
And maybe also try it with different browsers, to see if the result is always the same.
Once you know the answer to that, then you can start looking for the issue in a more
focused way.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
RE: mojk and utf8 charset problem
Posted by Thierry Templier <te...@yahoo.fr>.
Hello Matteo,
Thanks very much for your answer but I didn't receive the end...
As suggested, I tried both addresses and the result isn't the same. When using Tomcat directly, everything works fine and when accessing through modjk, I have problem with non latin1 characters... So I think that it's a modjk / connector configuration problem.
Thierry
> From my experience modjk doesn't have
> charset configuration, only on connector into server.xml you
> can change charset configuration (URIEncodig,
> useBodyEncodingForURI) but only to parse the uri and
> parameters of the request, not for output.
>
> Did you try with the same tomcat to get pages from http
> tomcat connector (port 8080 default) and from apache (port
> 80 default).
>
> i.e: if you have a dynamic page testPage.html build by
> tomcat, try this on your browser
> http://localhost:8080/testPage.html
> http://localhost:80/testPage.html
>
> If the result is the same, connector and apache are not the
> origin.
>
> If you
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
RE: mojk and utf8 charset problem
Posted by Matteo Turra <m....@kion.it>.
From my experience modjk doesn't have charset configuration, only on connector into server.xml you can change charset configuration (URIEncodig, useBodyEncodingForURI) but only to parse the uri and parameters of the request, not for output.
Did you try with the same tomcat to get pages from http tomcat connector (port 8080 default) and from apache (port 80 default).
i.e: if you have a dynamic page testPage.html build by tomcat, try this on your browser
http://localhost:8080/testPage.html
http://localhost:80/testPage.html
If the result is the same, connector and apache are not the origin.
If you
-----Original Message-----
From: Thierry Templier [mailto:templth@yahoo.fr]
Sent: venerdì 29 aprile 2011 14:33
To: users@tomcat.apache.org
Subject: mojk and utf8 charset problem
Hello,
I developped an application that uses UTF8 encoding since it needs to display arabic characters. When directly accessing the application from Tomcat, everything works fine. When I tried to access it through Apache web server and mod jk, I have problems to display such characters. Utf8 is correctly configured within Apache web server since I can display them from static pages. So it seems the problem comes from mod jk.
Is there a way to configure modjk to use utf8 encoding for http requests and responses?
Thanks very much for your answers.
Thierry
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org