You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Joe Russo <ru...@cadmus.com> on 2007/07/25 17:48:44 UTC

Tomcat5.0.28 character encodingg problem

I am getting the following error in the display of the JSP.  To give a
little history, this application I am supporting, at the time the
developers thought they needed to encode the characters to UTF-8 into
our Oracle DB.  The developers were unaware they could have allowed the
DB Driver convert it for us.  Therefore, we double encode going into and
out of the database.  Really stupid in hindsight.  Trying to clean the
database up is another project we face.  

I am in the process of converting from using JRUN to Tomcat and I have
ran into the problem where these funky symbols are displaying.  I can
not find any stack traces that would explain or possibly clue into a
solution.  

My questions are:  
Does Tomcat have problems with any types of encoding?      
What type of characters are being displayed below and any advice in
troubleshooting or solving this would be gratefully appreciated.


comments to our revised manuscript entitled “Interleukin-4 Cytotoxin
Therapy Synergizes with Gemcitabine in a Mouse Model of Pancreatic
Ductal Adenocarcinoma”. We agree with th

Joe

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat5.0.28 character encodingg problem

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joe,

Joe Russo wrote:
> I am in the process of converting from using JRUN to Tomcat

Good for you! Welcome to the community.

> I have
> ran into the problem where these funky symbols are displaying.  I can
> not find any stack traces that would explain or possibly clue into a
> solution.  

Right. These things (encoding problems) hardly ever generate errors;
they just exhibit unexpected behavior.

> My questions are:  
> Does Tomcat have problems with any types of encoding?      

Yes and no. Tomcat behaves exactly as the HTTP specification mandates.
That is, it interprets all incoming data using the ISO-8859-1 character
encoding unless the request states otherwise (in the Content-Type
header). Some browsers don't send the encoding along with the
Content-Type, so the behavior gets confused.

Some browsers only send an encoding when there is POST data, since the
Content-Type only really makes sense when where is request content (the
POST data). Unfortunately, the browser usually uses (what would have
been) the Content-Type of a request to encode the URL in the request.
So, if a browser uses UTF-8 to encode the URL (which is typical these
days), but doesn't send a Content-Type header (or leaves out the
encoding), then Tomcat interprets it incorrectly as ISO-8859-1, and you
get funny characters.

It's not Tomcat's fault. It's actually not the browser's fault, either.
It's actually the HTTP spec's fault, since the character encoding used
in URLs isn't explicitly laid out. :(

> What type of characters are being displayed below and any advice in
> troubleshooting or solving this would be gratefully appreciated.

The presence of the 'â' character looks to me like a UTF-8 URL being
interpreted as an ISO-8859-1 URL. Try searching google for
CharacterEncodingFilter and take a look at that. It tries to recover
from requests that don't include a character encoding. You should also
look at the "URIEncoding" attribute of the <Connector> element. You can
set the encoding to something other than the default (ISO-8859-1).

For more information, see:

http://tomcat.apache.org/faq/misc.html#tomcat5CharEncoding
http://tomcat.apache.org/faq/connectors.html#utf8
http://tomcat.apache.org/tomcat-5.0-doc/config/ajp.html (if you use JK)
http://tomcat.apache.org/tomcat-5.0-doc/config/http.html (if you don't)

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp3Wi9CaO5/Lv0PARAm9kAJ0Sb2P15mo+x5IUQZBiP1laJKCI3gCdFcO3
W0t6lz0jMzyvRsPK3BTBaXE=
=uAOC
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Tomcat5.0.28 character encodingg problem

Posted by Aleksey Dayen <ad...@urbitran.com>.
thanks chris,
our client doesn't want to upgrade to tomecat 5.5 - that's our problem.

Aleksey


________________________________________
From: Christopher Schultz [chris@christopherschultz.net]
Sent: Wednesday, July 25, 2007 5:55 PM
To: Tomcat Users List
Subject: Re: Tomcat5.0.28 character encodingg problem

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aleksey,

Aleksey Dayen wrote:
> Right now we using TomCat 4.1 with JDK 1.1.

Wow, really? I would think that JDK 1.2 would be the bare minimum for
Tomcat 4.1.

> Would our TomCat version work if we update to J2SDK 5.0 with Update 6?

Maybe. IIRC, Tomcat 4.1 has issues with a 1.5 (5.0) JDK due to the XML
parsers. You might want to search the archives for tomcat 4.1 and java 1.5.

I would recommend that you upgrade to at least Tomcat 5.5 if you have
the time to test (which you should, since you will be changing Java
versions).

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp8bL9CaO5/Lv0PARAkKkAKCEuH1E8mAuUi8G1pDwPXQi0lHeNgCgvad3
TKlhSKWK/YdDsq2RiZUe4Vc=
=rbh3
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat5.0.28 character encodingg problem

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aleksey,

Aleksey Dayen wrote:
> Right now we using TomCat 4.1 with JDK 1.1.

Wow, really? I would think that JDK 1.2 would be the bare minimum for
Tomcat 4.1.

> Would our TomCat version work if we update to J2SDK 5.0 with Update 6?

Maybe. IIRC, Tomcat 4.1 has issues with a 1.5 (5.0) JDK due to the XML
parsers. You might want to search the archives for tomcat 4.1 and java 1.5.

I would recommend that you upgrade to at least Tomcat 5.5 if you have
the time to test (which you should, since you will be changing Java
versions).

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp8bL9CaO5/Lv0PARAkKkAKCEuH1E8mAuUi8G1pDwPXQi0lHeNgCgvad3
TKlhSKWK/YdDsq2RiZUe4Vc=
=rbh3
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Tomcat5.0.28 character encodingg problem

Posted by Aleksey Dayen <ad...@urbitran.com>.
Right now we using TomCat 4.1 with JDK 1.1.

Would our TomCat version work if we update to J2SDK 5.0 with Update 6?



Please help!!!



Many thanks


-----Original Message-----
From: Nathan Hook [mailto:hooknc@hotmail.com]
Sent: Wednesday, July 25, 2007 4:55 PM
To: users@tomcat.apache.org
Subject: Re: Tomcat5.0.28 character encodingg problem

I think my statement of "most browsers will ignore this value" might have
been a tad bit on the excessive side upon further reading and i was hoping
to have that read as don't only rely on using the meta tag.

>From my understanding there are broswers where they can set their own
content type no matter what comes down in the response.  Even if the content
type is set in the header or with a meta tag.  I'm not saying this is a good
thing.  I'm just saying that I read this somewhere in passing when we were
trying to figure out how to character encoding.  Some browsers just won't do
what you're ask them to.

However, setting the content type in both the header and in the meta tag
isn't a bad thing.  It can only really help you.  Its much like when you're
asking the client and all the intermediary servers not to cache any of your
pages.

Setting a few of the cache-control values should work, but its best to send
all the values you can to make sure that you're communicating to everyone
you can.

httpResponse.addHeader("Cache-Control", "no-chache, no-store,
must-revalidate, max-age=0, proxy-revalidate, no-transform, pre-check=0,
post-check=0, private");

Even in the case above.  If a intermediary server wants to cache your
data... its going to cache your data.

I guess what I'm really trying to say is if there are many ways of telling a
browser how to handle something, implement everyone of those ways.  Because
you aren't guaranteed that any specific was is going to work on all
browsers.

Warm Regards.



----Original Message Follows----
From: Christopher Schultz <ch...@christopherschultz.net>
To: Tomcat Users List <us...@tomcat.apache.org>

Nathan,

Nathan Hook wrote:
 > - Set the meta type in each and every jsp to be utf-8.  Now, most
 > browsers will ignore this value from my understanding, but it shouldn't
 > hurt to add it.

Really? The HTTP header should override any META tag, but the META tag
should be used if, for some reason, there is no Content-Type header.

- -chris

_________________________________________________________________
http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat4.1

Posted by Hassan Schroeder <ha...@gmail.com>.
On 7/25/07, Aleksey Dayen <ad...@urbitran.com> wrote:
> Right now we using TomCat 4.1 with JDK 1.1.
>
> Would our TomCat version work if we update to J2SDK 5.0 with Update 6?

1. Hijacking threads, especially without even removing the previous
    content, is rude.

2. Why ask about "Update 6" when that release is on Update 12? And
     a whole release behind, to boot?

3. How freakin' tough is it to *just try it*?? No matter what anyone on
    this list says, you're going to have to test.

Personally I'd be a lot more concerned about apps breaking than
Tomcat, but whatever -- /you/ have to do the testing.

FWIW,
-- 
Hassan Schroeder ------------------------ hassan.schroeder@gmail.com

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Tomcat4.1

Posted by Aleksey Dayen <ad...@urbitran.com>.
Right now we using TomCat 4.1 with JDK 1.1.

Would our TomCat version work if we update to J2SDK 5.0 with Update 6?



Please help!!!



Many thanks


-----Original Message-----
From: Nathan Hook [mailto:hooknc@hotmail.com]
Sent: Wednesday, July 25, 2007 4:55 PM
To: users@tomcat.apache.org
Subject: Re: Tomcat5.0.28 character encodingg problem

I think my statement of "most browsers will ignore this value" might have
been a tad bit on the excessive side upon further reading and i was hoping
to have that read as don't only rely on using the meta tag.

>From my understanding there are broswers where they can set their own
content type no matter what comes down in the response.  Even if the content
type is set in the header or with a meta tag.  I'm not saying this is a good
thing.  I'm just saying that I read this somewhere in passing when we were
trying to figure out how to character encoding.  Some browsers just won't do
what you're ask them to.

However, setting the content type in both the header and in the meta tag
isn't a bad thing.  It can only really help you.  Its much like when you're
asking the client and all the intermediary servers not to cache any of your
pages.

Setting a few of the cache-control values should work, but its best to send
all the values you can to make sure that you're communicating to everyone
you can.

httpResponse.addHeader("Cache-Control", "no-chache, no-store,
must-revalidate, max-age=0, proxy-revalidate, no-transform, pre-check=0,
post-check=0, private");

Even in the case above.  If a intermediary server wants to cache your
data... its going to cache your data.

I guess what I'm really trying to say is if there are many ways of telling a
browser how to handle something, implement everyone of those ways.  Because
you aren't guaranteed that any specific was is going to work on all
browsers.

Warm Regards.



----Original Message Follows----
From: Christopher Schultz <ch...@christopherschultz.net>
To: Tomcat Users List <us...@tomcat.apache.org>

Nathan,

Nathan Hook wrote:
 > - Set the meta type in each and every jsp to be utf-8.  Now, most
 > browsers will ignore this value from my understanding, but it shouldn't
 > hurt to add it.

Really? The HTTP header should override any META tag, but the META tag
should be used if, for some reason, there is no Content-Type header.

- -chris

_________________________________________________________________
http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat5.0.28 character encodingg problem

Posted by Nathan Hook <ho...@hotmail.com>.
I think my statement of "most browsers will ignore this value" might have 
been a tad bit on the excessive side upon further reading and i was hoping 
to have that read as don't only rely on using the meta tag.

>From my understanding there are broswers where they can set their own 
content type no matter what comes down in the response.  Even if the content 
type is set in the header or with a meta tag.  I'm not saying this is a good 
thing.  I'm just saying that I read this somewhere in passing when we were 
trying to figure out how to character encoding.  Some browsers just won't do 
what you're ask them to.

However, setting the content type in both the header and in the meta tag 
isn't a bad thing.  It can only really help you.  Its much like when you're 
asking the client and all the intermediary servers not to cache any of your 
pages.

Setting a few of the cache-control values should work, but its best to send 
all the values you can to make sure that you're communicating to everyone 
you can.

httpResponse.addHeader("Cache-Control", "no-chache, no-store, 
must-revalidate, max-age=0, proxy-revalidate, no-transform, pre-check=0, 
post-check=0, private");

Even in the case above.  If a intermediary server wants to cache your 
data... its going to cache your data.

I guess what I'm really trying to say is if there are many ways of telling a 
browser how to handle something, implement everyone of those ways.  Because 
you aren't guaranteed that any specific was is going to work on all 
browsers.

Warm Regards.



----Original Message Follows----
From: Christopher Schultz <ch...@christopherschultz.net>
To: Tomcat Users List <us...@tomcat.apache.org>

Nathan,

Nathan Hook wrote:
 > - Set the meta type in each and every jsp to be utf-8.  Now, most
 > browsers will ignore this value from my understanding, but it shouldn't
 > hurt to add it.

Really? The HTTP header should override any META tag, but the META tag
should be used if, for some reason, there is no Content-Type header.

- -chris

_________________________________________________________________
http://im.live.com/messenger/im/home/?source=hmtextlinkjuly07


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat5.0.28 character encodingg problem

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nathan,

Nathan Hook wrote:
> - Set the meta type in each and every jsp to be utf-8.  Now, most
> browsers will ignore this value from my understanding, but it shouldn't
> hurt to add it.

Really? The HTTP header should override any META tag, but the META tag
should be used if, for some reason, there is no Content-Type header.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGp6l89CaO5/Lv0PARAlEoAJ9Zmnrjir6nE7ikDJWKYcVXdSlbPgCfauaV
9yDlECmoja1DIz1Pkgsc+kY=
=lJh4
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat5.0.28 character encodingg problem

Posted by Nathan Hook <ho...@hotmail.com>.
Both Chris and Tim are giving great advice.  We're actually just trying to 
internationalize our application for our next major release.

Here are the things we've learned.

- You have to change the URIEncoding on your Tomcat Connector in your 
server.xml (as Tim pointed out).

We are using mod_jk and had to change our entry in the server.xml to the 
following:

<Connector port="8009"
           enableLookups="false" redirectPort="8443" protocol="AJP/1.3" 
URIEncoding="UTF-8" />


- On every request that comes into your tomcat server you have to check the 
character encoding of your request and your response BEFORE any work is 
actually done.

So like Chris mentioned you want to look up a character encoding filter.  I 
would recommend placing that as the very first filter that gets called in 
your application.  To do make this filter first in the filter chain is 
simple.  When adding your filter to your applications web.xml file, make 
sure is the first one listed in the filter mappings section.

Here is the Filter we are currently using for testing.

public class ContentTypeFilter implements Filter {
  public void init(FilterConfig config) {}
  public void destroy() {}
  public void doFilter(ServletRequest request, ServletResponse response, 
FilterChain filterChain) throws IOException, ServletException
  {
     // I've seen some other classes that check to see if the character 
encoding is null and then set
     // the character encoding to utf-8.  I'm not sure which is best at this 
time.  My guess is doing
     // the null checks because from my understanding the client can change 
the page encoding on
     // each and every request even though the server sets the page up to be 
utf-8.
     request = (HttpServletRequest)request;
     request.setCharacterEncoding("UTF-8");

     // Make sure to set the character encoding on the response early 
because once something is
     // sent back to the client (like a jsp), then the character encoding is 
already set to the default
     // of the server.
     response.setCharacterEncoding("UTF-8");
     // Set the content type in the header of the response.
     response.setContentType("text/html;charset=UTF-8");

     filterChain.doFilter(request, response);
  }
}


- Set the meta type in each and every jsp to be utf-8.  Now, most browsers 
will ignore this value from my understanding, but it shouldn't hurt to add 
it.

<head>
  <title>test title</title>
  <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
</head>


- Finally for database storage...  Again from my understanding you will need 
to set all your tables to utf-8 and then inform your JDBC Driver that you 
want to pass everything back and forth using utf-8.

In mysql you add the following to your jdbc url connection string:
useUnicode=true
characterEncoding=UTF-8


I hope all that information helps.


----Original Message Follows----
From: Tim Funk <fu...@joedog.org>
Reply-To: "Tomcat Users List" <us...@tomcat.apache.org>
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: Re: Tomcat5.0.28 character encodingg problem
Date: Wed, 25 Jul 2007 12:09:07 -0400

http://tomcat.apache.org/faq/misc.html#utf8

And you should first start with in server.xml:
     <Connector ... URIEncoding="UTF-8" .../>

-Tim

Joe Russo wrote:
>I am getting the following error in the display of the JSP.  To give a
>little history, this application I am supporting, at the time the
>developers thought they needed to encode the characters to UTF-8 into
>our Oracle DB.  The developers were unaware they could have allowed the
>DB Driver convert it for us.  Therefore, we double encode going into and
>out of the database.  Really stupid in hindsight.  Trying to clean the
>database up is another project we face.
>
>I am in the process of converting from using JRUN to Tomcat and I have
>ran into the problem where these funky symbols are displaying.  I can
>not find any stack traces that would explain or possibly clue into a
>solution.
>
>My questions are:  Does Tomcat have problems with any types of encoding?    
>   What type of characters are being displayed below and any advice in
>troubleshooting or solving this would be gratefully appreciated.
>

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

_________________________________________________________________
http://newlivehotmail.com


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat5.0.28 character encodingg problem

Posted by Tim Funk <fu...@joedog.org>.
http://tomcat.apache.org/faq/misc.html#utf8

And you should first start with in server.xml:
     <Connector ... URIEncoding="UTF-8" .../>

-Tim

Joe Russo wrote:
> I am getting the following error in the display of the JSP.  To give a
> little history, this application I am supporting, at the time the
> developers thought they needed to encode the characters to UTF-8 into
> our Oracle DB.  The developers were unaware they could have allowed the
> DB Driver convert it for us.  Therefore, we double encode going into and
> out of the database.  Really stupid in hindsight.  Trying to clean the
> database up is another project we face.  
> 
> I am in the process of converting from using JRUN to Tomcat and I have
> ran into the problem where these funky symbols are displaying.  I can
> not find any stack traces that would explain or possibly clue into a
> solution.  
> 
> My questions are:  
> Does Tomcat have problems with any types of encoding?      
> What type of characters are being displayed below and any advice in
> troubleshooting or solving this would be gratefully appreciated.
> 

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org