You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Joseph S <jk...@selectacast.net> on 2007/08/15 04:24:28 UTC

utf-8 encoding problem

My problem is this:

One of my pages with an apostrophe was not displaying properly, so I 
added to my jsp:

<%@ page contentType="text/html; charset=UTF-8"%>

When I did that my content displayed correctly, but on form submission 
it got corrupted.

You can view the problem here:

http://b.tupari.net/

One page displays correctly, but on submit the value gets mangled.  The 
other page doesn't display correctly, but if you cut and paste into the 
form from the first page the apostrophe does come out correctly on submit.

This happens in both firefox and konqueror.  So who is to blame here? 
The web browsers?  Tomcat?  Apache?

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Ronald Klop <ro...@base.nl>.
Most browsers will encode the request the same as the page it came from. This is true for POST variables. I'm not sure about GET query variables.

In the past I found some websites explaining this hidden feature, but don't have the time to search again.

Ronald.

On Thu Aug 16 20:25:18 CEST 2007 Tomcat Users List <us...@tomcat.apache.org> wrote:
> Mark Thomas wrote:
> 
> > request.setCharacterEncoding("UTF-8");
> 
> Is this always safe? For responses I can (and do) check the 
> accept-charset request paramater, but I can't figure out how to tell 
> what the request encoding should be.
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 

Re: utf-8 encoding problem

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Joe,

Joseph S wrote:
> Christopher Schultz wrote:
> 
>> Setting the encoding of the response is sometimes necessary when the
>> browser (stupidly, IMO) elects not to send the charset being used to the
>> server.
>>
> It isn't the browser's fault, its the spec's fault. See
> https://bugzilla.mozilla.org/show_bug.cgi?id=289060#c8

Certainly, the specification doesn't help in this regard. I'm
disappointed that things like this never get fixed in specifications.
This question comes up all the time, and the solution is almost always
to simply pick a charset and use it all the time, without question. But
that's messy, and doesn't allow the client to make any choices about
character encoding, etc. :(

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxbRM9CaO5/Lv0PARAuqDAJ9rbnlgMeJe5NjCLyWzj1S53EAxHgCdExsx
CYVYrMDRFMhDpxUoXMFRpPg=
=lW9w
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Joseph S <jk...@selectacast.net>.

Christopher Schultz wrote:

> Setting the encoding of the response is sometimes necessary when the
> browser (stupidly, IMO) elects not to send the charset being used to the
> server.
> 
It isn't the browser's fault, its the spec's fault. See 
https://bugzilla.mozilla.org/show_bug.cgi?id=289060#c8

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mark and Joe,

Mark Thomas wrote:
> Joseph Shraibman wrote:
>> Mark Thomas wrote:
>>
>>>        request.setCharacterEncoding("UTF-8");
>>
>> Is this always safe?  For responses I can (and do) check the
>> accept-charset request [header], but I can't figure out how to tell
>> what the request encoding should be.

Don't forget that Accept-Charset has nothing to do with the request:
it's all about the list of charsets that are acceptable for the
/response/ to the current request.

Setting the encoding of the response is sometimes necessary when the
browser (stupidly, IMO) elects not to send the charset being used to the
server.

- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxab+9CaO5/Lv0PARAhAbAJ0XIzeqDmgiKPqMhQLNSdkJJpgomACfTnZa
ZK1KZN1hgbzoPmUdFWnI29o=
=4CGT
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Mark Thomas <ma...@apache.org>.
Joseph Shraibman wrote:
> Mark Thomas wrote:
> 
>>        request.setCharacterEncoding("UTF-8");
> 
> Is this always safe?  For responses I can (and do) check the
> accept-charset request paramater, but I can't figure out how to tell
> what the request encoding should be.

It should be reasonable unless the user goes out of their way to do
soemthing different. In that case they deserve whatever they get.

Mark


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Mark Thomas <ma...@apache.org>.
Joseph Shraibman wrote:
> This is an old problem.  See
> https://bugzilla.mozilla.org/show_bug.cgi?id=18643
> https://bugzilla.mozilla.org/show_bug.cgi?id=7533
> 
> Firefox and MSIE use a magic _charset_ paramater, but I can't use it
> because if I call request.getParamater("_charset_") I can't set the
> encoding after that!
> 
> Anyway it seems firefox (and I assume IE) submit the form in whatever
> the page encoding was, so for all forms I serve up myself I'll just send
> the endong to UTF-8 and I'll assume it will come back as UTF-8
> 
> Does Tomcat know anything about _charset_ ?
No.

Mark


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Joseph Shraibman <jo...@xtenit.com>.
This is an old problem.  See 
https://bugzilla.mozilla.org/show_bug.cgi?id=18643
https://bugzilla.mozilla.org/show_bug.cgi?id=7533

Firefox and MSIE use a magic _charset_ paramater, but I can't use it 
because if I call request.getParamater("_charset_") I can't set the 
encoding after that!

Anyway it seems firefox (and I assume IE) submit the form in whatever 
the page encoding was, so for all forms I serve up myself I'll just send 
the endong to UTF-8 and I'll assume it will come back as UTF-8

Does Tomcat know anything about _charset_ ?

Joseph Shraibman wrote:
> Mark Thomas wrote:
> 
>>        request.setCharacterEncoding("UTF-8");
> 
> Is this always safe?  For responses I can (and do) check the 
> accept-charset request paramater, but I can't figure out how to tell 
> what the request encoding should be.
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Joseph Shraibman <jo...@xtenit.com>.
Mark Thomas wrote:

>        request.setCharacterEncoding("UTF-8");

Is this always safe?  For responses I can (and do) check the 
accept-charset request paramater, but I can't figure out how to tell 
what the request encoding should be.

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Mark Thomas <ma...@apache.org>.
Try this then - this is my standard character encoding index.jsp test.

<%@ page contentType="text/html; charset=UTF-8" %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
   <head>
     <title>Character encoding test page</title>
   </head>
   <body>
     <p>Data posted to this form was:
     <%
       request.setCharacterEncoding("UTF-8");
       out.print(request.getParameter("mydata"));
     %>

     </p>
     <form method="post" action="index.jsp">
       <input type="text" name="mydata">
       <input type="submit" value="Submit" />
       <input type="reset" value="Reset" />
     </form>
   </body>
</html>

To get the above working with GET, you'll need to make sure
URIEncoding="UTF-8" has been set on the connector as Nathan pointed
out earlier.

Mark

Joseph S wrote:
> POST
> 
> Mark Thomas wrote:
>> Joseph S wrote:
>>> When I did that my content displayed correctly, but on form submission
>>> it got corrupted.
>>
>> POST or GET?
>>
>> Mark
>>
>>
>> ---------------------------------------------------------------------
>> To start a new topic, e-mail: users@tomcat.apache.org
>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Joseph S <jk...@selectacast.net>.
POST

Mark Thomas wrote:
> Joseph S wrote:
>> When I did that my content displayed correctly, but on form submission
>> it got corrupted.
> 
> POST or GET?
> 
> Mark
> 
> 
> ---------------------------------------------------------------------
> To start a new topic, e-mail: users@tomcat.apache.org
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Mark Thomas <ma...@apache.org>.
Joseph S wrote:
> When I did that my content displayed correctly, but on form submission
> it got corrupted.

POST or GET?

Mark


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: utf-8 encoding problem

Posted by Joseph Shraibman <jk...@selectacast.net>.

Nathan Hook wrote:
> A few things...
> 
> First, what type of apostrophe are you using?  Are you using a typical 
> ascii apostrophe (') or are you using the Microsoft slanted apostrophe 
> that comes out of word documents (&#8242;)?
> 
It's &#8217;

> Here are two links that describe the problem:
> 
> http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
> http://www.cs.tut.fi/~jkorpela/chars.html#win

That basically says that some windows chars doesn't display properly. 
That isn't my problem.  It displays properly when I set the char 
encoding to utf-8.  My question is why doesn't it submit properly if the 
original page was sent utf-8 but does submit properly if the original 
page ISO-8859-1?
> 
> If you're using mod_jk make sure that the ajp connector is set up to 
> encode using utf-8 like so:
> 
> <Connector port="8009" enableLookups="false" redirectPort="8443" 
> protocol="AJP/1.3" URIEncoding="UTF-8" />
> 
> 
> Next, make sure that the request AND response have been set to use utf 
> encoding. 

Aren't all requests submitted as application/x-www-form-urlencoded which 
is an encoded form of unicode?


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: utf-8 encoding problem

Posted by Nathan Hook <ho...@hotmail.com>.
A few things...

First, what type of apostrophe are you using?  Are you using a typical ascii 
apostrophe (') or are you using the Microsoft slanted apostrophe that comes 
out of word documents (&#8242;)?

Here are two links that describe the problem:

http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
http://www.cs.tut.fi/~jkorpela/chars.html#win

Now after reading that you're still having issues, then here is what needs 
to be done to get utf-8 encoding to work.

If you're using mod_jk make sure that the ajp connector is set up to encode 
using utf-8 like so:

<Connector port="8009" enableLookups="false" redirectPort="8443" 
protocol="AJP/1.3" URIEncoding="UTF-8" />


Next, make sure that the request AND response have been set to use utf 
encoding.  The request MUST have its character encoding set BEFORE any 
request parameters are requested or the request will default to the machines 
character encoding.

public class ContentTypeFilter implements Filter
{
  private static org.apache.log4j.Logger log = 
org.apache.log4j.Logger.getLogger("tracking");

  public void init(FilterConfig config)
  {
  }

  public void destroy()
  {
  }

  public void doFilter(ServletRequest request, ServletResponse response, 
FilterChain filterChain) throws IOException, ServletException
  {
     request = (HttpServletRequest)request;
     request.setCharacterEncoding("UTF-8");

     response.setCharacterEncoding("UTF-8");
     response.setContentType("text/html;charset=UTF-8");

     filterChain.doFilter(request, response);
  }
}

Finally, I would also set the meta header on the jsp page to be utf-8 just 
to be complete...

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >

Regards...

----Original Message Follows----
From: Joseph S <jk...@selectacast.net>
Reply-To: "Tomcat Users List" <us...@tomcat.apache.org>
To: Tomcat Users List <us...@tomcat.apache.org>
Subject: utf-8 encoding problem
Date: Tue, 14 Aug 2007 22:24:28 -0400

My problem is this:

One of my pages with an apostrophe was not displaying properly, so I added 
to my jsp:

<%@ page contentType="text/html; charset=UTF-8"%>

When I did that my content displayed correctly, but on form submission it 
got corrupted.

You can view the problem here:

http://b.tupari.net/

One page displays correctly, but on submit the value gets mangled.  The 
other page doesn't display correctly, but if you cut and paste into the form 
from the first page the apostrophe does come out correctly on submit.

This happens in both firefox and konqueror.  So who is to blame here? The 
web browsers?  Tomcat?  Apache?

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org

_________________________________________________________________
Tease your brain--play Clink! Win cool prizes! 
http://club.live.com/clink.aspx?icid=clink_hotmailtextlink2


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org