You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Rick <to...@vidyah.com> on 2004/09/01 04:44:09 UTC

UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )

Since 5.0.27, pretty much all of my UTF-8 i8 code seems to be messed up. 

The problem seems to have been caused by whatever fix was created for issue
--------------------------
ServletResponse.setContentType sets response encoding after getWriter was
called (Bugtraq 5062838) (luehe) 
--------------------------

Now it seems almost impossible to properly set the encoding type of some of
my JSPs and all of my Servlets that return UTF-8 XML data.

As an example, my login page allows the user to switch to Japanese text.
Text data is read with a ResourceBundle, which reads from a UTF-8 encoded
.properties file.

If the encoding of the .jsp page itself is in ASCII, then I can't get the
characters to show up at all any more.
I have to save the .jsp page as UTF-8.  
Added "set JAVA_OPTS=-Dfile.encoding=UTF-8" to my catalina.bat file

Then, If I try to set a character set in my page header, it messes up.

This works in some cases...
<%@ page language="java" import="java.util.*" contentType="text/html" %>
response.getCharacterEncoding() = "ISO-8859-1"

The really scary part is that with no meta or charset actually set, that the
browser(IE) correctly changes to UTF-8 and displays the content fine.   But
if I change the actual file encoding of the .jsp page from UTF-8 back to
ASCII. Then IE does not change to UTF-8 and the page is messed up again.
Why does the actual encoding of the .jsp file itself dictate the response
sent to the client?    

It appears that the actual encoding of the source file someone how gets past
along and then I'm unable to alter the character encoding, and if I try, it
just causes everything to go to hell.


This use to work before 5.0.27, but now doesn't, even though all data and
pages are encoded in UTF-8.
<%@ page language="java" import="java.util.*" contentType="text/html;
charset=UTF-8" %>
response.getCharacterEncoding() = "UTF-8"


Before 5.0.27, all I had to do to get my output in UTF-8 was ...
 contentType="text/html; charset=UTF-8"

Now I have to mess with the actual .jsp file page encodings and still can't
get most to work properly as well as none of my servlets will return correct
UTF-8 data.  

I have tried setting "pageEncoding" in the page tag as well with no luck.


Thanks for anyone's insight or help on this, its never fun to find out that
something that had been working quite solid , up and blows up for no good
reason.

Current dev machine is on windows xp by the way, vanilla install of Tomcat
5.0.28.
I will be setting this up on a Linux box for more testing shortly.


---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org


RE: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )

Posted by Mark Thomas <ma...@apache.org>.
The change (which is required by the spec) is that if the character set has not
been set before a call to getWriter() then it will default to ISO-8859-1. There
was some discussion on the tomcat-dev list about this (see
http://marc.theaimsgroup.com/?l=tomcat-dev&m=109104739719572&w=2)

I'll try and put together a very simple JSP test case and get back to you.

Mark

> -----Original Message-----
> From: Rick [mailto:tomcatdev@vidyah.com] 
> Sent: Wednesday, September 01, 2004 3:44 AM
> To: 'Tomcat Users List'; tomcat-dev@jakarta.apache.org
> Subject: UTF-8 Encoding Issue Since 5.0.27 ( gun in my mouth )
> 
> Since 5.0.27, pretty much all of my UTF-8 i8 code seems to be 
> messed up. 
> 
> The problem seems to have been caused by whatever fix was 
> created for issue
> --------------------------
> ServletResponse.setContentType sets response encoding after 
> getWriter was
> called (Bugtraq 5062838) (luehe) 
> --------------------------
> 
> Now it seems almost impossible to properly set the encoding 
> type of some of
> my JSPs and all of my Servlets that return UTF-8 XML data.
> 
> As an example, my login page allows the user to switch to 
> Japanese text.
> Text data is read with a ResourceBundle, which reads from a 
> UTF-8 encoded
> .properties file.
> 
> If the encoding of the .jsp page itself is in ASCII, then I 
> can't get the
> characters to show up at all any more.
> I have to save the .jsp page as UTF-8.  
> Added "set JAVA_OPTS=-Dfile.encoding=UTF-8" to my catalina.bat file
> 
> Then, If I try to set a character set in my page header, it messes up.
> 
> This works in some cases...
> <%@ page language="java" import="java.util.*" 
> contentType="text/html" %>
> response.getCharacterEncoding() = "ISO-8859-1"
> 
> The really scary part is that with no meta or charset 
> actually set, that the
> browser(IE) correctly changes to UTF-8 and displays the 
> content fine.   But
> if I change the actual file encoding of the .jsp page from 
> UTF-8 back to
> ASCII. Then IE does not change to UTF-8 and the page is 
> messed up again.
> Why does the actual encoding of the .jsp file itself dictate 
> the response
> sent to the client?    
> 
> It appears that the actual encoding of the source file 
> someone how gets past
> along and then I'm unable to alter the character encoding, 
> and if I try, it
> just causes everything to go to hell.
> 
> 
> This use to work before 5.0.27, but now doesn't, even though 
> all data and
> pages are encoded in UTF-8.
> <%@ page language="java" import="java.util.*" contentType="text/html;
> charset=UTF-8" %>
> response.getCharacterEncoding() = "UTF-8"
> 
> 
> Before 5.0.27, all I had to do to get my output in UTF-8 was ...
>  contentType="text/html; charset=UTF-8"
> 
> Now I have to mess with the actual .jsp file page encodings 
> and still can't
> get most to work properly as well as none of my servlets will 
> return correct
> UTF-8 data.  
> 
> I have tried setting "pageEncoding" in the page tag as well 
> with no luck.
> 
> 
> Thanks for anyone's insight or help on this, its never fun to 
> find out that
> something that had been working quite solid , up and blows up 
> for no good
> reason.
> 
> Current dev machine is on windows xp by the way, vanilla 
> install of Tomcat
> 5.0.28.
> I will be setting this up on a Linux box for more testing shortly.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-user-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: tomcat-user-help@jakarta.apache.org