You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by David Wall <d....@computer.org> on 2011/09/01 04:41:20 UTC

Tomcat 7.0.19 character encoding issue with JSP

I'm trying to track down a character encoding issue that I've been 
having, but don't really understand. Hopefully one of you will know what 
the answer is.

I am using CKEditor to generate some user-specified HTML. CKEditor 
offers an "insert special character" function that often creates named 
HTML entities like "&yen;" but they also have a few like the solid black 
right arrow that is a UTF8 character rather than an entity spec. I then 
generate a JSP file that includes that HTML produced by CKEditor.

Initially, because I was using the Java 6 FileWriter without specifying 
a character encoding and I'd end up with a generated JSP where the HTML 
entities were fine, but the other special characters appeared as just 
'?' in the file. I changed to use FileOutputStream/OutputStreamWriter 
and specified "UTF-8" and the JSP looked good:

<%@ page contentType="text/html; charset=utf-8" session="true" 
isELIgnored="true" %>
...
<p>These have issues: ► Ŵ but these don&#39;t: &trade; &hArr; &diams; 
&aacute; &para; &yen;</p>

With the UTF8 encoding on writing the JSP, the right arrow and latin-W 
appeared in the JSP file instead of two question marks. I thought maybe 
I had won, but when I look at the .java class file that is generated by 
Tomcat, I see this instead:

out.write("<p>These have issues: â–º Å´ but these don&#39;t: &trade; 
&hArr; &diams; &aacute; &para; &yen;</p>\n");

And when I view that in a web browser, I'm back to question marks again. 
View source in the browser shows:

<p>These have issues: ? ? but these don&#39;t:&trade;  &hArr;  &diams;  &aacute;  &para;  &yen;</p>

So I figured it was the default character encoding of the JVM causing me 
some grief. I checked and the default on my Windows PC is Cp1252. But 
when I change this with the JVM argument -Dfile.encoding=UTF8, I am no 
better off. The JSP looks okay, but the .java generated looks like 
above. I did note that I could revert back to writing the JSP using 
FileWriter and it produced the correct JSP file, but the 
Tomcat-generated .java file still was wrong.

What might I need to do to ensure that the .java file created from my 
JSP can both read my JSP correctly encoded and write the .java file 
correctly encoded so that these special character appear nice. It's not 
really Tomcat that is the issue since CKEditor is running in Vaadin 
which is running in Tomcat and it looks fine there, but as soon as I run 
the generated JSP, the characters get lost and I end up with question 
marks instead.

Thanks for any ideas,
David

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 7.0.19 character encoding issue with JSP

Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David,

On 9/1/2011 3:00 PM, David Wall wrote:
> Thanks for all the tips and ideas!

If you had already read this:

http://wiki.apache.org/tomcat/FAQ/CharacterEncoding

...and it didn't help, we welcome any suggestions. Feel free to make
any editions you feel would he helpful to others.

- -chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5f7oUACgkQ9CaO5/Lv0PDBJQCePgv+5JURwn/HYwoSME1JIogn
VDwAn3Oge+yOI4MkARuUjnSACqt75mq0
=5M6K
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 7.0.19 character encoding issue with JSP

Posted by David Wall <d....@computer.org>.
You are right about the encoding of the .java file in Eclipse.  I tried 
in 'vi' and sure enough the codes are in there correctly.  Interesting 
that Eclipse opened the .jsp file and showed it nicely, but the .java 
file was not.  I couldn't do the properties, though, since these files 
are not part of my project, but I was able to drag them into Eclipse.

Anyway, I was still having the problem, but noted that my URL actually 
runs a servlet that does a RequestDispatcher.include() of my generated 
JSP page, so even though the JSP says everything was UTF-8 in the @page 
directive, apparently the response was already set to the default 
charset in the servlet itself.  So I added 
response.setCharacterEncoding("UTF-8"); to the top of my servlet's 
doGet/doPost and that seems to have resolved it.

Thanks for all the tips and ideas!

David



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 7.0.19 character encoding issue with JSP

Posted by Konstantin Kolinko <kn...@gmail.com>.
2011/9/1 David Wall <d....@computer.org>:
> Thanks for the ideas, Mark, but it's still the same undesirable result.
>
> On 9/1/2011 6:58 AM, Mark Thomas wrote:
>>
>> I suspect you need:
>> <%@ page pageEncoding="UTF-8" %>
>> at the start of your JSP.
>>
>> .java files are written using UTF-8 by default so if what you see there
>> is wrong then the original .jsp file was read with the wrong encoding.
>
> My JSP file that I write shows it correctly since changing my PrintWriter to
> set the stream to UTF-8.
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"
> session="true" isELIgnored="true" %>
> <%@ taglib uri="http://open.esignforms.com/libdocsgen/taglib" prefix="esf"
> %>
> ...
> <p>These have issues: ► Ŵ but these don&#39;t: &trade; &hArr; &diams;
> &aacute; &para; &yen;</p>
> ...
>
> But the generated .java files still shows:
>
> out.write("<p>These have issues: â–º Å´ but these don&#39;t: &trade; &hArr;

Is your editor (or whatever you use to view java file) treating the
*.java file as UTF-8 or not? It may be display issue in the editor.

If you are opening it in Eclipse:
press Alt + Enter -> Properties dialog opens -> see "Resource" page ->
Text file encoding

> &diams; &aacute; &para; &yen;</p>\n");
>
> I checked and the file mod timestamps are updated so I know these are newly
> created files when I run the generated JSP.  I noted that the compiler was
> looking for "UTF-8" (and not allowing "UTF8" or "utf-8" or other variants).
>
> My only guess now is to know what's the file encoding used when the Tomcat
> compiler (Jasper?) READS my JSP file. Is it possible that it is messed up
> reading my UTF-8 encoded JSP file, even though it then writes the .java file
> with UTF-8 also?
>
> I am running Windows 7, Tomcat 7.0.19, latest Java 6, and running this in
> Eclipse Helios Service Release 2.
>
> Any other thoughts I can try?
>

Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 7.0.19 character encoding issue with JSP

Posted by David Wall <d....@computer.org>.
Thanks for the ideas, Mark, but it's still the same undesirable result.

On 9/1/2011 6:58 AM, Mark Thomas wrote:
> I suspect you need:
> <%@ page pageEncoding="UTF-8" %>
> at the start of your JSP.
>
> .java files are written using UTF-8 by default so if what you see there
> is wrong then the original .jsp file was read with the wrong encoding.

My JSP file that I write shows it correctly since changing my 
PrintWriter to set the stream to UTF-8.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" 
session="true" isELIgnored="true" %>
<%@ taglib uri="http://open.esignforms.com/libdocsgen/taglib" 
prefix="esf" %>
...
<p>These have issues: ► Ŵ but these don&#39;t: &trade; &hArr; &diams; 
&aacute; &para; &yen;</p>
...

But the generated .java files still shows:

out.write("<p>These have issues: â–º Å´ but these don&#39;t: &trade; 
&hArr; &diams; &aacute; &para; &yen;</p>\n");

I checked and the file mod timestamps are updated so I know these are 
newly created files when I run the generated JSP.  I noted that the 
compiler was looking for "UTF-8" (and not allowing "UTF8" or "utf-8" or 
other variants).

My only guess now is to know what's the file encoding used when the 
Tomcat compiler (Jasper?) READS my JSP file. Is it possible that it is 
messed up reading my UTF-8 encoded JSP file, even though it then writes 
the .java file with UTF-8 also?

I am running Windows 7, Tomcat 7.0.19, latest Java 6, and running this 
in Eclipse Helios Service Release 2.

Any other thoughts I can try?

Thanks,
David


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 7.0.19 character encoding issue with JSP

Posted by Mark Thomas <ma...@apache.org>.
On 01/09/2011 03:41, David Wall wrote:
> I'm trying to track down a character encoding issue that I've been
> having, but don't really understand. Hopefully one of you will know what
> the answer is.

I suspect you need:
<%@ page pageEncoding="UTF-8" %>
at the start of your JSP.

.java files are written using UTF-8 by default so if what you see there
is wrong then the original .jsp file was read with the wrong encoding.

Tomcat determines the encoding based on a number of factors. For the
details see line 326 onwards of
http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/jasper/compiler/ParserController.java?view=annotate

The short version is if it is XML, it uses the encoding defined in the
XML doc, if it isn't XML it uses the value of pageEncoding. The
fall-back is ISO-8859-1.

> <%@ page contentType="text/html; charset=utf-8" session="true"
> isELIgnored="true" %>

You'll still need this to tell the browser to use UTF-8 to display the data.

> So I figured it was the default character encoding of the JVM causing me
> some grief. I checked and the default on my Windows PC is Cp1252. But
> when I change this with the JVM argument -Dfile.encoding=UTF8, I am no
> better off.

That is expected. file.encoding is not always read/write and should
never be relied upon to fix any encoding problems.

HTH,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org