You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by David Wall <d....@computer.org> on 2011/09/01 04:41:20 UTC
Tomcat 7.0.19 character encoding issue with JSP
I'm trying to track down a character encoding issue that I've been
having, but don't really understand. Hopefully one of you will know what
the answer is.
I am using CKEditor to generate some user-specified HTML. CKEditor
offers an "insert special character" function that often creates named
HTML entities like "¥" but they also have a few like the solid black
right arrow that is a UTF8 character rather than an entity spec. I then
generate a JSP file that includes that HTML produced by CKEditor.
Initially, because I was using the Java 6 FileWriter without specifying
a character encoding and I'd end up with a generated JSP where the HTML
entities were fine, but the other special characters appeared as just
'?' in the file. I changed to use FileOutputStream/OutputStreamWriter
and specified "UTF-8" and the JSP looked good:
<%@ page contentType="text/html; charset=utf-8" session="true"
isELIgnored="true" %>
...
<p>These have issues: ► Ŵ but these don't: ™ ⇔ ♦
á ¶ ¥</p>
With the UTF8 encoding on writing the JSP, the right arrow and latin-W
appeared in the JSP file instead of two question marks. I thought maybe
I had won, but when I look at the .java class file that is generated by
Tomcat, I see this instead:
out.write("<p>These have issues: â–º Å´ but these don't: ™
⇔ ♦ á ¶ ¥</p>\n");
And when I view that in a web browser, I'm back to question marks again.
View source in the browser shows:
<p>These have issues: ? ? but these don't:™ ⇔ ♦ á ¶ ¥</p>
So I figured it was the default character encoding of the JVM causing me
some grief. I checked and the default on my Windows PC is Cp1252. But
when I change this with the JVM argument -Dfile.encoding=UTF8, I am no
better off. The JSP looks okay, but the .java generated looks like
above. I did note that I could revert back to writing the JSP using
FileWriter and it produced the correct JSP file, but the
Tomcat-generated .java file still was wrong.
What might I need to do to ensure that the .java file created from my
JSP can both read my JSP correctly encoded and write the .java file
correctly encoded so that these special character appear nice. It's not
really Tomcat that is the issue since CKEditor is running in Vaadin
which is running in Tomcat and it looks fine there, but as soon as I run
the generated JSP, the characters get lost and I end up with question
marks instead.
Thanks for any ideas,
David
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Tomcat 7.0.19 character encoding issue with JSP
Posted by Christopher Schultz <ch...@christopherschultz.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
David,
On 9/1/2011 3:00 PM, David Wall wrote:
> Thanks for all the tips and ideas!
If you had already read this:
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding
...and it didn't help, we welcome any suggestions. Feel free to make
any editions you feel would he helpful to others.
- -chris
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk5f7oUACgkQ9CaO5/Lv0PDBJQCePgv+5JURwn/HYwoSME1JIogn
VDwAn3Oge+yOI4MkARuUjnSACqt75mq0
=5M6K
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Tomcat 7.0.19 character encoding issue with JSP
Posted by David Wall <d....@computer.org>.
You are right about the encoding of the .java file in Eclipse. I tried
in 'vi' and sure enough the codes are in there correctly. Interesting
that Eclipse opened the .jsp file and showed it nicely, but the .java
file was not. I couldn't do the properties, though, since these files
are not part of my project, but I was able to drag them into Eclipse.
Anyway, I was still having the problem, but noted that my URL actually
runs a servlet that does a RequestDispatcher.include() of my generated
JSP page, so even though the JSP says everything was UTF-8 in the @page
directive, apparently the response was already set to the default
charset in the servlet itself. So I added
response.setCharacterEncoding("UTF-8"); to the top of my servlet's
doGet/doPost and that seems to have resolved it.
Thanks for all the tips and ideas!
David
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Tomcat 7.0.19 character encoding issue with JSP
Posted by Konstantin Kolinko <kn...@gmail.com>.
2011/9/1 David Wall <d....@computer.org>:
> Thanks for the ideas, Mark, but it's still the same undesirable result.
>
> On 9/1/2011 6:58 AM, Mark Thomas wrote:
>>
>> I suspect you need:
>> <%@ page pageEncoding="UTF-8" %>
>> at the start of your JSP.
>>
>> .java files are written using UTF-8 by default so if what you see there
>> is wrong then the original .jsp file was read with the wrong encoding.
>
> My JSP file that I write shows it correctly since changing my PrintWriter to
> set the stream to UTF-8.
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"
> session="true" isELIgnored="true" %>
> <%@ taglib uri="http://open.esignforms.com/libdocsgen/taglib" prefix="esf"
> %>
> ...
> <p>These have issues: ► Ŵ but these don't: ™ ⇔ ♦
> á ¶ ¥</p>
> ...
>
> But the generated .java files still shows:
>
> out.write("<p>These have issues: â–º Å´ but these don't: ™ ⇔
Is your editor (or whatever you use to view java file) treating the
*.java file as UTF-8 or not? It may be display issue in the editor.
If you are opening it in Eclipse:
press Alt + Enter -> Properties dialog opens -> see "Resource" page ->
Text file encoding
> ♦ á ¶ ¥</p>\n");
>
> I checked and the file mod timestamps are updated so I know these are newly
> created files when I run the generated JSP. I noted that the compiler was
> looking for "UTF-8" (and not allowing "UTF8" or "utf-8" or other variants).
>
> My only guess now is to know what's the file encoding used when the Tomcat
> compiler (Jasper?) READS my JSP file. Is it possible that it is messed up
> reading my UTF-8 encoded JSP file, even though it then writes the .java file
> with UTF-8 also?
>
> I am running Windows 7, Tomcat 7.0.19, latest Java 6, and running this in
> Eclipse Helios Service Release 2.
>
> Any other thoughts I can try?
>
Best regards,
Konstantin Kolinko
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Tomcat 7.0.19 character encoding issue with JSP
Posted by David Wall <d....@computer.org>.
Thanks for the ideas, Mark, but it's still the same undesirable result.
On 9/1/2011 6:58 AM, Mark Thomas wrote:
> I suspect you need:
> <%@ page pageEncoding="UTF-8" %>
> at the start of your JSP.
>
> .java files are written using UTF-8 by default so if what you see there
> is wrong then the original .jsp file was read with the wrong encoding.
My JSP file that I write shows it correctly since changing my
PrintWriter to set the stream to UTF-8.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"
session="true" isELIgnored="true" %>
<%@ taglib uri="http://open.esignforms.com/libdocsgen/taglib"
prefix="esf" %>
...
<p>These have issues: ► Ŵ but these don't: ™ ⇔ ♦
á ¶ ¥</p>
...
But the generated .java files still shows:
out.write("<p>These have issues: â–º Å´ but these don't: ™
⇔ ♦ á ¶ ¥</p>\n");
I checked and the file mod timestamps are updated so I know these are
newly created files when I run the generated JSP. I noted that the
compiler was looking for "UTF-8" (and not allowing "UTF8" or "utf-8" or
other variants).
My only guess now is to know what's the file encoding used when the
Tomcat compiler (Jasper?) READS my JSP file. Is it possible that it is
messed up reading my UTF-8 encoded JSP file, even though it then writes
the .java file with UTF-8 also?
I am running Windows 7, Tomcat 7.0.19, latest Java 6, and running this
in Eclipse Helios Service Release 2.
Any other thoughts I can try?
Thanks,
David
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org
Re: Tomcat 7.0.19 character encoding issue with JSP
Posted by Mark Thomas <ma...@apache.org>.
On 01/09/2011 03:41, David Wall wrote:
> I'm trying to track down a character encoding issue that I've been
> having, but don't really understand. Hopefully one of you will know what
> the answer is.
I suspect you need:
<%@ page pageEncoding="UTF-8" %>
at the start of your JSP.
.java files are written using UTF-8 by default so if what you see there
is wrong then the original .jsp file was read with the wrong encoding.
Tomcat determines the encoding based on a number of factors. For the
details see line 326 onwards of
http://svn.apache.org/viewvc/tomcat/trunk/java/org/apache/jasper/compiler/ParserController.java?view=annotate
The short version is if it is XML, it uses the encoding defined in the
XML doc, if it isn't XML it uses the value of pageEncoding. The
fall-back is ISO-8859-1.
> <%@ page contentType="text/html; charset=utf-8" session="true"
> isELIgnored="true" %>
You'll still need this to tell the browser to use UTF-8 to display the data.
> So I figured it was the default character encoding of the JVM causing me
> some grief. I checked and the default on my Windows PC is Cp1252. But
> when I change this with the JVM argument -Dfile.encoding=UTF8, I am no
> better off.
That is expected. file.encoding is not always read/write and should
never be relied upon to fix any encoding problems.
HTH,
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org