You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Tad Woods <ta...@gmail.com> on 2007/12/31 15:40:05 UTC

Tomcat upgrade introduces character set problems.

Upgrading from Tomcat 5.5.14 to 5.5.23 has introduced character set encoding
problems. Note the JVM also changed from 1.5.0_06 to 1.5.0_11 in the
upgarde. In the earlier release I was able to post, persist, and reload
special characters just fine. Now with 5.5.23 some special characters are
being converted to "?" or other non-displayable characters.

 

After research and testing, I was able to solve the problem with regular
form posts by doing to things: (1) ensuring that all of my pages specify
content type = "text/html; charset=UTF-8" and (2) set up a servlet Filter
for all url-patterns that calls request.setCharacterEncoding("UTF-8"). 

 

The outstanding encoding problem is with multipart/form-data posts. For
example: I upload a text file, process it with ServletFileUpload, save it to
disk, then read that file back from disk and special characters get
converted to "?". I have tried to specify different character sets at
various places in that process flow with no success.

 

This is where I am stuck with testing: The Linux host's JVM default
character set is US-ASCII. I have tried the content type of the HTML
multipart form as UTF-8, ISO-8859-1, and US-ASCII. In the servlet Filter for
the multipart post I have tried setCharacterEncoding() to the various
character sets. If I call DiskFileItem.getCharSet() on the uploaded file it
returns null, and the default character set for DiskFileItem is ISO-8859-1.
If I download the uploaded file via FTP (not via HTTP through Tomcat) back
to my Windows client the content looks fine (i.e. the special characters are
there). However when I read the file inside my servlet and re-display the
content via an HTTP response, the special characters turn to "?" or other
non-displayable characters. In the servlet I have tried reading the file
several ways, including FileReader and a a FileInputStream wrapped by a
InputStreamReader specifying the various character sets.

 

To make this even more interesting (or frustrating), if I run the same tests
solely in my Windows client, the multipart post works fine (as did the
earlier Tomcat on the host)! The Window's clients default character set is
WINDOWS-1252 (apparently a superset of ISO-8859-1). Note that the host's
default character set remained US-ASCII for both versions of Tomcat, so I
don't know whether that is a factor or not.

 

Tad


RE: Tomcat upgrade introduces character set problems.

Posted by Tad Woods <ta...@gmail.com>.
I solved the problem by specifying WINDOWS-1252 as the Charset where I read
the imported file on the Linux host (excerpt follows). I still don't
understand why this wasn't necessary on the older Tomcat/JVM, unless the
decoding of ISO-8859-1 just became stricter. Apparently WINDOWS-1252 is a
superset of ISO-8859-1.

File importFile;
...
Reader importFileReader = new InputStreamReader(new
FileInputStream(importFile),"WINDOWS-1252");



-----Original Message-----
From: Tad Woods [mailto:tadwoods@gmail.com] 
Sent: Monday, December 31, 2007 9:40 AM
To: users@tomcat.apache.org
Subject: Tomcat upgrade introduces character set problems.

Upgrading from Tomcat 5.5.14 to 5.5.23 has introduced character set encoding
problems. Note the JVM also changed from 1.5.0_06 to 1.5.0_11 in the
upgarde. In the earlier release I was able to post, persist, and reload
special characters just fine. Now with 5.5.23 some special characters are
being converted to "?" or other non-displayable characters.

After research and testing, I was able to solve the problem with regular
form posts by doing two things: (1) ensuring that all of my pages specify
content type = "text/html; charset=UTF-8" and (2) set up a servlet Filter
for all url-patterns that calls request.setCharacterEncoding("UTF-8"). 

The outstanding encoding problem is with multipart/form-data posts. For
example: I upload a text file, process it with ServletFileUpload, save it to
disk, then read that file back from disk and special characters get
converted to "?". I have tried to specify different character sets at
various places in that process flow with no success.

This is where I am stuck with testing: The Linux host's JVM default
character set is US-ASCII. I have tried the content type of the HTML
multipart form as UTF-8, ISO-8859-1, and US-ASCII. In the servlet Filter for
the multipart post I have tried setCharacterEncoding() to the various
character sets. If I call DiskFileItem.getCharSet() on the uploaded file it
returns null, and the default character set for DiskFileItem is ISO-8859-1.
If I download the uploaded file via FTP (not via HTTP through Tomcat) back
to my Windows client the content looks fine (i.e. the special characters are
there). However when I read the file inside my servlet and re-display the
content via an HTTP response, the special characters turn to "?" or other
non-displayable characters. In the servlet I have tried reading the file
several ways, including FileReader and a FileInputStream wrapped by an
InputStreamReader specifying the various character sets.

To make this even more interesting (or frustrating), if I run the same tests
solely in my Windows client, the multipart post works fine (as did the
earlier Tomcat on the host)! The Window's clients default character set is
WINDOWS-1252 (apparently a superset of ISO-8859-1). Note that the host's
default character set remained US-ASCII for both versions of Tomcat, so I
don't know whether that is a factor or not.


Tad




---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org