You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Rick Beaubien <rb...@library.berkeley.edu> on 2000/10/27 18:47:11 UTC

Tomcat 3.1 problems: UTF-8 encoded HTML pages mishandled; Classpath issues

We recently tried upgrading our version of Tomcat 3.1. We had actually been
running 3.1 for a while, but discovered that changes had been made to some
of the 3.1 .jar files since we first installed it (the modified date for
the class files in the servlet.jar file for the earlier issue of 3.1 is
3/08/00; the modified date for the more recent issue is 4/18/00).
Unfortunately, the newer issue has introduced two big problems for us, and
we have had to roll back to the earlier issue of 3.1.

The most serious of the two problems involves the handling of multibyte
UTF-8 encoded Unicode values embedded in HTML pages. My servlet produces
UTF-8 encoded HTML pages that include encoded Unicode character values in
the CJK range; I set the HttpServletResponse.ContentType in these cases to
"text/html;charset=utf-8".  Under the earlier issue of Tomcat 3.1, the
UTF-8 encoded CJK characters got submitted to the browser properly; IE5
with the proper fonts installed was able to display them just fine.  Under
the newest issue of Tomcat 3.1, however, the CJK characters in the HTML
pages are replaced with "?"s before they are submitted to the browser! 

To see how the older version of Tomcat 3.1 treats the UTF-8 encodings of
characters in the CJK range, specify the following location in IE5: 

http://sunsite.berkeley.edu/xdlib/servlet/archobj?DOCCHOICE=misc/sprintatest
2.xml

With the proper fonts installed, Chinese characters will appear in the two
lefthand frames.  If you view the source, you can see that these are being
transmitted by the 3 byte UTF-8 encodings of the corresponding Unicode
values.  But under the current release of Tomcat 3.1 "????" appear in place
of the UTF-8 encodings for the Chinese characters. (This problem pertains,
I am sure, to all multibyte UTF-8 encodings, not just those in the CJK
range.  For another report of the same problem, see Stefan van den Oord's
memo of May 4, 2000 at:
http://www.metronet.com/~wjm/tomcat/FromFeb11/msg01988.html )

The second problem introduced by the newer issue of 3.1 is easier for us to
work around. Under the earlier issue of Tomcat 3.1, I had been able to run
parallel versions of a servlet off the same running copy of the Tomcat
servlet engine: a production version and a development version. The two
servlet versions use classes with identical package and class names but
residing of course in different CLASSPATH locations. (The production
version is accessed via http://sunsite.berkeley.edu/xdlib/servlet/... and
the development version via
http://sunsite.berkeley.edu/xdlibdev/servlet/...) The previous release of
Tomcat 3.1 had no problem keeping these two versions of my servlet sorted
out; it activated the proper classes from the proper classpath for the
version of the servlet which the URL indicated. But under the newer issue
of Tomcat 3.1, this has changed. If a user invokes the xdlibdev servlet (my
development servlet), Tomcat will now use classes from the xdlib/servlet
classpath (the production servlet classpath) if these are already loaded!
In other words, it now seems only to pay attention to the package and class
names, not to the classpath that is associated with a servlet when loading
classes for use. 

We are running Tomcat under Solaris 2.6 and jdk1.2.2.

I have reported the second problem above as Bug report 170.  However all of
my attempts to report the first problem as a bug have timed out; it no
longer seems to be possible to submit a bug report! 

Thanks in advance for any insights anyone might have into either of these
matters. 

Rick Beaubien

-----------------------------------------------------
Rick Beaubien 

Software Engineer: Research and Development
Library Systems Office
Rm 386 Doe Library
University of California
Berkeley, CA 94720-6000
510-643-9776