You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by BugRat Mail System <to...@cortexity.com> on 2000/10/31 20:32:38 UTC

BugRat Report #323 has been filed.

Bug report #323 has just been filed.

You can view the report at the following URL:

   <http://znutar.cortexity.com:8888/BugRatViewer/ShowReport/323>

REPORT #323 Details.

Project: Tomcat
Category: Bug Report
SubCategory: New Bug Report
Class: swbug
State: received
Priority: high
Severity: critical
Confidence: public
Environment: 
   Release: 3.1
   JVM Release: jdk1.2.2
   Operating System: Solaris
   OS Release: 2.6
   Platform: Unix

Synopsis: 
Tomcat 3.1 mishandles UTF-8 encoded text above the ascii range

Description:
My servlet produces UTF-8 encoded HTML pages that 
include encoded Unicode character values in the CJK range; 
I set the HttpServletResponse.ContentType in these cases to
"text/html;charset=utf-8".  Under an early issue of 
Tomcat 3.1, the UTF-8 encoded CJK characters got submitted 
to the browser properly; IE5 with the proper fonts 
installed was able to display theses characters just fine.  
Under the newest issue of Tomcat 3.1, however, the CJK 
characters in the HTML pages are replaced with "?"s before 
they are submitted to the browser! 

For another report of the same problem, see 
Stefan van den Oord's memo to the developers list on
May 4, 2000:

http://www.metronet.com/~wjm/tomcat/FromFeb11/msg01988.html )


Re: BugRat Report #323 has been filed.

Posted by Rick Beaubien <rb...@library.berkeley.edu>.
Please note that the problem I am reporting does not pertain to UTF-8
encoding in URLs, but to UTF-8 encoded values above the ascii range in the
CONTENT of the HTML pages prepared by my servlet.  The content of my pages
includes UTF-8 encode unicode values in the CJK range.   For example, see
the following test page delivered by my servlet under the early, correctly
functioning version of Tomcat 3.1 (I recommend using IE to view this page;
if you have Chinese font support installed you will be able to see Chinese
characters in the lefthand frames.  But even if the Chinese characters show
up as boxes for lack of the proper font, you can tell by looking at the
page source that the 3-byte UTF-8 encodings for the Chinese characters are
being delivered to the browser):

http://sunsite.berkeley.edu/xdlib/servlet/archobj?DOCCHOICE=misc/sprintatest
2.xml

Under the newest release of Tomcat 3.1, the Chinese characters in question
all get translated to question marks by Tomcat before being delivered to
the browser.

The patches that I see cited at the address provided by Kim below all seem
to pertain to UTF-8 encoding in URLs, not in page content.  Stefan van den
Oord and I are reporting problems with the handling of UTF-8 encoding in
the <body> of the HTML pages prepared by our servlets.

Thanks,

Rick Beaubien


At 12:31 PM 11/01/2000 +0900, you wrote:
>Please check
>
>    http://www.javaclue.org/tomcat/
>
>Kim
>
>
>On Tue, 31 Oct 2000, BugRat Mail System wrote:
>
>> Bug report #323 has just been filed.
>> 
>> You can view the report at the following URL:
>> 
>>    <http://znutar.cortexity.com:8888/BugRatViewer/ShowReport/323>
>> 
>> REPORT #323 Details.
>> 
>> Project: Tomcat
>> Category: Bug Report
>> SubCategory: New Bug Report
>> Class: swbug
>> State: received
>> Priority: high
>> Severity: critical
>> Confidence: public
>> Environment: 
>>    Release: 3.1
>>    JVM Release: jdk1.2.2
>>    Operating System: Solaris
>>    OS Release: 2.6
>>    Platform: Unix
>> 
>> Synopsis: 
>> Tomcat 3.1 mishandles UTF-8 encoded text above the ascii range
>> 
>> Description:
>> My servlet produces UTF-8 encoded HTML pages that 
>> include encoded Unicode character values in the CJK range; 
>> I set the HttpServletResponse.ContentType in these cases to
>> "text/html;charset=utf-8".  Under an early issue of 
>> Tomcat 3.1, the UTF-8 encoded CJK characters got submitted 
>> to the browser properly; IE5 with the proper fonts 
>> installed was able to display theses characters just fine.  
>> Under the newest issue of Tomcat 3.1, however, the CJK 
>> characters in the HTML pages are replaced with "?"s before 
>> they are submitted to the browser! 
>> 
>> For another report of the same problem, see 
>> Stefan van den Oord's memo to the developers list on
>> May 4, 2000:
>> 
>> http://www.metronet.com/~wjm/tomcat/FromFeb11/msg01988.html )
>> 
>> 
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
>
>
>

-----------------------------------------------------
Rick Beaubien 

Software Engineer: Research and Development
Library Systems Office
Rm 386 Doe Library
University of California
Berkeley, CA 94720-6000
510-643-9776

Character encoding problems [was BugRat Report #323 has been filed.]

Posted by Nick Bauman <ni...@cortexity.com>.
3.2 and 3.3 devs: Why havn't Pilho's patches been incorporated into the
main tree?

-Nick

On Wed, 1 Nov 2000, Pilho Kim wrote:

> Please check
> 
>     http://www.javaclue.org/tomcat/
> 
> Kim
> 
> 
> On Tue, 31 Oct 2000, BugRat Mail System wrote:
> 
> > Bug report #323 has just been filed.
> > 
> > You can view the report at the following URL:
> > 
> >    <http://znutar.cortexity.com:8888/BugRatViewer/ShowReport/323>
> > 
> > REPORT #323 Details.
> > 
> > Project: Tomcat
> > Category: Bug Report
> > SubCategory: New Bug Report
> > Class: swbug
> > State: received
> > Priority: high
> > Severity: critical
> > Confidence: public
> > Environment: 
> >    Release: 3.1
> >    JVM Release: jdk1.2.2
> >    Operating System: Solaris
> >    OS Release: 2.6
> >    Platform: Unix
> > 
> > Synopsis: 
> > Tomcat 3.1 mishandles UTF-8 encoded text above the ascii range
> > 
> > Description:
> > My servlet produces UTF-8 encoded HTML pages that 
> > include encoded Unicode character values in the CJK range; 
> > I set the HttpServletResponse.ContentType in these cases to
> > "text/html;charset=utf-8".  Under an early issue of 
> > Tomcat 3.1, the UTF-8 encoded CJK characters got submitted 
> > to the browser properly; IE5 with the proper fonts 
> > installed was able to display theses characters just fine.  
> > Under the newest issue of Tomcat 3.1, however, the CJK 
> > characters in the HTML pages are replaced with "?"s before 
> > they are submitted to the browser! 
> > 
> > For another report of the same problem, see 
> > Stefan van den Oord's memo to the developers list on
> > May 4, 2000:
> > 
> > http://www.metronet.com/~wjm/tomcat/FromFeb11/msg01988.html )
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 

-- 
Nicolaus Bauman
Software Engineer
Simplexity Systems



Re: BugRat Report #323 has been filed.

Posted by Pilho Kim <ph...@math.soongsil.ac.kr>.
Please check

    http://www.javaclue.org/tomcat/

Kim


On Tue, 31 Oct 2000, BugRat Mail System wrote:

> Bug report #323 has just been filed.
> 
> You can view the report at the following URL:
> 
>    <http://znutar.cortexity.com:8888/BugRatViewer/ShowReport/323>
> 
> REPORT #323 Details.
> 
> Project: Tomcat
> Category: Bug Report
> SubCategory: New Bug Report
> Class: swbug
> State: received
> Priority: high
> Severity: critical
> Confidence: public
> Environment: 
>    Release: 3.1
>    JVM Release: jdk1.2.2
>    Operating System: Solaris
>    OS Release: 2.6
>    Platform: Unix
> 
> Synopsis: 
> Tomcat 3.1 mishandles UTF-8 encoded text above the ascii range
> 
> Description:
> My servlet produces UTF-8 encoded HTML pages that 
> include encoded Unicode character values in the CJK range; 
> I set the HttpServletResponse.ContentType in these cases to
> "text/html;charset=utf-8".  Under an early issue of 
> Tomcat 3.1, the UTF-8 encoded CJK characters got submitted 
> to the browser properly; IE5 with the proper fonts 
> installed was able to display theses characters just fine.  
> Under the newest issue of Tomcat 3.1, however, the CJK 
> characters in the HTML pages are replaced with "?"s before 
> they are submitted to the browser! 
> 
> For another report of the same problem, see 
> Stefan van den Oord's memo to the developers list on
> May 4, 2000:
> 
> http://www.metronet.com/~wjm/tomcat/FromFeb11/msg01988.html )
> 
>