You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Christian Mallwitz <c....@intershop.de> on 2000/11/13 18:23:15 UTC

tomcat 4.0 m4: bug while submitting UTF-8 data to JSP page

Hi,

I have a JSP file (see attachment) which lets you submit text in UTF-8 to
the same JSP file. For this to work the JSP file contains code for
converting the submitted text from Unicode to UTF-8. 

I run some test to submit the Euro symbol. In Unicode this is code point
0x20ac and in UTF-8 it is 0xE2 0x82 0xAC (3 bytes). It works for all servlet
engines I know of incl. Tomcat up to 3.2 beta 6 but not for Tomcat 4.0m4

if you have an URL like http://host/post.jsp?text=%E2%82%AC I expect the
following output:

text [as text]   = â'¬
text [as hex]    = 0xe2 0x82 0xac 
text [corrected] = EUR

but I get

text [as text]   = â'¬
text [as hex]    = 0xe2 0x201a 0xac 
text [corrected] = 

Note the second hex code. Interestingly 0x201a is a Unicode code point
containing a , character but I'm clueless how Tomcat got there ...

Bye
Christian
PS: I have attached a JSP file for more multibyte samples ...
-- 
Christian Mallwitz INTERSHOP Communications Germany
Senior Software Engineer    phone: +49 3641 894 334