You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Asher Tarnopolski <as...@huji.013.net.il> on 2004/07/02 23:20:24 UTC

utf-8 with tomcat 5

hey folks,

to show you what is it all about i wrote a small app which shows the
html utf-8 codes of the entered string. this is the jsp code:

<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body> <form act="/tests/utf.jsp" method=post><input type=text name=source ><input type=submit><form><p> <%if(request.getParameter("source")!=null){  request.setCharacterEncoding("UTF-8");   out.println(request.getParameter("source").length()+"<p>");   out.println(request.getParameter("source"));   StringBuffer sb = new StringBuffer();  for(int i=0; i<request.getParameter("source").length(); i++)  {    if(request.getParameter("source").charAt(i) == '&')      sb.append("&");    else      sb.append(request.getParameter("source").charAt(i));   }  out.println("<p>"+ sb.toString());}%> </body></html>

well, as you see, this code block gets a utf-8 encoded parameter from
a request, outputs its length, the parameter itself, and its html
utf-8 codes.
to test it i send a hebrew letter ALEF. on tomcat 4.xx everything
works perfect and i get the following response:

7
א
&#1488;

(in case you don't see it here, it's 7 , alef's utf-8 code and alef's utf-8
code parsed to be visible in browser)

cool. then i run the same code on tomcat 5.0.16 and KABOOM. this is
what i get:

2
א
א

(in case you don't see it here: it's 2, and twice alef as it would be
passed in windows-1255 or iso... where the hell utf-8 is gone?)

all this makes me understand that tomcat 5 has some bug influenting
its utf-8 support. how comes the parameter length of one char is 2?!

thanks in advance.