You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Asher Tarnopolski <as...@huji.013.net.il> on 2004/07/02 23:20:24 UTC
utf-8 with tomcat 5
hey folks,
to show you what is it all about i wrote a small app which shows the
html utf-8 codes of the entered string. this is the jsp code:
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"></head><body> <form act="/tests/utf.jsp" method=post><input type=text name=source ><input type=submit><form><p> <%if(request.getParameter("source")!=null){ request.setCharacterEncoding("UTF-8"); out.println(request.getParameter("source").length()+"<p>"); out.println(request.getParameter("source")); StringBuffer sb = new StringBuffer(); for(int i=0; i<request.getParameter("source").length(); i++) { if(request.getParameter("source").charAt(i) == '&') sb.append("&"); else sb.append(request.getParameter("source").charAt(i)); } out.println("<p>"+ sb.toString());}%> </body></html>
well, as you see, this code block gets a utf-8 encoded parameter from
a request, outputs its length, the parameter itself, and its html
utf-8 codes.
to test it i send a hebrew letter ALEF. on tomcat 4.xx everything
works perfect and i get the following response:
7
א
א
(in case you don't see it here, it's 7 , alef's utf-8 code and alef's utf-8
code parsed to be visible in browser)
cool. then i run the same code on tomcat 5.0.16 and KABOOM. this is
what i get:
2
א
א
(in case you don't see it here: it's 2, and twice alef as it would be
passed in windows-1255 or iso... where the hell utf-8 is gone?)
all this makes me understand that tomcat 5 has some bug influenting
its utf-8 support. how comes the parameter length of one char is 2?!
thanks in advance.