You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@struts.apache.org by bu...@apache.org on 2004/01/24 22:18:41 UTC
DO NOT REPLY [Bug 26403] New: -
double UTF-8 encoding of HTTP request parameters
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26403>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=26403
double UTF-8 encoding of HTTP request parameters
Summary: double UTF-8 encoding of HTTP request parameters
Product: Struts
Version: Nightly Build
Platform: Other
OS/Version: Other
Status: NEW
Severity: Normal
Priority: Other
Component: Digester
AssignedTo: struts-dev@jakarta.apache.org
ReportedBy: darkeye@tyrell.hu
I'm having a problem with properly processing UTF-8 encoded request parameters
through struts. The effect is, that international characters (that are not
ASCII, thus are multi-byte UTF-8 characters) are encoded twice into UTF-8.
As an example, let's see the examples webapp included in the jakarta-struts
source tree. It has the registration sample, reachable through
http://localhost:8080/struts-examples/validator/registration.do
if installed on localhost:8080. let's suppose I which to type:
small letter a with acute: รก
unicode value hex: 00e1
unicode value binary: 11100001
UTF-8 binary: 11000011 10100001
UTF-8 in hex: c3a1
into the firstName field into the form. this can be simulated by:
http://localhost:8080/struts-examples/validator/registration-submit.do?firstName=%C3%A1
(if typed manually and submitted via POST, has the same effect)
the resuling page shows a lot of form problems, as I didn't fill out most of the
fields, which is OK. but more importantly, it also shows the entered letter in
the firstName input field. what is vierd, is that a different letter is shown
(actually two letters). running xxd on the received page, here's the relevant part:
00003a0: 6e67 7468 3d22 3330 2220 7369 7a65 3d22 ngth="30" size="
00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...."
00003c0: 3e0a 2020 2020 3c2f 7464 3e0a 2020 3c2f >. </td>. </
with the important part at value="....", which is:
00003b0: 3330 2220 7661 6c75 653d 22c3 83c2 a122 30" value="...."
^^^^^^^^^^
the letters presented are:
UTF-8 hex sequence: c383c2a1
UTF-8 binary: 11000011 10000011 11000010 10100001
which is actually two UTF-8 letters by now. what is funny, that if I 'decode'
them from UTF-8, I get the original UTF-8 sequence:
first part, as received: 11000011 10000011
de-coded: 11000011
second part, as received: 11000010 10100001
de-coded: 10100001
and voila, the the parts make up the original UTF-8 sequence:
11000011 10100001
which actually is the UTF-8 sequence for the letter sent.
if I resend this page (the by now to UTF-8 letters), I get four letters, then 8,
etc. it seems, that the engine doesn't recognize, that there are UTF-8 sequences
to begin with, and encodes them 'again'.
I'm using mozilla as a browser, Tomcat 5.0.16. the encoding of the pages is UTF-8.
---------------------------------------------------------------------
To unsubscribe, e-mail: struts-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: struts-dev-help@jakarta.apache.org