You are viewing a plain text version of this content. The canonical link for it is here.
Posted to taglibs-dev@jakarta.apache.org by Tod Matola <ma...@oclc.org> on 2000/12/07 14:04:26 UTC
Re: Unicode decoding problem
Hello,
Michel Jacobson wrote:
> Hy,
>
> I find a trouble using Tomcat (on win95 with Apache) with unicode chars in
> url. Tomcat send me the error:
> "Decode error" from unUrlDecode(String) function in the package
> org.apache.tomcat.util;
>
> Watching this function, I see that the algorithme only handle chars in %xx
> format, but for some unicode chars the value can be in %x%xx format or in
> %x%x (for example the char: 014b send by the web client is %1%4b, the char
> 0303 is %3%3, etc.)
>
> So as I need this function to work well, I change it myself, and I give you
> my new version.
>
> Michel Jacobson CNRS.LACITO
> jacobson@idf.ext.jussieu.fr
> ------------------------------------------------------
>
> public static String unUrlDecode(String data) {
> StringBuffer buf = new StringBuffer();
> for (int i = 0; i < data.length(); i++) {
> char c = data.charAt(i);
> switch (c) {
> case '+':
> buf.append(' ');
> break;
> case '%':
> try {
> if(data.charAt(i+2) == '%') {
> String hexachars ="0123456789abcdefABCDEF";
> if(hexachars.indexOf(data.charAt(i+4)) == -1) {
> buf.append((char)
> Integer.parseInt(data.substring(i+1,i+2).concat("0").concat(data.substring(i
> +3,i+4)), 16));
> i += 3;
> } else {
> buf.append((char)
> Integer.parseInt(data.substring(i+1,i+2).concat(data.substring(i+3,i+5)),
> 16));
> i += 4;
> }
> } else {
> buf.append((char) Integer.parseInt(data.substring(i+1, i+3), 16));
> i += 2;
> }
> } catch (NumberFormatException e) {
> String msg = "Decode error ";
> throw new IllegalArgumentException(msg);
> } catch (StringIndexOutOfBoundsException e) {
> String rest = data.substring(i);
> buf.append(rest);
> if (rest.length()==2)
> i++;
> }
>
> break;
> default:
> buf.append(c);
> break;
> }
> }
> return buf.toString();
> }
I know this is being picky, but I noticed some performace issues with your algorithm.
If understand it right it will be used quite heavy when decoding data and should be tuned.
As far as I know, if you define an object within a loop (String hexachars) you will actually be
creating a new String object for every iteration of that block of code and the value of the string
doesn't change, so there is no need. I would adjust it to the following (no functionality changes,
but require fewer calls to the Classloader and GC).
public static String unUrlDecode(String data) {
StringBuffer buf = new StringBuffer();
String hexachars ="0123456789abcdefABCDEF";
char c;
for (int i = 0; i < data.length(); i++) {
c = data.charAt(i);
switch (c) {
case '+':
buf.append(' ');
break;
case '%':
try {
if(data.charAt(i+2) == '%') {
if(hexachars.indexOf(data.charAt(i+4)) == -1) {
buf.append((char)
Integer.parseInt(data.substring(i+1,i+2).concat("0").concat(data.substring(i+3,i+4)), 16));
i += 3;
} else {
buf.append((char)
Integer.parseInt(data.substring(i+1,i+2).concat(data.substring(i+3,i+5)), 16));
i += 4;
}
} else {
buf.append((char) Integer.parseInt(data.substring(i+1, i+3), 16));
i += 2;
}
} catch (NumberFormatException e) {
String msg = "Decode error ";
throw new IllegalArgumentException(msg);
} catch (StringIndexOutOfBoundsException e) {
String rest = data.substring(i);
buf.append(rest);
if (rest.length()==2)
i++;
}
break;
default:
buf.append(c);
break;
}
}
return buf.toString();
}
Cheers Tod....