You are viewing a plain text version of this content. The canonical link for it is here.
Posted to taglibs-dev@jakarta.apache.org by Tod Matola <ma...@oclc.org> on 2000/12/07 14:04:26 UTC
Re: Unicode decoding problem

Hello,

Michel Jacobson wrote:

> Hy,
>
> I find a trouble using Tomcat (on win95 with Apache) with unicode chars in
> url. Tomcat send me the error:
> "Decode error" from unUrlDecode(String) function in the package
> org.apache.tomcat.util;
>
> Watching this function, I see that the algorithme only handle chars in %xx
> format, but for some unicode chars the value can be in %x%xx format or in
> %x%x (for example the char: 014b send by the web client is %1%4b, the char
> 0303 is %3%3, etc.)
>
> So as I need this function to work well, I change it myself, and I give you
> my new version.
>
> Michel Jacobson CNRS.LACITO
> jacobson@idf.ext.jussieu.fr
> ------------------------------------------------------
>
>     public static String unUrlDecode(String data) {
>         StringBuffer buf = new StringBuffer();
>         for (int i = 0; i < data.length(); i++) {
>             char c = data.charAt(i);
>             switch (c) {
>             case '+':
>                 buf.append(' ');
>                 break;
>             case '%':
>                 try {
>                     if(data.charAt(i+2) == '%') {
>                                 String hexachars ="0123456789abcdefABCDEF";
>                             if(hexachars.indexOf(data.charAt(i+4)) == -1) {
>                                         buf.append((char)
> Integer.parseInt(data.substring(i+1,i+2).concat("0").concat(data.substring(i
> +3,i+4)), 16));
>                                         i += 3;
>                                 } else {
>                                         buf.append((char)
> Integer.parseInt(data.substring(i+1,i+2).concat(data.substring(i+3,i+5)),
> 16));
>                                         i += 4;
>                                 }
>                         } else {
>                                 buf.append((char) Integer.parseInt(data.substring(i+1, i+3), 16));
>                                 i += 2;
>                         }
>                 } catch (NumberFormatException e) {
>                     String msg = "Decode error ";
>                     throw new IllegalArgumentException(msg);
>                 } catch (StringIndexOutOfBoundsException e) {
>                     String rest  = data.substring(i);
>                     buf.append(rest);
>                     if (rest.length()==2)
>                         i++;
>                 }
>
>                 break;
>             default:
>                 buf.append(c);
>                 break;
>             }
>         }
>         return buf.toString();
>     }

I know this is being picky, but I noticed some performace issues with your algorithm.
If understand it right it will be used quite heavy when decoding data and should be tuned.

As far as I know, if you define an object within a loop (String hexachars) you will actually be
creating a new String object for every iteration of that block of code and the value of the string
doesn't change, so there is no need. I would adjust it to the following (no functionality changes,
but require fewer calls to the Classloader and GC).


    public static String unUrlDecode(String data) {
        StringBuffer buf = new StringBuffer();
        String hexachars ="0123456789abcdefABCDEF";
        char c;

        for (int i = 0; i < data.length(); i++) {
            c = data.charAt(i);
            switch (c) {
            case '+':
                buf.append(' ');
                break;
            case '%':
                try {
                    if(data.charAt(i+2) == '%') {
                            if(hexachars.indexOf(data.charAt(i+4)) == -1) {
                                        buf.append((char)
Integer.parseInt(data.substring(i+1,i+2).concat("0").concat(data.substring(i+3,i+4)), 16));
                                        i += 3;
                                } else {
                                        buf.append((char)
Integer.parseInt(data.substring(i+1,i+2).concat(data.substring(i+3,i+5)), 16));
                                        i += 4;
                                }
                        } else {
                                buf.append((char) Integer.parseInt(data.substring(i+1, i+3), 16));
                                i += 2;
                        }
                } catch (NumberFormatException e) {
                    String msg = "Decode error ";
                    throw new IllegalArgumentException(msg);
                } catch (StringIndexOutOfBoundsException e) {
                    String rest  = data.substring(i);
                    buf.append(rest);
                    if (rest.length()==2)
                        i++;
                }

                break;
            default:
                buf.append(c);
                break;
            }
        }
        return buf.toString();
    }


Cheers Tod....