You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-dev@jakarta.apache.org by Claes Mullern-Aspegren <as...@lysator.liu.se> on 2002/08/07 16:29:07 UTC

Wrong encoding sent to org.apache.util.URLUtil.java

I had some problem with the URLUtils class in
org.apache.util. It seams as when I'm
trying to upload a folder with swedish characters
(The "A" with a dot over or two dots over).
Using MS webfolders one of the calls to URLUtil.URLDecode
is encoded in ISO-8859-1 but the encoder parameter to the
method is UTF-8.

So this type of url:
/files/file%20%E5%20%E5moretext
was decoded to
/files/file

Since the String constructor got the wrong encoder
the byterepresentation of %E5 was wrong
and the string was truncated.


My solution was to set the encoder to ISO-8859-1
if the string containd a "%". I'm thinking this
is not the right way to do it. Does someone have a better idea
how to solve the problem, or which problems I can expect with this
solution.

Regards
/Klas Aspegren

The code.

    /**
     * Decode and return the specified URL-encoded byte array.
     *
     * @param bytes The url-encoded byte array
     * @param enc The encoding to use; if null, the default
     *  encoding is used
     * @exception IllegalArgumentException if a '%' character is not
     *  followed
     * by a valid 2-digit hexadecimal number
     */
    public static String URLDecode(byte[] bytes, String enc) {
        if (bytes == null)
            return (null);

        int len = bytes.length;
        int ix = 0;
        int ox = 0;
        while (ix < len) {
            byte b = bytes[ix++];     // Get byte to test
            if (b == '+') {
                b = (byte)' ';
            } else if (b == '%') {

--->		enc = "ISO-8859-1"; // MY ADDED LINE

		b = (byte) ((convertHexDigit(bytes[ix++]) << 4)
                            + convertHexDigit(bytes[ix++]));
	    }

            bytes[ox++] = b;
        }

	String res = null;
        if (enc != null) {
            try {
                return new String(bytes, 0, ox, enc);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
	return new String(bytes, 0, ox);


    }





--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Wrong encoding sent to org.apache.util.URLUtil.java

Posted by Sung-Gu <jericho at apache.org>.
----- Original Message ----- 
From: "Claes Mullern-Aspegren" <as...@lysator.liu.se>

> trying to upload a folder with swedish characters

The current encoding way (using URIUtil class) has a problem
to support non-ascii characters in a few of probabilites.... 
That uses the fixed safe characters that might give you another problem.

> (The "A" with a dot over or two dots over).
> Using MS webfolders one of the calls to URLUtil.URLDecode
> is encoded in ISO-8859-1 but the encoder parameter to the
> method is UTF-8.
> 
> So this type of url:
> /files/file%20%E5%20%E5moretext
> was decoded to
> /files/file
> 
> Since the String constructor got the wrong encoder
> the byterepresentation of %E5 was wrong
> and the string was truncated.

Yes, you can notice that...
The encoding exception should be exported...
This value should be set by application users occasionally.


> My solution was to set the encoder to ISO-8859-1
> if the string containd a "%". I'm thinking this
> is not the right way to do it. Does someone have a better idea
> how to solve the problem, or which problems I can expect with this
> solution.

I hope it may be by a logical encoding and decoding way with
some flexible setting and somewhat careful programmtic coding... 

If you have some time, could you please check it out. 
Do you thinks, it will be ok? (refer to the below Refs)

Sung-Gu

Refs:
http://www.mail-archive.com/slide-user@jakarta.apache.org/msg02507.html
http://www.mail-archive.com/slide-dev@jakarta.apache.org/msg03473.html


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>