You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by kevin seguin <se...@motive.com> on 2001/05/14 05:02:03 UTC

[tomcat 4] RequestUtil.parseParameters() method

i just happened to be looking through the RequestUtil.parseParameters()
method, and something struck me as odd.  since i don't know the history
here, i figured i'd ask someone who does...

anyway, the method looks something like this:

    public static void parseParameters(Map map, String data, String
encoding) 
        throws UnsupportedEncodingException {
        
        if ((data != null) && (data.length() > 0)) {
            int len = data.length();
            byte[] bytes = new byte[len];
            data.getBytes(0, len, bytes, 0);
            parseParameters(map, bytes, encoding);
        }
        
    }

what strikes me as odd is an encoding is being passed into the method,
but rather than using this encoding to get the bytes out of the string
passed in, a deprecated getBytes method is being used.  also, to
determine the number of bytes to get, String.length() is being used. 
this is a potential problem because String.length() is the number of
unicode characters, which is not necessarily the same number of bytes in
the string (think multibyte character sets).

i believe a safer version of this method is:

    public static void parseParameters(Map map, String data, String
encoding) 
        throws UnsupportedEncodingException {
        
        if ((data != null) && (data.length() > 0)) {
            byte[] bytes = data.getBytes(encoding);
            parseParameters(map, bytes, encoding);
        }
        
    }

RequestUtil.URLDecode(String str, String enc) has a similar problem.

i can commit changes to fix these problems if that's ok (i'm a new
committer, so i figure before i go stepping on anybody's toes, i'd run
this by the list :) ...

-kevin.

Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by tttye <tt...@ticnet.com>.
I remember this code.  It is correct given previous development history.
The bytes in the input data string are NOT encoded.

The look back earlier to where the original byte array from the browser is
converted to a string.  You will notice that no encoding is applied to the
bytes as they are stored in the string ( I.E. the input loop is:
                      while () {
                             int x = read();  //which reads a 8 bit byte
from the stream
                             string[i] = x;     //which assumes that each
byte is a character
                      }
Therefore, the deprecated getBytes method undoes this damage converting the
bytes back into a byte array so an encoding can be applied to the bytes and
a correctly encode character string produced.

The input data to this method should have always been a byte array but for
some reason previous developers converted the byte array into a string which
only holds 8 bit ""characters"".
Tim
----- Original Message -----
From: kevin seguin <se...@motive.com>
To: <to...@jakarta.apache.org>
Sent: Sunday, May 13, 2001 10:02 PM
Subject: [tomcat 4] RequestUtil.parseParameters() method


> i just happened to be looking through the RequestUtil.parseParameters()
> method, and something struck me as odd.  since i don't know the history
> here, i figured i'd ask someone who does...
>
> anyway, the method looks something like this:
>
>     public static void parseParameters(Map map, String data, String
> encoding)
>         throws UnsupportedEncodingException {
>
>         if ((data != null) && (data.length() > 0)) {
>             int len = data.length();
>             byte[] bytes = new byte[len];
>             data.getBytes(0, len, bytes, 0);
>             parseParameters(map, bytes, encoding);
>         }
>
>     }
>
> what strikes me as odd is an encoding is being passed into the method,
> but rather than using this encoding to get the bytes out of the string
> passed in, a deprecated getBytes method is being used.  also, to
> determine the number of bytes to get, String.length() is being used.
> this is a potential problem because String.length() is the number of
> unicode characters, which is not necessarily the same number of bytes in
> the string (think multibyte character sets).
>
> i believe a safer version of this method is:
>
>     public static void parseParameters(Map map, String data, String
> encoding)
>         throws UnsupportedEncodingException {
>
>         if ((data != null) && (data.length() > 0)) {
>             byte[] bytes = data.getBytes(encoding);
>             parseParameters(map, bytes, encoding);
>         }
>
>     }
>
> RequestUtil.URLDecode(String str, String enc) has a similar problem.
>
> i can commit changes to fix these problems if that's ok (i'm a new
> committer, so i figure before i go stepping on anybody's toes, i'd run
> this by the list :) ...
>
> -kevin.
>


Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by kevin seguin <se...@motive.com>.
> 
> > what might be an even more relevant question is why convert the string
> > to bytes anyways?  these bytes just end up in new strings, so why not
> > perform the parsing logic on chars rather than bytes?  seems like
> > unecessary conversions to/from bytes/strings....
> 
> Well, there are a number of operations which occur before going back to
> String.
> Also, the calls with Strings as parameters will go away enventually, for
> memory efficiency.
> 

it looked like the only things that occured were walking the bytes and
parsing out name/value pairs, and decoding +'s and encoded characters. 
just seemed like this could be done on the chars... perhaps i missed
something :)

-kevin.

Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by Remy Maucherat <re...@apache.org>.
> what might be an even more relevant question is why convert the string
> to bytes anyways?  these bytes just end up in new strings, so why not
> perform the parsing logic on chars rather than bytes?  seems like
> unecessary conversions to/from bytes/strings....

Well, there are a number of operations which occur before going back to
String.
Also, the calls with Strings as parameters will go away enventually, for
memory efficiency.

Remy


Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by kevin seguin <se...@motive.com>.
what might be an even more relevant question is why convert the string
to bytes anyways?  these bytes just end up in new strings, so why not
perform the parsing logic on chars rather than bytes?  seems like
unecessary conversions to/from bytes/strings....

-kevin.

kevin seguin wrote:
> 
> i just happened to be looking through the RequestUtil.parseParameters()
> method, and something struck me as odd.  since i don't know the history
> here, i figured i'd ask someone who does...
> 
> anyway, the method looks something like this:
> 
>     public static void parseParameters(Map map, String data, String
> encoding)
>         throws UnsupportedEncodingException {
> 
>         if ((data != null) && (data.length() > 0)) {
>             int len = data.length();
>             byte[] bytes = new byte[len];
>             data.getBytes(0, len, bytes, 0);
>             parseParameters(map, bytes, encoding);
>         }
> 
>     }
> 
> what strikes me as odd is an encoding is being passed into the method,
> but rather than using this encoding to get the bytes out of the string
> passed in, a deprecated getBytes method is being used.  also, to
> determine the number of bytes to get, String.length() is being used.
> this is a potential problem because String.length() is the number of
> unicode characters, which is not necessarily the same number of bytes in
> the string (think multibyte character sets).
> 
> i believe a safer version of this method is:
> 
>     public static void parseParameters(Map map, String data, String
> encoding)
>         throws UnsupportedEncodingException {
> 
>         if ((data != null) && (data.length() > 0)) {
>             byte[] bytes = data.getBytes(encoding);
>             parseParameters(map, bytes, encoding);
>         }
> 
>     }
> 
> RequestUtil.URLDecode(String str, String enc) has a similar problem.
> 
> i can commit changes to fix these problems if that's ok (i'm a new
> committer, so i figure before i go stepping on anybody's toes, i'd run
> this by the list :) ...
> 
> -kevin.

Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by kevin seguin <se...@motive.com>.
> >
> > it's also used to parse form data, right?  can form data have non
> > US-ASCII characters?
> 
> No. Form data uses parseParameters(Map, byte[], String encoding) for that
> reason.
> 

ahh... that's what i missed.  thanks.

> If the thing was memory efficient, the url should still be in byte form at
> this point, and we could directly use parseParameters(Map, byte[], String
> encoding) for URL params too.
> 

sure.  'nuff said :)

-kevin.

Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by Remy Maucherat <re...@apache.org>.
> > We use simple byte conversion there because the String which is passed
here
> > (the query portion of the URL) is supposed to only have US-ASCII
characters.
> >
>
> it's also used to parse form data, right?  can form data have non
> US-ASCII characters?

No. Form data uses parseParameters(Map, byte[], String encoding) for that
reason.

If the thing was memory efficient, the url should still be in byte form at
this point, and we could directly use parseParameters(Map, byte[], String
encoding) for URL params too.

Remy


Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by kevin seguin <se...@motive.com>.
> We use simple byte conversion there because the String which is passed here
> (the query portion of the URL) is supposed to only have US-ASCII characters.
> 

it's also used to parse form data, right?  can form data have non
US-ASCII characters?

Re: [tomcat 4] RequestUtil.parseParameters() method

Posted by Remy Maucherat <re...@apache.org>.
> i just happened to be looking through the RequestUtil.parseParameters()
> method, and something struck me as odd.  since i don't know the history
> here, i figured i'd ask someone who does...
>
> anyway, the method looks something like this:
>
>     public static void parseParameters(Map map, String data, String
> encoding)
>         throws UnsupportedEncodingException {
>
>         if ((data != null) && (data.length() > 0)) {
>             int len = data.length();
>             byte[] bytes = new byte[len];
>             data.getBytes(0, len, bytes, 0);
>             parseParameters(map, bytes, encoding);
>         }
>
>     }
>
> what strikes me as odd is an encoding is being passed into the method,
> but rather than using this encoding to get the bytes out of the string
> passed in, a deprecated getBytes method is being used.  also, to
> determine the number of bytes to get, String.length() is being used.
> this is a potential problem because String.length() is the number of
> unicode characters, which is not necessarily the same number of bytes in
> the string (think multibyte character sets).
>
> i believe a safer version of this method is:
>
>     public static void parseParameters(Map map, String data, String
> encoding)
>         throws UnsupportedEncodingException {
>
>         if ((data != null) && (data.length() > 0)) {
>             byte[] bytes = data.getBytes(encoding);
>             parseParameters(map, bytes, encoding);
>         }
>
>     }
>
> RequestUtil.URLDecode(String str, String enc) has a similar problem.
>
> i can commit changes to fix these problems if that's ok (i'm a new
> committer, so i figure before i go stepping on anybody's toes, i'd run
> this by the list :) ...

We use simple byte conversion there because the String which is passed here
(the query portion of the URL) is supposed to only have US-ASCII characters.

Remy