You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Eugen Kuleshov <an...@hco.kollegienet.dk> on 2000/04/12 16:32:47 UTC

RequestUtil sources

Hello!

  I just look at some sources related to request parameters parsing.
  Lets look at
jakarta-tomcat\src\share\org\apache\tomcat\util\RequestUtil.java

----------------
[skipped]
/**
 * Usefull methods for request processing. Used to be in ServerRequest
or Request,
 * but most are usefull in other adapters. 
 * 
 * @author James Duncan Davidson [duncan@eng.sun.com]
 * @author James Todd [gonzo@eng.sun.com]
 * @author Jason Hunter [jch@eng.sun.com]
 * @author Harish Prabandham
 * @author costin@eng.sun.com
 */
public class RequestUtil {

    public static Hashtable readFormData( Request request ) {

        String contentType=request.getContentType();
    if (contentType != null) {
            if (contentType.indexOf(";")>0)
               
contentType=contentType.substring(0,contentType.indexOf(";")-1);
            contentType = contentType.toLowerCase().trim();
        }

    int contentLength=request.getContentLength();

    if (contentType != null &&
            contentType.startsWith("application/x-www-form-urlencoded"))
{
        try {
        ServletInputStream is=request.getInputStream();
                Hashtable postParameters = 
HttpUtils.parsePostData(contentLength, is);
        return postParameters;
        }
        catch (IOException e) {
        // nothing
        // XXX at least warn ?
        }
        }
    return null;
    }

  etc...........

----------

  Could someone tell me for what need to cut contentType if it not used
below?

  Eugen Kuleshov.

Re: RequestUtil sources

Posted by Eugen Kuleshov <an...@hco.kollegienet.dk>.
Costin Manolache wrote:
 
> We are now in open development season for 3.2, any patch is wellcome !

  ok
 
> And please keep reading the code !

  Then let me tell what we want.
  Response interface have get/setCharacterEncoding() methods. But
Request don't have it. I not quite understand why. Actually if we will
have this for Request then we can detect CharacterEncoding of user
request or just set default encoding in web application or from system
property file.encoding. I know it's out from JSDK specification but it's
really necessary for national languages and local character encodings. 
  CharacrerEncoding from request nust be used in parser of request
parameters for correct CharacrerEncoding conversion. For example:

    public static String unUrlDecode(String data)

  This method get string with %xx encoded characters. But if Servlet
engine and Web browser works with different caracter encoding you will
get wrong String after decode %xx data. For example in russian koi8-r
(ibm-878) and russian Cp1251 (windows-1251) caodepages code %c0 is two
different cahracters.
  The same problem with POST method. There used this code:

  ServletInputStream is=request.getInputStream();
  Hashtable postParameters =  HttpUtils.parsePostData(contentLength,
is);

  Then lets look ar parsePostData

----------
    static public Hashtable parsePostData(int len, 
                      ServletInputStream in)
    {
    // XXX
    // should a length of 0 be an IllegalArgumentException
    
    if (len <=0)
        return new Hashtable(); // cheap hack to return an empty hash

    if (in == null) {
        throw new IllegalArgumentException();
    }
    
    //
    // Make sure we read the entire POSTed body.
    //
        byte[] postedBytes = new byte [len];
        try {
            int offset = 0;
       
        do {
        int inputLen = in.read (postedBytes, offset, len - offset);
        if (inputLen <= 0) {
            String msg = lStrings.getString("err.io.short_read");
            throw new IllegalArgumentException (msg);
        }
        offset += inputLen;
        } while ((len - offset) > 0);

    } catch (IOException e) {
        throw new IllegalArgumentException(e.getMessage());
    }

        // XXX we shouldn't assume that the only kind of POST body
        // is FORM data encoded using ASCII or ISO Latin/1 ... or
        // that the body should always be treated as FORM data.
        //

        try {
            String postedBody = new String(postedBytes, 0, len,
"8859_1");
            return parseQueryString(postedBody);
        } catch (java.io.UnsupportedEncodingException e) {
            // XXX function should accept an encoding parameter & throw
this
            // exception.  Otherwise throw something expected.
            throw new IllegalArgumentException(e.getMessage());
        }
    }

----------

  So. Why there always used 8859_1 ? In real world exist not only this
encoding.

  I think (and still hope) that this situation must be changed before
Tomcat release.
  I would like to see in Tomcat's (and in JSDK specification) Request
additional methods for setting and getting CharacterEncoding of request
and this information must used in request parameter parser.

  Thank you.

  Eugen Kuleshov.

Re: RequestUtil sources

Posted by Costin Manolache <co...@eng.sun.com>.
We are now in open development season for 3.2, any patch is wellcome !

And please keep reading the code !

Costin

Eugen Kuleshov wrote:

> Costin Manolache wrote:
>
> > >   Could someone tell me for what need to cut contentType if it not used
> > > below?
> > Would you prefer a for(i=1; i<500; i++ ) {}  :-)???
>
>   sometime ago I heard that Tomkat will have good performance.
>   It's good idea to add some code like this for performance increasing
> after removing this. :)
>
> > That's why open source is good - people can find this kind of code...
> > I'll try to remove this ( and many other examples ). I'm more concerned with
> > the
> > atrocities in SimpleMapper and the header parsing and the date parsing.
>
>   It's ok. But long time I've tryed to fix some charset related things
> in parameter parser but still without success. Would you like if I'll
> offer some fix (or another implementation) of this parser for GET, POST
> and multipart requests?
>
> > I guess the original idea was to extract the encoding ( what's after ";" )
> > and use it when reading the form data.
>
>   good idea, but unfortunately still not implemented. :(
>
>   Eugen Kuleshov.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org


Re: RequestUtil sources

Posted by Eugen Kuleshov <an...@hco.kollegienet.dk>.
Costin Manolache wrote:
 
> >   Could someone tell me for what need to cut contentType if it not used
> > below?
> Would you prefer a for(i=1; i<500; i++ ) {}  :-)???

  sometime ago I heard that Tomkat will have good performance.
  It's good idea to add some code like this for performance increasing
after removing this. :)
 
> That's why open source is good - people can find this kind of code...
> I'll try to remove this ( and many other examples ). I'm more concerned with
> the
> atrocities in SimpleMapper and the header parsing and the date parsing.

  It's ok. But long time I've tryed to fix some charset related things
in parameter parser but still without success. Would you like if I'll
offer some fix (or another implementation) of this parser for GET, POST
and multipart requests?
 
> I guess the original idea was to extract the encoding ( what's after ";" )
> and use it when reading the form data.

  good idea, but unfortunately still not implemented. :(

  Eugen Kuleshov.

Re: RequestUtil sources

Posted by Costin Manolache <co...@eng.sun.com>.
>   Could someone tell me for what need to cut contentType if it not used
> below?

Would you prefer a for(i=1; i<500; i++ ) {}  :-)???

That's why open source is good - people can find this kind of code...
I'll try to remove this ( and many other examples ). I'm more concerned with
the
atrocities in SimpleMapper and the header parsing and the date parsing.

I guess the original idea was to extract the encoding ( what's after ";" )
and use it when reading the form data.

Costin