You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jun Inamori <j-...@osa.att.ne.jp> on 2000/05/01 05:18:25 UTC

Proposal: RequestImpl

Hello,

'HttpServletRequest.getParameter(key)' can't return the correct
parameter string,  when the original character sequence contains the 2
bytes characters, such as Japanese character.
As you know, 2 bytes characters are encoded like:
   "%82%B1%82%F1%82%C9%82%BF%82%ED"
The first Japanese character consists of '82' and 'B1' and the second of
'82' and 'F1'.
To get back the original string from integer sequence, we have to use
ByteArrayOutputStream. But RequestUtil and HttpUtil convert each integer
to 'char' and append the 'char' to StringBuffer.
This doesn't work. Before I describe my proposal to Tomcat, I'll explain
what the problem is. Please look the following code:

import java.io.*;

public class CharPrint{

    static final private String[]
hel={"82","B1","82","F1","82","C9","82","BF","82","ED"};

    public CharPrint(){
    }

    private String conv_1()
	throws IOException{
	ByteArrayOutputStream baos=new ByteArrayOutputStream();
	for(int i=0; i<hel.length; i++){ 
	    int i_v=Integer.parseInt(hel[i],16);
	    baos.write(i_v);
	}
	String str=baos.toString();
	return str;
    }

    private String conv_2()
	throws IOException{
	ByteArrayOutputStream baos=new ByteArrayOutputStream();
	for(int i=0; i<hel.length; i++){ 
	    int i_v=Integer.parseInt(hel[i],16);
	    baos.write(i_v);
	}
	String str=baos.toString("SJIS");
	return str;
    }

    private String conv_3(){
	StringBuffer sb=new StringBuffer();
	for(int i=0; i<hel.length; i++){ 
	    int i_v=Integer.parseInt(hel[i],16);
	    sb.append((char)i_v);
	}
	String str=sb.toString();
	return str;
    }

    private String conv_4()
	throws IOException{
	String str=conv_1();
	byte[] str_b=str.getBytes();
	try{
	    str=new String(str_b,"SJIS");
	}
	catch(Exception ex){
	    str=ex.toString();
	}
	return str;
    }

    public void print_def()
	throws IOException{
	String res_1=conv_1();
	String res_2=conv_2();
	String res_3=conv_3();
	String res_4=conv_4();
	System.out.println("ByteArrayDef"+res_1);
	System.out.println("ByteArrayEnc"+res_2);
	System.out.println("StringBuffer"+res_3);
	System.out.println("ByteArrayDefEnc"+res_4);
    }

    public void print_enc(String enc)
	throws IOException{
	String res_1=conv_1();
	String res_2=conv_2();
	String res_3=conv_3();
	String res_4=conv_4();
	OutputStreamWriter osw=new OutputStreamWriter(System.out,enc);
	PrintWriter pw=new PrintWriter(new BufferedWriter(osw));
	pw.println("ByteArrayDef"+res_1);
	pw.println("ByteArrayEnc"+res_2);
	pw.println("StringBuffer"+res_3);
	pw.println("ByteArrayDefEnc"+res_4);
	pw.close();
    }

    static public void main(String[] args){
	try{
	    CharPrint cp=new CharPrint();
	    if(args.length==0){
		cp.print_def();
	    }
	    else{
		cp.print_enc(args[0]);
	    }
	}
	catch(Exception ex){
	    System.out.println(ex.toString());
	    System.exit(1);
	}
	System.exit(0);
    }

}

The above program tries to retrieve the Japanese string from the integer
sequence, where each integer represents ASCII code. We can see the
resulting string in out console. This program takes zero or one
argument, the Java character encoding for OutputStreamWriter. When this
argument specified, it is passed to OutputStreamWriter, so we can
specify the encoding other than the default one. It seems meaningless to
specify 'SJIS' in the English only environment, but we can see the
result in our WWW browser by invoking this program as CGI. (We may run
Tomcat on English only environment, but the parameter to our servlets
may be Japanese.)

In case of English only environment, ONLY

    private String conv_2()
	throws IOException{
	ByteArrayOutputStream baos=new ByteArrayOutputStream();
	for(int i=0; i<hel.length; i++){ 
	    int i_v=Integer.parseInt(hel[i],16);
	    baos.write(i_v);
	}
	String str=baos.toString("SJIS");
	return str;
    }

can return the correct string. This means we have to use
ByteArrayOutputStream and also specify the proper Java encoding in
'toString()' methods. In Japanese enabled environment, 

    private String conv_1()
	throws IOException{
	ByteArrayOutputStream baos=new ByteArrayOutputStream();
	for(int i=0; i<hel.length; i++){ 
	    int i_v=Integer.parseInt(hel[i],16);
	    baos.write(i_v);
	}
	String str=baos.toString();
	return str;
    }

is also OK.

Now let's back to Tomcat. When 'HttpServletRequest.getParameter(key)' is
invoked, the following methods are invoked:

RequestImpl.getParameterValuess()-->
-->RequestImple.handleParameters()-->
1)-->RequestUtil.processFormData()-->RequestUtil.unUrlDecode()
or
2)-->RequestUtil.readFormData()-->HttpUtils.parsePostData()-->
HttpUtils.parseQueryString()-->HttpUtils.parseName()

So, converting integer sequence back to string is done by
RequestUtil.unUrlDecode() or HttpUtils.parseName(), and both of them use
StringBuffer for such a task.

Modifying these method to use ByteArrayOutputStream instead of
StringBuffer is easy. But it is preferable(or required as the server?)
that Tomcat running on English only environment can return the correct 2
bytes string. To make it possible, we have to pass the correct Java
encoding to the modified methods. And this results in the following
difficulty:
1) How to know the appropriate Java encoding.
2) HttpUtil resides in 'javax.*' package and this means we can't change
the parameter signature of 'parsePostData()'.

For 1), Locale can be retreived by RequestImple.getFacade().getLocale(),
so we have a chance to guess the Java encoding based on this Locale.
This approach may not solve all the case, but will work in most case. To
get the Java encoding from the retrieved Loacle, I've created the new
class, which is:
    org.apache.tomcat.util.LocaleToJavaEncMap
This is almost the same as LocaleToCharsetMap, but 'getEncoding(Locale
loc)' returns the Java encoding appropriate for the specified Locale.

For 2), the same work can be done within 'RequestUtil', so I copied 
    HttpUtils.parsePostData()
    HttpUtils.parseQueryString()
to RequestUtil with slight modification.
The original HttpUtils.parseQueryString() calls HttpUtils.parseName(),
but HttpUtils.parseName() is almost the same as
RequestUtil.unUrlDecode(), so my parseQueryString() call it.

Now the new process to get the decoded parameter looks like this:
RequestImpl.getParameterValuess()-->
-->RequestImple.handleParameters()-->
-->LocaleToJavaEncMap.getEncoding(RequestImple.getFacade().getLocale())-->
1)-->RequestUtil.processFormData()-->RequestUtil.unUrlDecode()
or
2)-->RequestUtil.readFormData()-->RequestUtil.parsePostData()-->
RequestUtil.parseQueryString()-->RequestUtil.unUrlDecode()

By this approach, 2 bytes string can be retrieved even when Tomcat is
running on English only environment.

Happy Java/RMI programming!

Jun Inamori
E-mail: j-office@osa.att.ne.jp
URL:    http://www.oop-reserch.com

---Souce---
The following are my new source code.
The lines between
    //Start:Jun Inamori modified
    //End:Jun Inamori modified
are the ones I modified.

*** Souce For ***
*** org.apache.tomcat.util.LocaleToJavaEncMap ***

package org.apache.tomcat.util;

import java.util.*;

public class LocaleToJavaEncMap {

  private static Hashtable map;

  static {
    map = new Hashtable();

    map.put("ar","8859_6");
    map.put("be","8859_5");
    map.put("bg","8859_5");
    map.put("ca","8859_1");
    map.put("cs","8859_2");
    map.put("da","8859_1");
    map.put("de","8859_1");
    map.put("el","8859_7");
    map.put("en","8859_1");
    map.put("es","8859_1");
    map.put("et","8859_1");
    map.put("fi","8859_1");
    map.put("fr","8859_1");
    map.put("hr","8859_2");
    map.put("hu","8859_2");
    map.put("is","8859_1");
    map.put("it","8859_1");
    map.put("iw","8859_8");
    map.put("ja","SJIS");
    map.put("ko","KSC5601");
    map.put("lt","8859_2");
    map.put("lv","8859_2");
    map.put("mk","8859_5");
    map.put("nl","8859_1");
    map.put("no","8859_1");
    map.put("pl","8859_2");
    map.put("pt","8859_1");
    map.put("ro","8859_2");
    map.put("ru","8859_5");
    map.put("sh","8859_5");
    map.put("sk","8859_2");
    map.put("sl","8859_2");
    map.put("sq","8859_2");
    map.put("sr","8859_5");
    map.put("sv","8859_1");
    map.put("tr","8859_9");
    map.put("uk","8859_5");
    map.put("zh","GB2312");
    map.put("zh_TW","Big5");
  }

  /**
   * Gets the preferred Java encoding for the given locale, or null if
the locale
   * is not recognized.
   *
   * @param loc the locale
   * @return the preferred Java encoding
   */
  public static String getEncoding(Locale loc) {
    String encoding;

    // Try for an full name match (may include country)
    encoding = (String) map.get(loc.toString());
    if (encoding != null) return encoding;

    // If a full name didn't match, try just the language
    encoding = (String) map.get(loc.getLanguage());
    return encoding;  // may be null
  }
}

*** Source for ***
*** org.apache.tomcat.core.RequestImpl ***

package org.apache.tomcat.core;

import org.apache.tomcat.util.*;
import java.io.*;
import java.net.*;
import java.security.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;


/**
 *
 * @author James Duncan Davidson [duncan@eng.sun.com]
 * @author James Todd [gonzo@eng.sun.com]
 * @author Jason Hunter [jch@eng.sun.com]
 * @author Harish Prabandham
 * @author Alex Cruikshank [alex@epitonic.com]
 */
public class RequestImpl  implements Request {

    // GS, used by the load balancing layer in the Web Servers
    // jvmRoute == the name of the JVM inside the plugin.
    protected String jvmRoute;

    // XXX used by forward to override, need a better
    // mechanism
    protected String requestURI;
    protected String queryString;

   //  RequestAdapterImpl Hints
    protected String serverName;
    protected Vector cookies = new Vector();

    protected String contextPath;
    protected String lookupPath; // everything after contextPath before
?
    protected String servletPath;
    protected String pathInfo;
    protected String pathTranslated;
    // Need to distinguish between null pathTranslated and
    // lazy-computed pathTranlsated
    protected boolean pathTranslatedIsSet=false;
    
    protected Hashtable parameters = new Hashtable();
    protected int contentLength = -1;
    protected String contentType = null;
    protected String charEncoding = null;
    protected String authType;
    protected String remoteUser;

    // Request
    protected Response response;
    protected HttpServletRequestFacade requestFacade;
    protected Context context;
    protected ContextManager contextM;
    protected Hashtable attributes = new Hashtable();

    protected boolean didReadFormData;
    protected boolean didParameters;
    protected boolean didCookies;
    // end "Request" variables

    // Session
    // set by interceptors - the session id
    protected String reqSessionId;
    protected boolean sessionIdFromCookie=false;
    protected boolean sessionIdFromURL=false;
    // cache- avoid calling SessionManager for each getSession()
    protected HttpSession serverSession;


    // LookupResult - used by sub-requests and
    // set by interceptors
    protected String servletName;
    protected ServletWrapper handler = null;
    Container container;

    protected String mappedPath = null;

    protected String scheme;
    protected String method;
    protected String protocol;
    protected MimeHeaders headers;
    protected ServletInputStream in;

    protected int serverPort;
    protected String remoteAddr;
    protected String remoteHost;


    protected static StringManager sm =
        StringManager.getManager("org.apache.tomcat.core");

    public RequestImpl() {
 	headers = new MimeHeaders();
 	recycle(); // XXX need better placement-super()
    }

    // GS - return the jvm load balance route
    public String getJvmRoute() {
	    return jvmRoute;
    }

    public String getScheme() {
        return scheme;
    }

    public String getMethod() {
        return method;
    }

    public String getRequestURI() {
        if( requestURI!=null) return requestURI;
	return requestURI;
    }

    // XXX used by forward
    public String getQueryString() {
	if( queryString != null ) return queryString;
        return queryString;
    }

    public String getProtocol() {
        return protocol;
    }

    // XXX server IP and/or Host:
    public String getServerName() {
	if(serverName!=null) return serverName;

	// XXX Move to interceptor!!!
	String hostHeader = this.getHeader("host");
	if (hostHeader != null) {
	    int i = hostHeader.indexOf(':');
	    if (i > -1) {
		hostHeader = hostHeader.substring(0,i);
	    }
	    serverName=hostHeader;
	    return serverName;
	}
	// default to localhost - and warn
	System.out.println("No server name, defaulting to localhost");
	serverName="localhost";
	return serverName;
    }

    public String getLookupPath() {
	return lookupPath;
    }

    public void setLookupPath( String l ) {
	lookupPath=l;
    }

    public String[] getParameterValues(String name) {
	handleParameters();
        return (String[])parameters.get(name);
    }

    public Enumeration getParameterNames() {
	handleParameters();
        return parameters.keys();
    }

    public String getAuthType() {
    	return authType;
    }

    public String getCharacterEncoding() {
        if(charEncoding!=null) return charEncoding;
        charEncoding = RequestUtil.getCharsetFromContentType(
getContentType());
	return charEncoding;
    }

    public int getContentLength() {
        if( contentLength > -1 ) return contentLength;
	contentLength = getFacade().getIntHeader("content-length");
	return contentLength;
    }

    public String getContentType() {
	if(contentType != null) return contentType;
	contentType = getHeader("content-type");
	if(contentType != null) return contentType;
	// can be null!! -
	return contentType;
    }

    /** All adapters that know the PT needs to call this method,
	in order to set pathTranslatedIsSet, otherwise tomcat
	will try to compute it again
    */
    public void setPathTranslated(String s ) {
	pathTranslated=s;
	pathTranslatedIsSet=true;
    }

    public String getPathTranslated() {
	if( pathTranslatedIsSet ) return pathTranslated;

	// not set yet - we'll compute it
	pathTranslatedIsSet=true;
	String path=getPathInfo();
	// In CGI spec, PATH_TRANSLATED shouldn't be set if no path info is
present
	pathTranslated=null;
	if(path==null || "".equals( path ) ) return null;
	pathTranslated=context.getRealPath( path );
	return pathTranslated;
    }


    // XXX XXX Servlet API conflicts with the CGI specs -
    // PathInfo should be "" if no path info is requested ( as it is in
CGI ).
    // We are following the spec, but IMHO it's a bug ( in the spec )
    public String getPathInfo() {
        return pathInfo;
    }
    
    public void setRemoteUser(String s) {
	remoteUser=s;
    }

    public String getRemoteUser() {
	return remoteUser;
    }

    public boolean isSecure() {
	if( context.getRequestSecurityProvider() == null )
	    return false;
	return context.getRequestSecurityProvider().isSecure(context,
getFacade());
    }

    public Principal getUserPrincipal() {
	if( context.getRequestSecurityProvider() == null )
	    return null;
	return context.getRequestSecurityProvider().getUserPrincipal(context,
getFacade());
    }

    public boolean isUserInRole(String role) {
	if( context.getRequestSecurityProvider() == null )
	    return false;
	return context.getRequestSecurityProvider().isUserInRole(context,
getFacade(), role);
    }


    public String getRequestedSessionId() {
        return reqSessionId;
    }

    public void setRequestedSessionId(String reqSessionId) {
	this.reqSessionId = reqSessionId;
    }

    public String getServletPath() {
        return servletPath;
    }

    // End hints

    // -------------------- Request methods ( high level )
    public HttpServletRequestFacade getFacade() {
	// some requests are internal, and will never need a
	// facade - no need to create a new object unless needed.
        if( requestFacade==null )
	    requestFacade = new HttpServletRequestFacade(this);
	return requestFacade;
    }

    public Context getContext() {
	return context;
    }

    public void setResponse(Response response) {
	this.response = response;
    }

    public Response getResponse() {
	return response;
    }

    public boolean isRequestedSessionIdFromCookie() {
	return sessionIdFromCookie;
    }

    public boolean isRequestedSessionIdFromURL() {
	return sessionIdFromURL;
    }

    public void setRequestedSessionIdFromCookie(boolean newState){
	sessionIdFromCookie=true;
    }
 
    public void setRequestedSessionIdFromURL(boolean newState) {
	sessionIdFromURL=newState;
    }

    public void setContext(Context context) {
	this.context = context;
    }

    public void setContextManager( ContextManager cm ) {
	contextM=cm;
    }

    public ContextManager getContextManager() {
	return contextM;
    }

    public Cookie[] getCookies() {
	// XXX need to use Cookie[], Vector is not needed
	if( ! didCookies ) {
	    // XXX need a better test
	    // XXX need to use adapter for hings
	    didCookies=true;
	    RequestUtil.processCookies( this, cookies );
	}

	Cookie[] cookieArray = new Cookie[cookies.size()];

	for (int i = 0; i < cookies.size(); i ++) {
	    cookieArray[i] = (Cookie)cookies.elementAt(i);
	}

	return cookieArray;
	//        return cookies;
    }

    public HttpSession getSession(boolean create) {

	// use the cached value, unless it is invalid
	if( serverSession!=null ) {
	    // Detect "invalidity" by trying to access a property
	    try {
		serverSession.getCreationTime();
		return (serverSession);
	    } catch (IllegalStateException e) {
		// It's invalid, so pretend we never saw it
		serverSession = null;
		reqSessionId = null;
	    }
	}
	
	SessionManager sM=context.getSessionManager();

	// if the interceptors found a request id, use it
	if( reqSessionId != null ) {
	    // we have a session !
	    serverSession=sM.findSession( context, reqSessionId );
	    if( serverSession!=null) return serverSession;
	}

	if( ! create )
	    return null;

	// no session exists, create flag
	serverSession =sM.createSession( context );
	reqSessionId = serverSession.getId();

	// XXX XXX will be changed - post-request Interceptors
	// ( to be defined) will set the session id in response,
	// SessionManager is just a repository and doesn't deal with
	// request internals.
	// hardcoded - will change!
	response.setSessionId( reqSessionId );

	return serverSession;
    }

    public boolean isRequestedSessionIdValid() {
	// so here we just assume that if we have a session it's,
	// all good, else not.
	HttpSession session = (HttpSession)getSession(false);

	if (session != null) {
	    return true;
	} else {
	    return false;
	}
    }

    // -------------------- LookupResult
    public ServletWrapper getWrapper() {
	return handler;
    }

    public void setWrapper(ServletWrapper handler) {
	this.handler=handler;
    }

    public Container getContainer() {
	return container;
    }

    public void setContainer(Container container) {
	this.container=container;
    }

    /** The file - result of mapping the request ( using aliases and
other
     *  mapping rules. Usefull only for static resources.
     */
    public String getMappedPath() {
	return mappedPath;
    }

    public void setMappedPath( String m ) {
	mappedPath=m;
    }

    public void setRequestURI( String r ) {
 	this.requestURI=r;
    }

    public void setParameters( Hashtable h ) {
	if(h!=null)
	    this.parameters=h;
	// XXX Should we override query parameters ??
    }

    public Hashtable getParameters() {
	return parameters;
    }

    public void setContentLength( int  len ) {
	this.contentLength=len;
    }

    public void setContentType( String type ) {
	this.contentType=type;
    }

    public void setCharEncoding( String enc ) {
	this.charEncoding=enc;
    }

    public void setAuthType(String authType) {
        this.authType = authType;
    }


    public void setPathInfo(String pathInfo) {
        this.pathInfo = pathInfo;
    }

    /** Set query string - will be called by forward
     */
    public void setQueryString(String queryString) {
	// the query will be processed when getParameter() will be called.
	// Or - if you alredy have it parsed, call setParameters()
	this.queryString = queryString;
    }

    public void setSession(HttpSession serverSession) {
	this.serverSession = serverSession;
    }

    public void setServletPath(String servletPath) {
	this.servletPath = servletPath;
    }


    // XXX
    // the server name should be pulled from a server object of some
    // sort, not just set and got.

    /** Virtual host */
    public void setServerName(String serverName) {
	this.serverName = serverName;
    }

    // -------------------- Attributes
    public Object getAttribute(String name) {
        return attributes.get(name);
    }

    public void setAttribute(String name, Object value) {
	if(name!=null && value!=null)
	    attributes.put(name, value);
    }

    public void removeAttribute(String name) {
	attributes.remove(name);
    }

    public Enumeration getAttributeNames() {
        return attributes.keys();
    }
    // End Attributes

    // -------------------- Facade for MimeHeaders
    public Enumeration getHeaders(String name) {
	//	Vector v = reqA.getMimeHeaders().getHeadersVector(name);
	Vector v = getMimeHeaders().getHeadersVector(name);
	return v.elements();
    }

    // -------------------- Utils - facade for RequestUtil
    public BufferedReader getReader()
	throws IOException
    {
	return RequestUtil.getReader( this );
    }

    private void handleParameters() {
	//Start:Jun Inamori modified
	HttpServletRequest hsq=getFacade();
	Locale req_l=hsq.getLocale();
	String enc=LocaleToJavaEncMap.getEncoding(req_l);
	//End:Jun Inamori modified
   	if(!didParameters) {
	    String qString=getQueryString();
	    if(qString!=null) {
		didParameters=true;
		//Start:Jun Inamori modified
		RequestUtil.processFormData( qString, parameters,enc);
		//End:Jun Inamori modified
	    }
	}
	if (!didReadFormData) {
	    didReadFormData = true;
	    //Start:Jun Inamori modified
	    Hashtable postParameters=RequestUtil.readFormData( this ,enc);
	    //End:Jun Inamori modified
	    if(postParameters!=null)
		parameters = RequestUtil.mergeParameters(parameters, postParameters);
	}
    }

    // -------------------- End utils
    public void recycle() {
	response = null;
	context = null;
        attributes.clear();
        parameters.clear();
        cookies.removeAllElements();
	//        requestURI = null;
	//        queryString = null;
        contentLength = -1;
        contentType = null;
        charEncoding = null;
        authType = null;
        remoteUser = null;
        reqSessionId = null;
	serverSession = null;
	didParameters = false;
	didReadFormData = false;
	didCookies = false;
	container=null;
	handler=null;
	jvmRoute = null;
	scheme = "http";// no need to use Constants
	method = "GET";
	requestURI="/";
	queryString=null;
	protocol="HTTP/1.0";
	headers.clear(); // XXX use recycle pattern
	serverName="localhost";
	serverPort=8080;
	pathTranslated=null;
	pathInfo=null;
	pathTranslatedIsSet=false;
	    
	// XXX a request need to override those if it cares
	// about security
	remoteAddr="127.0.0.1";
	remoteHost="localhost";

    }

    public MimeHeaders getMimeHeaders() {
	return headers;
    }

    public String getHeader(String name) {
        return headers.getHeader(name);
    }

    public Enumeration getHeaderNames() {
        return headers.names();
    }

    public ServletInputStream getInputStream() throws IOException {
    	return in;
    }

    public int getServerPort() {
        return serverPort;
    }

    public String getRemoteAddr() {
        return remoteAddr;
    }

    public String getRemoteHost() {
	return remoteHost;
    }

    /** Fill in the buffer. This method is probably easier to implement
than
	previous.
	This method should only be called from SerlvetInputStream
implementations.
	No need to implement it if your adapter implements ServletInputStream.
     */
    // you need to override this method if you want non-empty
InputStream
    public  int doRead( byte b[], int off, int len ) throws IOException
{
	return -1; // not implemented - implement getInputStream
    }


    // XXX I hate this - but the only way to remove this method from the
    // inteface is to implement it on top of doRead(b[]).
    // Don't use this method if you can ( it is bad for performance !!)
    // you need to override this method if you want non-empty
InputStream
    public int doRead() throws IOException {
	return -1;
    }

    // -------------------- "cooked" info --------------------
    // Hints = return null if you don't know,
    // and Tom will find the value. You can also use the static
    // methods in RequestImpl

    // What's between context path and servlet name ( /servlet )
    // A smart server may use arbitrary prefixes and rewriting
    public String getServletPrefix() {
	return null;
    }

    public void setScheme( String scheme ) {
	this.scheme=scheme;
    }

    public void setMethod( String method ) {
	this.method=method;
    }

    public void setProtocol( String protocol ) {
	this.protocol=protocol;
    }

    public void setMimeHeaders( MimeHeaders headers ) {
	this.headers=headers;
    }

    public void setBody( StringBuffer body ) {
	// ???
    }

    public void setServerPort(int serverPort ) {
	this.serverPort=serverPort;
    }

    public void setRemoteAddr( String remoteAddr ) {
	this.remoteAddr=remoteAddr;
    }

    public void setRemoteHost(String remoteHost) {
	this.remoteHost=remoteHost;
    }

    public String toString() {
	StringBuffer sb=new StringBuffer();
	sb.append( "R( ");
	if( context!=null) {
	    sb.append( context.getPath() );
	    if( getServletPath() != null )
		sb.append( " + " + getServletPath() + " + " + getPathInfo());
	    else
		sb.append( " + " + getLookupPath());
	} else {
	    sb.append(getRequestURI());
	}
	sb.append(")");
	return sb.toString();
    }

    public String toStringDebug() {
	StringBuffer sb=new StringBuffer();
	sb.append( "Request( " + context ).append("\n");
	sb.append( "    URI:" + getRequestURI()  ).append("\n");
	sb.append( "    SP:" + getServletPath() );
	sb.append( ",PI:" + getPathInfo() );
	sb.append( ",LP:" + getLookupPath() );
	sb.append( ",MP:" + getMappedPath() );
	sb.append( "," + getWrapper() +") ");
	return sb.toString();
    }
}

*** Source for ***
*** org.apache.tomcat.util.RequestUtil ***

package org.apache.tomcat.util;

import org.apache.tomcat.core.*;
import org.apache.tomcat.core.Constants;
import java.io.*;
import java.net.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.text.*;

/**
 * Usefull methods for request processing. Used to be in ServerRequest
or Request,
 * but most are usefull in other adapters. 
 * 
 * @author James Duncan Davidson [duncan@eng.sun.com]
 * @author James Todd [gonzo@eng.sun.com]
 * @author Jason Hunter [jch@eng.sun.com]
 * @author Harish Prabandham
 * @author costin@eng.sun.com
 */
public class RequestUtil {

    //Start:Jun Inamori modified
    public static Hashtable readFormData( Request request,String enc ) {
    //End:Jun Inamori modified

        String contentType=request.getContentType();
	if (contentType != null) {
            if (contentType.indexOf(";")>0)
               
contentType=contentType.substring(0,contentType.indexOf(";")-1);
            contentType = contentType.toLowerCase().trim();
        }

	int contentLength=request.getContentLength();

	if (contentType != null &&
            contentType.startsWith("application/x-www-form-urlencoded"))
{
	    try {
		ServletInputStream is=request.getInputStream();
		//Start:Jun Inamori modified
                Hashtable postParameters =parsePostData(contentLength,
is, enc);
		//End:Jun Inamori modified
		return postParameters;
	    }
	    catch (IOException e) {
		// nothing
		// XXX at least warn ?
	    }
        }
	return null;
    }

    public static Hashtable mergeParameters(Hashtable one, Hashtable
two) {
	// Try some shortcuts
	if (one.size() == 0) {
	    return two;
	}

	if (two.size() == 0) {
	    return one;
	}

	Hashtable combined = (Hashtable) one.clone();

        Enumeration e = two.keys();

	while (e.hasMoreElements()) {
	    String name = (String) e.nextElement();
	    String[] oneValue = (String[]) one.get(name);
	    String[] twoValue = (String[]) two.get(name);
	    String[] combinedValue;

	    if (oneValue == null) {
		combinedValue = twoValue;
	    }

	    else {
		combinedValue = new String[oneValue.length + twoValue.length];

	        System.arraycopy(oneValue, 0, combinedValue, 0,
                    oneValue.length);
	        System.arraycopy(twoValue, 0, combinedValue,
                    oneValue.length, twoValue.length);
	    }

	    combined.put(name, combinedValue);
	}

	return combined;
    }

    public static BufferedReader getReader(Request request) throws
IOException {
        // XXX
	// this won't work in keep alive scenarios. We need to provide
	// a buffered reader that won't try to read in the stream
	// past the content length -- if we don't, the buffered reader
	// will probably try to read into the next request... bad!
        String encoding = request.getCharacterEncoding();
        if (encoding == null) {
            encoding = Constants.DEFAULT_CHAR_ENCODING;
        }
	InputStreamReader r =
            new InputStreamReader(request.getInputStream(), encoding);
	return new BufferedReader(r);
    }

    public static void processCookies( Request request, Vector cookies )
{
	// XXX bug in original RequestImpl - might not work if multiple
	// cookie headers.
	//
	// XXX need to use the cookies hint in RequestAdapter
    	String cookieString = request.getHeader("cookie");
	
	if (cookieString != null) {
            StringTokenizer tok = new StringTokenizer(cookieString,
                                                      ";", false);
            while (tok.hasMoreTokens()) {
                String token = tok.nextToken();
                int i = token.indexOf("=");
                if (i > -1) {

                    // XXX
                    // the trims here are a *hack* -- this should
                    // be more properly fixed to be spec compliant
                    
                    String name = token.substring(0, i).trim();
                    String value = token.substring(i+1,
token.length()).trim();
		    // RFC 2109 and bug 
		    value=stripQuote( value );
                    Cookie cookie = new Cookie(name, value);
                    cookies.addElement(cookie);
                } else {
                    // we have a bad cookie.... just let it go
                }
            }
        }
    }

    
    /**
     *
     * Strips quotes from the start and end of the cookie string
     * This conforms to RFC 2109
     * 
     * @param value            a <code>String</code> specifying the
cookie 
     *                         value (possibly quoted).
     *
     * @see #setValue
     *
     */
    private static String stripQuote( String value )
    {
	//	System.out.println("Strip quote from " + value );
	if (((value.startsWith("\"")) && (value.endsWith("\""))) ||
	    ((value.startsWith("'") && (value.endsWith("'"))))) {
	    try {
		return value.substring(1,value.length()-1);
	    } catch (Exception ex) { 
	    }
	}
	return value;
    }  
    
    //Start:Jun Inamori modified
    public static void processFormData(String data, Hashtable
parameters, String enc) {
    //End:Jun Inamori modified
        // XXX
        // there's got to be a faster way of doing this.
	if( data==null ) return; // no parameters
        StringTokenizer tok = new StringTokenizer(data, "&", false);
        while (tok.hasMoreTokens()) {
            String pair = tok.nextToken();
	    int pos = pair.indexOf('=');
	    if (pos != -1) {
		//Start:Jun Inamori modified
		String key = unUrlDecode(pair.substring(0, pos),enc);
		String value = unUrlDecode(pair.substring(pos+1,
							  pair.length()),enc);
		//End:Jun Inamori modified
		String values[];
		if (parameters.containsKey(key)) {
		    String oldValues[] = (String[])parameters.get(key);
		    values = new String[oldValues.length + 1];
		    for (int i = 0; i < oldValues.length; i++) {
			values[i] = oldValues[i];
		    }
		    values[oldValues.length] = value;
		} else {
		    values = new String[1];
		    values[0] = value;
		}
		parameters.put(key, values);
	    } else {
		// we don't have a valid chunk of form data, ignore
	    }
        }
    }

    public static int readData(InputStream in, byte buf[], int length) {
        int read = 0;
        try {
            do {
                read += in.read(buf, read, length - read);
            } while (read < length && read != -1);
        } catch (IOException e) {
            
        }
	return read;
    }

    /**
     * This method decodes the given urlencoded string.
     *
     * @param  str the url-encoded string
     * @return the decoded string
     * @exception IllegalArgumentException If a '%' is not
     * followed by a valid 2-digit hex number.
     *
     * @author: cut & paste from JServ, much faster that previous tomcat
impl 
     */
    public final static String URLDecode(String str)
	throws NumberFormatException, StringIndexOutOfBoundsException
    {
        if (str == null)  return  null;

        StringBuffer dec = new StringBuffer();    // decoded string
output
        int strPos = 0;
        int strLen = str.length();

        dec.ensureCapacity(str.length());
        while (strPos < strLen) {
            int laPos;        // lookahead position

            // look ahead to next URLencoded metacharacter, if any
            for (laPos = strPos; laPos < strLen; laPos++) {
                char laChar = str.charAt(laPos);
                if ((laChar == '+') || (laChar == '%')) {
                    break;
                }
            }

            // if there were non-metacharacters, copy them all as a
block
            if (laPos > strPos) {
                dec.append(str.substring(strPos,laPos));
                strPos = laPos;
            }

            // shortcut out of here if we're at the end of the string
            if (strPos >= strLen) {
                break;
            }

            // process next metacharacter
            char metaChar = str.charAt(strPos);
            if (metaChar == '+') {
                dec.append(' ');
                strPos++;
                continue;
            } else if (metaChar == '%') {
		// We throw the original exception - the super will deal with it
		//                try {
		dec.append((char) Integer.parseInt(
						   str.substring(strPos + 1, strPos + 3), 16));
		//                } catch (NumberFormatException e) {
		//                    throw new IllegalArgumentException("invalid
hexadecimal "
		//                    + str.substring(strPos + 1, strPos + 3)
		//                    + " in URLencoded string (illegal unescaped
'%'?)" );
		//                } catch (StringIndexOutOfBoundsException e) {
		//                    throw new IllegalArgumentException("illegal
unescaped '%' "
		//                    + " in URLencoded string" );
		//                }
                strPos += 3;
            }
        }

        return dec.toString();
    }

    //Start:Jun Inamori modified
    /**
     *
     * Parses data from an HTML form that the client sends to 
     * the server using the HTTP POST method and the 
     * <i>application/x-www-form-urlencoded</i> MIME type.
     *
     * <p>The data sent by the POST method contains key-value
     * pairs. A key can appear more than once in the POST data
     * with different values. However, the key appears only once in 
     * the hashtable, with its value being
     * an array of strings containing the multiple values sent
     * by the POST method.
     *
     * <p>The keys and values in the hashtable are stored in their
     * decoded form, so
     * any + characters are converted to spaces, and characters
     * sent in hexadecimal notation (like <i>%xx</i>) are
     * converted to ASCII characters.
     *
     *
     *
     * @param len	an integer specifying the length,
     *			in characters, of the 
     *			<code>ServletInputStream</code>
     *			object that is also passed to this
     *			method
     *
     * @param in	the <code>ServletInputStream</code>
     *			object that contains the data sent
     *			from the client
     * 
     * @param enc	the <code>String</code>
     *			which represents the character
     *			encoding
     * 
     * @return		a <code>HashTable</code> object built
     *			from the parsed key-value pairs
     *
     *
     * @exception IllegalArgumentException	if the data
     *			sent by the POST method is invalid
     *
     */
    static public Hashtable parsePostData(int len, 
					  ServletInputStream in,String enc)
    {
	// XXX
	// should a length of 0 be an IllegalArgumentException
	
	if (len <=0)
	    return new Hashtable(); // cheap hack to return an empty hash

	if (in == null) {
	    throw new IllegalArgumentException();
	}
	
	//
	// Make sure we read the entire POSTed body.
	//
        byte[] postedBytes = new byte [len];
        try {
            int offset = 0;
       
	    do {
		int inputLen = in.read (postedBytes, offset, len - offset);
		if (inputLen <= 0) {
		    throw new IllegalArgumentException ("Error reading parameters.");
		}
		offset += inputLen;
	    } while ((len - offset) > 0);

	} catch (IOException e) {
	    throw new IllegalArgumentException(e.getMessage());
	}

        // XXX we shouldn't assume that the only kind of POST body
        // is FORM data encoded using ASCII or ISO Latin/1 ... or
        // that the body should always be treated as FORM data.
        //

        try {
            String postedBody = new String(postedBytes, 0, len,
"8859_1");
            return parseQueryString(postedBody,enc);

        } catch (java.io.UnsupportedEncodingException e) {
            // XXX function should accept an encoding parameter & throw
this
            // exception.  Otherwise throw something expected.
            throw new IllegalArgumentException(e.getMessage());
        }
    }
    //End:Jun Inamori modified

    //Start:Jun Inamori modified
    static public Hashtable parseQueryString(String s,String enc) {

	String valArray[] = null;
	
	if (s == null) {
	    throw new IllegalArgumentException();
	}
	Hashtable ht = new Hashtable();
	StringTokenizer st = new StringTokenizer(s, "&");
	while (st.hasMoreTokens()) {
	    String pair = (String)st.nextToken();
	    int pos = pair.indexOf('=');
	    if (pos == -1) {
		// XXX
		// should give more detail about the illegal argument
		throw new IllegalArgumentException();
	    }
	    String key = unUrlDecode(pair.substring(0, pos),enc);
	    String val = unUrlDecode(pair.substring(pos+1, pair.length()),enc);
	    if (ht.containsKey(key)) {
		String oldVals[] = (String []) ht.get(key);
		valArray = new String[oldVals.length + 1];
		for (int i = 0; i < oldVals.length; i++) 
		    valArray[i] = oldVals[i];
		valArray[oldVals.length] = val;
	    } else {
		valArray = new String[1];
		valArray[0] = val;
	    }
	    ht.put(key, valArray);
	}
	return ht;
    }
    //End:Jun Inamori modified

    //Start:Jun Inamori modified
    public static String unUrlDecode(String data, String enc) {
	ByteArrayOutputStream baos=new ByteArrayOutputStream();
        for(int i=0; i<data.length(); i++) {
            char c = data.charAt(i);
            switch (c) {
                case '+':
		    baos.write((int)' ');
                    break;
                case '%':
                    try {
			int i_v=Integer.parseInt(data.substring(i+1,i+3),16);
			baos.write(i_v);
			i+=2;
                    }
                    catch (NumberFormatException e) {
                        throw new IllegalArgumentException();
                    }
		    catch (StringIndexOutOfBoundsException e) {
			String rest  = data.substring(i);
			for(int j=0;j<rest.length();j++){
			    char rc=rest.charAt(j);
			    baos.write((int)rc);
			}
			if (rest.length()==2)
			    i++;
		    }
                    break;
                default:
		    baos.write((int)c);
                    break;
            }
        }
	String str=null;
	try{
	    str=baos.toString(enc);
	}
	catch(Exception ex){
	    str=baos.toString();
	}
	return str;
    }           
    //End:Jun Inamori modified

    // Basically return everything after ";charset="
    // If no charset specified, use the HTTP default (ASCII) character
set.
    public static String getCharsetFromContentType(String type) {
        if (type == null) {
            return null;
        }
        int semi = type.indexOf(";");
        if (semi == -1) {
            return null;
        }
        String afterSemi = type.substring(semi + 1);
        int charsetLocation = afterSemi.indexOf("charset=");
        if (charsetLocation == -1) {
            return null;
        }
        String afterCharset = afterSemi.substring(charsetLocation + 8);
        String encoding = afterCharset.trim();
        return encoding;
    }

    public static Enumeration getLocales(HttpServletRequest req) {
	String acceptLanguage = req.getHeader("Accept-Language");
	// Short circuit with an empty enumeration if null header
        if (acceptLanguage == null) {
            Vector def = new Vector();
            def.addElement(Locale.getDefault());
            return def.elements();
        }

        Hashtable languages = new Hashtable();

        StringTokenizer languageTokenizer =
            new StringTokenizer(acceptLanguage, ",");

        while (languageTokenizer.hasMoreTokens()) {
            String language = languageTokenizer.nextToken().trim();
            int qValueIndex = language.indexOf(';');
            int qIndex = language.indexOf('q');
            int equalIndex = language.indexOf('=');
            Double qValue = new Double(1);

            if (qValueIndex > -1 &&
                qValueIndex < qIndex &&
                qIndex < equalIndex) {
	        String qValueStr = language.substring(qValueIndex + 1);

                language = language.substring(0, qValueIndex);
                qValueStr = qValueStr.trim().toLowerCase();
                qValueIndex = qValueStr.indexOf('=');
                qValue = new Double(0);

                if (qValueStr.startsWith("q") &&
                    qValueIndex > -1) {
                    qValueStr = qValueStr.substring(qValueIndex + 1);

                    try {
                        qValue = new Double(qValueStr.trim());
                    } catch (NumberFormatException nfe) {
                    }
                }
            }

	    // XXX
	    // may need to handle "*" at some point in time

	    if (! language.equals("*")) {
	        String key = qValue.toString();
		Vector v = (Vector)((languages.containsKey(key)) ?
		    languages.get(key) : new Vector());

		v.addElement(language);
		languages.put(key, v);
	    }
        }

        if (languages.size() == 0) {
            Vector v = new Vector();

           
v.addElement(org.apache.tomcat.core.Constants.LOCALE_DEFAULT);
            languages.put("1.0", v);
        }

        Vector l = new Vector();
        Enumeration e = languages.keys();

        while (e.hasMoreElements()) {
            String key = (String)e.nextElement();
            Vector v = (Vector)languages.get(key);
            Enumeration le = v.elements();

            while (le.hasMoreElements()) {
	        String language = (String)le.nextElement();
		String country = "";
		int countryIndex = language.indexOf("-");

		if (countryIndex > -1) {
		    country = language.substring(countryIndex + 1).trim();
		    language = language.substring(0, countryIndex).trim();
		}

                l.addElement(new Locale(language, country));
            }
        }

        return l.elements();
    }



    /* -------------------- From HttpDate -------------------- */
    // Parse date - XXX This code is _very_ slow ( 3 parsers,
GregorianCalendar,
    // etc ). It was moved out to avoid creating 1 Calendar instance (
and
    // a associated parsing ) per header ( the Calendar was created in
HttpDate
    // which was created for each HeaderField ).
    // This also avoid passing HttpHeaders - which was required to
access
    // HttpHeaderFiled to access HttpDate to access the parsing code.
    
    // we force our locale here as all http dates are in english
    private final static Locale loc = Locale.US;

    // all http dates are expressed as time at GMT
    private final static TimeZone zone = TimeZone.getTimeZone("GMT");

    // format for RFC 1123 date string -- "Sun, 06 Nov 1994 08:49:37
GMT"
    private final static String rfc1123Pattern ="EEE, dd MMM yyyyy
HH:mm:ss z";

    // format for RFC 1036 date string -- "Sunday, 06-Nov-94 08:49:37
GMT"
    private final static String rfc1036Pattern ="EEEEEEEEE, dd-MMM-yy
HH:mm:ss z";

    // format for C asctime() date string -- "Sun Nov  6 08:49:37 1994"
    private final static String asctimePattern ="EEE MMM d HH:mm:ss
yyyyy";
    
    private final static SimpleDateFormat rfc1123Format =
	new SimpleDateFormat(rfc1123Pattern, loc);
    
    private final static SimpleDateFormat rfc1036Format =
	new SimpleDateFormat(rfc1036Pattern, loc);
    
    private final static SimpleDateFormat asctimeFormat =
	new SimpleDateFormat(asctimePattern, loc);

    public static long toDate( String dateString ) {
	// XXX
	Date date=null;
	try {
            date = rfc1123Format.parse(dateString);
	} catch (ParseException e) { }
	
        if( date==null)
	    try {
		date = rfc1036Format.parse(dateString);
	    } catch (ParseException e) { }
	
        if( date==null)
	    try {
		date = asctimeFormat.parse(dateString);
	    } catch (ParseException pe) {
	    }

	if(date==null) {
	    return -1;
	}

	// Original code was: 
	//	Calendar calendar = new GregorianCalendar(zone, loc);
	//calendar.setTime(date);
	// calendar.getTime().getTime();
	return date.getTime();
    }
    
}

Re: Proposal: RequestImpl

Posted by "Dmitry I. Platonoff" <dp...@descartes.com>.
On Mon, 1 May 2000 15:47:43 -0400, Costin Manolache wrote:


 > Thanks for this very good proposal, we do have a lot of problems
 > with character to/from byte conversions and encoding.

I must admit, that the proposal is not exactly good, although I understand
the author, and share his concerns.

BTW, did anyone actually read the discussion about the same problem in the
servlet-interest mailing list?

 > I agree that the right way to get the encoding is from Accept-Language:
 > header _and_  Content-Type charset if available ( this is not part of
 > your proposal but I think it have to be used if present !).
 > If Content-Type is not present, I think we need an optimized
 > version of the code to get the JavaEnc, eventually without going
 > through Locale ( i.e. parse only the first component of the header
 > with simple code, and use it directly.)

There's also an "Accept-Charset" header, which can also contain a valid
charset. Moreover, this header is used much more often in the real life,
but it also could be incorrect, and it happens very often (at least, it was
true for the pretty recent versions of Nescape Communicator).

 > ( Accept-Language is important for the output too, but I agree it's a
 > reasonable guess for input if charset is not specified in Content-Type


It is NOT a reasonable guess for the input charset. For example:

- there might be several charsets used by the people who have a particular
locale set in their browser, and neither of them have absolutely no
preference, so you never can tell the charset by knowing just a locale.

- another situation: I have a Russian locale in my browser as a default,
and I've decided to visit a French site and fill in a form in French. Then
imagine: my text is in French (suppose it's a Latin1 charset), but there's
no charset in the Content-Type header, and the Accept-Language says
"Russian".

 > - I know this is a very important issue - and we need to find a good
 > solution, but it's important to do it in a clean way. I can understand
 > what happens if I look at the code, but it's not easy ( I'm talking
 > about tomcat code, not your code ). If we can factor out the
 > encoding/decoding probably everything will be much simpler.

What's about the solution everybody agreed in servlet-interest? We should
not guess the charset at all, leaving it to the servlet developer or custom
software to determine the encoding. We can never actually tell it with 100%
accuracy. But there should be a .setCharacterEncoding() method in place for
the ServletRequest, and the parsers should take care of the encoding set
with this method. Speaking of the implementation, I'm agree with you, the
stream approach is not the most efficient solution, but it could be easily
done using the single byte[] buffer.


Sincerely,
Dmitry.



Re: Proposal: RequestImpl

Posted by Costin Manolache <co...@costin.dnt.ro>.
Jun Inamori wrote:

> Hello,
>
> 'HttpServletRequest.getParameter(key)' can't return the correct
> parameter string,  when the original character sequence contains the 2
> bytes characters, such as Japanese character.
> As you know, 2 bytes characters are encoded like:
>    "%82%B1%82%F1%82%C9%82%BF%82%ED"
> The first Japanese character consists of '82' and 'B1' and the second of
> '82' and 'F1'.

Hi,

Thanks for this very good proposal, we do have a lot of problems with
character to/from byte conversions and encoding.

I have a few comments/questions:

- getLocale()
Locales are constructed from Accept-Language: header, and if you look
at RequestUtil you'll notice the code is very "expensive" - a lot of objects
are created, very complex parsing, a new Locale object is allocated ( and
that creates few other objects and have a slow init time), etc.
I don't think it's a good idea to use it at the engine level - it can be used by
servlets, but I would like something a bit faster if it'll be part of the critical loop.

I agree that the right way to get the encoding is from Accept-Language:
header _and_  Content-Type charset if available ( this is not part of your
proposal but I think it have to be used if present !). If Content-Type
is not present, I think we need an optimized version of the code to get
the JavaEnc, eventually without going through Locale ( i.e. parse only
the first component of the header with simple code, and use it directly.)
( Accept-Language is important for the output too, but I agree it's a
reasonable guess for input if charset is not specified in Content-Type ).


- Decoding using the ByteArrayOS is very expensive in terms of Garbage
Collection (GC). GC is right now the main performance problem
in tomcat. We will also need to decode if the user will call getReader().
I think we need to find a way to reuse the objects and avoid excessive
usage of Strings. ByteArrayOS also creates byte[] buffers -> more GC.

One good way to deal with that that it's not covered in your proposal is to
use Reader/Writers.  I'm still looking for a way to reuse instances of
Reader/Writers ( they allocate byte[] buffers too, plus Encoders, Decoders ).

Probably a pool of Reader/Writers acting as encoders/decoders might do
the trick, or reimplementing the encode/decode in a reusable way.
( XML projects - xalan, crimson - use optimized byte/char converters
for common encodings - with little GC and fast execution time).

- I know this is a very important issue - and we need to find a good solution,
but it's important to do it in a clean way. I can understand what happens
if I look at the code, but it's not easy ( I'm talking about tomcat code,
not your code ). If we can factor out the encoding/decoding probably
everything will be much simpler.


- Can you send a DIFF - it's much easier to read and patch ?


Costin