You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Arion <ar...@talentinfo.com.hk> on 2000/05/22 11:51:17 UTC

Re: Multi byte language problem in request.getParameter()

Hi!

Actually, you shouldn't use non-ASCII character in the URL. How knows if
a lower byte of a character is an empty character? (FYI, empty character
should be replace by '+' in URL).

Let's go down to Tomcat, request.getParameter could not get the value
with correct encoding. You can use String.getBytes and new String to
convert the character (possibly with POST method)

Arion

"DonS, Choi" wrote:

> Hi,
>
> I'm using apache1.3.13+tomcat3.1 in solaris sparc machine.
>
> In my jsp file( call it a.jsp)
>
> --------------------------------------------
> ....
> out.print(request.getParameter("name"));
> ....
> --------------------------------------------
>
> works when
> http://a.com/a.jsp?name=test
>
> But when
> http://a.com/a.jsp?name=GQ1[
>
> a.jsp doesn't work correctly.
>
> I'm a korean. And My country use 2-byte lanaguage.
>
> Any help will be greately appreciated
> Thanks advance.


Re: Multi byte language problem in request.getParameter()

Posted by "DonS, Choi" <ds...@dreami.co.kr>.
I made a simple jsp code(call it paramtest.jsp)

<%  String a = request.getParameter("name");
        out.print(a);
%>

Case 1)  when i connect http://a.com/paramtest.jsp?name=ㄹㄹ
Apache log file(/usr/local/apache/logs/access_log) shows
...
[23/May/2000:21:56:39 +0900] "GET /startupclass/paramtest.jsp?name=ㄹㄹ HTTP/1.1" 200 154
....

But tomcat log file($TOMCAT_HOME/logs/jasper.log) shows
....
<JASPER_LOG> Tue May 23 21:58:01 GMT+09:00 2000               RequestURI: /startupclass/paramtest.jsp</JASPER_LOG>
<JASPER_LOG> Tue May 23 21:58:01 GMT+09:00 2000              QueryString: name=</JASPER_LOG>
<JASPER_LOG> Tue May 23 21:58:01 GMT+09:00 2000           Request Params: </JASPER_LOG>
<JASPER_LOG> Tue May 23 21:58:01 GMT+09:00 2000                  name = </JASPER_LOG>
<JASPER_LOG> Tue May 23 21:58:01 GMT+09:00 2000 Classpath according to the Servlet Engine is: /usr/local/apache/htdocs/
WEB-INF/classes</JASPER_LOG>
....

As you can see , In tomcat log file, there is no QueryString

Case 2) when i connect http:8080//a.com/paramtest.jsp?name=ㄹㄹ
tomcat log file shows
...
<JASPER_LOG> Tue May 23 22:01:36 GMT+09:00 2000               RequestURI: /startupclass/paramtest.jsp</JASPER_LOG>
<JASPER_LOG> Tue May 23 22:01:36 GMT+09:00 2000              QueryString: name=¤?¤?</JASPER_LOG>
<JASPER_LOG> Tue May 23 22:01:36 GMT+09:00 2000           Request Params: </JASPER_LOG>
<JASPER_LOG> Tue May 23 22:01:36 GMT+09:00 2000                  name = ¤?¤?</JASPER_LOG>
<JASPER_LOG> Tue May 23 22:01:36 GMT+09:00 2000 Classpath according to the Servlet Engine is: /usr/local/apache/htdocs/startupclass/
WEB-INF/classes</JASPER_LOG>
.....

As you can see, Although QueryString is broken, tomcat can accept QueryString. 

Why tomcat can't accept non-ascii parameter?


----- Original Message ----- 
From: <ed...@apache.org>
To: <to...@jakarta.apache.org>
Sent: Tuesday, May 23, 2000 1:19 PM
Subject: Re: Multi byte language problem in request.getParameter()


> On Mon, 27 Mar 2000, DonS, Choi wrote:
> 
> > I have a following code
> > 
> > -----------------------------------------
> > ..
> > <script>
> > function search_winner(name)
> > {
> >  location.href= 'e_winner.jsp?sid=' + sid + '&name=' + name;
> > };
> > </script>
> > .....
> > -----------------------------------------------------------------------
> > In above sample code "name" is 2-byte lanaguage (non-ASCII)
> > So tomcat should handle  URL  http://e_winner.jsp?sid=0000&name=??
> > 
> > In weblogic(our company uses weblogic), there is no problem.
> > Weblogic can handle 2-byte lanuage in URL.
> > 
> > Is there any solution?
> 
> Questions like this belong on tomcat-users.
> 
> The code you've shown is quite broken -- you need to encode name when you
> use it in an URL like this.  Use the output of
> java.net.URLEncoder.encode(name) -- instead of name -- and you should be
> fine.
> 
> I don't know enough about the internals of WebLogic to speculate as to why
> it may work in that environment -- but this code should be fixed in any
> case.  FWIW -- you might also want to apply response.encodeUrl to the
> whole url, so as to allow sessions to work when users turn off cookies.
> 
> Ed
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 

Re: Multi byte language problem in request.getParameter()

Posted by ed...@apache.org.
On Mon, 27 Mar 2000, DonS, Choi wrote:

> I have a following code
> 
> -----------------------------------------
> ..
> <script>
> function search_winner(name)
> {
>  location.href= 'e_winner.jsp?sid=' + sid + '&name=' + name;
> };
> </script>
> .....
> -----------------------------------------------------------------------
> In above sample code "name" is 2-byte lanaguage (non-ASCII)
> So tomcat should handle  URL  http://e_winner.jsp?sid=0000&name=나
> 
> In weblogic(our company uses weblogic), there is no problem.
> Weblogic can handle 2-byte lanuage in URL.
> 
> Is there any solution?

Questions like this belong on tomcat-users.

The code you've shown is quite broken -- you need to encode name when you
use it in an URL like this.  Use the output of
java.net.URLEncoder.encode(name) -- instead of name -- and you should be
fine.

I don't know enough about the internals of WebLogic to speculate as to why
it may work in that environment -- but this code should be fixed in any
case.  FWIW -- you might also want to apply response.encodeUrl to the
whole url, so as to allow sessions to work when users turn off cookies.

Ed






Re: Multi byte language problem in request.getParameter()

Posted by "DonS, Choi" <ds...@dreami.co.kr>.
I have a following code

-----------------------------------------
..
<script>
function search_winner(name)
{
 location.href= 'e_winner.jsp?sid=' + sid + '&name=' + name;
};
</script>
.....
-----------------------------------------------------------------------
In above sample code "name" is 2-byte lanaguage (non-ASCII)
So tomcat should handle  URL  http://e_winner.jsp?sid=0000&name=나

In weblogic(our company uses weblogic), there is no problem.
Weblogic can handle 2-byte lanuage in URL.

Is there any solution?



----- Original Message ----- 
From: <ed...@apache.org>
To: <to...@jakarta.apache.org>
Sent: Tuesday, May 23, 2000 6:36 AM
Subject: Re: Multi byte language problem in request.getParameter()


> On Mon, 22 May 2000, Arion wrote:
> 
> > Hi!
> > 
> > Actually, you shouldn't use non-ASCII character in the URL. How knows if
> > a lower byte of a character is an empty character? (FYI, empty character
> > should be replace by '+' in URL).
> > 
> > Let's go down to Tomcat, request.getParameter could not get the value
> > with correct encoding. You can use String.getBytes and new String to
> > convert the character (possibly with POST method)
> 
> > "DonS, Choi" wrote:
> [snip]
> > > works when
> > > http://a.com/a.jsp?name=test
> > >
> > > But when
> > > http://a.com/a.jsp?name=GQ1[
> > >
> > > a.jsp doesn't work correctly.
> > >
> > > I'm a korean. And My country use 2-byte lanaguage.
> 
> I'm curious -- exactly what error do you see with this?  Arion is
> generally correct that you shouldn't enter multi-byte characters directly
> into an url -- they need to be url encoded to hexidecimal format as pairs
> like: "%C7%D1%B1%DB" (this is the same as the two multi-byte characters
> which you included, assuming my display hasn't munged them).
> 
> When I tried this on my own system (which is set up w/ Japanese multi-byte
> i18n) -- using lynx -- I couldn't generate any incorrect behavior, either
> w/ Japanese or w/ the same characters which you included.  I tried sending
> them both from the command line (unencoded) and from a form entry box.
> 
> I'm not at all sure about the issues with Korean -- it's entirely possible
> that you'll need to do something like what Arion describes: getBytes,
> followed by creating a string with the appropriate encoding, possibly
> after changing the different encoding of the bytes.  I wouldn't be
> surprised if encoding issues (mismatch on input/output?) could cause your
> problem.
> 
> Anyway, I'd like to know how and why it's failing ... my experience with
> i18n has been surprisingly painless, but clearly that's not always the way
> it works out.
> 
> thanks --
> 
> Ed
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: tomcat-dev-help@jakarta.apache.org
> 

Re: Multi byte language problem in request.getParameter()

Posted by ed...@apache.org.
On Mon, 22 May 2000, Arion wrote:

> Hi!
> 
> Actually, you shouldn't use non-ASCII character in the URL. How knows if
> a lower byte of a character is an empty character? (FYI, empty character
> should be replace by '+' in URL).
> 
> Let's go down to Tomcat, request.getParameter could not get the value
> with correct encoding. You can use String.getBytes and new String to
> convert the character (possibly with POST method)

> "DonS, Choi" wrote:
[snip]
> > works when
> > http://a.com/a.jsp?name=test
> >
> > But when
> > http://a.com/a.jsp?name=GQ1[
> >
> > a.jsp doesn't work correctly.
> >
> > I'm a korean. And My country use 2-byte lanaguage.

I'm curious -- exactly what error do you see with this?  Arion is
generally correct that you shouldn't enter multi-byte characters directly
into an url -- they need to be url encoded to hexidecimal format as pairs
like: "%C7%D1%B1%DB" (this is the same as the two multi-byte characters
which you included, assuming my display hasn't munged them).

When I tried this on my own system (which is set up w/ Japanese multi-byte
i18n) -- using lynx -- I couldn't generate any incorrect behavior, either
w/ Japanese or w/ the same characters which you included.  I tried sending
them both from the command line (unencoded) and from a form entry box.

I'm not at all sure about the issues with Korean -- it's entirely possible
that you'll need to do something like what Arion describes: getBytes,
followed by creating a string with the appropriate encoding, possibly
after changing the different encoding of the bytes.  I wouldn't be
surprised if encoding issues (mismatch on input/output?) could cause your
problem.

Anyway, I'd like to know how and why it's failing ... my experience with
i18n has been surprisingly painless, but clearly that's not always the way
it works out.

thanks --

Ed