You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by mubariz kharbe <mu...@infosys.com> on 2002/06/12 14:03:57 UTC

Problem with passing japanese values to a servlet

Hi,

I am developing an internationalized web based application using Tomcat 3.1 on Windows 2000. I am facing the following problems

When I pass Japanese values to the servlet and I retreive the value using 
	myData = httpservletrequest.getParameter("foo");
I get ?? in myData.
So I used
	myDataNew = new String(myData.getBytes("ISO-8859-1"),"UTF-8");
This is the solution that is found at most forums I looked for.
I still get the value of myDataNew as ??.
I am able to get the correct value in myDataNew only after I boot my server with default locale as Japanese. But I cannot do that since my application is web based and needs to have support for all the languages. So the server should necessarily be on English OS. This is also the business requirement.

Question 1. What should be done so that running the server on English OS I will be able to get the correct value in myData for all the languages, specially Japanese?

The new tomcat 4.0.3 uses the Servlet Engine 2.3 in which there is a facility to set the character encoding for the httpservletrequest. I upgraded my tomcat server to 4.0.3.
Now I tried using 
	httpservletrequest.setCharacterEncoding("UTF-8");
	myData = httpservletrequest.getParameter("foo");
And I am still getting the value of myData as ?? for Japanese values.

Question 2. Is there a problem in the way I am using httpservletrequest.setCharacterEncoding method? What else is needed to be done?

Any advice will be greatly appreciated.

Thanking in anticipation
Regards
Mubariz
	





    
		

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Problem with passing japanese values to a servlet

Posted by Drew Sudell <as...@acm.org>.
mubariz kharbe writes:
 > Hi,
 > 
 > I am developing an internationalized web based application using Tomcat 3.1 on Windows 2000. I am facing the following problems
 > 
 > When I pass Japanese values to the servlet and I retreive the value using 
 > 	myData = httpservletrequest.getParameter("foo");
 > I get ?? in myData.
 > So I used
 > 	myDataNew = new String(myData.getBytes("ISO-8859-1"),"UTF-8");
 > This is the solution that is found at most forums I looked for.

That is the way to manually transcode data that came in as UTF-8 and
is the way most people have done it in Servlet 2.2.

 > I still get the value of myDataNew as ??.
 > I am able to get the correct value in myDataNew only after I boot
 > my server with default locale as Japanese. But I cannot do that
 > since my application is web based and needs to have support for all
 > the languages. So the server should necessarily be on English
 > OS. This is also the business requirement.
 > Question 1. What should be done so that running the server on English OS I will be able to get the correct value in myData for all the languages, specially Japanese?

This isn't simple.  But I'll try to point you in a right direction
below.

 > 
 > The new tomcat 4.0.3 uses the Servlet Engine 2.3 in which there is a facility to set the character encoding for the httpservletrequest. I upgraded my tomcat server to 4.0.3.
 > Now I tried using 
 > 	httpservletrequest.setCharacterEncoding("UTF-8");
 > 	myData = httpservletrequest.getParameter("foo");

This is a better way that is new to Servlet 2.3.  I'd suggest it in
preference to manual transcoding as in the above example.  It saves a 
bit of overhead by only doing one correct transcoding instead of
fouling it up, undoing it and then getting it right (3 transcodings).

 > And I am still getting the value of myData as ?? for Japanese values.
 > 
 > Question 2. Is there a problem in the way I am using httpservletrequest.setCharacterEncoding method? What else is needed to be done?
 >

Not particularly.
 
 > Any advice will be greatly appreciated.
 > 

My first question is why do you believe the data being posted is UTF-8 
to begin with?  This is really less a question about servlets and java
than one about html forms and browsers.

The game is to get the browser to post the data in an encoding that
you can predict and to set the request encoding to that.

Above you mention that things workout when you set the default locale
to Japanese.  That makes me think you're getting the data posted in a
native Japanese encoding such as Shift-JIS or EUC, depending on the
platform.  If so transcoding it as UTF-8 won't work.

There are a couple of strategies that one can take.  A lot depends on
the languages, browsers, and client platforms you need to support as
well as how the application is structured and how the users use it.

The easiest thing is if you can do everything in a single encoding.
For example, if you only support English and Japanese, since English
is encodable in the Japanese native encodings, you could just use
those. If you have to support a wide range of languages, UTF-8 might
be a good answer, so long as you can be sure the browsers you support
will post the data back as UTF-8.  [I've never had to make that trick
work, but suspect that sending the pages as UTF-8 and/or setting the
acceptable content types on the form SHOULD do the trick.]

The other game you can play is to know what encoding you expect back
on a per form basis.  Basically this ends up being multiple sub-sites, 
one per encoding, for the application.  This can be done dynamically
or statically.  Staticly (copy pages and alter them) is "easier" but
harder to maintain as the number of encodings grows.  In this scenario 
you have to guess somehow what encoding each post is coming in as, or 
embed the information somewhere (in the url, in a hidden parameter, in 
the session, etc.)

There a a few good ideas at the end of this presentation
http://www.w3.org/Talks/1999/0830-tutorial-unicode-mjd/

Bottom line, there's no "right answer" to handling forms in a
completely internationalized site.  It would be nice if browsers
actually set the encoding on the content type of the posted data.
But I've yet to see one that did.  That forces the use of heuristics,
guesswork and silly kludges.

I've got a few other links that I've pulled together over time saved
off my home page at http://www.op.net/~asudell/info/i18n/index.html.
Some of those might help you too.

-- 
        Drew Sudell     asudell@acm.org      http://www.op.net/~asudell

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Problem with passing japanese values to a servlet

Posted by "Ushakov, Sergey N" <us...@int.com.ru>.
Internationalization was common issue with Tomcat 3 (or rather with servlet
spec 2.2).

If you can forget about Tomcat 3 - do not hesitate. If you can't - try
"com.oreilly.servlet" package available at http://www.servlets.com. It
worked for me perfectly in the past (not with Japanese locale but with
Russian one - not big difference :)

Now regarding your Tomcat 4 issues.

1) The OS needs not be Japanese and needs not have Japanese as default
locale. But what definitely needs be i18n-enabled is your JRE. Are you sure
your JRE/JDK is not English-only?

2) If JRE is ok it may be a good idea to check if your OS has Japanese
encodings support (though being an English edition). It is not an issue for
Windows 2000, but for English edition it is not the default and requires
explicit expression of your wish during setup. And you can always request
for it afterwards using Control Panel. But frankly speaking I have a vague
idea how much JRE depends on OS for encoding translations.

2) Are you sure your browser sends UTF-8? Maybe try other Japanese
encodings?

HTH.
Regards,
Sergey

P.S. Keep trying... It should work :)



----- Original Message -----
From: "mubariz kharbe" <mu...@infosys.com>
To: <to...@jakarta.apache.org>
Sent: Wednesday, June 12, 2002 4:03 PM
Subject: Problem with passing japanese values to a servlet


> Hi,
>
> I am developing an internationalized web based application using Tomcat
3.1 on Windows 2000. I am facing the following problems
>
> When I pass Japanese values to the servlet and I retreive the value using
> myData = httpservletrequest.getParameter("foo");
> I get ?? in myData.
> So I used
> myDataNew = new String(myData.getBytes("ISO-8859-1"),"UTF-8");
> This is the solution that is found at most forums I looked for.
> I still get the value of myDataNew as ??.
> I am able to get the correct value in myDataNew only after I boot my
server with default locale as Japanese. But I cannot do that since my
application is web based and needs to have support for all the languages. So
the server should necessarily be on English OS. This is also the business
requirement.
>
> Question 1. What should be done so that running the server on English OS I
will be able to get the correct value in myData for all the languages,
specially Japanese?
>
> The new tomcat 4.0.3 uses the Servlet Engine 2.3 in which there is a
facility to set the character encoding for the httpservletrequest. I
upgraded my tomcat server to 4.0.3.
> Now I tried using
> httpservletrequest.setCharacterEncoding("UTF-8");
> myData = httpservletrequest.getParameter("foo");
> And I am still getting the value of myData as ?? for Japanese values.
>
> Question 2. Is there a problem in the way I am using
httpservletrequest.setCharacterEncoding method? What else is needed to be
done?
>
> Any advice will be greatly appreciated.
>
> Thanking in anticipation
> Regards
> Mubariz
>
>
>
>
>
>
>
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>