You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Robert Priest <Ro...@bentley.com> on 2003/09/04 23:15:32 UTC

Character Encoding problem (umlauts, etc).

> I have a servlet that catches a request for a file.
> 
> But if that file has characters such as an umlaut in it (for example: ä),
> the path info is all wrong.
> 
> For example:  I am requesting file : 
> 
> "/38CF278C0186B466222FC48571080B83/51/dms00051/äää.txt"
> 
> but what is coming across in the request is:
> 
> "/38CF278C0186B466222FC48571080B83/51/dms00051/???.txt"
> 
> 
> I have tried:
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("ISO-8859-1"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("Unicode"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("UTF8"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("UnicodeLittle"));
> 
> 
> But none of them are returning correctly.
> 
> Does anyone know what the correct know what is the correct unicode
> encoding I should have?
> 
> Any other suggestions?
> 
> I know this problem has been solved before so If you could point me in the
> direction of the solution on the web that is fine.
> 
> THanks in advance.

Re: Character Encoding problem (umlauts, etc).

Posted by Anton Tagunov <at...@mail.cnt.ru>.
Hello Robert!

Robert Priest <Ro...@bentley.com> wrote:
RP> I am requesting file :
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/äää.txt"
RP> but what is coming across in the request is:
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/???.txt"

Probably your browser is sending it that way?
I guess it is a bad idea anyways to type anything nasty
in the browser URL input line.

You may try to spy your interaction between browser and
server, I have described how to do it in one of the sections
of my ancient http://tagunov.tripod.com, try to find it there,
then you'll know for sure what bytes are sent by browser.

I guess that it is generally a bad idea to have anything
nasty in the url at all. The closest you could get would be
to encode it all as %AD and etc. But then you should be
sure what encoding this is (utf-8 or anything).

So, if these are links from your HTML page, why don't you
encode all in the url directly on the server side and
have <A
href="context/38CF278C0186B466222FC48571080B83/51/dms00051/%88%AA.txt">

but then why don't you get rid of these nasty umlauts at all?

Why not use only normal latin letters, or, in case you heavily use
numeric ids already, use only numeric ids?

Anton


Re: Character Encoding problem (umlauts, etc).

Posted by Thomas Kellerer <sp...@gmx.net>.
Robert Priest schrieb:

>>I have a servlet that catches a request for a file.
>>

How is the request sent?

If sent via an HTML form, you need to include the accept-charset="UTF-8" 
attribute into your <form> tag....

Thomas