You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Martin van den Bemt <mv...@mvdb.com> on 2001/08/16 22:41:10 UTC

FW: i18n UTF to ISO charsets conversions..

Hi guys,

I think my message got lost on tomcat-user and will never be replies to
;((..
Since tomcat does a lot of translating, I hope one of you guys (or girls?)
can give me a couple of hints how to get this done (if it can be done that
is..)..

TIA,
Mvgr,
Martin

-----Original Message-----
From: Martin van den Bemt [mailto:mvdb@mvdb.com]
Sent: Thursday, August 16, 2001 1:42 PM
To: jakarta-tomcat-user
Subject: i18n UTF to ISO charsets conversions..


Hi,

Just keep in mind that a big system was designed without REAL i18n in mind,
so there are better solutions to storing the data.

We have a 2 part system : one which allowes multiple encodings to be used
(eg an english web user-interface, where you can edit data in a localized
manner). Because of that we have to use UTF-8 for this system.
The other system is showing the result of system 1 localised (so not mixed
as system 1).
Everything is stored as text databases on the file system using the java
default encoding of ISO-8859-1 (even greek text), which works great for
system 2 (just set the appropiate charset in the request header and
everything works well.)

System 1 doesn't have problems with this either, until we want to save
data..
It comes in encoded as UTF-8 and needs to be written to disk as ISO-8859-1
(1000's of files are set up like this and converting those is too much
impact right now)

What I need is to convert the parameter from UTF-8 to ISO-8859-whatever..

As an example 2 headers : (extracted with getReader and read()..)

this is greek text entered in ie..

the ISO-8859-1 encoding
isotext=%EB%E5%F0%F4%EF%EC%DD%F1%E5%E9%E5%F2
the UTF-8 encoding
utf8text=%CE%BB%CE%B5%CF%80%CF%84%CE%BF%CE%BC%CE%AD%CF%81%CE%B5%CE%B9%CE%B5%
CF%82

from ISO to UTF is not a problem, but from UTF-8 to IS0-8859-<whatever> is
not working..

so I want to get convert utf8text to have the same value as with isotext..

Does anyone know how to handle this conversion or know where to find a class
/ source that can do this conversion?

Hope it makes any sense what I'm asking


TIA

Mvgr,
Martin


Re: FW: i18n UTF to ISO charsets conversions..

Posted by "Craig R. McClanahan" <cr...@apache.org>.

On Thu, 16 Aug 2001, Martin van den Bemt wrote:

> Hi guys,
> 
> I think my message got lost on tomcat-user and will never be replies to
> ;((..

Probably 'cause it's not really a Tomcat question.  :-)

> Since tomcat does a lot of translating, I hope one of you guys (or girls?)
> can give me a couple of hints how to get this done (if it can be done that
> is..)..
> 

One approach would be to remember that Java uses Unicode internally, so
you could do it in two stages:

  UTF8 --> Reader --> Unicode (in a String) --> Writer --> ISO-8859-1

Just configure the reader to read UTF8, and the writer to write ISO-8859-1
and you should be fine.

> TIA,
> Mvgr,
> Martin
> 
Craig


> -----Original Message-----
> From: Martin van den Bemt [mailto:mvdb@mvdb.com]
> Sent: Thursday, August 16, 2001 1:42 PM
> To: jakarta-tomcat-user
> Subject: i18n UTF to ISO charsets conversions..
> 
> 
> Hi,
> 
> Just keep in mind that a big system was designed without REAL i18n in mind,
> so there are better solutions to storing the data.
> 
> We have a 2 part system : one which allowes multiple encodings to be used
> (eg an english web user-interface, where you can edit data in a localized
> manner). Because of that we have to use UTF-8 for this system.
> The other system is showing the result of system 1 localised (so not mixed
> as system 1).
> Everything is stored as text databases on the file system using the java
> default encoding of ISO-8859-1 (even greek text), which works great for
> system 2 (just set the appropiate charset in the request header and
> everything works well.)
> 
> System 1 doesn't have problems with this either, until we want to save
> data..
> It comes in encoded as UTF-8 and needs to be written to disk as ISO-8859-1
> (1000's of files are set up like this and converting those is too much
> impact right now)
> 
> What I need is to convert the parameter from UTF-8 to ISO-8859-whatever..
> 
> As an example 2 headers : (extracted with getReader and read()..)
> 
> this is greek text entered in ie..
> 
> the ISO-8859-1 encoding
> isotext=%EB%E5%F0%F4%EF%EC%DD%F1%E5%E9%E5%F2
> the UTF-8 encoding
> utf8text=%CE%BB%CE%B5%CF%80%CF%84%CE%BF%CE%BC%CE%AD%CF%81%CE%B5%CE%B9%CE%B5%
> CF%82
> 
> from ISO to UTF is not a problem, but from UTF-8 to IS0-8859-<whatever> is
> not working..
> 
> so I want to get convert utf8text to have the same value as with isotext..
> 
> Does anyone know how to handle this conversion or know where to find a class
> / source that can do this conversion?
> 
> Hope it makes any sense what I'm asking
> 
> 
> TIA
> 
> Mvgr,
> Martin
> 
>