You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Martin van den Bemt <mv...@mvdb.com> on 2001/08/16 22:41:10 UTC
FW: i18n UTF to ISO charsets conversions..
Hi guys,
I think my message got lost on tomcat-user and will never be replies to
;((..
Since tomcat does a lot of translating, I hope one of you guys (or girls?)
can give me a couple of hints how to get this done (if it can be done that
is..)..
TIA,
Mvgr,
Martin
-----Original Message-----
From: Martin van den Bemt [mailto:mvdb@mvdb.com]
Sent: Thursday, August 16, 2001 1:42 PM
To: jakarta-tomcat-user
Subject: i18n UTF to ISO charsets conversions..
Hi,
Just keep in mind that a big system was designed without REAL i18n in mind,
so there are better solutions to storing the data.
We have a 2 part system : one which allowes multiple encodings to be used
(eg an english web user-interface, where you can edit data in a localized
manner). Because of that we have to use UTF-8 for this system.
The other system is showing the result of system 1 localised (so not mixed
as system 1).
Everything is stored as text databases on the file system using the java
default encoding of ISO-8859-1 (even greek text), which works great for
system 2 (just set the appropiate charset in the request header and
everything works well.)
System 1 doesn't have problems with this either, until we want to save
data..
It comes in encoded as UTF-8 and needs to be written to disk as ISO-8859-1
(1000's of files are set up like this and converting those is too much
impact right now)
What I need is to convert the parameter from UTF-8 to ISO-8859-whatever..
As an example 2 headers : (extracted with getReader and read()..)
this is greek text entered in ie..
the ISO-8859-1 encoding
isotext=%EB%E5%F0%F4%EF%EC%DD%F1%E5%E9%E5%F2
the UTF-8 encoding
utf8text=%CE%BB%CE%B5%CF%80%CF%84%CE%BF%CE%BC%CE%AD%CF%81%CE%B5%CE%B9%CE%B5%
CF%82
from ISO to UTF is not a problem, but from UTF-8 to IS0-8859-<whatever> is
not working..
so I want to get convert utf8text to have the same value as with isotext..
Does anyone know how to handle this conversion or know where to find a class
/ source that can do this conversion?
Hope it makes any sense what I'm asking
TIA
Mvgr,
Martin
Re: FW: i18n UTF to ISO charsets conversions..
Posted by "Craig R. McClanahan" <cr...@apache.org>.
On Thu, 16 Aug 2001, Martin van den Bemt wrote:
> Hi guys,
>
> I think my message got lost on tomcat-user and will never be replies to
> ;((..
Probably 'cause it's not really a Tomcat question. :-)
> Since tomcat does a lot of translating, I hope one of you guys (or girls?)
> can give me a couple of hints how to get this done (if it can be done that
> is..)..
>
One approach would be to remember that Java uses Unicode internally, so
you could do it in two stages:
UTF8 --> Reader --> Unicode (in a String) --> Writer --> ISO-8859-1
Just configure the reader to read UTF8, and the writer to write ISO-8859-1
and you should be fine.
> TIA,
> Mvgr,
> Martin
>
Craig
> -----Original Message-----
> From: Martin van den Bemt [mailto:mvdb@mvdb.com]
> Sent: Thursday, August 16, 2001 1:42 PM
> To: jakarta-tomcat-user
> Subject: i18n UTF to ISO charsets conversions..
>
>
> Hi,
>
> Just keep in mind that a big system was designed without REAL i18n in mind,
> so there are better solutions to storing the data.
>
> We have a 2 part system : one which allowes multiple encodings to be used
> (eg an english web user-interface, where you can edit data in a localized
> manner). Because of that we have to use UTF-8 for this system.
> The other system is showing the result of system 1 localised (so not mixed
> as system 1).
> Everything is stored as text databases on the file system using the java
> default encoding of ISO-8859-1 (even greek text), which works great for
> system 2 (just set the appropiate charset in the request header and
> everything works well.)
>
> System 1 doesn't have problems with this either, until we want to save
> data..
> It comes in encoded as UTF-8 and needs to be written to disk as ISO-8859-1
> (1000's of files are set up like this and converting those is too much
> impact right now)
>
> What I need is to convert the parameter from UTF-8 to ISO-8859-whatever..
>
> As an example 2 headers : (extracted with getReader and read()..)
>
> this is greek text entered in ie..
>
> the ISO-8859-1 encoding
> isotext=%EB%E5%F0%F4%EF%EC%DD%F1%E5%E9%E5%F2
> the UTF-8 encoding
> utf8text=%CE%BB%CE%B5%CF%80%CF%84%CE%BF%CE%BC%CE%AD%CF%81%CE%B5%CE%B9%CE%B5%
> CF%82
>
> from ISO to UTF is not a problem, but from UTF-8 to IS0-8859-<whatever> is
> not working..
>
> so I want to get convert utf8text to have the same value as with isotext..
>
> Does anyone know how to handle this conversion or know where to find a class
> / source that can do this conversion?
>
> Hope it makes any sense what I'm asking
>
>
> TIA
>
> Mvgr,
> Martin
>
>