You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by Jorden Woods <jo...@paradigmsgroup.com> on 2004/10/01 10:59:46 UTC

Re: Forms and Japanese

Andreas Hartmann <andreas <at> apache.org> writes:

> 
> Jean Pierre LeJacq wrote:
> 
> [...]
> 
> > In cocoon, you need to set the encodings in the WEB-INF/web.xml file
> > for both the container-encoding and the form-encoding.
> > 
> > Lastly, you may need to tweak the servlet container and possibly the
> > HTTP server you are using.
> 
> Up to now, I managed to solve all encoding problems using these
> mechanisms. It seems to be possible to use UTF-8 encoding throughout
> *all* files, at least if you're working with modern browsers.
> 
> IMO this is a far better approach than having a mixture of ISO-8859-1
> and UTF-8, which can be seen only as a workaround.
> 

Thank you for the input. The primary concern here is that most Asian operating 
systems default to national language encodings (NLE) for input and output. 

One of the main reasons is that most Asian operating systems do not provide 
Unicode fonts for Asian character sets.

Rather than using Unicode, the typical approach for forcing the browser to load 
a particular encoding is to use the following two tags:

<html lang> and the <meta http-equiv tag> such as:

<html lang="zh-hans">

<head>
    <meta http-equiv="Content-type" content="text/html; charset=GB2312">
</head>

Here is the main problem with Lenya's forms: 

Setting the form encoding to iso-8859-1 destroys any Asian encodings.

Using Unicode with XML and Java will typically create an automatic conversion 
upon input from the NLE to Unicode. 

However, when the form is then read by the Asian browser, lacking a Unicode 
font it will not be able to display the content.

There are two approaches:

1. Store everything as Unicode and provide a utility for converting to NLEs or

2. Store everything as NLEs and manage the interaction on a use-case basis (so 
a Japanese user gets Japanese NLEs, Chinese get Chinese NLE, etc.

Unicode is preferable, however, it is important to have the output conversion 
to NLEs to enable users to decode the output stream.

Regardless of the approach, can someone please send me the form of the code 
snippet for the serializers for src/webapp/sitemap.xmap and 
src/webapp/lenya/usecase.xmap?

Cheers,

Jorden.



---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org