You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by Jorden Woods <jo...@paradigmsgroup.com> on 2004/09/30 05:44:04 UTC

Forms and Japanese

Hi:

I have just noticed that the forms editors do not handle encodings other than 
Latin-1 (ISO-8859-1). 

I believe this is built into the usecase.xmap file via:

<map:serialize type="xhtml-iso-8859-1"/>

The site that I am putting together must support Japanese and Chinese, so this 
is a problem since both become 'garbage' when input into the forms editor.

I am not sure how to do this, but there needs to be a way to let the editor 
know that it needs to change its language encoding. Something like:

if (japanese)
 <map:serialize type="xhtml-shift-JIS"/>
else (chinese-simplified)
 <map:serialize type="xhtml-gb2312"/>
else
 <map:serialize type="xhtml-iso-8859-1"/>

Clearly I could also use Unicode (UTF-8) for everything, but most people don't 
enter text with Unicode, they use national language encodings, as above.

I am not sure how to do this (what is the syntax) or if the usecase.xmap file  
is the right place to make the change.

Can someone help me to modify this file so that I can get the editors working 
with Chinese and Japanese?

Thanks,

Jorden.


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Forms and Japanese

Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Andreas Hartmann wrote:

> Up to now, I managed to solve all encoding problems using these
> mechanisms. It seems to be possible to use UTF-8 encoding throughout
> *all* files, at least if you're working with modern browsers.
> 
> IMO this is a far better approach than having a mixture of ISO-8859-1
> and UTF-8, which can be seen only as a workaround.

+1000

-- 
Gregor J. Rothfuss
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://wyona.com                          http://lenya.apache.org
gregor.rothfuss@wyona.com                       gregor@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Forms and Japanese

Posted by Jorden Woods <jo...@paradigmsgroup.com>.
Andreas Hartmann <andreas <at> apache.org> writes:

> 
> Jean Pierre LeJacq wrote:
> 
> [...]
> 
> > In cocoon, you need to set the encodings in the WEB-INF/web.xml file
> > for both the container-encoding and the form-encoding.
> > 
> > Lastly, you may need to tweak the servlet container and possibly the
> > HTTP server you are using.
> 
> Up to now, I managed to solve all encoding problems using these
> mechanisms. It seems to be possible to use UTF-8 encoding throughout
> *all* files, at least if you're working with modern browsers.
> 
> IMO this is a far better approach than having a mixture of ISO-8859-1
> and UTF-8, which can be seen only as a workaround.
> 

Thank you for the input. The primary concern here is that most Asian operating 
systems default to national language encodings (NLE) for input and output. 

One of the main reasons is that most Asian operating systems do not provide 
Unicode fonts for Asian character sets.

Rather than using Unicode, the typical approach for forcing the browser to load 
a particular encoding is to use the following two tags:

<html lang> and the <meta http-equiv tag> such as:

<html lang="zh-hans">

<head>
    <meta http-equiv="Content-type" content="text/html; charset=GB2312">
</head>

Here is the main problem with Lenya's forms: 

Setting the form encoding to iso-8859-1 destroys any Asian encodings.

Using Unicode with XML and Java will typically create an automatic conversion 
upon input from the NLE to Unicode. 

However, when the form is then read by the Asian browser, lacking a Unicode 
font it will not be able to display the content.

There are two approaches:

1. Store everything as Unicode and provide a utility for converting to NLEs or

2. Store everything as NLEs and manage the interaction on a use-case basis (so 
a Japanese user gets Japanese NLEs, Chinese get Chinese NLE, etc.

Unicode is preferable, however, it is important to have the output conversion 
to NLEs to enable users to decode the output stream.

Regardless of the approach, can someone please send me the form of the code 
snippet for the serializers for src/webapp/sitemap.xmap and 
src/webapp/lenya/usecase.xmap?

Cheers,

Jorden.



---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Forms and Japanese

Posted by Andreas Hartmann <an...@apache.org>.
Jean Pierre LeJacq wrote:

[...]

> In cocoon, you need to set the encodings in the WEB-INF/web.xml file
> for both the container-encoding and the form-encoding.
> 
> Lastly, you may need to tweak the servlet container and possibly the
> HTTP server you are using.

Up to now, I managed to solve all encoding problems using these
mechanisms. It seems to be possible to use UTF-8 encoding throughout
*all* files, at least if you're working with modern browsers.

IMO this is a far better approach than having a mixture of ISO-8859-1
and UTF-8, which can be seen only as a workaround.

-- Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Forms and Japanese

Posted by Jean Pierre LeJacq <jp...@quoininc.com>.
On Thu, 30 Sep 2004, Michael Wechner wrote:

> Gregor J. Rothfuss wrote:
> >
> > we need to switch to utf-8 everywhere. the browser uses the encoding
> > specified by the page.
>
> I don't think this is actually the case. As far as I can see
> concerning data input (through forms) it's using the browser
> specific setting.

I think this is right. xhtml (and html-4.01) define both a content
type for the document and then specific content type declarations
for forms.

For the overall document, check this URI which gives best practices
for character encoding:

  http://www.w3.org/TR/2004/WD-i18n-html-tech-char-20040509/

The html-4.01 standard specifies the accept and accept-charset
attributes which defines the character encoding:

  http://www.w3.org/TR/html401/interact/forms.html#h-17.3

In cocoon, you need to set the encodings in the WEB-INF/web.xml file
for both the container-encoding and the form-encoding.

Lastly, you may need to tweak the servlet container and possibly the
HTTP server you are using.

Whew!

-- 
JP



---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Forms and Japanese

Posted by Michael Wechner <mi...@wyona.com>.
Gregor J. Rothfuss wrote:

>
>>  
>> the user could configure it within his/her profile.
>
>
> we need to switch to utf-8 everywhere. the browser uses the encoding 
> specified by the page.


I don't think this is actually the case. As far as I can see concerning 
data input (through forms) it's using the browser specific setting.

Michi


-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com              http://cocoon.apache.org/lenya/
michael.wechner@wyona.com                        michi@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Forms and Japanese

Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Michael Wechner wrote:

> as you are saying, people normally don't use UTF-8 within the browser 
> setting which means Lenya needs to deliver the encoding the browser is 
> set to, otherwise things get messed up. (or maybe I totally 
> misunderstand it).
> 
> If I am wrong on this, then it would mean either we would have to detect 
> the encoding of the browser (is that actually possible?) or
> the user could configure it within his/her profile.

we need to switch to utf-8 everywhere. the browser uses the encoding 
specified by the page.

-- 
Gregor J. Rothfuss
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://wyona.com                          http://lenya.apache.org
gregor.rothfuss@wyona.com                       gregor@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org


Re: Forms and Japanese

Posted by Michael Wechner <mi...@wyona.com>.
Jorden Woods wrote:

>Hi:
>
>I have just noticed that the forms editors do not handle encodings other than 
>Latin-1 (ISO-8859-1). 
>
>I believe this is built into the usecase.xmap file via:
>
><map:serialize type="xhtml-iso-8859-1"/>
>
>The site that I am putting together must support Japanese and Chinese, so this 
>is a problem since both become 'garbage' when input into the forms editor.
>  
>

yes, this is a problem indeed

>I am not sure how to do this, but there needs to be a way to let the editor 
>know that it needs to change its language encoding. Something like:
>
>if (japanese)
> <map:serialize type="xhtml-shift-JIS"/>
>else (chinese-simplified)
> <map:serialize type="xhtml-gb2312"/>
>else
> <map:serialize type="xhtml-iso-8859-1"/>
>
>Clearly I could also use Unicode (UTF-8) for everything, but most people don't 
>enter text with Unicode, they use national language encodings, as above.
>  
>

as you are saying, people normally don't use UTF-8 within the browser 
setting which means Lenya needs to deliver the encoding the browser is 
set to, otherwise things get messed up. (or maybe I totally 
misunderstand it).

If I am wrong on this, then it would mean either we would have to detect 
the encoding of the browser (is that actually possible?) or
the user could configure it within his/her profile.

Does that make sense?

>I am not sure how to do this (what is the syntax) or if the usecase.xmap file  
>is the right place to make the change.
>  
>

you need to define serializers with japanese and chinese encoding, e.g. 
within

src/webapp/sitemap.xmap

and then use these serializers within

src/webapp/lenya/usecase.xmap

Hope that helps. It would be nice to have Lenya used by the asian world ;-)

Michi

>Can someone help me to modify this file so that I can get the editors working 
>with Chinese and Japanese?
>
>Thanks,
>
>Jorden.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
>For additional commands, e-mail: lenya-user-help@cocoon.apache.org
>
>
>  
>


-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com              http://cocoon.apache.org/lenya/
michael.wechner@wyona.com                        michi@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org