You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lenya.apache.org by Jorden Woods <jo...@paradigmsgroup.com> on 2004/09/30 05:44:04 UTC
Forms and Japanese
Hi:
I have just noticed that the forms editors do not handle encodings other than
Latin-1 (ISO-8859-1).
I believe this is built into the usecase.xmap file via:
<map:serialize type="xhtml-iso-8859-1"/>
The site that I am putting together must support Japanese and Chinese, so this
is a problem since both become 'garbage' when input into the forms editor.
I am not sure how to do this, but there needs to be a way to let the editor
know that it needs to change its language encoding. Something like:
if (japanese)
<map:serialize type="xhtml-shift-JIS"/>
else (chinese-simplified)
<map:serialize type="xhtml-gb2312"/>
else
<map:serialize type="xhtml-iso-8859-1"/>
Clearly I could also use Unicode (UTF-8) for everything, but most people don't
enter text with Unicode, they use national language encodings, as above.
I am not sure how to do this (what is the syntax) or if the usecase.xmap file
is the right place to make the change.
Can someone help me to modify this file so that I can get the editors working
with Chinese and Japanese?
Thanks,
Jorden.
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org
Re: Forms and Japanese
Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Andreas Hartmann wrote:
> Up to now, I managed to solve all encoding problems using these
> mechanisms. It seems to be possible to use UTF-8 encoding throughout
> *all* files, at least if you're working with modern browsers.
>
> IMO this is a far better approach than having a mixture of ISO-8859-1
> and UTF-8, which can be seen only as a workaround.
+1000
--
Gregor J. Rothfuss
Wyona Inc. - Open Source Content Management - Apache Lenya
http://wyona.com http://lenya.apache.org
gregor.rothfuss@wyona.com gregor@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org
Re: Forms and Japanese
Posted by Jorden Woods <jo...@paradigmsgroup.com>.
Andreas Hartmann <andreas <at> apache.org> writes:
>
> Jean Pierre LeJacq wrote:
>
> [...]
>
> > In cocoon, you need to set the encodings in the WEB-INF/web.xml file
> > for both the container-encoding and the form-encoding.
> >
> > Lastly, you may need to tweak the servlet container and possibly the
> > HTTP server you are using.
>
> Up to now, I managed to solve all encoding problems using these
> mechanisms. It seems to be possible to use UTF-8 encoding throughout
> *all* files, at least if you're working with modern browsers.
>
> IMO this is a far better approach than having a mixture of ISO-8859-1
> and UTF-8, which can be seen only as a workaround.
>
Thank you for the input. The primary concern here is that most Asian operating
systems default to national language encodings (NLE) for input and output.
One of the main reasons is that most Asian operating systems do not provide
Unicode fonts for Asian character sets.
Rather than using Unicode, the typical approach for forcing the browser to load
a particular encoding is to use the following two tags:
<html lang> and the <meta http-equiv tag> such as:
<html lang="zh-hans">
<head>
<meta http-equiv="Content-type" content="text/html; charset=GB2312">
</head>
Here is the main problem with Lenya's forms:
Setting the form encoding to iso-8859-1 destroys any Asian encodings.
Using Unicode with XML and Java will typically create an automatic conversion
upon input from the NLE to Unicode.
However, when the form is then read by the Asian browser, lacking a Unicode
font it will not be able to display the content.
There are two approaches:
1. Store everything as Unicode and provide a utility for converting to NLEs or
2. Store everything as NLEs and manage the interaction on a use-case basis (so
a Japanese user gets Japanese NLEs, Chinese get Chinese NLE, etc.
Unicode is preferable, however, it is important to have the output conversion
to NLEs to enable users to decode the output stream.
Regardless of the approach, can someone please send me the form of the code
snippet for the serializers for src/webapp/sitemap.xmap and
src/webapp/lenya/usecase.xmap?
Cheers,
Jorden.
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org
Re: Forms and Japanese
Posted by Andreas Hartmann <an...@apache.org>.
Jean Pierre LeJacq wrote:
[...]
> In cocoon, you need to set the encodings in the WEB-INF/web.xml file
> for both the container-encoding and the form-encoding.
>
> Lastly, you may need to tweak the servlet container and possibly the
> HTTP server you are using.
Up to now, I managed to solve all encoding problems using these
mechanisms. It seems to be possible to use UTF-8 encoding throughout
*all* files, at least if you're working with modern browsers.
IMO this is a far better approach than having a mixture of ISO-8859-1
and UTF-8, which can be seen only as a workaround.
-- Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org
Re: Forms and Japanese
Posted by Jean Pierre LeJacq <jp...@quoininc.com>.
On Thu, 30 Sep 2004, Michael Wechner wrote:
> Gregor J. Rothfuss wrote:
> >
> > we need to switch to utf-8 everywhere. the browser uses the encoding
> > specified by the page.
>
> I don't think this is actually the case. As far as I can see
> concerning data input (through forms) it's using the browser
> specific setting.
I think this is right. xhtml (and html-4.01) define both a content
type for the document and then specific content type declarations
for forms.
For the overall document, check this URI which gives best practices
for character encoding:
http://www.w3.org/TR/2004/WD-i18n-html-tech-char-20040509/
The html-4.01 standard specifies the accept and accept-charset
attributes which defines the character encoding:
http://www.w3.org/TR/html401/interact/forms.html#h-17.3
In cocoon, you need to set the encodings in the WEB-INF/web.xml file
for both the container-encoding and the form-encoding.
Lastly, you may need to tweak the servlet container and possibly the
HTTP server you are using.
Whew!
--
JP
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org
Re: Forms and Japanese
Posted by Michael Wechner <mi...@wyona.com>.
Gregor J. Rothfuss wrote:
>
>>
>> the user could configure it within his/her profile.
>
>
> we need to switch to utf-8 everywhere. the browser uses the encoding
> specified by the page.
I don't think this is actually the case. As far as I can see concerning
data input (through forms) it's using the browser specific setting.
Michi
--
Michael Wechner
Wyona Inc. - Open Source Content Management - Apache Lenya
http://www.wyona.com http://cocoon.apache.org/lenya/
michael.wechner@wyona.com michi@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org
Re: Forms and Japanese
Posted by "Gregor J. Rothfuss" <gr...@apache.org>.
Michael Wechner wrote:
> as you are saying, people normally don't use UTF-8 within the browser
> setting which means Lenya needs to deliver the encoding the browser is
> set to, otherwise things get messed up. (or maybe I totally
> misunderstand it).
>
> If I am wrong on this, then it would mean either we would have to detect
> the encoding of the browser (is that actually possible?) or
> the user could configure it within his/her profile.
we need to switch to utf-8 everywhere. the browser uses the encoding
specified by the page.
--
Gregor J. Rothfuss
Wyona Inc. - Open Source Content Management - Apache Lenya
http://wyona.com http://lenya.apache.org
gregor.rothfuss@wyona.com gregor@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org
Re: Forms and Japanese
Posted by Michael Wechner <mi...@wyona.com>.
Jorden Woods wrote:
>Hi:
>
>I have just noticed that the forms editors do not handle encodings other than
>Latin-1 (ISO-8859-1).
>
>I believe this is built into the usecase.xmap file via:
>
><map:serialize type="xhtml-iso-8859-1"/>
>
>The site that I am putting together must support Japanese and Chinese, so this
>is a problem since both become 'garbage' when input into the forms editor.
>
>
yes, this is a problem indeed
>I am not sure how to do this, but there needs to be a way to let the editor
>know that it needs to change its language encoding. Something like:
>
>if (japanese)
> <map:serialize type="xhtml-shift-JIS"/>
>else (chinese-simplified)
> <map:serialize type="xhtml-gb2312"/>
>else
> <map:serialize type="xhtml-iso-8859-1"/>
>
>Clearly I could also use Unicode (UTF-8) for everything, but most people don't
>enter text with Unicode, they use national language encodings, as above.
>
>
as you are saying, people normally don't use UTF-8 within the browser
setting which means Lenya needs to deliver the encoding the browser is
set to, otherwise things get messed up. (or maybe I totally
misunderstand it).
If I am wrong on this, then it would mean either we would have to detect
the encoding of the browser (is that actually possible?) or
the user could configure it within his/her profile.
Does that make sense?
>I am not sure how to do this (what is the syntax) or if the usecase.xmap file
>is the right place to make the change.
>
>
you need to define serializers with japanese and chinese encoding, e.g.
within
src/webapp/sitemap.xmap
and then use these serializers within
src/webapp/lenya/usecase.xmap
Hope that helps. It would be nice to have Lenya used by the asian world ;-)
Michi
>Can someone help me to modify this file so that I can get the editors working
>with Chinese and Japanese?
>
>Thanks,
>
>Jorden.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
>For additional commands, e-mail: lenya-user-help@cocoon.apache.org
>
>
>
>
--
Michael Wechner
Wyona Inc. - Open Source Content Management - Apache Lenya
http://www.wyona.com http://cocoon.apache.org/lenya/
michael.wechner@wyona.com michi@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-user-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-user-help@cocoon.apache.org