You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Bruno Dumon <br...@outerthought.org> on 2004/05/29 12:02:14 UTC

Re: Short Introduction to using Cocoon with non-roman languages - was: Has anyone used Cocoon for chinese language application ?

On Fri, 2004-05-28 at 22:18, Jasper Michalczik wrote:
> Dear Reinhard, dear Cocoon-users,
> 
> I was asked to give a short explanation on how to use Cocoon for
> non-roman languages - especially Arabic - which should be of use for
> Chinese as well.
> 
> I'm not too firm in using Cocoon, so please feel free to correct or
> extend this.
> 
> 
> All files have to be saved as utf-8, so make sure to add/change the
> first line of your xml/xsl-files:
> 
> 	<?xml version="1.0" encoding="UTF-8"?>

This isn't a requirement, it can be any encoding you like as long as it
supports the characters you need. It can be a different encoding then
the one being used to send the page to the browser. UTF-8 is a good
choice though.

> In sitemap.xmap I added the following to each serializer:
> 
> 	<map:serializer logger=...>
> 		<encoding>UTF-8</encoding>
> 	</map:serializer>
> 
> This adds the following META-Tag to the serialized document:
> 
> 	<META http-equiv="Content-Type" content="text/html;
> charset=UTF-8">

yep, but it only does it if your page has already a html/head tag in it.

> 
> Then I set the following parameters in web.xml...
> 
> 	<init-param>
> 		<param-name>container-encoding</param-name>
> 		<param-value>ISO-8859-1</param-value>
> 	</init-param>
> 	<init-param>
> 		<param-name>form-encoding</param-name>
> 		<param-value>UTF-8</param-value>
> 	</init-param>
> 
> ... to make sure the forms are processed correctly.
> 
> On the client side at least Windows 2000 (I don't know about Linux or
> Mac) must be used with the keyboard settings set up to allow
> Arabic/Chinese typing. If you only need to display non-roman characters,
> this also works with any system and a browser that supports
> Unicode-display. IE5+ for example downloads the necessary fonts
> automatically when needed.
> 
> I remember having some troubles using Tomcat 4.1.29, but 4.1.18 works
> fine.

This is because of the following issue:
http://issues.apache.org/bugzilla/show_bug.cgi?id=26997

>  I don't have any experiences with any other version or
> servlet-container.
> 
> 
> I only can't explain why the container-encoding in web.xml has to be set
> to ISO-8859-1. If anybody knows about this, please add it to this text.
> Any other setting I tried to use didn't work out.

It has to be ISO-8859-1, always. This is because the servlet
specification requires that request parameters are by default decoded as
ISO-8859-1 (regardless of the default platform encoding). The only
reason I can imagine this is configurable at all is to work around buggy
servlet containers.

More background on all this is also available at:

http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding

> 
> 
> I hope I could make a small contribution to the growing
> cocoon-community...

sure!

> 
> 
> Jasper Michalczik
> 

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Antonio Gallardo <ag...@agssa.net>.
Bruno Dumon dijo:
> On Sat, 2004-05-29 at 13:30, Antonio Gallardo wrote:
>> Hi Bruno:
>>
>> Thanks for the answer.
>>
>> Currently, I have no time to test it.
>
> I understand that.
>
>>  I know this is a issue very frecuent
>> now, when people realize the right encoding is UTF-8. Here is a link
>> from
>> Tomcat:
>>
>> http://jakarta.apache.org/tomcat/faq/misc.html#utf8
>
> yep, but from a quick glance that information is very tomcat/jsp/servlet
> specific, ie the -Dfile.encoding isn't needed.

Cocoon is a servlet....

You got me! ;-)

I am writing a RT for dev list now to solve the proble.... please answer
there. :-)

Best Regards,

Antonio Gallardo


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2004-05-29 at 13:30, Antonio Gallardo wrote:
> Hi Bruno:
> 
> Thanks for the answer.
> 
> Currently, I have no time to test it.

I understand that.

>  I know this is a issue very frecuent
> now, when people realize the right encoding is UTF-8. Here is a link from
> Tomcat:
> 
> http://jakarta.apache.org/tomcat/faq/misc.html#utf8

yep, but from a quick glance that information is very tomcat/jsp/servlet
specific, ie the -Dfile.encoding isn't needed.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Antonio Gallardo <ag...@agssa.net>.
Hi Bruno:

Thanks for the answer.

Currently, I have no time to test it. I know this is a issue very frecuent
now, when people realize the right encoding is UTF-8. Here is a link from
Tomcat:

http://jakarta.apache.org/tomcat/faq/misc.html#utf8

Best Regards,

Antonio Gallardo

Bruno Dumon dijo:
> On Sat, 2004-05-29 at 12:26, Antonio Gallardo wrote:
>> Bruno Dumon dijo:
>> >> I only can't explain why the container-encoding in web.xml has to be
>> set
>> >> to ISO-8859-1. If anybody knows about this, please add it to this
>> text.
>> >> Any other setting I tried to use didn't work out.
>> >
>> > It has to be ISO-8859-1, always. This is because the servlet
>> > specification requires that request parameters are by default decoded
>> as
>> > ISO-8859-1 (regardless of the default platform encoding). The only
>> > reason I can imagine this is configurable at all is to work around
>> buggy
>> > servlet containers.
>> >
>> > More background on all this is also available at:
>> >
>> > http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
>>
>> I never saw the abovelinked page before.
>
> It's there since 13/3/2003 and its URL has been dropped on this list
> multiple times since then.
>
> I'd like to move (a subset of) that info into the standard Cocoon docs,
> but first I'd like to see the Tomcat issue resolved.
>
>>  But for more than a year I have
>> this set is web.xml:
>>
>>     <init-param>
>>       <param-name>container-encoding</param-name>
>>       <param-value>utf-8</param-value>
>>     </init-param>
>>
>>     <init-param>
>>       <param-name>form-encoding</param-name>
>>       <param-value>utf-8</param-value>
>>     </init-param>
>>
>> In the site map we are using this HTML 4.01 serializer component:
>>
>> <map:serializer name="html" ....>
>>   <doctype-public>-//W3C//DTD HTML 4.01
>> Transitional//EN</doctype-public>
>>   <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
>>   <encoding>ISO-8859-1</encoding>
>>   <buffer-size>1024</buffer-size>
>>   <omit-xml-declaration>true</omit-xml-declaration>
>> </map:serializer>
>>
>> With this configuration we are able to connect to a PostgreSQL database
>> UTF-8 encoded.
>>
>> Hope this help.
>
> oops! that's a quite wrong configuration you have there. If you thought
> you were using UTF-8 for the communication with your browser, then I'll
> have to dissapoint you. You're using ISO-8859-1. Specifying UTF-8 twice
> in the web.xml is the same as specifying nothing, because it negates the
> effect. The servlet container decodes the request parameters as
> ISO-8859-1, and then cocoon does this:
>
> new String(value.getBytes("UTF-8"), "UTF-8");
>
> which is an effectless operation (but does burn a lot of CPU cycles,
> you're better of disabling those parameters in the web.xml if you're
> just using ISO-8859-1).
>
> Note that the encoding used to connect to your database (and how your
> database stores the data internally) are completely seperate issues from
> what encoding is used to communicate between webserver and browser (if
> and how this needs to be configured depends on the database product).
>
> --
> Bruno Dumon                             http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> bruno@outerthought.org                          bruno@apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2004-05-29 at 12:26, Antonio Gallardo wrote:
> Bruno Dumon dijo:
> >> I only can't explain why the container-encoding in web.xml has to be set
> >> to ISO-8859-1. If anybody knows about this, please add it to this text.
> >> Any other setting I tried to use didn't work out.
> >
> > It has to be ISO-8859-1, always. This is because the servlet
> > specification requires that request parameters are by default decoded as
> > ISO-8859-1 (regardless of the default platform encoding). The only
> > reason I can imagine this is configurable at all is to work around buggy
> > servlet containers.
> >
> > More background on all this is also available at:
> >
> > http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
> 
> I never saw the abovelinked page before.

It's there since 13/3/2003 and its URL has been dropped on this list
multiple times since then.

I'd like to move (a subset of) that info into the standard Cocoon docs,
but first I'd like to see the Tomcat issue resolved.

>  But for more than a year I have
> this set is web.xml:
> 
>     <init-param>
>       <param-name>container-encoding</param-name>
>       <param-value>utf-8</param-value>
>     </init-param>
> 
>     <init-param>
>       <param-name>form-encoding</param-name>
>       <param-value>utf-8</param-value>
>     </init-param>
> 
> In the site map we are using this HTML 4.01 serializer component:
> 
> <map:serializer name="html" ....>
>   <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
>   <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
>   <encoding>ISO-8859-1</encoding>
>   <buffer-size>1024</buffer-size>
>   <omit-xml-declaration>true</omit-xml-declaration>
> </map:serializer>
> 
> With this configuration we are able to connect to a PostgreSQL database
> UTF-8 encoded.
> 
> Hope this help.

oops! that's a quite wrong configuration you have there. If you thought
you were using UTF-8 for the communication with your browser, then I'll
have to dissapoint you. You're using ISO-8859-1. Specifying UTF-8 twice
in the web.xml is the same as specifying nothing, because it negates the
effect. The servlet container decodes the request parameters as
ISO-8859-1, and then cocoon does this:

new String(value.getBytes("UTF-8"), "UTF-8");

which is an effectless operation (but does burn a lot of CPU cycles,
you're better of disabling those parameters in the web.xml if you're
just using ISO-8859-1).

Note that the encoding used to connect to your database (and how your
database stores the data internally) are completely seperate issues from
what encoding is used to communicate between webserver and browser (if
and how this needs to be configured depends on the database product).

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Antonio Gallardo <ag...@agssa.net>.
Bruno Dumon dijo:
>> I only can't explain why the container-encoding in web.xml has to be set
>> to ISO-8859-1. If anybody knows about this, please add it to this text.
>> Any other setting I tried to use didn't work out.
>
> It has to be ISO-8859-1, always. This is because the servlet
> specification requires that request parameters are by default decoded as
> ISO-8859-1 (regardless of the default platform encoding). The only
> reason I can imagine this is configurable at all is to work around buggy
> servlet containers.
>
> More background on all this is also available at:
>
> http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding

I never saw the abovelinked page before. But for more than a year I have
this set is web.xml:

    <init-param>
      <param-name>container-encoding</param-name>
      <param-value>utf-8</param-value>
    </init-param>

    <init-param>
      <param-name>form-encoding</param-name>
      <param-value>utf-8</param-value>
    </init-param>

In the site map we are using this HTML 4.01 serializer component:

<map:serializer name="html" ....>
  <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
  <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
  <encoding>ISO-8859-1</encoding>
  <buffer-size>1024</buffer-size>
  <omit-xml-declaration>true</omit-xml-declaration>
</map:serializer>

With this configuration we are able to connect to a PostgreSQL database
UTF-8 encoded.

Hope this help.

Best Regards,

Antonio Gallardo

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org