You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Bruno Dumon <br...@outerthought.org> on 2004/05/29 12:51:10 UTC

Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

On Sat, 2004-05-29 at 12:26, Antonio Gallardo wrote:
> Bruno Dumon dijo:
> >> I only can't explain why the container-encoding in web.xml has to be set
> >> to ISO-8859-1. If anybody knows about this, please add it to this text.
> >> Any other setting I tried to use didn't work out.
> >
> > It has to be ISO-8859-1, always. This is because the servlet
> > specification requires that request parameters are by default decoded as
> > ISO-8859-1 (regardless of the default platform encoding). The only
> > reason I can imagine this is configurable at all is to work around buggy
> > servlet containers.
> >
> > More background on all this is also available at:
> >
> > http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
> 
> I never saw the abovelinked page before.

It's there since 13/3/2003 and its URL has been dropped on this list
multiple times since then.

I'd like to move (a subset of) that info into the standard Cocoon docs,
but first I'd like to see the Tomcat issue resolved.

>  But for more than a year I have
> this set is web.xml:
> 
>     <init-param>
>       <param-name>container-encoding</param-name>
>       <param-value>utf-8</param-value>
>     </init-param>
> 
>     <init-param>
>       <param-name>form-encoding</param-name>
>       <param-value>utf-8</param-value>
>     </init-param>
> 
> In the site map we are using this HTML 4.01 serializer component:
> 
> <map:serializer name="html" ....>
>   <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
>   <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
>   <encoding>ISO-8859-1</encoding>
>   <buffer-size>1024</buffer-size>
>   <omit-xml-declaration>true</omit-xml-declaration>
> </map:serializer>
> 
> With this configuration we are able to connect to a PostgreSQL database
> UTF-8 encoded.
> 
> Hope this help.

oops! that's a quite wrong configuration you have there. If you thought
you were using UTF-8 for the communication with your browser, then I'll
have to dissapoint you. You're using ISO-8859-1. Specifying UTF-8 twice
in the web.xml is the same as specifying nothing, because it negates the
effect. The servlet container decodes the request parameters as
ISO-8859-1, and then cocoon does this:

new String(value.getBytes("UTF-8"), "UTF-8");

which is an effectless operation (but does burn a lot of CPU cycles,
you're better of disabling those parameters in the web.xml if you're
just using ISO-8859-1).

Note that the encoding used to connect to your database (and how your
database stores the data internally) are completely seperate issues from
what encoding is used to communicate between webserver and browser (if
and how this needs to be configured depends on the database product).

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Antonio Gallardo <ag...@agssa.net>.
Bruno Dumon dijo:
> On Sat, 2004-05-29 at 13:30, Antonio Gallardo wrote:
>> Hi Bruno:
>>
>> Thanks for the answer.
>>
>> Currently, I have no time to test it.
>
> I understand that.
>
>>  I know this is a issue very frecuent
>> now, when people realize the right encoding is UTF-8. Here is a link
>> from
>> Tomcat:
>>
>> http://jakarta.apache.org/tomcat/faq/misc.html#utf8
>
> yep, but from a quick glance that information is very tomcat/jsp/servlet
> specific, ie the -Dfile.encoding isn't needed.

Cocoon is a servlet....

You got me! ;-)

I am writing a RT for dev list now to solve the proble.... please answer
there. :-)

Best Regards,

Antonio Gallardo


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2004-05-29 at 13:30, Antonio Gallardo wrote:
> Hi Bruno:
> 
> Thanks for the answer.
> 
> Currently, I have no time to test it.

I understand that.

>  I know this is a issue very frecuent
> now, when people realize the right encoding is UTF-8. Here is a link from
> Tomcat:
> 
> http://jakarta.apache.org/tomcat/faq/misc.html#utf8

yep, but from a quick glance that information is very tomcat/jsp/servlet
specific, ie the -Dfile.encoding isn't needed.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Short Introduction to using Cocoon with non-roman languages -was: Has anyone used Cocoon for chinese language application ?

Posted by Antonio Gallardo <ag...@agssa.net>.
Hi Bruno:

Thanks for the answer.

Currently, I have no time to test it. I know this is a issue very frecuent
now, when people realize the right encoding is UTF-8. Here is a link from
Tomcat:

http://jakarta.apache.org/tomcat/faq/misc.html#utf8

Best Regards,

Antonio Gallardo

Bruno Dumon dijo:
> On Sat, 2004-05-29 at 12:26, Antonio Gallardo wrote:
>> Bruno Dumon dijo:
>> >> I only can't explain why the container-encoding in web.xml has to be
>> set
>> >> to ISO-8859-1. If anybody knows about this, please add it to this
>> text.
>> >> Any other setting I tried to use didn't work out.
>> >
>> > It has to be ISO-8859-1, always. This is because the servlet
>> > specification requires that request parameters are by default decoded
>> as
>> > ISO-8859-1 (regardless of the default platform encoding). The only
>> > reason I can imagine this is configurable at all is to work around
>> buggy
>> > servlet containers.
>> >
>> > More background on all this is also available at:
>> >
>> > http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding
>>
>> I never saw the abovelinked page before.
>
> It's there since 13/3/2003 and its URL has been dropped on this list
> multiple times since then.
>
> I'd like to move (a subset of) that info into the standard Cocoon docs,
> but first I'd like to see the Tomcat issue resolved.
>
>>  But for more than a year I have
>> this set is web.xml:
>>
>>     <init-param>
>>       <param-name>container-encoding</param-name>
>>       <param-value>utf-8</param-value>
>>     </init-param>
>>
>>     <init-param>
>>       <param-name>form-encoding</param-name>
>>       <param-value>utf-8</param-value>
>>     </init-param>
>>
>> In the site map we are using this HTML 4.01 serializer component:
>>
>> <map:serializer name="html" ....>
>>   <doctype-public>-//W3C//DTD HTML 4.01
>> Transitional//EN</doctype-public>
>>   <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
>>   <encoding>ISO-8859-1</encoding>
>>   <buffer-size>1024</buffer-size>
>>   <omit-xml-declaration>true</omit-xml-declaration>
>> </map:serializer>
>>
>> With this configuration we are able to connect to a PostgreSQL database
>> UTF-8 encoded.
>>
>> Hope this help.
>
> oops! that's a quite wrong configuration you have there. If you thought
> you were using UTF-8 for the communication with your browser, then I'll
> have to dissapoint you. You're using ISO-8859-1. Specifying UTF-8 twice
> in the web.xml is the same as specifying nothing, because it negates the
> effect. The servlet container decodes the request parameters as
> ISO-8859-1, and then cocoon does this:
>
> new String(value.getBytes("UTF-8"), "UTF-8");
>
> which is an effectless operation (but does burn a lot of CPU cycles,
> you're better of disabling those parameters in the web.xml if you're
> just using ISO-8859-1).
>
> Note that the encoding used to connect to your database (and how your
> database stores the data internally) are completely seperate issues from
> what encoding is used to communicate between webserver and browser (if
> and how this needs to be configured depends on the database product).
>
> --
> Bruno Dumon                             http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> bruno@outerthought.org                          bruno@apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org