You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Thomas Steinborn <th...@exceloncorp.com> on 2000/04/05 14:52:52 UTC

encoding/locale of page response: bug!?

Hi,

I'm using cocoon 1.7.

I'm running into trouble right now, because the encoding of the response
send to an request does not set the character encoding/locale according
to the formatter used.

An example might make clear what I mean.

I'm serving an XML document on UTF-8 encoding. It contains some non
US-ASCII characters e.g. German Umlaute (üöä).

While withing the producer/processor/formatter cycle there is no
problem, since Java Strings are Unicode based.

But later, when we write the response in org.apache.cocoon.Engine#handle
cocoon simply does:

            // get the output writer
            PrintWriter out = response.getWriter();

            // send the page
            out.println(page.getContent());

This causes a problem, because it simply writes the content in the "...
platform's default character encoding..." (Quote form JDK doc).

Instead it should write it in the very same encoding as the formatter
used, e.g. in UTF-8 if the XML formatter is used or ISO-8859-1 if the
HTML formatter is used.

This is probalby more important for an XML result since the encoding
information in the <?xml ...?> header takes precedence of the encoding
given via the HTPP header.

In my opinion this is a bug. What do you think?

Any idea how to fix that?

Thanks
Thomas

Re: encoding/locale of page response: bug!?

Posted by Stefano Mazzocchi <st...@apache.org>.
Thomas Steinborn wrote:
> 
> Hi,
> 
> I'm using cocoon 1.7.
> 
> I'm running into trouble right now, because the encoding of the response
> send to an request does not set the character encoding/locale according
> to the formatter used.
> 
> An example might make clear what I mean.
> 
> I'm serving an XML document on UTF-8 encoding. It contains some non
> US-ASCII characters e.g. German Umlaute (üöä).
> 
> While withing the producer/processor/formatter cycle there is no
> problem, since Java Strings are Unicode based.
> 
> But later, when we write the response in org.apache.cocoon.Engine#handle
> cocoon simply does:
> 
>             // get the output writer
>             PrintWriter out = response.getWriter();
> 
>             // send the page
>             out.println(page.getContent());
> 
> This causes a problem, because it simply writes the content in the "...
> platform's default character encoding..." (Quote form JDK doc).

you are right, this is/was a bug. I'm not sure I fixed all the
dependencies here, but a couple of days ago I fixed a bunch of encoding
problems in cocoon 1.7.3-dev which is currently found in the CVS module.
 
> Instead it should write it in the very same encoding as the formatter
> used, e.g. in UTF-8 if the XML formatter is used or ISO-8859-1 if the
> HTML formatter is used.

Now you can specify what encoding your formatter should use.
 
> This is probalby more important for an XML result since the encoding
> information in the <?xml ...?> header takes precedence of the encoding
> given via the HTPP header.
> 
> In my opinion this is a bug. What do you think?

Yes, it is, but it' going to be fixed as you read this.
 
> Any idea how to fix that?

Yeah, but it's easier to fix it than to explain how :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------