You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by FREDERIC MOSER <fr...@eturs.u-strasbg.fr> on 2004/08/02 17:09:05 UTC

Encoding problems

Hi,

I've got some encoding problems using the xhtml serializer, I don't
really understand .
(I use Mozilla on Win XP and my editor is set to use UTF-8)


Part 1:
-------

I've got the following stylesheet:
--------------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>plop</title>
                <link rel="stylesheet" type="text/css" href="simple.css"/>
                <meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" />
            </head>
            <body>
            <h1>Monographie ééé</h1>
                <div>
                    <form method="post" action="ModifierMonographie">
                    ...
--------------------------------------------------------------------------------------

If I use : <map:serialize type="html"/>, everything work but the browser
detect my encoding as IS0-8859-1 (because we can't encode HTML 4.01 as
UTF-8 I guess??)

Since I change it to <map:serialize type="xhtml"/>, my form submit crapy
characters instead of "é", "è", "à", etc. "é" become "é" and so on...
and now the browser detect it as UTF-8

So here is my conclusion:
HTML 4.01  --> ISO-8859-1 --> It works
XHTML --> UTF-8 --> It does not work



Part 2:
-------

So let's see, I keep the xhtml serializer and change the output from
UTF-8 to ISO-8859-15:

--------------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="iso-8859-15"
indent="no"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>plop</title>
                <link rel="stylesheet" type="text/css" href="simple.css"/>
                <meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-15" />
            </head>
            <body>
            <h1>Monographie ééé</h1>
                <div>
                    <form method="post" action="ModifierMonographie">
                     ...
--------------------------------------------------------------------------------------

Ok now the browser detect the web page as ISO-9959-15 but if I check the
source of the page I see a <?xml version="1.0" encoding="UTF-8"?> and
the display is crapy, ("é" become "é") and the form submission work
perfectly...
I force UTF-8 in the browser, the display of the page is OK now but the
form submission does not work anymore...

(I tried with IE but it's some kind of worst of course)

So, if someone got an idea,

Thank you in advance ;-)


Fred




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding problems

Posted by "Volkm@r" <pl...@arcor.de>.
Christian Hoofe wrote:
> [...]
> Put this into your sitemap.xmap to produce output IE understands:
> 
> <map:serializer  ... >
>     <omit-xml-declaration>yes</omit-xml-declaration>
> </map:serializer>   

That's exactly what I already suggested in the second posting of this 
thread. (Did you read it?).
But only omitting the xml declaration is not yet a replacement for an 
HTTP headers' charset information that needs to be given using

    mime-type="text/html; charset=utf-8"

in the serializer's configuration.
-- 
Volkmar W. Pogatzki


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding problems

Posted by Christian Hoofe <ko...@hoofe.de>.
> 
> Don't know exactly about Opera, but M$IE definitely doesn't evaluate
> 
>            <?xml version="1.0" encoding="..."?>.
> 
> Instead it wants to get the correct HTTP header's charset information.

Internet Explorer has an unfortunate bug involving the xml prolog declaration.
If there is anything on the first line before the doctype, IE6 will switch into
"quirks" mode. This means that using an xml prolog delaration (<?xml
version="1.0" encoding="UTF-8"?>) will cause the behavior. An HTML comment, even
an empty one, will do the same.

-> http://www.positioniseverything.net/articles/doctypes.html

Put this into your sitemap.xmap to produce output IE understands:

<map:serializer  ... >
    <omit-xml-declaration>yes</omit-xml-declaration>
</map:serializer>   




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding problems

Posted by "Volkm@r" <pl...@arcor.de>.
Gerald Aichholzer wrote:
> On Tue, 03 Aug 2004 11:02:43 +0200, Volkm@r <pl...@arcor.de> wrote:
> 
>> Jan Hoskens wrote:
>>
>>> You may want to take a look at the wiki page:  
>>> http://wiki.apache.org/cocoon/RequestParameterEncoding
>>> I had this problem a while ago:  
>>> http://marc.theaimsgroup.com/?l=xml-cocoon-users&m=109100902605917&w=2
>>> I'm wondering what the best solution is and which encoding is 
>>> preferred  for handling special characters or when UTF-8 / ISO 8859-1 
>>> should/could  be used.
>>
>>
>> I can't see any reason why not to use Unicode charset with UTF-8  
>> encoding only. It is supported by all common browsers ans allows you 
>> to  use *all* Unicode 
>> <http://www.ltg.ed.ac.uk/~richard/unicode-sample.html>  characters by 
>> just typing them into the source code.
>> Make sure that the charset information sent with HTTP header is  
>> compliant with the document's encoding.
>> Most browsers will use it for encoding input in forms.
> 
> 
> I have a similar experience (and still no solution for it). All my
> source XML-files are UTF-8 encoded. Coocon generates UTF-8 encoded
> XHTML.
> 
> Mozilla and Firebird display the national characters correctly. But
> Internet Explorer and Opera show only garbage instead of the special
> characters.

Don't know exactly about Opera, but M$IE definitely doesn't evaluate

           <?xml version="1.0" encoding="..."?>.

Instead it wants to get the correct HTTP header's charset information.
-- 
Volkmar W. Pogatzki


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding problems

Posted by Gerald Aichholzer <ga...@iicm.tu-graz.ac.at>.
On Tue, 03 Aug 2004 11:02:43 +0200, Volkm@r <pl...@arcor.de> wrote:

> Jan Hoskens wrote:
>> You may want to take a look at the wiki page:  
>> http://wiki.apache.org/cocoon/RequestParameterEncoding
>> I had this problem a while ago:  
>> http://marc.theaimsgroup.com/?l=xml-cocoon-users&m=109100902605917&w=2
>> I'm wondering what the best solution is and which encoding is preferred  
>> for handling special characters or when UTF-8 / ISO 8859-1 should/could  
>> be used.
>
> I can't see any reason why not to use Unicode charset with UTF-8  
> encoding only. It is supported by all common browsers ans allows you to  
> use *all* Unicode <http://www.ltg.ed.ac.uk/~richard/unicode-sample.html>  
> characters by just typing them into the source code.
> Make sure that the charset information sent with HTTP header is  
> compliant with the document's encoding.
> Most browsers will use it for encoding input in forms.

I have a similar experience (and still no solution for it). All my
source XML-files are UTF-8 encoded. Coocon generates UTF-8 encoded
XHTML.

Mozilla and Firebird display the national characters correctly. But
Internet Explorer and Opera show only garbage instead of the special
characters.

My XHTML-serializer is configured as follows:

     <map:serializer mime-type="text/html" name="xhtml" ...>
       <doctype-public>-//W3C//DTD XHTML 1.0 Strict//EN</doctype-public>
       <doctype-system>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</doctype-system>
       <encoding>UTF-8</encoding>
     </map:serializer>

All my XML-files contain the following directive:

     <?xml version="1.0" encoding="UTF-8">

Aside from this I haven't implemented anything special regarding
character encoding.

Any help is appreciated,
thanx in advance,
Gerald
-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding problems

Posted by "Volkm@r" <pl...@arcor.de>.
Jan Hoskens wrote:
> You may want to take a look at the wiki page: 
> http://wiki.apache.org/cocoon/RequestParameterEncoding
> I had this problem a while ago: 
> http://marc.theaimsgroup.com/?l=xml-cocoon-users&m=109100902605917&w=2
> I'm wondering what the best solution is and which encoding is preferred 
> for handling special characters or when UTF-8 / ISO 8859-1 should/could 
> be used.

I can't see any reason why not to use Unicode charset with UTF-8 
encoding only. It is supported by all common browsers ans allows you to 
use *all* Unicode <http://www.ltg.ed.ac.uk/~richard/unicode-sample.html> 
characters by just typing them into the source code.
Make sure that the charset information sent with HTTP header is 
compliant with the document's encoding.
Most browsers will use it for encoding input in forms.
-- 
Volkmar W. Pogatzki


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding problems

Posted by Jan Hoskens <jh...@schaubroeck.be>.
You may want to take a look at the wiki page: 
http://wiki.apache.org/cocoon/RequestParameterEncoding
I had this problem a while ago: 
http://marc.theaimsgroup.com/?l=xml-cocoon-users&m=109100902605917&w=2
I'm wondering what the best solution is and which encoding is preferred 
for handling special characters or when UTF-8 / ISO 8859-1 should/could 
be used.

Kind Regards,
Jan

Volkm@r wrote:

> FREDERIC MOSER wrote:
>
>> Hi,
>>
>> I've got some encoding problems using the xhtml serializer, I don't
>> really understand .
>> (I use Mozilla on Win XP and my editor is set to use UTF-8)
>>
>>
>> Part 1:
>> -------
>>
>> I've got the following stylesheet:
>> -------------------------------------------------------------------------------------- 
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <xsl:stylesheet version="1.0"
>> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>>     <xsl:output method="xml" version="1.0" encoding="UTF-8" 
>> indent="no"/>
>>     <xsl:template match="/">
>>         <html>
>>             <head>
>>                 <title>plop</title>
>>                 <link rel="stylesheet" type="text/css" 
>> href="simple.css"/>
>>                 <meta http-equiv="Content-Type" content="text/html;
>> charset=UTF-8" />
>>             </head>
>>             <body>
>>             <h1>Monographie ééé</h1>
>>                 <div>
>>                     <form method="post" action="ModifierMonographie">
>>                     ...
>> -------------------------------------------------------------------------------------- 
>>
>>
>> If I use : <map:serialize type="html"/>, everything work but the browser
>> detect my encoding as IS0-8859-1 (because we can't encode HTML 4.01 as
>> UTF-8 I guess??)
>
>
> Did you check in the components section of your sitemap.xmap how the 
> serializers are configured?
>
> To supply HTML using UTF-8 you could use
> -------------------------------------------------------------------------------------- 
>
> <map:serializer name="html" mime-type="text/html; charset=utf-8"
>                 logger="sitemap.serializer.html" pool-grow="2"
>                 pool-max="64" pool-min="2"
>                 src="org.apache.cocoon.serialization.XMLSerializer">
>    <doctype-public>-//W3C//DTD XHTML 1.0 
> Strict//EN</doctype-public>     
> <doctype-system>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</doctype-system> 
>
>    <omit-xml-declaration>yes</omit-xml-declaration>
>    <omit-namespaces>yes</omit-namespaces>
>    <encoding>UTF-8</encoding>
>    <indent>yes</indent>
> </map:serializer>
> -------------------------------------------------------------------------------------- 
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: Encoding problems

Posted by "Volkm@r" <pl...@arcor.de>.
FREDERIC MOSER wrote:
> Hi,
> 
> I've got some encoding problems using the xhtml serializer, I don't
> really understand .
> (I use Mozilla on Win XP and my editor is set to use UTF-8)
> 
> 
> Part 1:
> -------
> 
> I've got the following stylesheet:
> --------------------------------------------------------------------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>     <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>
>     <xsl:template match="/">
>         <html>
>             <head>
>                 <title>plop</title>
>                 <link rel="stylesheet" type="text/css" href="simple.css"/>
>                 <meta http-equiv="Content-Type" content="text/html;
> charset=UTF-8" />
>             </head>
>             <body>
>             <h1>Monographie ééé</h1>
>                 <div>
>                     <form method="post" action="ModifierMonographie">
>                     ...
> --------------------------------------------------------------------------------------
> 
> If I use : <map:serialize type="html"/>, everything work but the browser
> detect my encoding as IS0-8859-1 (because we can't encode HTML 4.01 as
> UTF-8 I guess??)

Did you check in the components section of your sitemap.xmap how the 
serializers are configured?

To supply HTML using UTF-8 you could use
--------------------------------------------------------------------------------------
<map:serializer name="html" mime-type="text/html; charset=utf-8"
                 logger="sitemap.serializer.html" pool-grow="2"
                 pool-max="64" pool-min="2"
                 src="org.apache.cocoon.serialization.XMLSerializer">
    <doctype-public>-//W3C//DTD XHTML 1.0 Strict//EN</doctype-public>	 
<doctype-system>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</doctype-system>
    <omit-xml-declaration>yes</omit-xml-declaration>
    <omit-namespaces>yes</omit-namespaces>
    <encoding>UTF-8</encoding>
    <indent>yes</indent>
</map:serializer>
--------------------------------------------------------------------------------------


-- 
Volkmar W. Pogatzki


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org