You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by "g[R]eK" <gR...@warsztat.pac.pl> on 2003/04/01 17:24:46 UTC

Re[2]: Problem with indent

Hello Joerg,

March 13, 2003, 08:22:55 AM, you wrote:

JH> g[R]eK wrote:
>> Hi J.Pietschmann,
>> 
>> JP> Try
>> JP>    <indent>yes</indent>
>> 
>> It isn't working, the output is same :-(

JH> Of course it changes nothing with encoding, you wrote "Problem with indent" 
JH> in the mail subject.

I remind to you. I have 2 problems:
1st With encoding
2nd With indent in output (output doesn't have indents)

>>>>                                <encoding>UTF-8</encoding>
>> 
>> JP> This should be the default.
>> 
>> 
>>>>And in output I have some chars encoded as entities. How Can
>> 
>> JP>  > I force Cocoon to encode my language (polish) chars
>> 
>>>>correctly?
>> 
>> 
>> JP> What is "correctly? If you mean you don't want to have
>> JP> HTML entities: you can't (unless you are willing to pull
>> JP> some tricks which take some time to explain)
>> 
>> Look:
>> 
>> <table class="footer">
>> <tbody>
>> <tr>
>> <td>
>>         Some footer... bla bla
>> Zaż&oacute;łcić gęsią jaźń</td>
>> </tr>
>> </tbody>
>> </table>
>> 
>> This is what I get from cocoon HTML Serializer, but I want to have this:
>> <table class="footer">
>> <tbody>
>> <tr>
>> <td>
>>         Some footer... bla bla
>> Zażółcić gęsią jaźń</td>
>> </tr>
>> </tbody>
>> </table>
>> 
>> The difference is enity '&oacute;'. Interesting is that the cocoon is encoding some of my language letters (like 'ę',
>> 'ż') correctly. That is to say, it is encoded as char not as entity.

JH> I know that I often mix the identifiers/correct names, but I will try to 
JH> explain:
JH> &oacute; is only another representation of ó. &oacute; is the character 
JH> entity, while ó is the character. But they represent the same character and 
JH> a browser correctly parsing the HTML should show both in the same way. But 
JH> there is no problem if Cocoon delivers the HTML in the above way.

JH> Now remains the question, why the browser isn't doing this. Do you have a 
JH> <meta> tag specifying the encoding 'UTF-8' in your HTML code? If yes, is the 
JH> browser not UTF-8 aware? Or does it prefer the encoding specified in the 
JH> response header and this is different or/and wrong?

This is my meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

I think it is proper.
But problem is another that you think.
Browser is displaying my page correctly, that is not a problem.
Problem is caused by the entites like "&oacute;" because its size is 8 bytes,
but character ó have size 1 or 2 bytes. It is big difference, when ó character
is repeating much times.
I hope, you know what I say?

JH> I think Mozilla is a very good browser to test this. On a page you can have 
JH> a look at the properties of the page via 'view page info' in context menu, 
JH> view/page info in the main menu or ctrl + i via keyboard. There is written 
JH> as which encoding the page was recognized. Furthermore you can force Mozilla 
JH> to show the page in another encoding to see the effects, what happens if 
JH> it's recognized correctly (or not).

Encoding is recognized correctly, in IE and Mozilla.

>> Little question... When I start cocoon, I have this text:
>> 'server.properties not found, using command line or default properites'
>> Is it important?

JH> I don't think, that it is important. server.properties could overwrite the 
JH> mentioned default properties.

JH> Regards,

JH> Joerg




-- 
Best regards,
 g[R]eK                            mailto:gReK@warsztat.pac.pl


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Problem with indent

Posted by "J.Pietschmann" <j3...@yahoo.de>.
g[R]eK wrote:
> Problem is caused by the entites like "&oacute;" because its size is 8 bytes,
> but character ó have size 1 or 2 bytes. It is big difference, when ó character
> is repeating much times.
> I hope, you know what I say?
An "encoding problem" usually refers to mismatches regarding the mapping of
Unicode characters to bytes in the output.
Your problem, that the serializer maps characters to predefined HTML entities,
is somewhat trickier, and there is no standardized way to deal with it.

Cocoon uses an identity XML transformation for serialization, usually performed
by Xalan (default setting). You can have a look into the Xalan docs and search
for extensions to the xsl:output element which might solve your problem, or ask
on the Xalan list. There is also a properties file for the HTML entities, you
can provide a modified version which may cause Xalan to output UTF-8 encoded
bytes or at least character referencces (which are a bit shorter).

J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org