You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by "g[R]eK" <gR...@warsztat.pac.pl> on 2003/03/13 18:51:51 UTC

Re[2]: Problem with indent

Hi J.Pietschmann,

JP> Try
JP>    <indent>yes</indent>

It isn't working, the output is same :-(

>>                                 <encoding>UTF-8</encoding>
JP> This should be the default.

>> And in output I have some chars encoded as entities. How Can
JP>  > I force Cocoon to encode my language (polish) chars
>> correctly?

JP> What is "correctly? If you mean you don't want to have
JP> HTML entities: you can't (unless you are willing to pull
JP> some tricks which take some time to explain)

Look:

<table class="footer">
<tbody>
<tr>
<td>
        Some footer... bla bla
Zaż&oacute;łcić gęsią jaźń</td>
</tr>
</tbody>
</table>

This is what I get from cocoon HTML Serializer, but I want to have this:
<table class="footer">
<tbody>
<tr>
<td>
        Some footer... bla bla
Zażółcić gęsią jaźń</td>
</tr>
</tbody>
</table>

The difference is enity '&oacute;'. Interesting is that the cocoon is encoding some of my language letters (like 'ę',
'ż') correctly. That is to say, it is encoded as char not as entity.

Little question... When I start cocoon, I have this text:
'server.properties not found, using command line or default properites'
Is it important?


-- 
Best regards,
 g[R]eK                            mailto:gReK@warsztat.pac.pl


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Problem with indent

Posted by "J.Pietschmann" <j3...@yahoo.de>.
g[R]eK wrote:
> Problem is caused by the entites like "&oacute;" because its size is 8 bytes,
> but character ó have size 1 or 2 bytes. It is big difference, when ó character
> is repeating much times.
> I hope, you know what I say?
An "encoding problem" usually refers to mismatches regarding the mapping of
Unicode characters to bytes in the output.
Your problem, that the serializer maps characters to predefined HTML entities,
is somewhat trickier, and there is no standardized way to deal with it.

Cocoon uses an identity XML transformation for serialization, usually performed
by Xalan (default setting). You can have a look into the Xalan docs and search
for extensions to the xsl:output element which might solve your problem, or ask
on the Xalan list. There is also a properties file for the HTML entities, you
can provide a modified version which may cause Xalan to output UTF-8 encoded
bytes or at least character referencces (which are a bit shorter).

J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re[2]: Problem with indent

Posted by "g[R]eK" <gR...@warsztat.pac.pl>.
Hello Joerg,

March 13, 2003, 08:22:55 AM, you wrote:

JH> g[R]eK wrote:
>> Hi J.Pietschmann,
>> 
>> JP> Try
>> JP>    <indent>yes</indent>
>> 
>> It isn't working, the output is same :-(

JH> Of course it changes nothing with encoding, you wrote "Problem with indent" 
JH> in the mail subject.

I remind to you. I have 2 problems:
1st With encoding
2nd With indent in output (output doesn't have indents)

>>>>                                <encoding>UTF-8</encoding>
>> 
>> JP> This should be the default.
>> 
>> 
>>>>And in output I have some chars encoded as entities. How Can
>> 
>> JP>  > I force Cocoon to encode my language (polish) chars
>> 
>>>>correctly?
>> 
>> 
>> JP> What is "correctly? If you mean you don't want to have
>> JP> HTML entities: you can't (unless you are willing to pull
>> JP> some tricks which take some time to explain)
>> 
>> Look:
>> 
>> <table class="footer">
>> <tbody>
>> <tr>
>> <td>
>>         Some footer... bla bla
>> Zaż&oacute;łcić gęsią jaźń</td>
>> </tr>
>> </tbody>
>> </table>
>> 
>> This is what I get from cocoon HTML Serializer, but I want to have this:
>> <table class="footer">
>> <tbody>
>> <tr>
>> <td>
>>         Some footer... bla bla
>> Zażółcić gęsią jaźń</td>
>> </tr>
>> </tbody>
>> </table>
>> 
>> The difference is enity '&oacute;'. Interesting is that the cocoon is encoding some of my language letters (like 'ę',
>> 'ż') correctly. That is to say, it is encoded as char not as entity.

JH> I know that I often mix the identifiers/correct names, but I will try to 
JH> explain:
JH> &oacute; is only another representation of ó. &oacute; is the character 
JH> entity, while ó is the character. But they represent the same character and 
JH> a browser correctly parsing the HTML should show both in the same way. But 
JH> there is no problem if Cocoon delivers the HTML in the above way.

JH> Now remains the question, why the browser isn't doing this. Do you have a 
JH> <meta> tag specifying the encoding 'UTF-8' in your HTML code? If yes, is the 
JH> browser not UTF-8 aware? Or does it prefer the encoding specified in the 
JH> response header and this is different or/and wrong?

This is my meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

I think it is proper.
But problem is another that you think.
Browser is displaying my page correctly, that is not a problem.
Problem is caused by the entites like "&oacute;" because its size is 8 bytes,
but character ó have size 1 or 2 bytes. It is big difference, when ó character
is repeating much times.
I hope, you know what I say?

JH> I think Mozilla is a very good browser to test this. On a page you can have 
JH> a look at the properties of the page via 'view page info' in context menu, 
JH> view/page info in the main menu or ctrl + i via keyboard. There is written 
JH> as which encoding the page was recognized. Furthermore you can force Mozilla 
JH> to show the page in another encoding to see the effects, what happens if 
JH> it's recognized correctly (or not).

Encoding is recognized correctly, in IE and Mozilla.

>> Little question... When I start cocoon, I have this text:
>> 'server.properties not found, using command line or default properites'
>> Is it important?

JH> I don't think, that it is important. server.properties could overwrite the 
JH> mentioned default properties.

JH> Regards,

JH> Joerg




-- 
Best regards,
 g[R]eK                            mailto:gReK@warsztat.pac.pl


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Problem with indent

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Joerg Heinicke wrote:
> Now remains the question, why the browser isn't doing this. Do you have 
> a <meta> tag specifying the encoding 'UTF-8' in your HTML code? If yes, 
> is the browser not UTF-8 aware? Or does it prefer the encoding specified 
> in the response header and this is different or/and wrong?

I vaguely remember this issue was discussed only a short time ago:
there are three places where a character encoding is specified:
1. The content type HTTP header
2. (optional) the XML declaration (for XHTML)
3. The META tag in the HTML header
Unfortunately, the character encoding may be omitted from the
content type header, and it defaults to ISO-8859-1. This is
usually the authoritative information for all browsers and
overrides the encoding declaration in the META tag, except for
IEx, which usually prefers to second-guess everything.

Ok, back to the original question: The interesting point is that
the letter ł is not in ISO-8859-1 but apparently in ISO-8859-2
(the encoding of this mail message). If this was really delivered
to the browser and not just caused by a clever cut&paste, I'm
really interested how this got out of Cocoon without obvious
configuration changes...

J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org


Re: Problem with indent

Posted by Joerg Heinicke <jo...@gmx.de>.
g[R]eK wrote:
> Hi J.Pietschmann,
> 
> JP> Try
> JP>    <indent>yes</indent>
> 
> It isn't working, the output is same :-(

Of course it changes nothing with encoding, you wrote "Problem with indent" 
in the mail subject.

>>>                                <encoding>UTF-8</encoding>
> 
> JP> This should be the default.
> 
> 
>>>And in output I have some chars encoded as entities. How Can
> 
> JP>  > I force Cocoon to encode my language (polish) chars
> 
>>>correctly?
> 
> 
> JP> What is "correctly? If you mean you don't want to have
> JP> HTML entities: you can't (unless you are willing to pull
> JP> some tricks which take some time to explain)
> 
> Look:
> 
> <table class="footer">
> <tbody>
> <tr>
> <td>
>         Some footer... bla bla
> Zaż&oacute;łcić gęsią jaźń</td>
> </tr>
> </tbody>
> </table>
> 
> This is what I get from cocoon HTML Serializer, but I want to have this:
> <table class="footer">
> <tbody>
> <tr>
> <td>
>         Some footer... bla bla
> Zażółcić gęsią jaźń</td>
> </tr>
> </tbody>
> </table>
> 
> The difference is enity '&oacute;'. Interesting is that the cocoon is encoding some of my language letters (like 'ę',
> 'ż') correctly. That is to say, it is encoded as char not as entity.

I know that I often mix the identifiers/correct names, but I will try to 
explain:
&oacute; is only another representation of ó. &oacute; is the character 
entity, while ó is the character. But they represent the same character and 
a browser correctly parsing the HTML should show both in the same way. But 
there is no problem if Cocoon delivers the HTML in the above way.

Now remains the question, why the browser isn't doing this. Do you have a 
<meta> tag specifying the encoding 'UTF-8' in your HTML code? If yes, is the 
browser not UTF-8 aware? Or does it prefer the encoding specified in the 
response header and this is different or/and wrong?

I think Mozilla is a very good browser to test this. On a page you can have 
a look at the properties of the page via 'view page info' in context menu, 
view/page info in the main menu or ctrl + i via keyboard. There is written 
as which encoding the page was recognized. Furthermore you can force Mozilla 
to show the page in another encoding to see the effects, what happens if 
it's recognized correctly (or not).

> Little question... When I start cocoon, I have this text:
> 'server.properties not found, using command line or default properites'
> Is it important?

I don't think, that it is important. server.properties could overwrite the 
mentioned default properties.

Regards,

Joerg

-- 

System Development
VIRBUS AG
Fon  +49(0)341-979-7419
Fax  +49(0)341-979-7409
joerg.heinicke@virbus.de
www.virbus.de

CeBIT 2003 vom 12. bis zum 19. März in Hannover

VIRBUS stellt gemeinsam mit der IBM-Tochter IT-Services and Solutions
GmbH aus.
Lassen Sie sich überraschen von den neuen, umfassenden Möglichkeiten im
Internet-gestützten Zahlungsverkehr.
Besuchen Sie uns: Halle 3, Stand D55.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-users-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-users-help@xml.apache.org