You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Marco Stipek <st...@triplex.de> on 2000/11/30 13:57:02 UTC

Re[4]: Character Enconding in HTML Output ist wrong!

Hi David,


thank you for your fast reaction! I think you did not exactly the same
as I did. We use the switch HTML (as mentioned in m first mail).

So I call:

testXSLT -IN infile.xml -XSL test.xsl -OUT something.htm -HTML
                                                         ^^^^^^
                                                         This is the
                                                         differnce I
                                                         think

This uses the FormaterToHTML which produces the wrong output. If I use
the FormaterToXML al things are fine (put I can't produce HTML output
:-(  )

regards,
Marco
                                                         

Wednesday, November 29, 2000, 9:57:01 PM, you wrote:


Dlc> Hi Marco,

Dlc> I ran this with Xalan-C 1.0 on Windows 2000 and got the following output:

Dlc> <HTML>
Dlc>     <HEAD>
Dlc>     </HEAD>
Dlc>     <BODY>
Dlc>        &#339;
Dlc> </BODY>
Dlc> </HTML>

Dlc> So I'm not sure what's going on.  What OS are you using?  If you're using
Dlc> Windows, are you using the German version, or the US English version?

Dlc> If you're able to debug, you should watch for the initialization of
Dlc> FormatterToXML::m_maxCharacter and see what the value is set to.  We
Dlc> default to 127 as the max character if we don't know anything about the
Dlc> encoding, so I don't know what's happening here.

Dlc> Dave



                                                                                                                   
Dlc>                     Marco Stipek                                                                                   
Dlc>                     <stipek@tripl        To:     "David_N_Bertoni@lotus.com" <xa...@xml.apache.org>            
Dlc>                     ex.de>               cc:     (bcc: David N Bertoni/CAM/Lotus)                                  
Dlc>                                          Subject:     Re[2]: Character Enconding in HTML Output ist wrong!         
Dlc>                     11/29/2000                                                                                     
Dlc>                     02:40 PM                                                                                       
Dlc>                     Please                                                                                         
Dlc>                     respond to                                                                                     
Dlc>                     xalan-dev                                                                                      
                                                                                                                   
                                                                                                                   



Dlc> Hello David,

Dlc> thanks for your fast response!

Dlc> We have ISO-8859-1 Encoding in the Inputfile, transforming with
Dlc> Parameter -HTML to a Outputfile with the <xsl:output> encoding
Dlc> ISO-8859-1.

Dlc> -- Infile
Dlc> <INXML>
Dlc>        &#339;
Dlc> </INXML>

Dlc> -- XSL
Dlc> <?xml version="1.0"?>
Dlc> <xsl:stylesheet version="1.0" xmlns:xsl="
Dlc> http://www.w3.org/1999/XSL/Transform">
Dlc> <xsl:output method="html" indent="yes" encoding="ISO-8859-1" />

Dlc> <xsl:template match="INXML">
Dlc>     <HTML>
Dlc>          <HEAD></HEAD>
Dlc>          <BODY><xsl:value-of select="."/></BODY>
Dlc>      </HTML>
Dlc> </xsl:template>
Dlc> ----

Dlc> In the Outfile there is a strange (I think multibyte) character.

Dlc> There should be a "$#339;".

Dlc> W3C has defined a &oelig; (which is a small oe litteral) but this
Dlc> doesn't work with the actual Netscape Versions (donÄt know in Ver 6,
Dlc> but in all 4.x Versions doesn't work!)





Dlc> Wednesday, November 29, 2000, 7:55:51 PM, you wrote:


Dlc>> What output encoding attribute are you using on your stylesheet?  If
Dlc> you
Dlc>> don't supply one, Xalan-C defaults to UTF-8, which does support that
Dlc>> character without using an entity.  Perhaps you need to specify the
Dlc>> appropriate encoding?

Dlc>> If not, please post a _small_ xml file and stylesheet which reproduces
Dlc> the
Dlc>> problem and we'll take a look at it.

Dlc>> Dave




Dlc>>                     Marco Stipek

Dlc>>                     <stipek@tripl        To:
Dlc> xalan-dev@xml.apache.org
Dlc>>                     ex.de>               cc:     (bcc: David N
Dlc> Bertoni/CAM/Lotus)
Dlc>>                                          Subject:     Character
Dlc> Enconding in HTML Output ist wrong!
Dlc>>                     11/29/2000

Dlc>>                     01:12 PM

Dlc>>                     Please

Dlc>>                     respond to

Dlc>>                     xalan-dev






Dlc>> We have much Problems with e.g. the &oelig; Charackter Reference,
Dlc>> which is not directly defined in the ISO-8859-1 Charset but wiedly
Dlc>> used on french websites.

Dlc>> As the W3C defined a Entity &#339; for that char (it's Latin-A
Dlc>> extended) we use this value for getting a result.
Dlc>> &olig; is for some reasons not supported by Netscape 4.X.

Dlc>> But Xalan-C (tetsted on 1.0) does something strange.
Dlc>> I think it's writing the binary value of internal represantation
Dlc>> (maybe UTF-X) into the HTML ASCII File, even if we use the notation
Dlc>> &#339;. But the output must be "&#339;".

Dlc>> The possible point of failure we have detected at the
Dlc> FormaterToHTML.cpp
Dlc>> file, which instead of calling  writeNumberedEntityReference(ch)
Dlc>> calls accum(ch).

Dlc>> Could it simply be changed or what exactly is the result?

Dlc>> --------------------------------------------------------------------
Dlc>> extract of FormaterToHTML.cpp:
Dlc>> FormatterToHTML::characters(
Dlc>> ...
Dlc>>         else if(ch >= 0x007Fu && ch <= m_maxCharacter)
Dlc>>         {
Dlc>>              // Hope this is right...
Dlc>>              accum(ch);

Dlc>>         }
Dlc>>         else
Dlc>>         {
Dlc>>             writeNumberedEntityReference(ch);
Dlc>>         }
Dlc>> ...
Dlc>> --------------------------------------------------------------------

Dlc>> best regards,
Dlc>> Marco Stipek








Dlc> Grüsse,
Dlc>  Marco Stipek
Dlc> -----------------------------------------------------------------------
Dlc> Marco Stipek

Dlc> triplex - agentur fuer neue medien GmbH
Dlc> Herzog-Heinrich-Strasse 11-13
Dlc> 80336 Muenchen


Dlc> Tel: +49 89 209138-23
Dlc> Fax: +49 89 209138-10
Dlc> mailto:stipek@triplex.de
Dlc> http://www.triplex.de
Dlc> -----------------------------------------------------------------------








Grüsse,
 Marco Stipek
-----------------------------------------------------------------------
Marco Stipek

triplex - agentur fuer neue medien GmbH
Herzog-Heinrich-Strasse 11-13
80336 Muenchen


Tel: +49 89 209138-23
Fax: +49 89 209138-10
mailto:stipek@triplex.de
http://www.triplex.de
-----------------------------------------------------------------------



Re[5]: Character Enconding in HTML Output ist wrong!

Posted by Marco Stipek <st...@triplex.de>.
Hi David,


I've debuged the FormaterToHTML.cpp and found that the value of
FormatterToXML::m_maxCharacter is 65535 which is makes to me no sense,
when I have a outputrule which declares ISO-8859-1 as the charset to
use.

But I don't understand the setting of the value yet. Where is it
overloaded from the initial setting of the FormaterToXML, and what
logic is used?


regards,
Marco

Thursday, November 30, 2000, 1:57:02 PM, I wrote:

MS> Hi David,


MS> thank you for your fast reaction! I think you did not exactly the same
MS> as I did. We use the switch HTML (as mentioned in m first mail).

MS> So I call:

MS> testXSLT -IN infile.xml -XSL test.xsl -OUT something.htm -HTML
MS>                                                          ^^^^^^
MS>                                                          This is the
MS>                                                          differnce I
MS>                                                          think

MS> This uses the FormaterToHTML which produces the wrong output. If I use
MS> the FormaterToXML al things are fine (put I can't produce HTML output
MS> :-(  )

MS> regards,
MS> Marco
                                                         

MS> Wednesday, November 29, 2000, 9:57:01 PM, you wrote:


Dlc>> Hi Marco,

Dlc>> I ran this with Xalan-C 1.0 on Windows 2000 and got the following output:

Dlc>> <HTML>
Dlc>>     <HEAD>
Dlc>>     </HEAD>
Dlc>>     <BODY>
Dlc>>        &#339;
Dlc>> </BODY>
Dlc>> </HTML>

Dlc>> So I'm not sure what's going on.  What OS are you using?  If you're using
Dlc>> Windows, are you using the German version, or the US English version?

Dlc>> If you're able to debug, you should watch for the initialization of
Dlc>> FormatterToXML::m_maxCharacter and see what the value is set to.  We
Dlc>> default to 127 as the max character if we don't know anything about the
Dlc>> encoding, so I don't know what's happening here.

Dlc>> Dave



                                                                                                                   
Dlc>>                     Marco Stipek                                                                                   
Dlc>>                     <stipek@tripl        To:     "David_N_Bertoni@lotus.com" <xa...@xml.apache.org>            
Dlc>>                     ex.de>               cc:     (bcc: David N Bertoni/CAM/Lotus)                                  
Dlc>>                                          Subject:     Re[2]: Character Enconding in HTML Output ist wrong!         
Dlc>>                     11/29/2000                                                                                     
Dlc>>                     02:40 PM                                                                                       
Dlc>>                     Please                                                                                         
Dlc>>                     respond to                                                                                     
Dlc>>                     xalan-dev                                                                                      
                                                                                                                   
                                                                                                                   



Dlc>> Hello David,

Dlc>> thanks for your fast response!

Dlc>> We have ISO-8859-1 Encoding in the Inputfile, transforming with
Dlc>> Parameter -HTML to a Outputfile with the <xsl:output> encoding
Dlc>> ISO-8859-1.

Dlc>> -- Infile
Dlc>> <INXML>
Dlc>>        &#339;
Dlc>> </INXML>

Dlc>> -- XSL
Dlc>> <?xml version="1.0"?>
Dlc>> <xsl:stylesheet version="1.0" xmlns:xsl="
Dlc>> http://www.w3.org/1999/XSL/Transform">
Dlc>> <xsl:output method="html" indent="yes" encoding="ISO-8859-1" />

Dlc>> <xsl:template match="INXML">
Dlc>>     <HTML>
Dlc>>          <HEAD></HEAD>
Dlc>>          <BODY><xsl:value-of select="."/></BODY>
Dlc>>      </HTML>
Dlc>> </xsl:template>
Dlc>> ----

Dlc>> In the Outfile there is a strange (I think multibyte) character.

Dlc>> There should be a "$#339;".

Dlc>> W3C has defined a &oelig; (which is a small oe litteral) but this
Dlc>> doesn't work with the actual Netscape Versions (donÄt know in Ver 6,
Dlc>> but in all 4.x Versions doesn't work!)





Dlc>> Wednesday, November 29, 2000, 7:55:51 PM, you wrote:


Dlc>>> What output encoding attribute are you using on your stylesheet?  If
Dlc>> you
Dlc>>> don't supply one, Xalan-C defaults to UTF-8, which does support that
Dlc>>> character without using an entity.  Perhaps you need to specify the
Dlc>>> appropriate encoding?

Dlc>>> If not, please post a _small_ xml file and stylesheet which reproduces
Dlc>> the
Dlc>>> problem and we'll take a look at it.

Dlc>>> Dave




Dlc>>>                     Marco Stipek

Dlc>>>                     <stipek@tripl        To:
Dlc>> xalan-dev@xml.apache.org
Dlc>>>                     ex.de>               cc:     (bcc: David N
Dlc>> Bertoni/CAM/Lotus)
Dlc>>>                                          Subject:     Character
Dlc>> Enconding in HTML Output ist wrong!
Dlc>>>                     11/29/2000

Dlc>>>                     01:12 PM

Dlc>>>                     Please

Dlc>>>                     respond to

Dlc>>>                     xalan-dev






Dlc>>> We have much Problems with e.g. the &oelig; Charackter Reference,
Dlc>>> which is not directly defined in the ISO-8859-1 Charset but wiedly
Dlc>>> used on french websites.

Dlc>>> As the W3C defined a Entity &#339; for that char (it's Latin-A
Dlc>>> extended) we use this value for getting a result.
Dlc>>> &olig; is for some reasons not supported by Netscape 4.X.

Dlc>>> But Xalan-C (tetsted on 1.0) does something strange.
Dlc>>> I think it's writing the binary value of internal represantation
Dlc>>> (maybe UTF-X) into the HTML ASCII File, even if we use the notation
Dlc>>> &#339;. But the output must be "&#339;".

Dlc>>> The possible point of failure we have detected at the
Dlc>> FormaterToHTML.cpp
Dlc>>> file, which instead of calling  writeNumberedEntityReference(ch)
Dlc>>> calls accum(ch).

Dlc>>> Could it simply be changed or what exactly is the result?

Dlc>>> --------------------------------------------------------------------
Dlc>>> extract of FormaterToHTML.cpp:
Dlc>>> FormatterToHTML::characters(
Dlc>>> ...
Dlc>>>         else if(ch >= 0x007Fu && ch <= m_maxCharacter)
Dlc>>>         {
Dlc>>>              // Hope this is right...
Dlc>>>              accum(ch);

Dlc>>>         }
Dlc>>>         else
Dlc>>>         {
Dlc>>>             writeNumberedEntityReference(ch);
Dlc>>>         }
Dlc>>> ...
Dlc>>> --------------------------------------------------------------------

Dlc>>> best regards,
Dlc>>> Marco Stipek