You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Marco Stipek <st...@triplex.de> on 2000/11/30 14:13:55 UTC

Re[5]: Character Enconding in HTML Output ist wrong!

Hi David,


I've debuged the FormaterToHTML.cpp and found that the value of
FormatterToXML::m_maxCharacter is 65535 which is makes to me no sense,
when I have a outputrule which declares ISO-8859-1 as the charset to
use.

But I don't understand the setting of the value yet. Where is it
overloaded from the initial setting of the FormaterToXML, and what
logic is used?


regards,
Marco

Thursday, November 30, 2000, 1:57:02 PM, I wrote:

MS> Hi David,


MS> thank you for your fast reaction! I think you did not exactly the same
MS> as I did. We use the switch HTML (as mentioned in m first mail).

MS> So I call:

MS> testXSLT -IN infile.xml -XSL test.xsl -OUT something.htm -HTML
MS>                                                          ^^^^^^
MS>                                                          This is the
MS>                                                          differnce I
MS>                                                          think

MS> This uses the FormaterToHTML which produces the wrong output. If I use
MS> the FormaterToXML al things are fine (put I can't produce HTML output
MS> :-(  )

MS> regards,
MS> Marco
                                                         

MS> Wednesday, November 29, 2000, 9:57:01 PM, you wrote:


Dlc>> Hi Marco,

Dlc>> I ran this with Xalan-C 1.0 on Windows 2000 and got the following output:

Dlc>> <HTML>
Dlc>>     <HEAD>
Dlc>>     </HEAD>
Dlc>>     <BODY>
Dlc>>        &#339;
Dlc>> </BODY>
Dlc>> </HTML>

Dlc>> So I'm not sure what's going on.  What OS are you using?  If you're using
Dlc>> Windows, are you using the German version, or the US English version?

Dlc>> If you're able to debug, you should watch for the initialization of
Dlc>> FormatterToXML::m_maxCharacter and see what the value is set to.  We
Dlc>> default to 127 as the max character if we don't know anything about the
Dlc>> encoding, so I don't know what's happening here.

Dlc>> Dave



                                                                                                                   
Dlc>>                     Marco Stipek                                                                                   
Dlc>>                     <stipek@tripl        To:     "David_N_Bertoni@lotus.com" <xa...@xml.apache.org>            
Dlc>>                     ex.de>               cc:     (bcc: David N Bertoni/CAM/Lotus)                                  
Dlc>>                                          Subject:     Re[2]: Character Enconding in HTML Output ist wrong!         
Dlc>>                     11/29/2000                                                                                     
Dlc>>                     02:40 PM                                                                                       
Dlc>>                     Please                                                                                         
Dlc>>                     respond to                                                                                     
Dlc>>                     xalan-dev                                                                                      
                                                                                                                   
                                                                                                                   



Dlc>> Hello David,

Dlc>> thanks for your fast response!

Dlc>> We have ISO-8859-1 Encoding in the Inputfile, transforming with
Dlc>> Parameter -HTML to a Outputfile with the <xsl:output> encoding
Dlc>> ISO-8859-1.

Dlc>> -- Infile
Dlc>> <INXML>
Dlc>>        &#339;
Dlc>> </INXML>

Dlc>> -- XSL
Dlc>> <?xml version="1.0"?>
Dlc>> <xsl:stylesheet version="1.0" xmlns:xsl="
Dlc>> http://www.w3.org/1999/XSL/Transform">
Dlc>> <xsl:output method="html" indent="yes" encoding="ISO-8859-1" />

Dlc>> <xsl:template match="INXML">
Dlc>>     <HTML>
Dlc>>          <HEAD></HEAD>
Dlc>>          <BODY><xsl:value-of select="."/></BODY>
Dlc>>      </HTML>
Dlc>> </xsl:template>
Dlc>> ----

Dlc>> In the Outfile there is a strange (I think multibyte) character.

Dlc>> There should be a "$#339;".

Dlc>> W3C has defined a &oelig; (which is a small oe litteral) but this
Dlc>> doesn't work with the actual Netscape Versions (donÄt know in Ver 6,
Dlc>> but in all 4.x Versions doesn't work!)





Dlc>> Wednesday, November 29, 2000, 7:55:51 PM, you wrote:


Dlc>>> What output encoding attribute are you using on your stylesheet?  If
Dlc>> you
Dlc>>> don't supply one, Xalan-C defaults to UTF-8, which does support that
Dlc>>> character without using an entity.  Perhaps you need to specify the
Dlc>>> appropriate encoding?

Dlc>>> If not, please post a _small_ xml file and stylesheet which reproduces
Dlc>> the
Dlc>>> problem and we'll take a look at it.

Dlc>>> Dave




Dlc>>>                     Marco Stipek

Dlc>>>                     <stipek@tripl        To:
Dlc>> xalan-dev@xml.apache.org
Dlc>>>                     ex.de>               cc:     (bcc: David N
Dlc>> Bertoni/CAM/Lotus)
Dlc>>>                                          Subject:     Character
Dlc>> Enconding in HTML Output ist wrong!
Dlc>>>                     11/29/2000

Dlc>>>                     01:12 PM

Dlc>>>                     Please

Dlc>>>                     respond to

Dlc>>>                     xalan-dev






Dlc>>> We have much Problems with e.g. the &oelig; Charackter Reference,
Dlc>>> which is not directly defined in the ISO-8859-1 Charset but wiedly
Dlc>>> used on french websites.

Dlc>>> As the W3C defined a Entity &#339; for that char (it's Latin-A
Dlc>>> extended) we use this value for getting a result.
Dlc>>> &olig; is for some reasons not supported by Netscape 4.X.

Dlc>>> But Xalan-C (tetsted on 1.0) does something strange.
Dlc>>> I think it's writing the binary value of internal represantation
Dlc>>> (maybe UTF-X) into the HTML ASCII File, even if we use the notation
Dlc>>> &#339;. But the output must be "&#339;".

Dlc>>> The possible point of failure we have detected at the
Dlc>> FormaterToHTML.cpp
Dlc>>> file, which instead of calling  writeNumberedEntityReference(ch)
Dlc>>> calls accum(ch).

Dlc>>> Could it simply be changed or what exactly is the result?

Dlc>>> --------------------------------------------------------------------
Dlc>>> extract of FormaterToHTML.cpp:
Dlc>>> FormatterToHTML::characters(
Dlc>>> ...
Dlc>>>         else if(ch >= 0x007Fu && ch <= m_maxCharacter)
Dlc>>>         {
Dlc>>>              // Hope this is right...
Dlc>>>              accum(ch);

Dlc>>>         }
Dlc>>>         else
Dlc>>>         {
Dlc>>>             writeNumberedEntityReference(ch);
Dlc>>>         }
Dlc>>> ...
Dlc>>> --------------------------------------------------------------------

Dlc>>> best regards,
Dlc>>> Marco Stipek