You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Marco Stipek <st...@triplex.de> on 2000/11/30 14:13:55 UTC
Re[5]: Character Enconding in HTML Output ist wrong!
Hi David,
I've debuged the FormaterToHTML.cpp and found that the value of
FormatterToXML::m_maxCharacter is 65535 which is makes to me no sense,
when I have a outputrule which declares ISO-8859-1 as the charset to
use.
But I don't understand the setting of the value yet. Where is it
overloaded from the initial setting of the FormaterToXML, and what
logic is used?
regards,
Marco
Thursday, November 30, 2000, 1:57:02 PM, I wrote:
MS> Hi David,
MS> thank you for your fast reaction! I think you did not exactly the same
MS> as I did. We use the switch HTML (as mentioned in m first mail).
MS> So I call:
MS> testXSLT -IN infile.xml -XSL test.xsl -OUT something.htm -HTML
MS> ^^^^^^
MS> This is the
MS> differnce I
MS> think
MS> This uses the FormaterToHTML which produces the wrong output. If I use
MS> the FormaterToXML al things are fine (put I can't produce HTML output
MS> :-( )
MS> regards,
MS> Marco
MS> Wednesday, November 29, 2000, 9:57:01 PM, you wrote:
Dlc>> Hi Marco,
Dlc>> I ran this with Xalan-C 1.0 on Windows 2000 and got the following output:
Dlc>> <HTML>
Dlc>> <HEAD>
Dlc>> </HEAD>
Dlc>> <BODY>
Dlc>> œ
Dlc>> </BODY>
Dlc>> </HTML>
Dlc>> So I'm not sure what's going on. What OS are you using? If you're using
Dlc>> Windows, are you using the German version, or the US English version?
Dlc>> If you're able to debug, you should watch for the initialization of
Dlc>> FormatterToXML::m_maxCharacter and see what the value is set to. We
Dlc>> default to 127 as the max character if we don't know anything about the
Dlc>> encoding, so I don't know what's happening here.
Dlc>> Dave
Dlc>> Marco Stipek
Dlc>> <stipek@tripl To: "David_N_Bertoni@lotus.com" <xa...@xml.apache.org>
Dlc>> ex.de> cc: (bcc: David N Bertoni/CAM/Lotus)
Dlc>> Subject: Re[2]: Character Enconding in HTML Output ist wrong!
Dlc>> 11/29/2000
Dlc>> 02:40 PM
Dlc>> Please
Dlc>> respond to
Dlc>> xalan-dev
Dlc>> Hello David,
Dlc>> thanks for your fast response!
Dlc>> We have ISO-8859-1 Encoding in the Inputfile, transforming with
Dlc>> Parameter -HTML to a Outputfile with the <xsl:output> encoding
Dlc>> ISO-8859-1.
Dlc>> -- Infile
Dlc>> <INXML>
Dlc>> œ
Dlc>> </INXML>
Dlc>> -- XSL
Dlc>> <?xml version="1.0"?>
Dlc>> <xsl:stylesheet version="1.0" xmlns:xsl="
Dlc>> http://www.w3.org/1999/XSL/Transform">
Dlc>> <xsl:output method="html" indent="yes" encoding="ISO-8859-1" />
Dlc>> <xsl:template match="INXML">
Dlc>> <HTML>
Dlc>> <HEAD></HEAD>
Dlc>> <BODY><xsl:value-of select="."/></BODY>
Dlc>> </HTML>
Dlc>> </xsl:template>
Dlc>> ----
Dlc>> In the Outfile there is a strange (I think multibyte) character.
Dlc>> There should be a "$#339;".
Dlc>> W3C has defined a œ (which is a small oe litteral) but this
Dlc>> doesn't work with the actual Netscape Versions (donÄt know in Ver 6,
Dlc>> but in all 4.x Versions doesn't work!)
Dlc>> Wednesday, November 29, 2000, 7:55:51 PM, you wrote:
Dlc>>> What output encoding attribute are you using on your stylesheet? If
Dlc>> you
Dlc>>> don't supply one, Xalan-C defaults to UTF-8, which does support that
Dlc>>> character without using an entity. Perhaps you need to specify the
Dlc>>> appropriate encoding?
Dlc>>> If not, please post a _small_ xml file and stylesheet which reproduces
Dlc>> the
Dlc>>> problem and we'll take a look at it.
Dlc>>> Dave
Dlc>>> Marco Stipek
Dlc>>> <stipek@tripl To:
Dlc>> xalan-dev@xml.apache.org
Dlc>>> ex.de> cc: (bcc: David N
Dlc>> Bertoni/CAM/Lotus)
Dlc>>> Subject: Character
Dlc>> Enconding in HTML Output ist wrong!
Dlc>>> 11/29/2000
Dlc>>> 01:12 PM
Dlc>>> Please
Dlc>>> respond to
Dlc>>> xalan-dev
Dlc>>> We have much Problems with e.g. the œ Charackter Reference,
Dlc>>> which is not directly defined in the ISO-8859-1 Charset but wiedly
Dlc>>> used on french websites.
Dlc>>> As the W3C defined a Entity œ for that char (it's Latin-A
Dlc>>> extended) we use this value for getting a result.
Dlc>>> &olig; is for some reasons not supported by Netscape 4.X.
Dlc>>> But Xalan-C (tetsted on 1.0) does something strange.
Dlc>>> I think it's writing the binary value of internal represantation
Dlc>>> (maybe UTF-X) into the HTML ASCII File, even if we use the notation
Dlc>>> œ. But the output must be "œ".
Dlc>>> The possible point of failure we have detected at the
Dlc>> FormaterToHTML.cpp
Dlc>>> file, which instead of calling writeNumberedEntityReference(ch)
Dlc>>> calls accum(ch).
Dlc>>> Could it simply be changed or what exactly is the result?
Dlc>>> --------------------------------------------------------------------
Dlc>>> extract of FormaterToHTML.cpp:
Dlc>>> FormatterToHTML::characters(
Dlc>>> ...
Dlc>>> else if(ch >= 0x007Fu && ch <= m_maxCharacter)
Dlc>>> {
Dlc>>> // Hope this is right...
Dlc>>> accum(ch);
Dlc>>> }
Dlc>>> else
Dlc>>> {
Dlc>>> writeNumberedEntityReference(ch);
Dlc>>> }
Dlc>>> ...
Dlc>>> --------------------------------------------------------------------
Dlc>>> best regards,
Dlc>>> Marco Stipek