You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Marco Stipek <st...@triplex.de> on 2000/11/30 13:57:02 UTC
Re[4]: Character Enconding in HTML Output ist wrong!
Hi David,
thank you for your fast reaction! I think you did not exactly the same
as I did. We use the switch HTML (as mentioned in m first mail).
So I call:
testXSLT -IN infile.xml -XSL test.xsl -OUT something.htm -HTML
^^^^^^
This is the
differnce I
think
This uses the FormaterToHTML which produces the wrong output. If I use
the FormaterToXML al things are fine (put I can't produce HTML output
:-( )
regards,
Marco
Wednesday, November 29, 2000, 9:57:01 PM, you wrote:
Dlc> Hi Marco,
Dlc> I ran this with Xalan-C 1.0 on Windows 2000 and got the following output:
Dlc> <HTML>
Dlc> <HEAD>
Dlc> </HEAD>
Dlc> <BODY>
Dlc> œ
Dlc> </BODY>
Dlc> </HTML>
Dlc> So I'm not sure what's going on. What OS are you using? If you're using
Dlc> Windows, are you using the German version, or the US English version?
Dlc> If you're able to debug, you should watch for the initialization of
Dlc> FormatterToXML::m_maxCharacter and see what the value is set to. We
Dlc> default to 127 as the max character if we don't know anything about the
Dlc> encoding, so I don't know what's happening here.
Dlc> Dave
Dlc> Marco Stipek
Dlc> <stipek@tripl To: "David_N_Bertoni@lotus.com" <xa...@xml.apache.org>
Dlc> ex.de> cc: (bcc: David N Bertoni/CAM/Lotus)
Dlc> Subject: Re[2]: Character Enconding in HTML Output ist wrong!
Dlc> 11/29/2000
Dlc> 02:40 PM
Dlc> Please
Dlc> respond to
Dlc> xalan-dev
Dlc> Hello David,
Dlc> thanks for your fast response!
Dlc> We have ISO-8859-1 Encoding in the Inputfile, transforming with
Dlc> Parameter -HTML to a Outputfile with the <xsl:output> encoding
Dlc> ISO-8859-1.
Dlc> -- Infile
Dlc> <INXML>
Dlc> œ
Dlc> </INXML>
Dlc> -- XSL
Dlc> <?xml version="1.0"?>
Dlc> <xsl:stylesheet version="1.0" xmlns:xsl="
Dlc> http://www.w3.org/1999/XSL/Transform">
Dlc> <xsl:output method="html" indent="yes" encoding="ISO-8859-1" />
Dlc> <xsl:template match="INXML">
Dlc> <HTML>
Dlc> <HEAD></HEAD>
Dlc> <BODY><xsl:value-of select="."/></BODY>
Dlc> </HTML>
Dlc> </xsl:template>
Dlc> ----
Dlc> In the Outfile there is a strange (I think multibyte) character.
Dlc> There should be a "$#339;".
Dlc> W3C has defined a œ (which is a small oe litteral) but this
Dlc> doesn't work with the actual Netscape Versions (donÄt know in Ver 6,
Dlc> but in all 4.x Versions doesn't work!)
Dlc> Wednesday, November 29, 2000, 7:55:51 PM, you wrote:
Dlc>> What output encoding attribute are you using on your stylesheet? If
Dlc> you
Dlc>> don't supply one, Xalan-C defaults to UTF-8, which does support that
Dlc>> character without using an entity. Perhaps you need to specify the
Dlc>> appropriate encoding?
Dlc>> If not, please post a _small_ xml file and stylesheet which reproduces
Dlc> the
Dlc>> problem and we'll take a look at it.
Dlc>> Dave
Dlc>> Marco Stipek
Dlc>> <stipek@tripl To:
Dlc> xalan-dev@xml.apache.org
Dlc>> ex.de> cc: (bcc: David N
Dlc> Bertoni/CAM/Lotus)
Dlc>> Subject: Character
Dlc> Enconding in HTML Output ist wrong!
Dlc>> 11/29/2000
Dlc>> 01:12 PM
Dlc>> Please
Dlc>> respond to
Dlc>> xalan-dev
Dlc>> We have much Problems with e.g. the œ Charackter Reference,
Dlc>> which is not directly defined in the ISO-8859-1 Charset but wiedly
Dlc>> used on french websites.
Dlc>> As the W3C defined a Entity œ for that char (it's Latin-A
Dlc>> extended) we use this value for getting a result.
Dlc>> &olig; is for some reasons not supported by Netscape 4.X.
Dlc>> But Xalan-C (tetsted on 1.0) does something strange.
Dlc>> I think it's writing the binary value of internal represantation
Dlc>> (maybe UTF-X) into the HTML ASCII File, even if we use the notation
Dlc>> œ. But the output must be "œ".
Dlc>> The possible point of failure we have detected at the
Dlc> FormaterToHTML.cpp
Dlc>> file, which instead of calling writeNumberedEntityReference(ch)
Dlc>> calls accum(ch).
Dlc>> Could it simply be changed or what exactly is the result?
Dlc>> --------------------------------------------------------------------
Dlc>> extract of FormaterToHTML.cpp:
Dlc>> FormatterToHTML::characters(
Dlc>> ...
Dlc>> else if(ch >= 0x007Fu && ch <= m_maxCharacter)
Dlc>> {
Dlc>> // Hope this is right...
Dlc>> accum(ch);
Dlc>> }
Dlc>> else
Dlc>> {
Dlc>> writeNumberedEntityReference(ch);
Dlc>> }
Dlc>> ...
Dlc>> --------------------------------------------------------------------
Dlc>> best regards,
Dlc>> Marco Stipek
Dlc> Grüsse,
Dlc> Marco Stipek
Dlc> -----------------------------------------------------------------------
Dlc> Marco Stipek
Dlc> triplex - agentur fuer neue medien GmbH
Dlc> Herzog-Heinrich-Strasse 11-13
Dlc> 80336 Muenchen
Dlc> Tel: +49 89 209138-23
Dlc> Fax: +49 89 209138-10
Dlc> mailto:stipek@triplex.de
Dlc> http://www.triplex.de
Dlc> -----------------------------------------------------------------------
Grüsse,
Marco Stipek
-----------------------------------------------------------------------
Marco Stipek
triplex - agentur fuer neue medien GmbH
Herzog-Heinrich-Strasse 11-13
80336 Muenchen
Tel: +49 89 209138-23
Fax: +49 89 209138-10
mailto:stipek@triplex.de
http://www.triplex.de
-----------------------------------------------------------------------
Re[5]: Character Enconding in HTML Output ist wrong!
Posted by Marco Stipek <st...@triplex.de>.
Hi David,
I've debuged the FormaterToHTML.cpp and found that the value of
FormatterToXML::m_maxCharacter is 65535 which is makes to me no sense,
when I have a outputrule which declares ISO-8859-1 as the charset to
use.
But I don't understand the setting of the value yet. Where is it
overloaded from the initial setting of the FormaterToXML, and what
logic is used?
regards,
Marco
Thursday, November 30, 2000, 1:57:02 PM, I wrote:
MS> Hi David,
MS> thank you for your fast reaction! I think you did not exactly the same
MS> as I did. We use the switch HTML (as mentioned in m first mail).
MS> So I call:
MS> testXSLT -IN infile.xml -XSL test.xsl -OUT something.htm -HTML
MS> ^^^^^^
MS> This is the
MS> differnce I
MS> think
MS> This uses the FormaterToHTML which produces the wrong output. If I use
MS> the FormaterToXML al things are fine (put I can't produce HTML output
MS> :-( )
MS> regards,
MS> Marco
MS> Wednesday, November 29, 2000, 9:57:01 PM, you wrote:
Dlc>> Hi Marco,
Dlc>> I ran this with Xalan-C 1.0 on Windows 2000 and got the following output:
Dlc>> <HTML>
Dlc>> <HEAD>
Dlc>> </HEAD>
Dlc>> <BODY>
Dlc>> œ
Dlc>> </BODY>
Dlc>> </HTML>
Dlc>> So I'm not sure what's going on. What OS are you using? If you're using
Dlc>> Windows, are you using the German version, or the US English version?
Dlc>> If you're able to debug, you should watch for the initialization of
Dlc>> FormatterToXML::m_maxCharacter and see what the value is set to. We
Dlc>> default to 127 as the max character if we don't know anything about the
Dlc>> encoding, so I don't know what's happening here.
Dlc>> Dave
Dlc>> Marco Stipek
Dlc>> <stipek@tripl To: "David_N_Bertoni@lotus.com" <xa...@xml.apache.org>
Dlc>> ex.de> cc: (bcc: David N Bertoni/CAM/Lotus)
Dlc>> Subject: Re[2]: Character Enconding in HTML Output ist wrong!
Dlc>> 11/29/2000
Dlc>> 02:40 PM
Dlc>> Please
Dlc>> respond to
Dlc>> xalan-dev
Dlc>> Hello David,
Dlc>> thanks for your fast response!
Dlc>> We have ISO-8859-1 Encoding in the Inputfile, transforming with
Dlc>> Parameter -HTML to a Outputfile with the <xsl:output> encoding
Dlc>> ISO-8859-1.
Dlc>> -- Infile
Dlc>> <INXML>
Dlc>> œ
Dlc>> </INXML>
Dlc>> -- XSL
Dlc>> <?xml version="1.0"?>
Dlc>> <xsl:stylesheet version="1.0" xmlns:xsl="
Dlc>> http://www.w3.org/1999/XSL/Transform">
Dlc>> <xsl:output method="html" indent="yes" encoding="ISO-8859-1" />
Dlc>> <xsl:template match="INXML">
Dlc>> <HTML>
Dlc>> <HEAD></HEAD>
Dlc>> <BODY><xsl:value-of select="."/></BODY>
Dlc>> </HTML>
Dlc>> </xsl:template>
Dlc>> ----
Dlc>> In the Outfile there is a strange (I think multibyte) character.
Dlc>> There should be a "$#339;".
Dlc>> W3C has defined a œ (which is a small oe litteral) but this
Dlc>> doesn't work with the actual Netscape Versions (donÄt know in Ver 6,
Dlc>> but in all 4.x Versions doesn't work!)
Dlc>> Wednesday, November 29, 2000, 7:55:51 PM, you wrote:
Dlc>>> What output encoding attribute are you using on your stylesheet? If
Dlc>> you
Dlc>>> don't supply one, Xalan-C defaults to UTF-8, which does support that
Dlc>>> character without using an entity. Perhaps you need to specify the
Dlc>>> appropriate encoding?
Dlc>>> If not, please post a _small_ xml file and stylesheet which reproduces
Dlc>> the
Dlc>>> problem and we'll take a look at it.
Dlc>>> Dave
Dlc>>> Marco Stipek
Dlc>>> <stipek@tripl To:
Dlc>> xalan-dev@xml.apache.org
Dlc>>> ex.de> cc: (bcc: David N
Dlc>> Bertoni/CAM/Lotus)
Dlc>>> Subject: Character
Dlc>> Enconding in HTML Output ist wrong!
Dlc>>> 11/29/2000
Dlc>>> 01:12 PM
Dlc>>> Please
Dlc>>> respond to
Dlc>>> xalan-dev
Dlc>>> We have much Problems with e.g. the œ Charackter Reference,
Dlc>>> which is not directly defined in the ISO-8859-1 Charset but wiedly
Dlc>>> used on french websites.
Dlc>>> As the W3C defined a Entity œ for that char (it's Latin-A
Dlc>>> extended) we use this value for getting a result.
Dlc>>> &olig; is for some reasons not supported by Netscape 4.X.
Dlc>>> But Xalan-C (tetsted on 1.0) does something strange.
Dlc>>> I think it's writing the binary value of internal represantation
Dlc>>> (maybe UTF-X) into the HTML ASCII File, even if we use the notation
Dlc>>> œ. But the output must be "œ".
Dlc>>> The possible point of failure we have detected at the
Dlc>> FormaterToHTML.cpp
Dlc>>> file, which instead of calling writeNumberedEntityReference(ch)
Dlc>>> calls accum(ch).
Dlc>>> Could it simply be changed or what exactly is the result?
Dlc>>> --------------------------------------------------------------------
Dlc>>> extract of FormaterToHTML.cpp:
Dlc>>> FormatterToHTML::characters(
Dlc>>> ...
Dlc>>> else if(ch >= 0x007Fu && ch <= m_maxCharacter)
Dlc>>> {
Dlc>>> // Hope this is right...
Dlc>>> accum(ch);
Dlc>>> }
Dlc>>> else
Dlc>>> {
Dlc>>> writeNumberedEntityReference(ch);
Dlc>>> }
Dlc>>> ...
Dlc>>> --------------------------------------------------------------------
Dlc>>> best regards,
Dlc>>> Marco Stipek