You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Marco Stipek <st...@triplex.de> on 2000/11/29 19:12:51 UTC

Character Enconding in HTML Output ist wrong!

We have much Problems with e.g. the &oelig; Charackter Reference,
which is not directly defined in the ISO-8859-1 Charset but wiedly
used on french websites.

As the W3C defined a Entity &#339; for that char (it's Latin-A
extended) we use this value for getting a result.
&olig; is for some reasons not supported by Netscape 4.X.

But Xalan-C (tetsted on 1.0) does something strange.
I think it's writing the binary value of internal represantation
(maybe UTF-X) into the HTML ASCII File, even if we use the notation
&#339;. But the output must be "&#339;".

The possible point of failure we have detected at the FormaterToHTML.cpp
file, which instead of calling  writeNumberedEntityReference(ch)
calls accum(ch).

Could it simply be changed or what exactly is the result?

--------------------------------------------------------------------
extract of FormaterToHTML.cpp:
FormatterToHTML::characters(
...
        else if(ch >= 0x007Fu && ch <= m_maxCharacter)
        {
             // Hope this is right...
             accum(ch);

        }
        else
        {
            writeNumberedEntityReference(ch);
        }
...
--------------------------------------------------------------------

best regards,
Marco Stipek