You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Marco Stipek <st...@triplex.de> on 2000/11/29 19:12:51 UTC
Character Enconding in HTML Output ist wrong!
We have much Problems with e.g. the œ Charackter Reference,
which is not directly defined in the ISO-8859-1 Charset but wiedly
used on french websites.
As the W3C defined a Entity œ for that char (it's Latin-A
extended) we use this value for getting a result.
&olig; is for some reasons not supported by Netscape 4.X.
But Xalan-C (tetsted on 1.0) does something strange.
I think it's writing the binary value of internal represantation
(maybe UTF-X) into the HTML ASCII File, even if we use the notation
œ. But the output must be "œ".
The possible point of failure we have detected at the FormaterToHTML.cpp
file, which instead of calling writeNumberedEntityReference(ch)
calls accum(ch).
Could it simply be changed or what exactly is the result?
--------------------------------------------------------------------
extract of FormaterToHTML.cpp:
FormatterToHTML::characters(
...
else if(ch >= 0x007Fu && ch <= m_maxCharacter)
{
// Hope this is right...
accum(ch);
}
else
{
writeNumberedEntityReference(ch);
}
...
--------------------------------------------------------------------
best regards,
Marco Stipek