You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2001/05/07 09:24:45 UTC
[Bug 1639] New - Xalan escaping characters for ISO encodings other than ISO-8859-1
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1639
*** shadow/1639 Mon May 7 00:24:45 2001
--- shadow/1639.tmp.15531 Mon May 7 00:24:45 2001
***************
*** 0 ****
--- 1,58 ----
+ +============================================================================+
+ | Xalan escaping characters for ISO encodings other than ISO-8859-1 |
+ +----------------------------------------------------------------------------+
+ | Bug #: 1639 Product: XalanJ2 |
+ | Status: NEW Version: 2.0.x |
+ | Resolution: Platform: PC |
+ | Severity: Normal OS/Version: |
+ | Priority: Component: org.apache.xalan.serial |
+ +----------------------------------------------------------------------------+
+ | Assigned To: xalan-dev@xml.apache.org |
+ | Reported By: tgeor@yahoo.com |
+ | CC list: Cc: |
+ +----------------------------------------------------------------------------+
+ | URL: |
+ +============================================================================+
+ | DESCRIPTION |
+ I found that Xalan serializer escapes characters when you use an encoding of
+ anonther language.
+
+ Example
+
+ ------------ foo.xml ------------
+ <?xml version="1.0" encoding="ISO-8859-7"?>
+ <doc>��� (ABC in Greek)</doc>
+
+ ------------ foo.xsl ------------
+ <?xml version="1.0"?>
+ <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
+ <xsl:output method="xml" encoding="ISO-8859-7"/>
+ <xsl:template match="doc">
+ <out><xsl:value-of select="."/></out>
+ </xsl:template>
+ </xsl:stylesheet>
+
+ ------------ foo.out ------------
+ <?xml version="1.0" encoding="ISO-8859-7"?>
+ <out>ΑΒΓ (ABC in Greek)</out>
+
+ The expected output should be
+
+ <?xml version="1.0" encoding="ISO-8859-7"?>
+ <out>��� (ABC in Greek)</out>
+
+ The same happens to attribute values when you have no-english characters.
+
+ The problem is in the code of org.apache.xalan.serialize.SerializerToXML and
+ org.apache.xalan.serialize.SerializerToHTML when you check if ch <
+ m_maxCharacter. When java reads data from an input stream converts character to
+ Unicode, so for example, the greek letter A that has a value of 0xC1 (193) in
+ ISO-8859-7 becomes unicode letter 0x0391 (913). The max printable character in
+ ISO formats is 0xff (255) so the comparison ch < m_maxCharacter will be always
+ false for this letters.
+
+ You should compare the output character values with the m_maxCharacter and not
+ the unicode character values. A solution could be translates characters to the
+ output encoding using the getBytes(m_encoding) method of String class and
+ compare the byte values with the m_maxCharacter but their will be a performance
+ overhead.