You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2001/05/15 22:07:48 UTC
[Bug 1757] New - Wrong encoding used when adding attributes in XSLT

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1757

*** shadow/1757	Tue May 15 13:07:48 2001
--- shadow/1757.tmp.220	Tue May 15 13:07:48 2001
***************
*** 0 ****
--- 1,70 ----
+ +============================================================================+
+ | Wrong encoding used when adding attributes in XSLT                         |
+ +----------------------------------------------------------------------------+
+ |        Bug #: 1757                        Product: XalanJ2                 |
+ |       Status: NEW                         Version: 2.0.1                   |
+ |   Resolution:                            Platform: PC                      |
+ |     Severity: Major                    OS/Version: Windows NT/2K           |
+ |     Priority:                           Component: org.apache.xalan.transf |
+ +----------------------------------------------------------------------------+
+ |  Assigned To: xalan-dev@xml.apache.org                                     |
+ |  Reported By: ewindes@opentv.com                                           |
+ |      CC list: Cc:                                                          |
+ +----------------------------------------------------------------------------+
+ |          URL:                                                              |
+ +============================================================================+
+ |                              DESCRIPTION                                   |
+ I'm not sure exactly which component is to blame here, so I'll describe the 
+ situation...
+ 
+ org.apache.xalan.templates.ElemAttribute.constructNode() calls 
+ org.apache.xalan.transformer.TransformerImpl.transformToString() to turn an 
+ attribute node into a string.
+ 
+ The code called by transformToString() ends up using the default character 
+ encoding returned from org.apache.xalan.serialize.Encodings.getMimeEncoding().  
+ 
+ On US NT, getMimeEncoding() returns UTF-8.  
+ On Japanese NT, getMimeEncoding() returns "MS932".
+ 
+ So, on Japanese NT, characters outside the MS932 charset get encoded by 
+ transformToString (i.e. "&#27531;") and then again when the whole document is 
+ serialized (i.e. to: "&amp;#27531;")
+  
+ Example:  ("�c���Ɖ�" == 4 Shift-JIS characters)
+ 
+ XML:
+ <?xml version="1.0" ?>
+ <blah>
+ </blah>
+ 
+ XSL:
+ <?xml version="1.0" encoding="Shift_JIS" ?>
+ <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
+ <xsl:output method="html" encoding="Shift_JIS"/>
+  <xsl:template match="/">
+   <html>
+    <head>
+     <title>Attribute Test</title>
+    </head>
+    <body>
+     <p>
+        <xsl:attribute name="test">�c���Ɖ�</xsl:attribute>
+ 	�c���Ɖ�
+     </p>
+    </body>
+   </html>
+  </xsl:template>
+ </xsl:stylesheet>
+ 
+ 
+ On US NT, this produces:
+     <p test="�c���Ɖ�">
+ 	�c���Ɖ�
+     </p>
+ 
+ 
+ On Japanese NT, this produces:
+     <p test="&amp;#27531;&amp;#39640;&amp;#29031;&amp;#20250;">
+ 	�c���Ɖ�
+     </p>