You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@xalan.apache.org by bu...@apache.org on 2003/07/21 17:55:41 UTC

DO NOT REPLY [Bug 12105] - UTF Encoding is not preserved

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12105

UTF Encoding is not preserved





------- Additional Comments From aseidl@myst-technology.com  2003-07-21 15:55 -------
I believe this must be classified as a bug (and not an implementation 
decisions)--and a serious one at that--because it can produce invalid XML 
results.  For example, I have an XML document that uses UTF-8 encoding and 
contains a copyright character (0xA9, which UTF-8 encodes as 0xC2 0xA9).  When 
transformating this into another XML document that also uses UTF-encoding, the 
resulting document contains only a single 0xA9 character which is illegal in a 
UTF-8 encoded document.

Note: When transforming this same source XML document into an HTML document 
(also using UTF-8), the UTF-8 sequence (0xC2 0xA9) is correctly recognized and 
transformed.