You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2003/07/21 17:55:41 UTC
DO NOT REPLY [Bug 12105] -
UTF Encoding is not preserved
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12105>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12105
UTF Encoding is not preserved
------- Additional Comments From aseidl@myst-technology.com 2003-07-21 15:55 -------
I believe this must be classified as a bug (and not an implementation
decisions)--and a serious one at that--because it can produce invalid XML
results. For example, I have an XML document that uses UTF-8 encoding and
contains a copyright character (0xA9, which UTF-8 encodes as 0xC2 0xA9). When
transformating this into another XML document that also uses UTF-encoding, the
resulting document contains only a single 0xA9 character which is illegal in a
UTF-8 encoded document.
Note: When transforming this same source XML document into an HTML document
(also using UTF-8), the UTF-8 sequence (0xC2 0xA9) is correctly recognized and
transformed.