You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@xalan.apache.org by Christopher Hull <no...@remarque.org> on 2001/10/12 20:47:01 UTC

Problem Transforming Chinese (UTF-8)

I'm transforming a DOM that exists only in memory.  I'm attempting to
transform from a document that contains some Chinese text, and the data
in the original XML seems to be getting corrupted.  I suspect that is
because the text nodes themselves are bad.

The chinese text in the Style Sheet comes out OK, so it's not an xsl
encoding issue.  The Strings that are used to create the XML text nodes
to begin with appear to be valid.  When I pull the bytes out, they match
the original Chinese data.  Thereforre I suspect the source DOM text
nodes are being munged.  

When I call doc.createTextNode(chineseString).. the String may be
getting expressed in whatever the default encoding of my platform is,
and not UTF-8.  This seems to be the most likely point of failure.

How can I "force" Xercesj to build a DOM where the text nodes are truely
UTF-8 ?
Perhaps I should change the default encoding of my platform, but I'm not
sure how to do this.  The only system property I found was
file.encoding, and that had no effect on String expression when changed.

Any ideas?

Thanks;
-Chris