You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Lyle Coder <x_...@hotmail.com> on 2001/05/20 20:32:42 UTC
UTF-8 vs. markup
Hi,
This is probobally bit of a general XML question, but I'm using xalan and
wanted to know the plus and minus of the following question in xalan too.
I'm parsing HTML and constructing a DOM from it. My HTML parser produces
UTF-8 data. My question is, when I parse text such as "©" or
" "... these have their own UTF-8 (and hence UTF-16) equivalents (for
example, the 2 byte sequence in UTF-8). When I'm constructing my DOM,
should I use entity references in my DOM or should I just use the
UTF-8 multibyte o UTF-16 2 byte sequences?
Please advise
Thanks
Lyle
---------------------------------------------------------------------
In case of troubles, e-mail: webmaster@xml.apache.org
To unsubscribe, e-mail: general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org