You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xml.apache.org by Lyle Coder <x_...@hotmail.com> on 2001/05/20 20:32:42 UTC

UTF-8 vs. markup

Hi,
This is probobally bit of a general XML question, but I'm using xalan and
wanted to know the plus and minus of the following question in xalan too.

I'm parsing HTML and constructing a DOM from it.  My HTML parser produces
UTF-8 data.  My question is, when I parse text such as "&copy;"  or
"&nbsp;"... these have their own UTF-8 (and hence UTF-16) equivalents (for
example, the 2 byte sequence in UTF-8).  When I'm constructing my DOM,
should I use &nbsp; entity references in my DOM or should I just use the
UTF-8 multibyte o UTF-16 2 byte sequences?

Please advise

Thanks
Lyle

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org