You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Artur Tomusiak <ar...@hannonhill.com> on 2009/04/14 00:20:17 UTC
How to preserve numeric entities when converting xml String to a
org.w3c.dom.Document ?
Hello,
I am trying to convert a String with XML content in it into the
org.w3c.dom.Document object to do some modifications and then to convert
it back to the String. However, even if I do not do any modifications to
the object, I am still getting back a different String than what I have
provided as an input. The problem is with the numeric XML entities. For
example, if my input String is:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
©
&
</xml>
Once I convert this to an org.w3c.dom.Document object and then back to
String, I am getting this as a result:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
©
&
</xml>
After looking more closely, I realized that the org.w3c.dom.Document
object already contains the converted text, which means the problem lies
in conversion from the String to Document, and not when converting back
from Document to String.
Please let me know (an example code would be very appreciated) how can I
do the described conversions while preserving the numeric entities in
the XML.
Thanks,
Artur
--
Artur Tomusiak
(678) 904-6900 ext 140
Hannon Hill - CMS Experience You Can Trust
http://www.hannonhill.com
Re: How to preserve numeric entities when converting xml String to a org.w3c.dom.Document
?
Posted by ke...@us.ibm.com.
The simple answer is "Sorry, but those two forms are absolutely identical
in meaning as far as XML is concerned. If you're going through XML-based
processing, either output is correct. Standard tools aren't going to
maintain this distinction."
The longer answer is that you could postprocess the XML-syntax text, or
write your own serializer, to force the output into this form.
______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
-- "Threes" Rev 1.1 - Duane Elms / Leslie Fish (
http://www.ovff.org/pegasus/songs/threes-rev-11.html)
Re: How to preserve numeric entities when converting xml String to
a org.w3c.dom.Document ?
Posted by Michael Ludwig <ml...@as-guides.com>.
Artur Tomusiak schrieb:
>
> I am trying to convert a String with XML content in it into the
> org.w3c.dom.Document object to do some modifications and then to
> convert it back to the String. However, even if I do not do any
> modifications to the object, I am still getting back a different
> String than what I have provided as an input. The problem is with
> the numeric XML entities. For example, if my input String is:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xml>
> ©
> & </xml>
Hi Artur,
in fact, and to be pedantic, these are neither entities nor entity
references; they're numerical character references; they just happen
to use the same syntax as general entity references. (See XML spec
if interested.)
As keshlam said, these are 100 % identical as far as XML is concerned.
It's not clear to me whether you use XSLT at all or only the DOM.
I'm assuming you're using XSLT.
When transforming to a DOM target, the XSLT serialization instruction
like <xsl:output encoding="US-ASCII"/> is disregarded.
If all you want is a string, there is no point in transforming to the
DOM. In that case, simply specify <xsl:output encoding="US-ASCII"/> in
your stylesheet. That would force numerical character references for
non-ASCII characters.
But the characters in your example are ASCII characters, and I do not
know of a way to have them serialized as numerical character references
in XSLT 1.0. Use Perl or AWK or some other general text processing tool
to postprocess your output.
Michael Ludwig