You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by bu...@apache.org on 2003/01/27 22:16:20 UTC
DO NOT REPLY [Bug 14947] -
DOMWriter don't interpret Entity Reference correctly
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14947>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14947
DOMWriter don't interpret Entity Reference correctly
tng@ca.ibm.com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |INVALID
------- Additional Comments From tng@ca.ibm.com 2003-01-27 21:16 -------
To explain, the code involves two steps
1. XercesDOMParser to parse the xml string
2. DOMWriter to write out the string
According to XML 1.0 spec, 4.4 XML Processor Treatment of Entities and
References, the parser need to expand every entity reference in the xml
document.
Thus the DOMDocument created by XercesDOMParser in step 1 in fact is something
like this:
attribute scenario: <root><text attr=""AAAAA" &<BBBBBB> & 'C'"/></root>
Then in step 2 the DOMWriter writes the string out. From the DOMWriter
perspective, it does NOT know what the original string is, it may be
<root><text attr=""AAAAA" &<BBBBBB> &
'C'"/></root>
or
<root><text attr=""AAAAA" &<BBBBBB>
& 'C'"/></root>
It does NOT know. All it sees is
<root><text attr=""AAAAA" &<BBBBBB> & 'C'"/></root>
But since the DOMWriter is supposed to generate something that is parsable if
sent back to the parser, it cannot print such string as is. Thus the
DOMWriter is doing some "touch up", just enough, to get the string parsable.
So for the attribute scenario, since the appearance of " , & and < in attribute
value will lead to not-wellformed XML error, the DOMWriter fixes them to
" , & and < respectively; while the > and ' in attribute value are
ok to the parser, so DOMWriter does not do anything to them.
Similarly the DOMWriter fixes some of the characters for the text scenario
using similar analogy.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org