You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by bu...@apache.org on 2003/01/27 22:16:20 UTC

DO NOT REPLY [Bug 14947] - DOMWriter don't interpret Entity Reference correctly

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14947>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14947

DOMWriter don't interpret Entity Reference correctly

tng@ca.ibm.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From tng@ca.ibm.com  2003-01-27 21:16 -------
To explain, the code involves two steps
1. XercesDOMParser to parse the xml string 
2. DOMWriter to write out the string

According to XML 1.0 spec, 4.4 XML Processor Treatment of Entities and 
References, the parser need to expand every entity reference in the xml 
document.   

Thus the DOMDocument created by XercesDOMParser in step 1 in fact is something 
like this:
attribute scenario:	<root><text attr=""AAAAA"  &<BBBBBB> & 'C'"/></root>

Then in step 2 the DOMWriter writes the string out.    From the DOMWriter 
perspective, it does NOT know what the original string is, it may be 
	<root><text attr="&quot;AAAAA&quot;  &amp;&lt;BBBBBB&gt; &amp; 
&apos;C&apos;"/></root>
or
	<root><text attr="&quot;AAAAA&quot;  &amp;&lt;BBBBBB> 
&amp; 'C'"/></root>

It does NOT know.   All it sees is
	<root><text attr=""AAAAA"  &<BBBBBB> & 'C'"/></root>

But since the DOMWriter is supposed to generate something that is parsable if 
sent back to the parser, it cannot print such string as is.    Thus the 
DOMWriter is doing some "touch up", just enough, to get the string parsable.

So for the attribute scenario, since the appearance of " , & and < in attribute 
value will lead to not-wellformed XML error, the DOMWriter fixes them to 
&quot; , &amp; and &lt; respectively; while the > and ' in attribute value are 
ok to the parser, so DOMWriter does not do anything to them.

Similarly the DOMWriter fixes some of the characters for the text scenario 
using similar analogy.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org