You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2003/05/14 13:38:23 UTC

DO NOT REPLY [Bug 19266] - String with 'CDATA' text serialized/deserialized with error

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19266>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=19266

String with 'CDATA' text serialized/deserialized with error





------- Additional Comments From gthb@dimon.is  2003-05-14 11:38 -------
The bug here is that the serializer outputs ]]> unchanged when it occurs in
normal character data (not as the terminator of a CDATA section) and that
is explicitly forbidden by section 2.4 of http://www.w3.org/TR/REC-xml :

    The right angle bracket (>) may be represented using the string "&gt;",
    and must, for compatibility, be escaped using "&gt;" or a character
    reference when it appears in the string "]]>" in content, when that
    string is not marking the end of a CDATA section.

The parser is simply enforcing this rule, when it reads the serialized
output.

The place to fix this is probably org/apache/xml/serialize/XMLSerializer.java
where a new case should be added for ch == '>' -- the simple way would be to
just escape it always with &gt; (which should be harmless), but a fancier fix
would detect whether it is preceded by ]] and escape it only in that case.
That would be a bit more trouble, since the characters are currently processed
one-by-one in separate calls.

I *think* this is the way to fix it, but I haven't tried -- I'm successfully
working around this problem by writing my serialized output into a wrapper
OutputStream that hunts down the ]]> and escapes the >. Of course this is safe
only if you know the serialized output won't contain *real* CDATA sections
(they would be clobbered by this postprocessing) ... I'm only supplying normal
non-lexical events into the serializer, so it should never emit real CDATA
sections so this is safe.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org