You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Sripathy Subramania <SS...@viquity.com> on 2001/01/09 01:33:09 UTC
Bug in BaseMarkupSerializer
Hi,
(xerces-1_1_3)org.apache.xml.serialize.BaseMarkupSerializer.characters(char[
], int, int) inserts
escape sequence (']]<![CDATA[') for end of embedded CDATA (']]>') in wrong
location.
This bug resulted in incorrect XML message with embedded CDATA sections. I
proposed a fix
in the following mail. Would appreciate, if someone familiar with the code
can verify and
baseline the changes.
Xerces version : 1.1.3
JDK version : 1.3
I had a requirement of playing the SAX events(imitating a SAX parser) on a
XMLSerializer
instance to generate XML output. Generated SAX events correspond to a XML
file confirming to the following DTD.
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT Sample (Id, Messages+)>
<!ELEMENT Id (#PCDATA)>
<!ELEMENT Messages (MsgId, MsgDesc?, Msg)>
<!ELEMENT MsgId (#PCDATA)>
<!ELEMENT MsgDesc (#PCDATA)>
<!ELEMENT Msg (#PCDATA)>
In the above mentioned DTD, the 'Msg' element value may be a nested CDATA
value like
'ABC...XYZ...<![CDATA[
....<![CDATA[......]]>......<![CDATA[.....]]>.......]]>....'.
The series of SAX events generated for the element 'Msg' would be
1. startCDATA()
2. startElement()
3. characters()
4. endElement()
5. endCDATA()
There seem to be a bug in BaseMarkupSerializer.java file. The logic for
escaping ']]>' pattern in
embedded CDATA section doesn't seem right.
Original source from
Xerces-1_1_3\src\org\apache\xml\serialize\BaseMarkupSerializer (Lines
457~491)
*********************************************************
public void characters( char[] chars, int start, int length )
{
ElementState state;
state = content();
// Check if text should be print as CDATA section or unescaped
// based on elements listed in the output format (the element
// state) or whether we are inside a CDATA section or entity.
if ( state.inCData || state.doCData ) {
int saveIndent;
// Print a CDATA section. The text is not escaped, but ']]>'
// appearing in the code must be identified and dealt with.
// The contents of a text node is considered space preserving.
if ( ! state.inCData ) {
_printer.printText( "<![CDATA[" );
state.inCData = true;
}
saveIndent = _printer.getNextIndent();
_printer.setNextIndent( 0 );
for ( int index = 0 ; index < length ; ++index ) {
if ( index + 2 < length && chars[ index ] == ']' &&
chars[ index + 1 ] == ']' && chars[ index + 2 ] == '>'
) {
printText( chars, start, index + 2, true, true );
_printer.printText( "]]><![CDATA[" );
start += index + 2;
length -= index + 2;
index = 0;
}
}
if ( length > 0 )
printText( chars, start, length, true, true );
_printer.setNextIndent( saveIndent );
*************************************************************
Proposed changes for the above block
public void characters( char[] chars, int start, int length )
{
ElementState state;
state = content();
// Check if text should be print as CDATA section or unescaped
// based on elements listed in the output format (the element
// state) or whether we are inside a CDATA section or entity.
if ( state.inCData || state.doCData ) {
int saveIndent;
int index = 0;
int endIndex = 0;
// Print a CDATA section. The text is not escaped, but ']]>'
// appearing in the code must be identified and dealt with.
// The contents of a text node is considered space preserving.
if ( ! state.inCData ) {
_printer.printText( "<![CDATA[" );
state.inCData = true;
}
saveIndent = _printer.getNextIndent();
_printer.setNextIndent( 0 );
endIndex = start + length;
for ( index = start ; index < endIndex ; ++index ) {
if ( index + 2 < endIndex && chars[ index ] == ']' &&
chars[ index + 1 ] == ']' && chars[ index + 2 ] == '>'
) {
printText( chars, start, index + 2 - start, true, true
);
_printer.printText( "]]><![CDATA[" );
start = index + 2;
index = start;
}
}
if ( index > start )
printText( chars, start, index-start, true, true );
_printer.setNextIndent( saveIndent );
********************************************************************
However this fix does not handle the case when the end tag(']]>') does not
fall within the
buffer boundary. This might require more changes.
I checked the source for Xerces-1_2_3 and found that the logic remains the
same and
I couldn't find mails discussing this problem/fix in 'xerces-j-dev' mailing
list.
I don't know whether this bug has been fixed already or not.
Please feel free to contact me if you need more information.
Thanks,
-sripathy