You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Sripathy Subramania <SS...@viquity.com> on 2001/04/09 17:21:32 UTC
Serialization problem
> Hi,
>
> Could any of the developers be kind enough to read this email and
> do the needful. Was only trying to help the development team in creating a
> good product.
> Should any one of you feel that it is incorrect, please say so.
>
> Regards,
> -sripathy
>
>
> -----Original Message-----
> From: Sripathy Subramania
> Sent: Wednesday, April 04, 2001 2:03 PM
> To: 'xerces-j-dev@xml.apache.org'
> Cc: 'xerces-j-user@xml.apache.org'
> Subject: BaseMarkupSerializer bug
>
> Hi,
>
> xerces-1_1_3, BaseMarkupSerializer.characters(char[], int, int)
> inserts escape sequence "]]<![CDATA[" for embedded string
> pattern "]]>", at the wrong location.
> This results in incorrect XML data serialization from the DOM.
>
> I Have proposed a fix in this mail.
>
> Xerces version : 1.1.3
> JDK version : 1.3
>
> I had a requirement of serializing the DOM conforming to the
> following DTD.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!ELEMENT Sample (Id, Messages+)>
> <!ELEMENT Id (#PCDATA)>
> <!ELEMENT Messages (MsgId, MsgDesc?, Msg)>
> <!ELEMENT MsgId (#PCDATA)>
> <!ELEMENT MsgDesc (#PCDATA)>
> <!ELEMENT Msg (#PCDATA)>
>
> Xml file conforming to this dtd may be
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE Sample SYSTEM "Sample.dtd">
> <Sample>
> <Id>Doc 1</Id>
> <Messages>
> <MsgId>Msg 1</MsgId>
> <MsgDesc>Testing document</MsgDesc>
> <Msg><![CDATA[This is a test message having patterns ]]>. This message
> may cotain multiple occurrences of patterns ]]>. The End]]></Msg>
> </Messages>
> </Sample>
>
> In the above mentioned DTD, 'Msg' element value will be a
> CDATA section. This element value may contain the string "]]>"
> embedded in it(as shown in the saple xml document above).
> BaseMarkupSerializer identifies this pattern and
> escapes it by prepending the string "<![CDATA[", to "]]>". But the
> code logic for escaping seems to have a bug.
>
> Original source from
> Xerces-1_1_3\src\org\apache\xml\serialize\BaseMarkupSerializer
> (Lines 457~491)
> *********************************************************
> public void characters( char[] chars, int start, int length )
> {
> ElementState state;
>
> state = content();
> // Check if text should be print as CDATA section or unescaped
> // based on elements listed in the output format (the element
> // state) or whether we are inside a CDATA section or entity.
>
> if ( state.inCData || state.doCData ) {
> int saveIndent;
>
> // Print a CDATA section. The text is not escaped, but ']]>'
> // appearing in the code must be identified and dealt with.
> // The contents of a text node is considered space
> // preserving.
> if ( ! state.inCData ) {
> _printer.printText( "<![CDATA[" );
> state.inCData = true;
> }
> saveIndent = _printer.getNextIndent();
> _printer.setNextIndent( 0 );
> for ( int index = 0 ; index < length ; ++index ) {
> if ( index + 2 < length && chars[ index ] == ']' &&
> chars[ index + 1 ] == ']' &&
> chars[ index + 2 ] == '>') {
>
> printText( chars, start, index + 2, true, true );
> _printer.printText( "]]><![CDATA[" );
> start += index + 2;
> length -= index + 2;
> index = 0;
> }
> }
> if ( length > 0 )
> printText( chars, start, length, true, true );
> _printer.setNextIndent( saveIndent );
> *************************************************************
> Proposed changes for the above block
>
> public void characters( char[] chars, int start, int length )
> {
> ElementState state;
>
> state = content();
> // Check if text should be print as CDATA section or unescaped
> // based on elements listed in the output format (the element
> // state) or whether we are inside a CDATA section or entity.
>
> if ( state.inCData || state.doCData ) {
> int saveIndent;
> int index = 0;
> int endIndex = 0;
>
> // Print a CDATA section. The text is not escaped, but ']]>'
> // appearing in the code must be identified and dealt with.
> // The contents of a text node is considered space
> // preserving.
> if ( ! state.inCData ) {
> _printer.printText( "<![CDATA[" );
> state.inCData = true;
> }
> saveIndent = _printer.getNextIndent();
> _printer.setNextIndent( 0 );
> endIndex = start + length;
> for ( index = start ; index < endIndex ; ++index ) {
> if ( index + 2 < endIndex && chars[ index ] == ']' &&
> chars[ index + 1 ] == ']' &&
> chars[ index + 2 ] == '>') {
>
> printText( chars, start, index + 2 - start,
> true, true);
> _printer.printText( "]]><![CDATA[" );
> start = index + 2;
> index = start;
> }
> }
> if ( index > start )
> printText( chars, start, index-start, true, true );
> _printer.setNextIndent( saveIndent );
> ********************************************************************
>
> NOTE : However this fix does not handle the case when the string
> pattern "]]>" does not fall within the buffer boundary.
> This might require more changes.
>
> I checked the source for Xerces-1_2_3 and observed that this bug is
> not fixed yet. Moreover I couldn't find mails discussing this problem/fix
> in
> 'xerces-j-dev'/'xerces-j-user' mailing list.
> I don't know whether this bug has been already identified by the
> development team or not.
>
> Would appreciate, if someone familiar with the code can verify the
> bug and baseline the changes. Would be glad to provide more
> information, in this regard.
>
> Thanks,
> -sripathy
>
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org