You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xmlbeans.apache.org by "Martin Hamel (JIRA)" <xm...@xml.apache.org> on 2005/03/31 17:09:33 UTC

[jira] Created: (XMLBEANS-135) bad handling of embeded CDATA

bad handling of embeded CDATA
-----------------------------

         Key: XMLBEANS-135
         URL: http://issues.apache.org/jira/browse/XMLBEANS-135
     Project: XMLBeans
        Type: Bug
    Versions: Version 1.0.3, Version 1.0.4, Version 2 Beta 1    
 Environment: I arrived to it on windows with jdk 1.4.2. 
    Reporter: Martin Hamel


I have a case of bad xml. It is an envelope document that includes another 
document. The parser expect the enclosed document to be in CDATA. The problem 
is that the second document now include a third document which is also 
expected to be a CDATA. 


I create document A with an XMLBean. I put it has a text element of document B 
after I transformed Document A to a string with xmlText(). I then do the same 
with document B by putting it in Document C. Everything works well and 
automatically and it creates CDATA everytime it needs to.

        //fragment
 XmlOptions options = new XmlOptions();
        options.setSavePrettyPrint();
        Field field = getAssessmentFields().addNewField();
        field.setFieldName("AssessmentContent");
        field.setFieldValue(answersDocument.xmlText(options));
  ..


The problem is that on the second escaping the CDATA end ([[>)is escaped to 
"&gt;". The SAX parser that read all this (Xalan) just can't do it. Also, the 
specification says that there should not be any CDATA containing a CDATA.

Here is the modification I made for embeded CDATA. Do you think that would be 
worty of beeing included?

here is the entitizeContent method in Saver.java:

        Pattern cdataPattern = Pattern.compile("CDATA");


        private void entitizeContent ( )
        {
            if (_lastEmitCch == 0)
                return;

            int i = _lastEmitIn;
            final int n = _buf.length;

            boolean hasOutOfRange = false;
            
            int count = 0;
            for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
            {                
                char ch = _buf[ i ];

                if (ch == '<' || ch == '&')
                    count++;
                else if (isBadChar( ch ))
                    hasOutOfRange = true;

                if (++i == n)
                    i = 0;
            }

            if (count == 0 && !hasOutOfRange)
                return;

            i = _lastEmitIn;

            //
            // Heuristic for knowing when to save out stuff as a CDATA.
            //
            
            // Well check if we have a cdata in the buffer.
            // If we do, we won't nest another one.
            CharBuffer charBuffer = CharBuffer.wrap(_buf);
            boolean hasCDATA = cdataPattern.matcher(charBuffer).find();            

            if (_lastEmitCch > 32 && count > 5 &&
                    count * 100 / _lastEmitCch > 1 && !hasCDATA)
              { 
                boolean lastWasBracket = _buf[ i ] == ']';

                i = replace( i, "<![CDATA[" + _buf[ i ] );

                boolean secondToLastWasBracket = lastWasBracket;

                lastWasBracket = _buf[ i ] == ']';

                if (++i == _buf.length)
                    i = 0;

                for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
                {
                    char ch = _buf[ i ];

                    if (ch == '>' && secondToLastWasBracket && lastWasBracket)
                        i = replace( i, "&gt;" );
                    else if (isBadChar( ch ))
                        i = replace( i, "?" );
                    else
                        i++;

                    secondToLastWasBracket = lastWasBracket;
                    lastWasBracket = ch == ']';

                    if (i == _buf.length)
                        i = 0;
                }

                emit( "]]>" );
            }
            else
            {
                for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
                {
                    char ch = _buf[ i ];

                    if (ch == '<')
                        i = replace( i, "&lt;" );
                    else if (hasCDATA && ch == '>')
                        i = replace(i, "&gt;");
                    else if (ch == '&')
                        i = replace( i, "&amp;" );
                    else if (isBadChar( ch ))
                        i = replace( i, "?" );
                    else
                        i++;

                    if (i == _buf.length)
                        i = 0;
                }
            }
        }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: dev-help@xmlbeans.apache.org


[jira] Resolved: (XMLBEANS-135) bad handling of embeded CDATA

Posted by "Radu Preotiuc-Pietro (JIRA)" <xm...@xml.apache.org>.
     [ http://issues.apache.org/jira/browse/XMLBEANS-135?page=all ]
     
Radu Preotiuc-Pietro resolved XMLBEANS-135:
-------------------------------------------

     Resolution: Fixed
    Fix Version: Version 2 Beta 2
                 Version 2
                     (was: TBD)

Implemented the simple fix I was describing on the dev@xmlbeans.apache.org mailing list. It's definitely better than what we had and I actually think it covers the issue.

> bad handling of embeded CDATA
> -----------------------------
>
>          Key: XMLBEANS-135
>          URL: http://issues.apache.org/jira/browse/XMLBEANS-135
>      Project: XMLBeans
>         Type: Bug
>     Versions: Version 1.0.3, Version 2 Beta 1, Version 1.0.4
>  Environment: I arrived to it on windows with jdk 1.4.2. 
>     Reporter: Martin Hamel
>      Fix For: Version 2 Beta 2, Version 2

>
> I have a case of bad xml. It is an envelope document that includes another 
> document. The parser expect the enclosed document to be in CDATA. The problem 
> is that the second document now include a third document which is also 
> expected to be a CDATA. 
> I create document A with an XMLBean. I put it has a text element of document B 
> after I transformed Document A to a string with xmlText(). I then do the same 
> with document B by putting it in Document C. Everything works well and 
> automatically and it creates CDATA everytime it needs to.
>         //fragment
>  XmlOptions options = new XmlOptions();
>         options.setSavePrettyPrint();
>         Field field = getAssessmentFields().addNewField();
>         field.setFieldName("AssessmentContent");
>         field.setFieldValue(answersDocument.xmlText(options));
>   ..
> The problem is that on the second escaping the CDATA end ([[>)is escaped to 
> "&gt;". The SAX parser that read all this (Xalan) just can't do it. Also, the 
> specification says that there should not be any CDATA containing a CDATA.
> Here is the modification I made for embeded CDATA. Do you think that would be 
> worty of beeing included?
> here is the entitizeContent method in Saver.java:
>         Pattern cdataPattern = Pattern.compile("CDATA");
>         private void entitizeContent ( )
>         {
>             if (_lastEmitCch == 0)
>                 return;
>             int i = _lastEmitIn;
>             final int n = _buf.length;
>             boolean hasOutOfRange = false;
>             
>             int count = 0;
>             for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
>             {                
>                 char ch = _buf[ i ];
>                 if (ch == '<' || ch == '&')
>                     count++;
>                 else if (isBadChar( ch ))
>                     hasOutOfRange = true;
>                 if (++i == n)
>                     i = 0;
>             }
>             if (count == 0 && !hasOutOfRange)
>                 return;
>             i = _lastEmitIn;
>             //
>             // Heuristic for knowing when to save out stuff as a CDATA.
>             //
>             
>             // Well check if we have a cdata in the buffer.
>             // If we do, we won't nest another one.
>             CharBuffer charBuffer = CharBuffer.wrap(_buf);
>             boolean hasCDATA = cdataPattern.matcher(charBuffer).find();            
>             if (_lastEmitCch > 32 && count > 5 &&
>                     count * 100 / _lastEmitCch > 1 && !hasCDATA)
>               { 
>                 boolean lastWasBracket = _buf[ i ] == ']';
>                 i = replace( i, "<![CDATA[" + _buf[ i ] );
>                 boolean secondToLastWasBracket = lastWasBracket;
>                 lastWasBracket = _buf[ i ] == ']';
>                 if (++i == _buf.length)
>                     i = 0;
>                 for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
>                 {
>                     char ch = _buf[ i ];
>                     if (ch == '>' && secondToLastWasBracket && lastWasBracket)
>                         i = replace( i, "&gt;" );
>                     else if (isBadChar( ch ))
>                         i = replace( i, "?" );
>                     else
>                         i++;
>                     secondToLastWasBracket = lastWasBracket;
>                     lastWasBracket = ch == ']';
>                     if (i == _buf.length)
>                         i = 0;
>                 }
>                 emit( "]]>" );
>             }
>             else
>             {
>                 for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
>                 {
>                     char ch = _buf[ i ];
>                     if (ch == '<')
>                         i = replace( i, "&lt;" );
>                     else if (hasCDATA && ch == '>')
>                         i = replace(i, "&gt;");
>                     else if (ch == '&')
>                         i = replace( i, "&amp;" );
>                     else if (isBadChar( ch ))
>                         i = replace( i, "?" );
>                     else
>                         i++;
>                     if (i == _buf.length)
>                         i = 0;
>                 }
>             }
>         }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: dev-help@xmlbeans.apache.org


[jira] Updated: (XMLBEANS-135) bad handling of embeded CDATA

Posted by "Jacob Danner (JIRA)" <xm...@xml.apache.org>.
     [ http://issues.apache.org/jira/browse/XMLBEANS-135?page=history ]

Jacob Danner updated XMLBEANS-135:
----------------------------------

    Fix Version: TBD

probably not in the v2 release

> bad handling of embeded CDATA
> -----------------------------
>
>          Key: XMLBEANS-135
>          URL: http://issues.apache.org/jira/browse/XMLBEANS-135
>      Project: XMLBeans
>         Type: Bug
>     Versions: Version 1.0.3, Version 1.0.4, Version 2 Beta 1
>  Environment: I arrived to it on windows with jdk 1.4.2. 
>     Reporter: Martin Hamel
>      Fix For: TBD

>
> I have a case of bad xml. It is an envelope document that includes another 
> document. The parser expect the enclosed document to be in CDATA. The problem 
> is that the second document now include a third document which is also 
> expected to be a CDATA. 
> I create document A with an XMLBean. I put it has a text element of document B 
> after I transformed Document A to a string with xmlText(). I then do the same 
> with document B by putting it in Document C. Everything works well and 
> automatically and it creates CDATA everytime it needs to.
>         //fragment
>  XmlOptions options = new XmlOptions();
>         options.setSavePrettyPrint();
>         Field field = getAssessmentFields().addNewField();
>         field.setFieldName("AssessmentContent");
>         field.setFieldValue(answersDocument.xmlText(options));
>   ..
> The problem is that on the second escaping the CDATA end ([[>)is escaped to 
> "&gt;". The SAX parser that read all this (Xalan) just can't do it. Also, the 
> specification says that there should not be any CDATA containing a CDATA.
> Here is the modification I made for embeded CDATA. Do you think that would be 
> worty of beeing included?
> here is the entitizeContent method in Saver.java:
>         Pattern cdataPattern = Pattern.compile("CDATA");
>         private void entitizeContent ( )
>         {
>             if (_lastEmitCch == 0)
>                 return;
>             int i = _lastEmitIn;
>             final int n = _buf.length;
>             boolean hasOutOfRange = false;
>             
>             int count = 0;
>             for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
>             {                
>                 char ch = _buf[ i ];
>                 if (ch == '<' || ch == '&')
>                     count++;
>                 else if (isBadChar( ch ))
>                     hasOutOfRange = true;
>                 if (++i == n)
>                     i = 0;
>             }
>             if (count == 0 && !hasOutOfRange)
>                 return;
>             i = _lastEmitIn;
>             //
>             // Heuristic for knowing when to save out stuff as a CDATA.
>             //
>             
>             // Well check if we have a cdata in the buffer.
>             // If we do, we won't nest another one.
>             CharBuffer charBuffer = CharBuffer.wrap(_buf);
>             boolean hasCDATA = cdataPattern.matcher(charBuffer).find();            
>             if (_lastEmitCch > 32 && count > 5 &&
>                     count * 100 / _lastEmitCch > 1 && !hasCDATA)
>               { 
>                 boolean lastWasBracket = _buf[ i ] == ']';
>                 i = replace( i, "<![CDATA[" + _buf[ i ] );
>                 boolean secondToLastWasBracket = lastWasBracket;
>                 lastWasBracket = _buf[ i ] == ']';
>                 if (++i == _buf.length)
>                     i = 0;
>                 for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
>                 {
>                     char ch = _buf[ i ];
>                     if (ch == '>' && secondToLastWasBracket && lastWasBracket)
>                         i = replace( i, "&gt;" );
>                     else if (isBadChar( ch ))
>                         i = replace( i, "?" );
>                     else
>                         i++;
>                     secondToLastWasBracket = lastWasBracket;
>                     lastWasBracket = ch == ']';
>                     if (i == _buf.length)
>                         i = 0;
>                 }
>                 emit( "]]>" );
>             }
>             else
>             {
>                 for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
>                 {
>                     char ch = _buf[ i ];
>                     if (ch == '<')
>                         i = replace( i, "&lt;" );
>                     else if (hasCDATA && ch == '>')
>                         i = replace(i, "&gt;");
>                     else if (ch == '&')
>                         i = replace( i, "&amp;" );
>                     else if (isBadChar( ch ))
>                         i = replace( i, "?" );
>                     else
>                         i++;
>                     if (i == _buf.length)
>                         i = 0;
>                 }
>             }
>         }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: dev-help@xmlbeans.apache.org