You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Vasantha Crabb (JIRA)" <xe...@xml.apache.org> on 2008/08/13 09:10:44 UTC

[jira] Created: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Parser reports error when comment tag is broken up by BinInputStream
--------------------------------------------------------------------

                 Key: XERCESC-1827
                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
             Project: Xerces-C++
          Issue Type: Bug
          Components: Non-Validating Parser
    Affects Versions: 2.8.0
         Environment: Solaris (SunOS 5.10) i386 SunPRO
cygwin i386 GCC
            Reporter: Vasantha Crabb


If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "cargilld (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631024#action_12631024 ] 

cargilld commented on XERCESC-1827:
-----------------------------------

Hi,
I don't have any objections to David B's fix.  I will attach the xml document with long names I used for testing to this jira bug.

David

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Updated: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "cargilld (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

cargilld updated XERCESC-1827:
------------------------------

    Attachment: long.xml

An xml document with a really long name.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631863#action_12631863 ] 

David Bertoni commented on XERCESC-1827:
----------------------------------------

This illustrates that we are suffering from communication problems.  If you and Alberto are discussing issues, you should be doing it publicly on the developer list.  Also, if you feel a bug really needs a better fix, then post a message that states that.

You posted the following to the mailing list:

"Unless we uncover a serious issue and another beta cycle is required, please limit your changes to important bug fixes only. In particular, do not add new features, modify interfaces, or make any major changes. This excludes documentation which requires a major cleanup."

Clearly, we need to have a better understanding of what "important" means, because I assume that means critical fixes, including serious regressions.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631814#action_12631814 ] 

Boris Kolpackov commented on XERCESC-1827:
------------------------------------------

Sorry, I didn't know you had the "proper" fix ready and I am on a tight schedule to release 3.0.0.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631074#action_12631074 ] 

Boris Kolpackov commented on XERCESC-1827:
------------------------------------------

Thanks, David. David B, will you be able to commit your fix along with the test in the next day or two?

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631710#action_12631710 ] 

Boris Kolpackov commented on XERCESC-1827:
------------------------------------------

I and Alberto did some more thinking on David B's committed patch and we came to the conclusion that there were still a number of problems in this function. I therefore split this function into two: one for short strings (less than kCharBufSize) in which case fCharIndex is preserved and one for long strings in which case fCharIndex is not preserved.

I am keeping this bug open since It would be a good idea to add the test case (new-test.cc) to the test suite after the release.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622636#action_12622636 ] 

David Bertoni commented on XERCESC-1827:
----------------------------------------

Hi Dave,

Great mind think alike!  I came up with a similar solution, but your's has a bug in it (just like mine did).  As a sort of "stress test," I modified the test program to slice the entire document into 1 byte reads.

Because you're not updating fCharIndex in the loop, charsLeftInBuffer() will return the count of all of the characters in the buffer since you entered the loop.  More than one iteration of the loop triggers the bug.

I made the following modification:

        if (srcLen < kCharBufSize/4) {
            XMLSize_t saveCharsLeft = charsLeft;
            //fCharIndex += charsLeft;
    
            XMLSize_t offset = charsLeft;
            XMLSize_t remainingLen = srcLen - charsLeft;

            while (remainingLen > 0) {
                refreshCharBuffer();
                // Subtract the offset to get the number of new characters in the buffer.
                charsLeft = charsLeftInBuffer() - offset;
                if (charsLeft == 0)
                  return false; // error situation
                if (charsLeft > remainingLen)
                    charsLeft = remainingLen;
                if (XMLString::compareNString(&fCharBuf[fCharIndex+saveCharsLeft], toSkip+offset, charsLeft))
                    return false;
                offset += charsLeft;
                remainingLen -= charsLeft;
                saveCharsLeft += charsLeft;
            }
            fCharIndex += saveCharsLeft;

Note that I'm subtracting "offset" from the return value of charsLeftInBuffer().

However, I still think we're hacking up what should be a simple routine to accomodate two different usages.

I agree with you about the performance advantage of having an overload that takes a pre-calculated length.  I'll add that to the list of things to do when cleaning this up.

I'll attached my updated version of the test program.  It would be great if you could test to verify that my fix also works with your tests. I think this is the safest way to move forward for the 3.0 release.  I can work on updating the code to split the two functions up for the 3.1 release.  Could you make sure your document with the long element names is in the test buckets?  I will work with the reporter to make sure we can include his code in our tests as well.

Finally, does everyone agree we should proceed with this fix for 3.0?

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632811#action_12632811 ] 

Boris Kolpackov commented on XERCESC-1827:
------------------------------------------

Let me give a bit of a background on how this fix came about. After you committed your fix I asked Alberto to review it (and was about to do the same myself) since were were going to release this fix in the final version without giving the community much chance to test it. Once I started studying the function I had a really hard time understanding what's going on there. I also saw a number of potential problems. So I consulted Alberto via IM to see if he has a better understanding about what's going on. I quickly came to the conclusion that I am not comfortable releasing that code in this state so I decided to reimplement it as two separate functions. The whole thing took probably 30-40 minutes.

I agree using private IM like this is not ideal. However discussing small things like this on mailing lists and/or Jira takes too much time (what have taken half an hour over IM would have probably taken several days considering that a lot of people are living in different time zones). Perhaps we should setup an IRC channel for Xerces-C++ development with public archives. I wonder if there is a Apache-recommended way to do this?

As for the limits to changes, as a release manager I have some leeway in deciding which changes go in and which not. After all, if I make a bad call I will be the one releasing 3.0.1/3.1.0 to fix it.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Updated: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Vasantha Crabb (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vasantha Crabb updated XERCESC-1827:
------------------------------------

    Attachment: test.cc

This test case reproduces the issue.  It works with Xerces 2.7 and Xerces 2.5 but fails with Xerces 2.8.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>         Attachments: test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Resolved: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Alberto Massari (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alberto Massari resolved XERCESC-1827.
--------------------------------------

    Resolution: Fixed

The DOMTest now parses a sample XML with the input source that uses different chunks sizes

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "cargilld (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622559#action_12622559 ] 

cargilld commented on XERCESC-1827:
-----------------------------------

How about changing skipString as follows:
    else {
        if (charsLeft == 0) {
            refreshCharBuffer();
            charsLeft = charsLeftInBuffer();
            if (charsLeft == 0)
                return false; // error situation
        }
        if (XMLString::compareNString(&fCharBuf[fCharIndex], toSkip, charsLeft))
            return false;

        // the remaining characters of toSkip could fail so we don't want to
        // advance fCharIndex unless we have to.
        // the majority of the calls to this routine are for constants stringed
        // defined in mainly XMLUni.cpp and all the strings that call it are less
        // than 10 characters and it could be possible that the above comparison
        // passes but one of the next one will fail and that is why we don't want
        // to update fCharIndex.  The other possibility is that it is called for
        // the matching endtag and the string could be really long, even longer
        // than the buffer which forces us to advance the fCharIndex position.
        if (srcLen < kCharBufSize/4) {            
            unsigned int saveCharsLeft = charsLeft;
            //fCharIndex += charsLeft;
    
            unsigned int offset = charsLeft;
            unsigned int remainingLen = srcLen - charsLeft;

            while (remainingLen > 0) {
                refreshCharBuffer();
                charsLeft = charsLeftInBuffer();
                if (charsLeft == 0)
                  return false; // error situation
                if (charsLeft > remainingLen)
                    charsLeft = remainingLen;
                if (XMLString::compareNString(&fCharBuf[fCharIndex+saveCharsLeft], toSkip+offset, charsLeft))
                    return false;
                offset += charsLeft;
                remainingLen -= charsLeft;
                saveCharsLeft += charsLeft;
            }
            fCharIndex += saveCharsLeft;

        }
        else {
            // a really long name            
            fCharIndex += charsLeft;
    
            unsigned int offset = charsLeft;
            unsigned int remainingLen = srcLen - charsLeft;

            while (remainingLen > 0) {
                refreshCharBuffer();
                charsLeft = charsLeftInBuffer();
                if (charsLeft == 0)
                  return false; // error situation
                if (charsLeft > remainingLen)
                    charsLeft = remainingLen;
                if (XMLString::compareNString(&fCharBuf[fCharIndex], toSkip+offset, charsLeft))
                    return false;
                offset += charsLeft;
                remainingLen -= charsLeft;
                fCharIndex += charsLeft;
            }            
        }
    }

    // Add the source length to the current column to get it back right
    fCurCol += srcLen;   

    return true;
}

I tested it with the program attached as well as the testcase with the really really long name and both work.  The kCharBufSize/4 was just a guess for a value.  Having a skippedEndString that advances the buffer might be cleaner.  I also noticed that the first line of this function is 
const unsigned int srcLen = XMLString::stringLen(toSkip);

Since the majority of the places calling this routine are passing in XMLUni::fgxxx strings we could pass in the length of the string to improve performance.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631790#action_12631790 ] 

David Bertoni commented on XERCESC-1827:
----------------------------------------

Yes, you can see from the comments that Dave C. and I made above, that we came to the conclusion the function needed to be split, and I had already done that work.

Can we please coordinate our efforts better in the future?  It's not efficient to have multiple people doing the same work. -- there's plenty of work to do as it is. 

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622409#action_12622409 ] 

David Bertoni commented on XERCESC-1827:
----------------------------------------

OK, I've discovered that this regression was introduced in revision when 344347, when Dave C. checked in a fix for handling long element names that exceed the size of the XMLReader buffer.  Unfortunately, this fix introduced a bug that consumes characters in the buffer in this case.

As a result, a subsequent call to XMLReader::skipString() will not work, because some of the characters in the previous call will have been consumed.  This breaks the code in XMLScanner::senseNextToken():

            static const XMLCh gCDATAStr[] =
            {
                    chBang, chOpenSquare, chLatin_C, chLatin_D, chLatin_A
                ,   chLatin_T, chLatin_A, chNull
            };

            static const XMLCh gCommentString[] =
            {
                chBang, chDash, chDash, chNull
            };

            if (fReaderMgr.skippedString(gCDATAStr))
                return Token_CData;

            if (fReaderMgr.skippedString(gCommentString))
                return Token_Comment;

My suggestion is that we stop overloading skipString() to do both skipping and consuming.  In the case of "skipping" an element tag name, we really want to consume the string, since we must find the element tag name at that point in the buffer if the entity is well-formed.  This may affect out ability to treat the this is a recoverable error, but it's no worse than the current situation, where we've consumed the characters and can't "unconsume" them. This will require also that we place a limit on the length of a string that we can skip, since we can't skip a string that's larger than XMLReader::kCharBufSize, which is currently 16384.

The other possibility is that we make it possible for an XMLReader to buffer a potentially unlimited number of characters, so we can always "unconsume" everything.  I don't think we want to go down that path, because it will affect performance and complicate the code.

Note that this bug doesn't just affect XMLScanner::senseNextToken(), as there is other code that relies on the "lookahead" behavior of skippedString().

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622319#action_12622319 ] 

David Bertoni commented on XERCESC-1827:
----------------------------------------

I can take a look at this, if no one else is already.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Updated: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Boris Kolpackov updated XERCESC-1827:
-------------------------------------

    Fix Version/s: 2.9.0
                   3.0.0

I think this is serious enough that we should try to fix it for 3.0.0.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "Boris Kolpackov (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630954#action_12630954 ] 

Boris Kolpackov commented on XERCESC-1827:
------------------------------------------

I would like to release 3.0.0 by the end of this week and we definitely need to fix this for 3.0.0. David C, do you have any comments/objections to the fix proposed by David B?



> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631159#action_12631159 ] 

David Bertoni commented on XERCESC-1827:
----------------------------------------

I will try to get to this sometime tonight or tomorrow.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: long.xml, new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Updated: (XERCESC-1827) Parser reports error when comment tag is broken up by BinInputStream

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Bertoni updated XERCESC-1827:
-----------------------------------

    Attachment: new-test.cc

Updated test that slices the document into 1 byte reads.

> Parser reports error when comment tag is broken up by BinInputStream
> --------------------------------------------------------------------
>
>                 Key: XERCESC-1827
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1827
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 2.8.0
>         Environment: Solaris (SunOS 5.10) i386 SunPRO
> cygwin i386 GCC
>            Reporter: Vasantha Crabb
>             Fix For: 3.0.0, 2.9.0
>
>         Attachments: new-test.cc, test.cc
>
>
> If the BinInputStream delivers data to the parser in such a way that the '<', '!' and '--' at the start of an XML comment are delivered separately, the parser will report the error 'Expected comment or CDATA'.  The problem is reproducible under Xerces 2.8 on both Solaris i386 with the SunPRO C++ compiler and cygwin i386 with the GNU C++ compiler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org