You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by ji...@apache.org on 2004/06/10 18:44:10 UTC

[jira] Created: (XERCESC-1226) Parser reports bogus content when parsing

Message:

  A new issue has been created in JIRA.

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Thu, 10 Jun 2004 9:42 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


[jira] Updated: (XERCESC-1226) Parser reports bogus content when parsing

Posted by ji...@apache.org.
The following issue has been updated:

    Updater: David Bertoni (mailto:david_n_bertoni@us.ibm.com)
       Date: Thu, 10 Jun 2004 10:25 AM
    Changes:
             Attachment changed to diff.txt
    ---------------------------------------------------------------------
For a full history of the issue, see:

  http://issues.apache.org/jira/browse/XERCESC-1226?page=history

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Thu, 10 Jun 2004 10:25 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


[jira] Resolved: (XERCESC-1226) Parser reports bogus content when parsing

Posted by xe...@xml.apache.org.
Message:

   The following issue has been resolved as FIXED.

   Resolver: Alberto Massari
       Date: Tue, 6 Jul 2004 8:55 AM

A fix is in CVS. Please verify.

Alberto
---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Resolved
   Priority: Major
 Resolution: FIXED

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Tue, 6 Jul 2004 8:55 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


[jira] Commented: (XERCESC-1226) Parser reports bogus content when parsing

Posted by ji...@apache.org.
The following comment has been added to this issue:

     Author: David Bertoni
    Created: Thu, 10 Jun 2004 10:25 AM
       Body:
Looking further into the code, I don't believe the problem is related to the table, since "]" is marked as a special character because it might close a CDATA section.

Instead, I believe the bug is here in IGXMLScanner2.cpp, line 2791:

if (secondCh)
    toUse.append(secondCh);

The variable secondCh needs to be reset to 0 after the call to toUse::append().  Otherwise, it is appended every loop iteration, until something else resets it.  There is similar code in basicAttrValueScan() and scanAttValue().

I'll attach a proposed patch.
---------------------------------------------------------------------
View this comment:
  http://issues.apache.org/jira/browse/XERCESC-1226?page=comments#action_36019

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Thu, 10 Jun 2004 10:25 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


[jira] Updated: (XERCESC-1226) Parser reports bogus content when parsing

Posted by ji...@apache.org.
The following issue has been updated:

    Updater: David Bertoni (mailto:david_n_bertoni@us.ibm.com)
       Date: Thu, 10 Jun 2004 9:44 AM
    Comment:
XML document to reproduce the problem.
    Changes:
             Attachment changed to test1.xml
    ---------------------------------------------------------------------
For a full history of the issue, see:

  http://issues.apache.org/jira/browse/XERCESC-1226?page=history

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Thu, 10 Jun 2004 9:44 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org