You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by ji...@apache.org on 2004/06/10 19:27:10 UTC

[jira] Commented: (XERCESC-1226) Parser reports bogus content when parsing

The following comment has been added to this issue:

     Author: David Bertoni
    Created: Thu, 10 Jun 2004 10:25 AM
       Body:
Looking further into the code, I don't believe the problem is related to the table, since "]" is marked as a special character because it might close a CDATA section.

Instead, I believe the bug is here in IGXMLScanner2.cpp, line 2791:

if (secondCh)
    toUse.append(secondCh);

The variable secondCh needs to be reset to 0 after the call to toUse::append().  Otherwise, it is appended every loop iteration, until something else resets it.  There is similar code in basicAttrValueScan() and scanAttValue().

I'll attach a proposed patch.
---------------------------------------------------------------------
View this comment:
  http://issues.apache.org/jira/browse/XERCESC-1226?page=comments#action_36019

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Thu, 10 Jun 2004 10:25 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org