You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by ji...@apache.org on 2004/06/10 18:45:18 UTC

[jira] Updated: (XERCESC-1226) Parser reports bogus content when parsing

The following issue has been updated:

    Updater: David Bertoni (mailto:david_n_bertoni@us.ibm.com)
       Date: Thu, 10 Jun 2004 9:44 AM
    Comment:
XML document to reproduce the problem.
    Changes:
             Attachment changed to test1.xml
    ---------------------------------------------------------------------
For a full history of the issue, see:

  http://issues.apache.org/jira/browse/XERCESC-1226?page=history

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Thu, 10 Jun 2004 9:44 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org