You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by ji...@apache.org on 2004/06/10 19:27:10 UTC
[jira] Commented: (XERCESC-1226) Parser reports bogus content when parsing
The following comment has been added to this issue:
Author: David Bertoni
Created: Thu, 10 Jun 2004 10:25 AM
Body:
Looking further into the code, I don't believe the problem is related to the table, since "]" is marked as a special character because it might close a CDATA section.
Instead, I believe the bug is here in IGXMLScanner2.cpp, line 2791:
if (secondCh)
toUse.append(secondCh);
The variable secondCh needs to be reset to 0 after the call to toUse::append(). Otherwise, it is appended every loop iteration, until something else resets it. There is similar code in basicAttrValueScan() and scanAttValue().
I'll attach a proposed patch.
---------------------------------------------------------------------
View this comment:
http://issues.apache.org/jira/browse/XERCESC-1226?page=comments#action_36019
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1226
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1226
Summary: Parser reports bogus content when parsing
Type: Bug
Status: Unassigned
Priority: Major
Project: Xerces-C++
Components:
SAX/SAX2
Versions:
Nightly build (please specify the date)
Assignee:
Reporter: David Bertoni
Created: Thu, 10 Jun 2004 9:42 AM
Updated: Thu, 10 Jun 2004 10:25 AM
Environment: All platforms
Description:
When parsing the following document, the parser reports garbage characters.
<?xml version="1.0"?>
<subject>Research [𝔸]rticle</subject>
I traced this down to this function in XMLReader, starting on line 612:
inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}
Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content. This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.
When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org