You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by ji...@apache.org on 2004/06/10 18:44:10 UTC
[jira] Created: (XERCESC-1226) Parser reports bogus content when parsing
Message:
A new issue has been created in JIRA.
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1226
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1226
Summary: Parser reports bogus content when parsing
Type: Bug
Status: Unassigned
Priority: Major
Project: Xerces-C++
Components:
SAX/SAX2
Versions:
Nightly build (please specify the date)
Assignee:
Reporter: David Bertoni
Created: Thu, 10 Jun 2004 9:42 AM
Updated: Thu, 10 Jun 2004 9:42 AM
Environment: All platforms
Description:
When parsing the following document, the parser reports garbage characters.
<?xml version="1.0"?>
<subject>Research [𝔸]rticle</subject>
I traced this down to this function in XMLReader, starting on line 612:
inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}
Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content. This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.
When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
[jira] Updated: (XERCESC-1226) Parser reports bogus content when parsing
Posted by ji...@apache.org.
The following issue has been updated:
Updater: David Bertoni (mailto:david_n_bertoni@us.ibm.com)
Date: Thu, 10 Jun 2004 10:25 AM
Changes:
Attachment changed to diff.txt
---------------------------------------------------------------------
For a full history of the issue, see:
http://issues.apache.org/jira/browse/XERCESC-1226?page=history
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1226
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1226
Summary: Parser reports bogus content when parsing
Type: Bug
Status: Unassigned
Priority: Major
Project: Xerces-C++
Components:
SAX/SAX2
Versions:
Nightly build (please specify the date)
Assignee:
Reporter: David Bertoni
Created: Thu, 10 Jun 2004 9:42 AM
Updated: Thu, 10 Jun 2004 10:25 AM
Environment: All platforms
Description:
When parsing the following document, the parser reports garbage characters.
<?xml version="1.0"?>
<subject>Research [𝔸]rticle</subject>
I traced this down to this function in XMLReader, starting on line 612:
inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}
Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content. This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.
When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
[jira] Resolved: (XERCESC-1226) Parser reports bogus content when parsing
Posted by xe...@xml.apache.org.
Message:
The following issue has been resolved as FIXED.
Resolver: Alberto Massari
Date: Tue, 6 Jul 2004 8:55 AM
A fix is in CVS. Please verify.
Alberto
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1226
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1226
Summary: Parser reports bogus content when parsing
Type: Bug
Status: Resolved
Priority: Major
Resolution: FIXED
Project: Xerces-C++
Components:
SAX/SAX2
Versions:
Nightly build (please specify the date)
Assignee:
Reporter: David Bertoni
Created: Thu, 10 Jun 2004 9:42 AM
Updated: Tue, 6 Jul 2004 8:55 AM
Environment: All platforms
Description:
When parsing the following document, the parser reports garbage characters.
<?xml version="1.0"?>
<subject>Research [𝔸]rticle</subject>
I traced this down to this function in XMLReader, starting on line 612:
inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}
Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content. This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.
When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
[jira] Commented: (XERCESC-1226) Parser reports bogus content when parsing
Posted by ji...@apache.org.
The following comment has been added to this issue:
Author: David Bertoni
Created: Thu, 10 Jun 2004 10:25 AM
Body:
Looking further into the code, I don't believe the problem is related to the table, since "]" is marked as a special character because it might close a CDATA section.
Instead, I believe the bug is here in IGXMLScanner2.cpp, line 2791:
if (secondCh)
toUse.append(secondCh);
The variable secondCh needs to be reset to 0 after the call to toUse::append(). Otherwise, it is appended every loop iteration, until something else resets it. There is similar code in basicAttrValueScan() and scanAttValue().
I'll attach a proposed patch.
---------------------------------------------------------------------
View this comment:
http://issues.apache.org/jira/browse/XERCESC-1226?page=comments#action_36019
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1226
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1226
Summary: Parser reports bogus content when parsing
Type: Bug
Status: Unassigned
Priority: Major
Project: Xerces-C++
Components:
SAX/SAX2
Versions:
Nightly build (please specify the date)
Assignee:
Reporter: David Bertoni
Created: Thu, 10 Jun 2004 9:42 AM
Updated: Thu, 10 Jun 2004 10:25 AM
Environment: All platforms
Description:
When parsing the following document, the parser reports garbage characters.
<?xml version="1.0"?>
<subject>Research [𝔸]rticle</subject>
I traced this down to this function in XMLReader, starting on line 612:
inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}
Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content. This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.
When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
[jira] Updated: (XERCESC-1226) Parser reports bogus content when parsing
Posted by ji...@apache.org.
The following issue has been updated:
Updater: David Bertoni (mailto:david_n_bertoni@us.ibm.com)
Date: Thu, 10 Jun 2004 9:44 AM
Comment:
XML document to reproduce the problem.
Changes:
Attachment changed to test1.xml
---------------------------------------------------------------------
For a full history of the issue, see:
http://issues.apache.org/jira/browse/XERCESC-1226?page=history
---------------------------------------------------------------------
View the issue:
http://issues.apache.org/jira/browse/XERCESC-1226
Here is an overview of the issue:
---------------------------------------------------------------------
Key: XERCESC-1226
Summary: Parser reports bogus content when parsing
Type: Bug
Status: Unassigned
Priority: Major
Project: Xerces-C++
Components:
SAX/SAX2
Versions:
Nightly build (please specify the date)
Assignee:
Reporter: David Bertoni
Created: Thu, 10 Jun 2004 9:42 AM
Updated: Thu, 10 Jun 2004 9:44 AM
Environment: All platforms
Description:
When parsing the following document, the parser reports garbage characters.
<?xml version="1.0"?>
<subject>Research [𝔸]rticle</subject>
I traced this down to this function in XMLReader, starting on line 612:
inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}
Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in fgCharCharsTable indicate it's not plain content. This causes the parser to misbehave badly, and deliver broken character data, including unpaired low surrogates.
When I used the debugger, and returned "true" from this function, rather than false, the parser delivered the correct character data.
---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
If you want more information on JIRA, or have a bug to report see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org