You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Jesse Pelton (JIRA)" <xe...@xml.apache.org> on 2007/08/08 18:02:59 UTC

[jira] Commented: (XERCESC-1385) UTF8 parse failure when there's a bom in the utf8 header

    [ https://issues.apache.org/jira/browse/XERCESC-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518485 ] 

Jesse Pelton commented on XERCESC-1385:
---------------------------------------

This was fixed after 2.7 was released.

The relevant commit appears to be 381685; see http://svn.apache.org/viewvc?view=rev&revision=381685

> UTF8 parse failure when there's a bom in the utf8 header
> --------------------------------------------------------
>
>                 Key: XERCESC-1385
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1385
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: SAX/SAX2
>    Affects Versions: 2.6.0
>         Environment: OSX, CodeWarrior 9.4
>            Reporter: Miklos Fazekas
>         Attachments: test.cpp
>
>
> This issue probably related to 1284. (Or a duplicate of it)
> The attached sample code failes with Xerces2.6. 
> The problem seems to be that there's double checing for the utf8 bom. Bellow is a patch to XMLParser.cpp that resolves this issue. [ The bug is that we've already detected utf8 bom and modified fRawBufIndex, but the seconds check doesn't takes it into accout. ]
> src/xercesc/internal/XMLReader.cpp
> @@ -544,7 +544,7 @@
>              }
>              // If there's a utf-8 BOM  (0xEF 0xBB 0xBF), skip past it.
>              else {
> -                const char* asChars = (const char*)fRawByteBuf;
> +                const char* asChars = (const char*)(fRawByteBuf + fRawBufIndex);
>                  if ((fRawBytesAvail > XMLRecognizer::fgUTF8BOMLen )&&
>                      (XMLString::compareNString(  asChars
>                      , XMLRecognizer::fgUTF8BOM
> It's also possible that we should check if we detected an utf8 bom already as the following code would probably allow a double utf8 bom.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org