You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Dean Roddey <dr...@portal.com> on 2001/06/01 00:10:59 UTC

RE: Bug in scanner

No. All those just fall through, on the assumption that the error has
already been reported via the error handler.

--------------
Dean Roddey
Software Geek Extraordinaire
Portal, Inc
droddey@portal.com



-----Original Message-----
From: peiyongz@ca.ibm.com [mailto:peiyongz@ca.ibm.com]
Sent: Thursday, May 31, 2001 11:32 AM
To: xerces-c-dev@xml.apache.org
Subject: RE: Bug in scanner


Dean,

     I noticed if the parser see "<?xmlfoo ...", XMLScanner::ScanPI(), will
    emitError(XMLErrs::UnterminatedPI); which is captured by
XMLScanner::scanDocument()'s

    catch(const XMLErrs::Codes)
    {
        // This is a 'first fatal error' type exit, so reset and fall
through
        fReaderMgr.reset();
    }

   Shall we have a throw after the reset()?

Regards,

Peiyong Zhang
____________________________________________
XML Parsers Development
IBM Toronto Laboratory email: peiyongz@ca.ibm.com
Phone: (416)448-4088; Fax: (416)448-4414; T/L: 778-4088



Dean Roddey <dr...@portal.com> on 05/30/2001 06:54:01 PM

Please respond to xerces-c-dev@xml.apache.org

To:   "'xerces-c-dev@xml.apache.org'" <xe...@xml.apache.org>
cc:
Subject:  RE: Bug in scanner


That is probably true, and it would be my mistake. It probably works in
99.9% of all cases, which is why its not been caught. But yes, you could
have <?xml[tab]version="1.0"?> and that would indeed fail. So it should
skip
just the <?xml and then check explicitly for whitespace. I assume that this
would be a trivial fix becuase it already knows that more whitespace can
follow that initial trailing one so the code is already there to eat any
subsequent whitespace. The primary reason for doing that check was to make
sure that its not something like "<?xmlfoo", which is a completely
different
thing. So it still needs to do a check for at least one whitespace char and
eat any others.

--------------
Dean Roddey
Software Geek Extraordinaire
Portal, Inc
droddey@portal.com



-----Original Message-----
From: Chris Hill [mailto:chill@wolfram.com]
Sent: Wednesday, May 30, 2001 2:24 PM
To: xerces-c-dev@xml.apache.org
Subject: Bug in scanner


(Bugzilla appears to be down)

In XMLScanner::scanProlog(), to detect an xml decl, the following code is
used:

fReaderMgr.skippedString(XMLUni::fgXMLDeclStringSpace)

I believe this is incorrect.  If there is a whitespace other than the space

character 0x20 following "<?xml", it will not be recognized as the
beginning of an xml decl. (The other valid white space characters are 0x9
0xA 0xD)

A test case would be (with no trailing space on the first line)
<?xml
version='1.0'?><a/>

XMLUni::fgXMLDeclStringSpace seems like a dangerous thing to have around, I

don't know if it is used anywhere else, but it probably shouldn't be.

Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org