You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Dan Rosen <Da...@efi.com> on 2004/05/18 20:11:28 UTC

RE: XMLScanner::scanCharData fills XMLBuffer until out of memory

Hi,

I uploaded a patch to prevent the scanner from running out of memory last
week, but I'm new to Xerces and I don't believe I have the ability to check
the patch into CVS. Would somebody familiar with the relevant code be able to
review my patch and commit it as appropriate?

Regards,
dr

-----Original Message-----
From: jira@apache.org [mailto:jira@apache.org] 
Sent: Friday, May 14, 2004 12:32 PM
To: xerces-c-dev@xml.apache.org
Subject: [jira] Updated: (XERCESC-1207) XMLScanner::scanCharData fills
XMLBuffer until out of memory


The following issue has been updated:

    Updater: Dan Rosen (mailto:danr@efi.com)
       Date: Fri, 14 May 2004 12:31 PM
    Comment:
Oops, mangled the previous patch. Hopefully this one should apply properly.
    Changes:
             Attachment changed to inputbuffersize
    ---------------------------------------------------------------------
For a full history of the issue, see:

  http://issues.apache.org/jira/browse/XERCESC-1207?page=history

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1207

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1207
    Summary: XMLScanner::scanCharData fills XMLBuffer until out of memory
       Type: Bug

     Status: Unassigned
   Priority: Critical

    Project: Xerces-C++
 Components: 
             Non-Validating Parser
   Versions:
             2.5.0

   Assignee: 
   Reporter: Dan Rosen

    Created: Mon, 10 May 2004 10:51 AM
    Updated: Fri, 14 May 2004 12:31 PM

Description:
When parsing an XML file consisting primarily of very large (hundreds of
megabytes) blocks of contiguous character data, XMLScanner::scanCharData()
happily attempts to build a single XMLBuffer containing all the data.
Eventually the buffer becomes so large that the reallocation within
XMLBuffer::insureCapacity() fails, causing std::bad_alloc to be thrown, or a
crash in memcpy (depending on compiler). The fundamental problem seems to be
that there is no upper bound imposed on buffer length.

In the SAX model, it is acceptable to issue multiple
ContentHandler::characters() callbacks for a single contiguous block of data.
The only restriction on how this should be implemented is that all characters
in any single event must come from the same external entity; no further
behavior is specified. So it would be perfectly conformant to the SAX model
to set an upper bound on the size of a single characters() event.

(As far as I understand, allowing an upper bound in
XMLScanner::scanCharData() would not affect the DOM)

I'd propose that an upper bound for character buffer size be added as an
optional parameter (with some reasonable value as a default), either in the
constructor of the parser or in useScanner(), and that that parameter be used
to inform XMLScanner::scanCharData() when to force a call to sendCharData()
to dump the buffer to its client.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org