You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Kurt Werle <kw...@webwarecorp.com> on 2000/08/24 02:09:51 UTC

BUG: SAX parser abusive of memory

The SAX parser appears to use more memory for every byte it reads from a
stream.  This seems like a serious problem, and counter to the design
(from what I can tell).  When reading a stream of XML that contains only
one item with large contents, the SAX parser will blow the JVM's memory
(under JDK 1.[1,2,3] on various platforms).

java.lang.OutOfMemoryError
        at org.apache.xerces.readers.UTF8Reader.fillCurrentChunk(UTF8Reader.java:2725)
        at org.apache.xerces.readers.UTF8Reader.slowLoadNextByte(UTF8Reader.java:152)
        at org.apache.xerces.readers.UTF8Reader.copyAsciiCharData(UTF8Reader.java:2578)
        at org.apache.xerces.readers.UTF8Reader.scanContent(UTF8Reader.java:2341)
        at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1145)
        at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:380)
        at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:861)
        at MemHog.main(MemHog.java:36)

Is this a known bug?

Please download http://frsvnsvn.san-mateo.ca.us/kurt/MemHog.tar.gz
tar xfvz MemHog.tar.gz
cd MemHog
#assumes you have a great classpath with xerces.jar and . in it
make
#GENERATE A 100 MEG FILE
perl ./GenerateData.pl 100 > 100MegFile.xml
java MemHog < 100MegFile.xml

The following is the MemHog program.  The DocumentHandler just prints some
debugging notes.
---
import java.io.*;
import org.xml.sax.*; 
import org.apache.xerces.parsers.*; 

/**
 * MemHog.java
 *
 *
 * Created: Tue Aug 22 16:25:59 2000
 *
 * @author <a href="mailto: "Kurt Werle</a>
 * @version
 */

public class MemHog 
{
    public MemHog ()
    {
	
    }

    public static void main(String[] args)
    {
	MemHog myMemHog = new MemHog();
	InputStream myInputStream = System.in;
	OutputStream myOutputStream = System.out;
	SimpleDocumentHandler mySimpleDocumentHandler = new SimpleDocumentHandler();
	SAXParser mySAXParser = new SAXParser();
	InputSource myInputSource = new InputSource(myInputStream);

	mySAXParser.setDocumentHandler(mySimpleDocumentHandler);
		
	try
	    {
		mySAXParser.parse(myInputSource);
	    }
	catch (Exception e)
	    {
		System.out.println("Caught Exception:  " + e.toString());
	    } 
	
    } 
}// MemHog
---