You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by bu...@apache.org on 2003/11/10 05:01:12 UTC
DO NOT REPLY [Bug 897] -
Memory leak reading large XML-files with SAX parser
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=897>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=897
Memory leak reading large XML-files with SAX parser
------- Additional Comments From anewman@pisoftware.com 2003-11-10 04:01 -------
This is still a problem in the lastest version of Xerces (2.5). The number
"java.io.StringReader" increases until it runs out of memory - they are never
able to be garbage collected.
Here's some sample RDF/XML:<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rdf:RDF [
<!ENTITY math "http://kowari.org/math#">
<!ENTITY owl "http://www.w3.org/2002/07/owl#">
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">
]>
<rdf:RDF xmlns:math ="&math;"
xmlns:owl ="&owl;"
xmlns:rdf ="&rdf;"
xmlns:rdfs ="&rdfs;">
<rdf:Description>
<owl:sameIndividualAs rdf:datatype="&xsd;integer">14</owl:sameIndividualAs>
<rdfs:label xml:lang="en">fourteen</rdfs:label>
<math:roman>XIV</math:roman>
<math:square rdf:datatype="&xsd;integer">196</math:square>
<math:primeFactorization>
<rdf:Bag>
<rdf:li rdf:datatype="&xsd;integer">2</rdf:li>
<rdf:li rdf:datatype="&xsd;integer">7</rdf:li>
</rdf:Bag>
</math:primeFactorization>
</rdf:Description>
<rdf:Description>
<owl:sameIndividualAs rdf:datatype="&xsd;integer">15</owl:sameIndividualAs>
<rdfs:label xml:lang="en">fifteen</rdfs:label>
<math:roman>XV</math:roman>
<math:square rdf:datatype="&xsd;integer">225</math:square>
<math:primeFactorization>
<rdf:Bag>
<rdf:li rdf:datatype="&xsd;integer">3</rdf:li>
<rdf:li rdf:datatype="&xsd;integer">5</rdf:li>
</rdf:Bag>
</math:primeFactorization>
<rdf:type rdf:resource="&math;TriangularNumber"/>
</rdf:Description>
</rdf:RDF>
When you inline all of the references, then it only ever has 4 objects
allocated. For example:
<rdf:Description>
<owl:sameIndividualAs
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">14</owl:sameIndividualAs>
<rdfs:label xml:lang="en">fourteen</rdfs:label>
<math:roman>XIV</math:roman>
<math:square
rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">196</math:square>
<math:primeFactorization>
<rdf:Bag>
<rdf:li rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">2</rdf:li>
<rdf:li rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">7</rdf:li>
</rdf:Bag>
</math:primeFactorization>
</rdf:Description>
Here's a report from Optimize It after parsing a large amount of this XML:
2509 instances of java.io.StringReader allocated.
100.0% org.apache.xerces.impl.XMLEntityManager.startEntity()
100.0% org.apache.xerces.impl.XMLScanner.scanAttributeValue()
100.0%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute()
100.0%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement()
99.84%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch()
99.84%
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument()
99.84% org.apache.xerces.parsers.DTDConfiguration.parse()
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org