You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Jan Tošovský (JIRA)" <xe...@xml.apache.org> on 2017/05/06 21:45:04 UTC

[jira] [Commented] (XERCESJ-1653) Memory leak with validating SAX Parser

    [ https://issues.apache.org/jira/browse/XERCESJ-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999592#comment-15999592 ] 

Jan Tošovský commented on XERCESJ-1653:
---------------------------------------

This seems to be rather a design issue than the leak, see the logic in the handleStartElement() method in the org.apache.xerces.impl.dtd.XMLDTDValidator class (lines 1973-1997).
In the fElementChildren array there are all children for the particular element stored as QName objects. So if the number of children is huge, it takes the large amount of memory. In your particular case it would be handy to store QNames in a helper Map and in the fElementChildren array to store just its Integer key (QName hashCode?). However, this would require the reverse procedure when passing the fElementChildren array to the validation step, see the handleEndElement() method in the same class (line 2027).

> Memory leak with validating SAX Parser
> --------------------------------------
>
>                 Key: XERCESJ-1653
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1653
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: SAX
>    Affects Versions: 2.11.0
>         Environment: Windows 7 Enterprise, JDK 1.8.0_25
>            Reporter: Sebastian Millies
>            Priority: Critical
>         Attachments: elemlist.dtd, SAXMemoryUsage.java, SAXMemoryUsage.log, _subelem.dtd
>
>
> I'm parsing a very large XML file with org.apache.xerces.parsers.SAXParser and validation turned on. The file contains 25 million elements of the form specified in the attached DTD's, in total it is ca. 7 GB large. 
> Heap monitoring with jvisualvm shows millions of QName instances being cached and not being garbage collected.
> Turning off validation makes the problem disappear. 
> I have tested a numer of other parsers (Crimson, Aelfred2, Resin, Woodstox). With Woodstox, for example, I can process my 7 GB file (including validation) with just 64MB of heap. With Xerces, 1024MB of heap do not suffice. 
> I'll attach a small diagnosis program (SAXMemoryUsage.java) that shows that Xerces heap consumption increases inordinately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org