You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2006/07/28 00:44:00 UTC

Memory leak? (was: 'Possible' memory leak on fop-users)

On Jul 27, 2006, at 23:56, Andreas L Delmelle wrote:

<snip />
> I wouldn't be surprised to see a lot of these trees occur in the  
> course of the process, but if I esteem correctly, in a literal  
> snapshot, there should be only one. There is only one handler which  
> has one reference to the current block. That reference is never  
> explicitly cleared, but strictly speaking, it never needs to be  
> since it is re-used. Taking a snapshot right after FOP has  
> finished, would reveal the last one, provided that the reference  
> tree Root->AreaTreeHandler->XMLWhiteSpaceHandler has not yet been  
> completely cleared/released.

Was re-thinking this particular phrasing, and had a closer look...  
Moved it to fop-dev, because of the importance.

Firstly, this looks like a damned circular reference, indeed! That's  
my bad, sorry.

Since the reference to the last block is not released unless the  
reference to the Root's AreaTreeHandler is cleared, this keeps the  
entire ancestry alive, up to the PageSequence, which itself holds a  
reference to the Root? :| ... :(

Definitely worth a try to release the reference XMLWhiteSpaceHandler- 
 >Block as soon as possible.

OTOH, looking deeper, I'm strangely surprised no-one saw this one  
before --so surprised even that it makes me think I'm missing  
something :

Root.addChildNode(PageSequence) results in a reference to the  
PageSequence being kept in the Root's list of child nodes. Right?

AFAICT, this reference is *never* released as long as the Root object  
is alive, so it seems like currently, our 'split up in page- 
sequences' performance hint is complete and utter bogus...?

Sorry to disappoint you all.

Good news is, both are rather easily fixed --at least on the surface.

Either:
a) override addChildNode() in Root, so that the PageSequences don't  
get added to the List at all; maybe only under certain circumstances  
(unresolved forward references?) should this be needed
b) call Root.removeChild(this) in PageSequence.endOfNode()
c) call Root.removeChild() from the next PageSequence's startOfNode()

Unfortunately, I am a bit stuck in the marker-property rework ATM -- 
FOText in a marker turns out to be a little bit more difficult than  
the FObj-subclasses... Decided to take care of the dubious static  
FOText.lastFOTextProcessed in one go, so that will make a nice set of  
improvements 8)

I'll make it a priority to clear this up after that, if nobody beats  
me to it.


Cheers,

Andreas

Re: Memory leak? (was: 'Possible' memory leak on fop-users)

Posted by Karthik <ka...@yahoo.com>.
Andreas,

I ran my test cases against fop-trunk as of 08/03 and am seeing a very good
boost in performance. I profiled against the same test case that produced
loitering objects in fop 0.92beta, and did NOT find any loitering objects with
the trunk code. The memory usage also seems to be very stable and I see more
frequent garbage collections than it used to be in 0.92beta. Overall, the 
process seem to use less memory than before.

Below are some comparisons from my test environment :

Total pages processed : 12000 approx (split up as 1500 pages per pdf in a loop)

1. Memory usage  : FOP 0.92beta started of with 500MB  and went all the way 
upto 1.2 GB easily and JVM crashed after processing 5000 pages approx. The
latest version used upto a max of 750 MB (from 500 MB initial) and never went
beyond that.

2. Processing Time : 0.92 beta slowed down gradually from 4 minutes per 1500
page pdf to 15 min, when finally the JVM crashed. But the latest code took
consistently 3-4 min to produce 1500 pages.

I'm not sure if the above comparison makes sense to anyone, 
but I just wanted to report for comparisons sake.

Overall the performance is been good so far and I'll keep profiling the process
to look for any red flags.

Let me know, if you want to track any other details.

Thanks
Karthik


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Memory leak?

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Aug 9, 2006, at 22:32, Jeremias Maerki wrote:

<snip />
> Now, if we can free some objects even sooner FOP might even handle  
> files
> like the one from Luis Ferro with under 256MB one day. For that we'll
> need to know reliably when an FO is completely processed (including  
> area
> generation). That's something that we'll also need for
> page-number-citation-last, too, BTW.

Just thought about this some more, and I wonder...
When is that exactly? After LM.addAreas()? Could we make the LMs  
clean up after themselves?
After they're done, they signal the AreaTreeHandler that the FO may  
be released --it is ultimately up to the layout algorithm to decide  
if the element list and area generation would need to be repeated,  
and even then, it may suffice to keep certain state information about  
the FObj alive, rather than the whole FObj itself.

Also, we may even free objects in the FOTree sooner, as long as the  
LM references them, they won't be released anyway. We can release  
them in the FOTree before the LM itself lets go of them.

Also a requirement, or at least very convenient, to be able to free  
the FOs sooner, would be to have the LM-tree creation and  
initialization begin sooner --much sooner. Time to implement  
AreaTreeHandler.endBlock() or startFlow()? So that, by the time  
endPageSequence() occurs, the LM tree will already be constructed,  
maybe even some of the element lists could already be made partially  
available (?)



Cheers,

Andreas


Re: Memory leak?

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Aug 9, 2006, at 22:32, Jeremias Maerki wrote:

> Thank you, Karthik! That's very helpful information. Looks like  
> another
> little change making a huge difference. Cool!
>
> Now, if we can free some objects even sooner FOP might even handle  
> files
> like the one from Luis Ferro with under 256MB one day. For that we'll
> need to know reliably when an FO is completely processed (including  
> area
> generation). That's something that we'll also need for
> page-number-citation-last, too, BTW.

Another remote idea for improvement has again to do with markers.

On a given 'containing page', for each retrieve-boundary/-position/- 
class-name, there is only one qualifying area in the whole document  
(always on a preceding page). Currently the FlowLM stores the  
relevant markers in a map, so that the StaticContentLM needs to do  
little more than pick the right marker off the right map when  
resolving a RetrieveMarker.
Still, IIC, this means that, when the retrieve-markers for one page  
have all been processed, a (possibly large) number of markers remain  
unnecessarily referenced by their FO --unless in case of a second  
pass that completely invalidates the initial marker-retrieval, there  
will be no more need for a great many of those (possibly containing  
whole tables themselves)

I'll think it over some more.

Later,

Andreas


Re: Memory leak?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Thank you, Karthik! That's very helpful information. Looks like another
little change making a huge difference. Cool!

Now, if we can free some objects even sooner FOP might even handle files
like the one from Luis Ferro with under 256MB one day. For that we'll
need to know reliably when an FO is completely processed (including area
generation). That's something that we'll also need for
page-number-citation-last, too, BTW.

On 09.08.2006 22:21:56 kar thik wrote:
> Andreas,
> 
> I ran my test cases against fop-trunk as of 08/03 and am seeing a very
> good boost in performance. I profiled against the same test case that
> produced loitering objects in fop 0.92beta, and did NOT find any
> loitering objects with the trunk code. The memory usage also seems to
> be very stable and I see more frequent garbage collections than it used to
> be in 0.92beta. Overall, the  process seem to use less memory than
> before. 
> 
> Below are some comparisons from my test environment :
> 
> Total pages processed : 12000 approx (split up as 1500 pages per pdf in a loop)
> 
> 1. Memory usage  : FOP 0.92beta started of with 500MB   and went all
> the way upto 1.2 GB easily and JVM crashed after processing 5000 pages
> approx. The latest version used upto a max of 750 MB (from 500 MB
> initial) and never went beyond that.
> 
> 2. Processing Time : 0.92 beta slowed down gradually from 4 minutes per
> 1500 page pdf to 15 min, when finally the JVM crashed. But the latest
> code took consistently 3-4 min to produce 1500 pages.
> 
> I'm not sure if the above comparison makes sense to anyone, but I just
> wanted to report for comparisons sake.
> 
> Overall the performance is been good so far and I'll keep profiling the
> process to look for any red flags.
> 
> Let me know, if you want to track any other details.
> 
> Thanks
> Karthik
> 
> 
> Andreas L Delmelle <a_...@pandora.be> wrote: On Aug 3, 2006, at 18:59, Jeremias Maerki wrote:
> 
> > I went looking and found that the fix is actually very easy: Just make
> > sure a page-sequences FO children are unreferenced when a page- 
> > sequence
> > is done. The PageSequenceLM is properly disposed by the  
> > AreaTreeHandler
> > so at least the LMs were already released properly. So far I haven't
> > seen any negative side-effects (like NPEs) from that change. Let's  
> > keep
> > a eye on it, though.
> 
> Cool! Should really make a difference for the situation in the OP on  
> fop-users. IIRC, the order of magnitude there was several thousands  
> of pages...
> 
> Karthik, can you try the latest SVN trunk version, and see if things  
> improve for you? If not, don't hesitate to report back.
> 
> Still got a few less important improvements on my list, but deferring  
> the marker-property resolution is still first in line ATM. Funny,  
> didn't seem too difficult at first, until I made the first changes,  
> and got a little more than I bargained for... ;)
> 
> 
> 
> Thanks,
> 
> Andreas
> 
> 



Jeremias Maerki


Re: Memory leak?

Posted by kar thik <ka...@yahoo.com>.
Andreas,

I ran my test cases against fop-trunk as of 08/03 and am seeing a very good boost in performance. I profiled against the same test case that produced loitering objects in fop 0.92beta, and did NOT find any loitering objects with the trunk code. The memory usage also seems to be very stable and I see more frequent garbage collections than it used to be in 0.92beta. Overall, the  process seem to use less memory than before. 

Below are some comparisons from my test environment :

Total pages processed : 12000 approx (split up as 1500 pages per pdf in a loop)

1. Memory usage  : FOP 0.92beta started of with 500MB   and went all the way upto 1.2 GB easily and JVM crashed after processing 5000 pages approx. The latest version used upto a max of 750 MB (from 500 MB initial) and never went beyond that.

2. Processing Time : 0.92 beta slowed down gradually from 4 minutes per 1500 page pdf to 15 min, when finally the JVM crashed. But the latest code took consistently 3-4 min to produce 1500 pages.

I'm not sure if the above comparison makes sense to anyone, but I just wanted to report for comparisons sake.

Overall the performance is been good so far and I'll keep profiling the process to look for any red flags.

Let me know, if you want to track any other details.

Thanks
Karthik


Andreas L Delmelle <a_...@pandora.be> wrote: On Aug 3, 2006, at 18:59, Jeremias Maerki wrote:

> I went looking and found that the fix is actually very easy: Just make
> sure a page-sequences FO children are unreferenced when a page- 
> sequence
> is done. The PageSequenceLM is properly disposed by the  
> AreaTreeHandler
> so at least the LMs were already released properly. So far I haven't
> seen any negative side-effects (like NPEs) from that change. Let's  
> keep
> a eye on it, though.

Cool! Should really make a difference for the situation in the OP on  
fop-users. IIRC, the order of magnitude there was several thousands  
of pages...

Karthik, can you try the latest SVN trunk version, and see if things  
improve for you? If not, don't hesitate to report back.

Still got a few less important improvements on my list, but deferring  
the marker-property resolution is still first in line ATM. Funny,  
didn't seem too difficult at first, until I made the first changes,  
and got a little more than I bargained for... ;)



Thanks,

Andreas



Re: Memory leak?

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Aug 3, 2006, at 18:59, Jeremias Maerki wrote:

> I went looking and found that the fix is actually very easy: Just make
> sure a page-sequences FO children are unreferenced when a page- 
> sequence
> is done. The PageSequenceLM is properly disposed by the  
> AreaTreeHandler
> so at least the LMs were already released properly. So far I haven't
> seen any negative side-effects (like NPEs) from that change. Let's  
> keep
> a eye on it, though.

Cool! Should really make a difference for the situation in the OP on  
fop-users. IIRC, the order of magnitude there was several thousands  
of pages...

Karthik, can you try the latest SVN trunk version, and see if things  
improve for you? If not, don't hesitate to report back.

Still got a few less important improvements on my list, but deferring  
the marker-property resolution is still first in line ATM. Funny,  
didn't seem too difficult at first, until I made the first changes,  
and got a little more than I bargained for... ;)



Thanks,

Andreas


Re: Memory leak?

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
I went looking and found that the fix is actually very easy: Just make
sure a page-sequences FO children are unreferenced when a page-sequence
is done. The PageSequenceLM is properly disposed by the AreaTreeHandler
so at least the LMs were already released properly. So far I haven't
seen any negative side-effects (like NPEs) from that change. Let's keep
a eye on it, though.

http://svn.apache.org/viewvc?rev=428450&view=rev

On 28.07.2006 22:41:52 Andreas L Delmelle wrote:
> On Jul 28, 2006, at 21:21, Simon Pepping wrote:
> 
> > <snip />
> > IMHO we create a number of circular references in our trees. I have
> > always feared that we do not clean them up well enough. Are there
> > tools to investigate this?
> 
> In itself a circular reference is not necessarily a problem, I think.  
> As long as the circle gets broken at some point, the allocated heap  
> space will eventually be reclaimed. It's only a matter of _when_ this  
> becomes possible. Currently, it looks like the whole reference tree  
> for all FObjs in all PageSequences is cleared only when the whole  
> document is finished.
> 
> In the case of the XMLWhiteSpaceHandler, the reference to the current  
> Block seems to be kept alive slightly too long. The reference is only  
> cleared when replaced by the first block in the next sequence. To  
> allow optimal GC, I think this should be released sooner --end of the  
> page-sequence at the latest. But this would only make a difference if  
> all other references to the PageSequence are also cleared, of  
> course... If not, then the Block is referenced until end-of-document  
> anyway.
> 
> I have not yet inspected the layout-tree, but I assume a similar  
> problem is impossible there, as the PageSequenceLM is at the root of  
> each LM-tree. The LM points to its FObj, and not the other way round,  
> so all seems OK.
> 
> 
> Cheers,
> 
> Andreas



Jeremias Maerki


Re: Memory leak?

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Simon Pepping wrote:
> IMHO we create a number of circular references in our trees.

Circular references in itself aren't a problem for the garbage
collector.

I've found that code inspection is of limited value for detecting
memory leaks. A memory profiler like DrMem (or any more professional
tool) is better.

J.Pietschmann

Re: Memory leak?

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Jul 28, 2006, at 21:21, Simon Pepping wrote:

> <snip />
> IMHO we create a number of circular references in our trees. I have
> always feared that we do not clean them up well enough. Are there
> tools to investigate this?

In itself a circular reference is not necessarily a problem, I think.  
As long as the circle gets broken at some point, the allocated heap  
space will eventually be reclaimed. It's only a matter of _when_ this  
becomes possible. Currently, it looks like the whole reference tree  
for all FObjs in all PageSequences is cleared only when the whole  
document is finished.

In the case of the XMLWhiteSpaceHandler, the reference to the current  
Block seems to be kept alive slightly too long. The reference is only  
cleared when replaced by the first block in the next sequence. To  
allow optimal GC, I think this should be released sooner --end of the  
page-sequence at the latest. But this would only make a difference if  
all other references to the PageSequence are also cleared, of  
course... If not, then the Block is referenced until end-of-document  
anyway.

I have not yet inspected the layout-tree, but I assume a similar  
problem is impossible there, as the PageSequenceLM is at the root of  
each LM-tree. The LM points to its FObj, and not the other way round,  
so all seems OK.


Cheers,

Andreas

Re: Memory leak?

Posted by Simon Pepping <sp...@leverkruid.eu>.
On Fri, Jul 28, 2006 at 12:44:00AM +0200, Andreas L Delmelle wrote:
> On Jul 27, 2006, at 23:56, Andreas L Delmelle wrote:
> 
> OTOH, looking deeper, I'm strangely surprised no-one saw this one  
> before --so surprised even that it makes me think I'm missing  
> something :
> 
> Root.addChildNode(PageSequence) results in a reference to the  
> PageSequence being kept in the Root's list of child nodes. Right?
> 
> AFAICT, this reference is *never* released as long as the Root object  
> is alive, so it seems like currently, our 'split up in page- 
> sequences' performance hint is complete and utter bogus...?
> 
> Sorry to disappoint you all.

IMHO we create a number of circular references in our trees. I have
always feared that we do not clean them up well enough. Are there
tools to investigate this?

Regards, Simon 

-- 
Simon Pepping
home page: http://www.leverkruid.eu