You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Jeremias Maerki <de...@jeremias-maerki.ch> on 2006/07/13 22:43:07 UTC

Re: Images in FOP 0.92beta

Jörg,

remember this thread on fop-users? I've just found out what's wrong.

There's absolutely nothing wrong with the PDFRenderer or the PDF library
concerning reference freeing. It does it so as soon as each image is
written to the PDF which always happens immediately.

But I found that org.apache.fop.fo.flow.ExternalGraphic unnecessarily
maintains a hard reference on a FopImage. Unnecessarily, because we just
need the instrinsic size there. The FopImage is never reset to null
after use. I fixed that and: d'oh, still not good.

I ended up in the image cache and in the Javadocs for WeakHashMap where
I found that little detail that the weak reference is on the key, not
the value. And the key is the URL (String) which is passed around in FOP.
Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in
WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of
258 MB is suddenly produced without exceptions using the VM's default
heap settings, never going beyond 26MB heap usage. *g*

Will test some more and then commit later.

On 21.06.2006 23:03:38 J.Pietschmann wrote:
> Jeremias Maerki wrote:
> > Ouch, that could explain it. No, no changes in that area. Actually,
> > images could be written to the file immediately and then released
> > instead of having to wait until the next page-sequence is finished.
> 
> While the image data is written as soon as possible, the XObject
> which also points to the image object is kept for the object dictionary
> which is written much later. There have been changes in the way the
> object dictionaries are written to the PDF which I didn't track.
> 
> > Should be easy to fix.
> 
> Unfortunately, the XObject seems to query some data from the image
> object while writing the dictionary.



Jeremias Maerki


Re: Images in FOP 0.92beta

Posted by Chris Bowditch <bo...@hotmail.com>.
thomas.deweese@kodak.com wrote:

> Hi Jeremias,
> 

<snip/>

> 
>    Well I figure that the thread will just be blocked in queue.remove
> most of the time unless it has something to do.  I don't think there
> is much overhead for a thread in the types of systems we are targeting
> (i.e. not small constrained devices).  Note that this is one thread
> that is used for all CleanerThread sub objects (so it's not like you
> are likely to spawn lots of threads).

Don't forget that a lot of folk deploy FOP/Batik inside Web containers 
or Application Servers, where spawning new Threads is considered illegal.

<snip/>

Chris



Re: Images in FOP 0.92beta

Posted by th...@kodak.com.
Hi Jeremias,

Jeremias Maerki <de...@jeremias-maerki.ch> wrote on 07/14/2006 04:26:57 PM:

> At first, I'd have preferred to avoid an extra thread if possible so I
> just added a local ReferenceQueue and used poll() to do house-keeping
> whenever a user agent signs off. I assume you don't have a
> non-too-frequently called method you could do on-demand house-keeping 
in,
> so the thread is probably ok. 

   Well I figure that the thread will just be blocked in queue.remove
most of the time unless it has something to do.  I don't think there
is much overhead for a thread in the types of systems we are targeting
(i.e. not small constrained devices).  Note that this is one thread
that is used for all CleanerThread sub objects (so it's not like you
are likely to spawn lots of threads).

   Some people put the cleaning in the management calls (so you
poll the queue when people add/remove elements from the hash).  I'm
not fond of that as it means you are borrowing a 'strangers' thread
to do your work (it just feels ugly).

> And given that we have Batik in memory
> anyway FOP could co-use that thread. But since I'd like to avoid
> dependencies on Batik directly if possible, can we move CleanerThread to
> XML Graphics Commons and rename it to ReferenceCleanerThread to give it
> a more speaking name? 

   I was under the impression that most of the stuff in
batik.util will find it's way into graphics commons.  As for renaming
I don't think it's a big deal.

> The SoftReferenceCache is indeed a little odd, especially the method
> names. I think I'll skip that one for now.

   It is meant to be subclassed to provide a strongly typed interface
(notice all the '*Impl' methods are protected.  So the subclass can 
provide public versions that take strongly typed parameters.

> Some other interesting things I observed while playing around for those
> interested (ATM, I'm still doing the house-keeping without the thread
> but I might rewrite):

   SoftReferences are a very powerful tool in Java, I don't think they
get enough attention in general.

> When using weak references (as the current code does but with the fixed
> behavior) FOP takes around 35 sec on my machine to produce that 182
> image PDF. Heap usage is usually around 12MB with peaks to 26MB. The
> house-keeping after the user agent retires removes around 178 
references.
> 
> Switching to soft references which is actually the recommended type for
> caches, the heap usage goes up to the 64MB maximum and pretty much stay
> there. The whole thing takes 29-30 sec average. The house-keeping after
> the user agent retires removes between 161 and 170 references. So this
> means the VM actually keeps more references around, only freeing as many
> as it needs not to run into memory problems. And it runs faster this 
way.
> 
> I learned a few things today. :-)

   I guess that makes it a good day ;)

> On 14.07.2006 14:35:06 thomas.deweese wrote:
> > Hi all,
> > 
> >     Just a small comment on HashMaps with weak values:
> > 
> > Jeremias Maerki <de...@jeremias-maerki.ch> wrote on 07/13/2006 04:43:07 
PM:
> > 
> > > Ok, so I changed the WeakHashMap to a HashMap and wrapped the values 
in
> > > WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size 
of
> > > 258 MB is suddenly produced without exceptions using the VM's 
default
> > > heap settings, never going beyond 26MB heap usage. *g*
> > 
> >    There is a potential problem with this approach that Batik ran 
into.
> > Unless you go a little further those weak values accumulate in the 
map. 
> > In your case this probably isn't a big deal, but for Batik where there
> > are potentially of thousands (or tens of thousands, think mouse move 
> > events) 
> > of entries, these 'dead' entries start to add up.
> > 
> >    As a result Batik has batik.util.CleanerThread.  This class has
> > inner classes that subclass the various SoftReference classes with an 
> > additional method 'public void cleared()'.  This method is called by
> > the CleanerThread when the object the soft reference is point at is
> > cleared from memory (it uses the ReferenceQueue part of soft 
references).
> > 
> >    This gives you the hook you need to then de-register the entry from
> > the has table.  This is actually an incredibly useful 'addition' to
> > the standard soft reference classes (for example I will often use
> > it to check if classes I think should go to GC really do go to GC).
> > 
> >    I should also mention that Batik has a class called 
> > 'SoftReferenceCache'
> > which is a thread safe implementation of exactly what you just 
> > implemented. 
> > The interface may seem a little odd but it is designed to ensure that
> > only one party ever has to decode a resource even if multiple threads
> > request it "at the same time".
> > 
> >    Anyway just thought I would add my 2 cents...
> 
> 
> 
> Jeremias Maerki
> 


Re: Images in FOP 0.92beta

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
That was worth more than 2 cents. Thanks, Thomas. I didn't really care
too much about left-over references at first, but in a long-running
service they add up unnecessarily even if it's only a Map.Entry, a
String and a Reference instance per entry.

At first, I'd have preferred to avoid an extra thread if possible so I
just added a local ReferenceQueue and used poll() to do house-keeping
whenever a user agent signs off. I assume you don't have a
non-too-frequently called method you could do on-demand house-keeping in,
so the thread is probably ok. And given that we have Batik in memory
anyway FOP could co-use that thread. But since I'd like to avoid
dependencies on Batik directly if possible, can we move CleanerThread to
XML Graphics Commons and rename it to ReferenceCleanerThread to give it
a more speaking name? In the beginning, this means we will have two
threads doing the same thing but it is ultimately cleaner design in the
long run (when Batik starts using Commons).

The SoftReferenceCache is indeed a little odd, especially the method
names. I think I'll skip that one for now.

Some other interesting things I observed while playing around for those
interested (ATM, I'm still doing the house-keeping without the thread
but I might rewrite):

When using weak references (as the current code does but with the fixed
behaviour) FOP takes around 35 sec on my machine to produce that 182
image PDF. Heap usage is usually around 12MB with peaks to 26MB. The
house-keeping after the user agent retires removes around 178 references.

Switching to soft references which is actually the recommended type for
caches, the heap usage goes up to the 64MB maximum and pretty much stay
there. The whole thing takes 29-30 sec average. The house-keeping after
the user agent retires removes between 161 and 170 references. So this
means the VM actually keeps more references around, only freeing as many
as it needs not to run into memory problems. And it runs faster this way.

I learned a few things today. :-)

On 14.07.2006 14:35:06 thomas.deweese wrote:
> Hi all,
> 
>     Just a small comment on HashMaps with weak values:
> 
> Jeremias Maerki <de...@jeremias-maerki.ch> wrote on 07/13/2006 04:43:07 PM:
> 
> > Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in
> > WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of
> > 258 MB is suddenly produced without exceptions using the VM's default
> > heap settings, never going beyond 26MB heap usage. *g*
> 
>    There is a potential problem with this approach that Batik ran into.
> Unless you go a little further those weak values accumulate in the map. 
> In your case this probably isn't a big deal, but for Batik where there
> are potentially of thousands (or tens of thousands, think mouse move 
> events) 
> of entries, these 'dead' entries start to add up.
> 
>    As a result Batik has batik.util.CleanerThread.  This class has
> inner classes that subclass the various SoftReference classes with an 
> additional method 'public void cleared()'.  This method is called by
> the CleanerThread when the object the soft reference is point at is
> cleared from memory (it uses the ReferenceQueue part of soft references).
> 
>    This gives you the hook you need to then de-register the entry from
> the has table.  This is actually an incredibly useful 'addition' to
> the standard soft reference classes (for example I will often use
> it to check if classes I think should go to GC really do go to GC).
> 
>    I should also mention that Batik has a class called 
> 'SoftReferenceCache'
> which is a thread safe implementation of exactly what you just 
> implemented. 
> The interface may seem a little odd but it is designed to ensure that
> only one party ever has to decode a resource even if multiple threads
> request it "at the same time".
> 
>    Anyway just thought I would add my 2 cents...



Jeremias Maerki


Re: Images in FOP 0.92beta

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Jeremias Maerki wrote:
> remember this thread on fop-users? I've just found out what's wrong.

Great!

> There's absolutely nothing wrong with the PDFRenderer or the PDF library
> concerning reference freeing. It does it so as soon as each image is
> written to the PDF which always happens immediately.

Hm. I'm pretty sure in 0.20.5 a PDF object held a pointer, and the
object was using some data while writing a dictionary structure into
the PDF stream after all the real content was written.
[...]
> I ended up in the image cache and in the Javadocs for WeakHashMap where
> I found that little detail that the weak reference is on the key, not
> the value.

Oops, my fault.

J.Pietschmann

Re: Images in FOP 0.92beta

Posted by th...@kodak.com.
Hi all,

    Just a small comment on HashMaps with weak values:

Jeremias Maerki <de...@jeremias-maerki.ch> wrote on 07/13/2006 04:43:07 PM:

> Ok, so I changed the WeakHashMap to a HashMap and wrapped the values in
> WeakReferences. Tadaaa! A PDF with 182 JPEG images with a total size of
> 258 MB is suddenly produced without exceptions using the VM's default
> heap settings, never going beyond 26MB heap usage. *g*

   There is a potential problem with this approach that Batik ran into.
Unless you go a little further those weak values accumulate in the map. 
In your case this probably isn't a big deal, but for Batik where there
are potentially of thousands (or tens of thousands, think mouse move 
events) 
of entries, these 'dead' entries start to add up.

   As a result Batik has batik.util.CleanerThread.  This class has
inner classes that subclass the various SoftReference classes with an 
additional method 'public void cleared()'.  This method is called by
the CleanerThread when the object the soft reference is point at is
cleared from memory (it uses the ReferenceQueue part of soft references).

   This gives you the hook you need to then de-register the entry from
the has table.  This is actually an incredibly useful 'addition' to
the standard soft reference classes (for example I will often use
it to check if classes I think should go to GC really do go to GC).

   I should also mention that Batik has a class called 
'SoftReferenceCache'
which is a thread safe implementation of exactly what you just 
implemented. 
The interface may seem a little odd but it is designed to ensure that
only one party ever has to decode a resource even if multiple threads
request it "at the same time".

   Anyway just thought I would add my 2 cents...