You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Robin Green <gr...@hotmail.com> on 2000/12/07 03:59:52 UTC
[RT] Caching - SoftReference Annoyances at 3am
Okay, here goes my first RT :o)
Over the last few days I've been doing a lot of Cocoon coding, trying to
code (amongst other things, to be announced later) an alternative to C1's
MemoryStore that takes advantage of the java.lang.ref.SoftReference class
provided in JDK1.2 to avoid the dreaded OutOfMemoryError as far as possible.
I've set it up as an optional compilation thing in build.xml and it will
only use the new version if the platform is JDK1.2 (needed for
SoftReferences), dropping back to the old version if not, so it'll still
maintain Cocoon 1.x's full JDK1.1-compatibility.
Hopefully this can be useful in some way in C2 as well. What it does is
tries to combine the best of both approaches: using SoftReferences, which
allows the garbage collector to free part or all of the cache when memory is
low (but does not provide any prioritisation), and C1's (buggy)
prioritisation approach (i.e. try to keep recently used items, items that
take a long time to generate, etc.). Since there is no way of interfacing to
the garbage collector to tell it what's important and what's not, what
happens is still rather hacky and not-provably-optimal - it still uses a
cleanup thread to keep memory usage below user-specified limits, in the hope
that the cleanup thread will stay ahead of the garbage collector most or all
of the time (I would encourage everyone to vote for the JDK Bug Database
entry on setting priorities for SoftReferences, which would make this much
easier!). However it should work to an extent. There is quite a thorny nest
of wrapper classes etc. so it's not the simplest possible thing, but each
wrapper object only takes up a few bytes.
Now, the subtleties of the java.lang.ref package aren't the easiest thing in
the world to understand, but even so, I managed to completely miss the point
of the reference queue. I thought "Great, just wait for the garbage
collector to tell you a reference has been cleared, and then remove the
cleared reference (an empty stub) from the cache". If only things were that
simple. I wrote the code, and wondered why everything immediately
disappeared from the cache. :=)
The reason is this: the reference queue does _not_ tell you when a reference
has been cleared, it tells you when the GC has determined it to be
(softly/weakly/phantomly) reachable. Okay, okay, so I didn't RTFM very
carefully, big deal. But the point is, I can't see any way of reliably
finding out when a reference has been cleared in order to remove it from the
cache. Sure, you can check for cleared refs on each get and put call, or use
an every-ten-seconds cleanup thread, or even chew up all your spare CPU
cycles going round and round and round the data structure looking for
cleared references - but when you've got thousands or millions of cache
entries and a busy site, none of those absolutely guarantees that you'll
find them all before an OutOfMemoryError occurs - as far as I can see. And
before anyone says "override the clear method" - nope. That won't work,
because the VM doesn't call the method directly, and it's in the JDK bug
database as "not a bug". (!?)
So maybe you're storing 1K-1Mb byte arrays in the cache ready to serve up,
and compared to that the SoftReferences are tiny. But the point is, the
SoftReferences stay there. They themselves will not get garbage collected
because they are strongly referenced by the cache data structure. Again, you
can work around this by the techniques I just suggested, but none of them
are perfect. And just remember what Cocoon1.8.1-dev now uses as a key to
index cache entries: essentially, a concatenation of all the request
headers. So if you make heavy use of sessions, say, but still cache your
generated content, you might have a new cache entry created for almost every
request - lots and lots of SoftReferences.
I guess the best solution for now would be to set a maximum number of cache
entries, say 20,000, and when that limit is reached, start expelling the
lowest priority entry each time a new entry is added. I guess I answered my
own question. :-)
Unfortunately this results in expelling things unnecessarily, if the GC has
already cleared some references - and if, on the other hand, you avoid that
by searching 20,000 or more SoftReferences for cleared references every time
you want to insert something into the cache, well, that could be quite a
performance drain. Ideally, IMO, this is something that needs a new method
in the JDK, for notification of when a reference has been cleared.
Okay, this may sound quite theoretical and extreme. But I'd like to point
out that it was recently reported on cocoon-users that a Cocoon site had
crashed with an OutOfMemoryError after over a million hits. My hunch is that
that was something to do with ever-growing Monitors (something which I've
fixed on my local machine by putting monitors into Pages so that they too
can be discarded, but only if the cached Page is). Like the cumulative
impact of a mass of snowflakes causing an avalanche. If it was a simple
misconfiguration the error would likely have appeared earlier.
_____________________________________________________________________________________
Get more from the Web. FREE MSN Explorer download : http://explorer.msn.com