You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Robin Green <gr...@hotmail.com> on 2000/12/07 03:59:52 UTC

[RT] Caching - SoftReference Annoyances at 3am

Okay, here goes my first RT :o)

Over the last few days I've been doing a lot of Cocoon coding, trying to 
code (amongst other things, to be announced later) an alternative to C1's 
MemoryStore that takes advantage of the java.lang.ref.SoftReference class 
provided in JDK1.2 to avoid the dreaded OutOfMemoryError as far as possible. 
I've set it up as an optional compilation thing in build.xml and it will 
only use the new version if the platform is JDK1.2 (needed for 
SoftReferences), dropping back to the old version if not, so it'll still 
maintain Cocoon 1.x's full JDK1.1-compatibility.

Hopefully this can be useful in some way in C2 as well. What it does is 
tries to combine the best of both approaches: using SoftReferences, which 
allows the garbage collector to free part or all of the cache when memory is 
low (but does not provide any prioritisation), and C1's (buggy) 
prioritisation approach (i.e. try to keep recently used items, items that 
take a long time to generate, etc.). Since there is no way of interfacing to 
the garbage collector to tell it what's important and what's not, what 
happens is still rather hacky and not-provably-optimal - it still uses a 
cleanup thread to keep memory usage below user-specified limits, in the hope 
that the cleanup thread will stay ahead of the garbage collector most or all 
of the time (I would encourage everyone to vote for the JDK Bug Database 
entry on setting priorities for SoftReferences, which would make this much 
easier!). However it should work to an extent. There is quite a thorny nest 
of wrapper classes etc. so it's not the simplest possible thing, but each 
wrapper object only takes up a few bytes.

Now, the subtleties of the java.lang.ref package aren't the easiest thing in 
the world to understand, but even so, I managed to completely miss the point 
of the reference queue. I thought "Great, just wait for the garbage 
collector to tell you a reference has been cleared, and then remove the 
cleared reference (an empty stub) from the cache". If only things were that 
simple. I wrote the code, and wondered why everything immediately 
disappeared from the cache. :=)

The reason is this: the reference queue does _not_ tell you when a reference 
has been cleared, it tells you when the GC has determined it to be 
(softly/weakly/phantomly) reachable. Okay, okay, so I didn't RTFM very 
carefully, big deal. But the point is, I can't see any way of reliably 
finding out when a reference has been cleared in order to remove it from the 
cache. Sure, you can check for cleared refs on each get and put call, or use 
an every-ten-seconds cleanup thread, or even chew up all your spare CPU 
cycles going round and round and round the data structure looking for 
cleared references - but when you've got thousands or millions of cache 
entries and a busy site, none of those absolutely guarantees that you'll 
find them all before an OutOfMemoryError occurs - as far as I can see. And 
before anyone says "override the clear method" - nope. That won't work, 
because the VM doesn't call the method directly, and it's in the JDK bug 
database as "not a bug". (!?)

So maybe you're storing 1K-1Mb byte arrays in the cache ready to serve up, 
and compared to that the SoftReferences are tiny. But the point is, the 
SoftReferences stay there. They themselves will not get garbage collected 
because they are strongly referenced by the cache data structure. Again, you 
can work around this by the techniques I just suggested, but none of them 
are perfect. And just remember what Cocoon1.8.1-dev now uses as a key to 
index cache entries: essentially, a concatenation of all the request 
headers. So if you make heavy use of sessions, say, but still cache your 
generated content, you might have a new cache entry created for almost every 
request - lots and lots of SoftReferences.

I guess the best solution for now would be to set a maximum number of cache 
entries, say 20,000, and when that limit is reached, start expelling the 
lowest priority entry each time a new entry is added. I guess I answered my 
own question. :-)
Unfortunately this results in expelling things unnecessarily, if the GC has 
already cleared some references - and if, on the other hand, you avoid that 
by searching 20,000 or more SoftReferences for cleared references every time 
you want to insert something into the cache, well, that could be quite a 
performance drain. Ideally, IMO, this is something that needs a new method 
in the JDK, for notification of when a reference has been cleared.

Okay, this may sound quite theoretical and extreme. But I'd like to point 
out that it was recently reported on cocoon-users that a Cocoon site had 
crashed with an OutOfMemoryError after over a million hits. My hunch is that 
that was something to do with ever-growing Monitors (something which I've 
fixed on my local machine by putting monitors into Pages so that they too 
can be discarded, but only if the cached Page is). Like the cumulative 
impact of a mass of snowflakes causing an avalanche. If it was a simple 
misconfiguration the error would likely have appeared earlier.


_____________________________________________________________________________________
Get more from the Web.  FREE MSN Explorer download : http://explorer.msn.com