You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by DeWitt Clinton <de...@eziba.com> on 2000/02/20 01:37:35 UTC

Announce: File::Cache 0.04

Hi,

First, I'd like to announce the release of File::Cache 0.04, the first 
public release of another perl library that caches data to be shared 
between processes, this time via the filesystem:


The uploaded file
   
    File-Cache-0.04.tar.gz
   
has entered CPAN as
   
  file: $CPAN/authors/id/D/DC/DCLINTON/File-Cache-0.04.tar.gz
  size: 11270 bytes
   md5: 48257c5dd296339b54e4bca4f03624a5


Second, I'd like to explain why I wrote it.  As some of you know, and
some are even using, I put IPC::Cache out on CPAN last month.  Well,
it turns out that while IPC::Cache is very easy to use, it has some 
severe performance limitations when it is pushed beyond a handful of 
cached objects.  Essentially, IPC::Cache relies on the Storable module 
to persist and de-persist complex perl datatypes.  However, due to 
the complexity of splitting that data across shared memory segments, 
even with the help of IPC::ShareLite, the entire cache must be frozen 
or thawed at once. There is definitely room for someone to figure out 
a better way to do this.

However, since our site needed a solution quickly, I wrote
File::Cache.  This module shares the exact same interface as
IPC::Cache, but does not suffer the performance penalties associated
with a moderate number of objects.  File::Cache simply stores the data 
on the filesystem and relies on the underlying OS caching and
buffering to keep performance high.

For those using IPC::Cache, I encourage you to check out File::Cache.
You will be able to switch your code over simply by a global find and
replace of IPC::Cache with File::Cache.

I wrote a benchmarking script that illustrates these issues, and have
attached the results to this mail.  Quite revealing.

BTW, if Sam Tregar reads this list, I wonder if he might be interested 
in benchmarking his IPC::SharedCache code, too.  I made an honest
attempt to benchmark it, but didn't want to do SharedCache a
disservice and give inaccurate results.  I can send over the benchmark
script to anyone interested.

Happy caching,

-DeWitt






Re: Announce: File::Cache 0.04

Posted by Gunther Birznieks <gu...@extropia.com>.
Some additional comments below...

Thanks,
      Gunther

DeWitt Clinton wrote:

>
> > 4. The third thing that I would caution about is that there is a whole
> > shedload of Persistent::* modules that have come out (see Stas' post in the
> > last week). It seems like this, more than anything, could share a lot of
> > features with Cache::. It could be that Cache:: might actually be better off
> > wrapping around Persistant::.
>
> I can't find this in the CPAN tree.  The only thing in the Persistence
> namespace appears to be "Object-Persistence" and I'm not quite sure
> that's what we're looking for here.  Intuitively, this is probably
> what we'd want, but it doesn't seem to be exactly it.
>

Sorry, Stas had posted something about it a week or two ago... Here is the relevant
URL to Persistent::*. I haven't investigated it thoroughly but it also seems very
similar and it would be a shame to duplicate too much work.

http://www.bigsnow.org/persistent/

>
> > 5. Fourth, the LRU purging mechanism. If I were you I would put that in the
> > realm of CacheManager that wraps a Cache not a routine inside of Cache itself.
>
> Similar to the object expiration based on time, I think we can get
> away with having the Cache wrapper class support those features.   A
> separate CacheManager may not even be needed -- or, more accurately,
> the Cache::Cache class becomes the CacheManager.
>

Probably. But there are many different algorithms a cache can use to expire data.
LRU being just one. Abstracting it into a separate class hierarchy our at least
Cache::Manager::XXXX would allow anyone else to write their own cache expiry
routines ... you provide the one you want to implement as an example (LRU) and
others can implement at will.

Of course, at a first pass, it is probably just as easy to do as you say..

Later,
  Gunther



Re: Announce: File::Cache 0.04

Posted by DeWitt Clinton <de...@eziba.com>.
Great comments, please see my replies below.  And apologies if this is 
getting a bit off-topic, but it does seem like, judging from the
number of emails I've been getting from modperlers, that there is
considerable interest in the community for this.  I'd be happy to take 
the discussion elsewhere once I'm sure that we're heading it a
direction that best suits mod_perl.

On Sun, Feb 20, 2000 at 07:48:54AM +0000, Gunther Birznieks wrote:

> In addition, I have never used IPC::Cache so forgive my ignorance on the
> interface. If the interface is not exported into the main namespace and is
> instead OO in nature, I would prefer if you defer the loading of the subclass
> until the constructor (new()) is called... eg
> 
> use Cache;
> 
> my $cache = new Cache(-type => "IPC");

You are right on here.  I love this model -- moreover, the Cache
storage providers really only need to provide a few simple methods, get,
set, purge, and clear.  The Cache::Cache class will take the objects
returned by get() and perform any necessary work on them (such as
expiration) or a LRU purge.  It will require a little re-architecture
here, but everything adds up for me.

> 4. The third thing that I would caution about is that there is a whole
> shedload of Persistent::* modules that have come out (see Stas' post in the
> last week). It seems like this, more than anything, could share a lot of
> features with Cache::. It could be that Cache:: might actually be better off
> wrapping around Persistant::.

I can't find this in the CPAN tree.  The only thing in the Persistence 
namespace appears to be "Object-Persistence" and I'm not quite sure
that's what we're looking for here.  Intuitively, this is probably
what we'd want, but it doesn't seem to be exactly it.

> 5. Fourth, the LRU purging mechanism. If I were you I would put that in the
> realm of CacheManager that wraps a Cache not a routine inside of Cache itself.

Similar to the object expiration based on time, I think we can get
away with having the Cache wrapper class support those features.   A
separate CacheManager may not even be needed -- or, more accurately,
the Cache::Cache class becomes the CacheManager.

Thanks for the great advice!

-DeWitt


Re: Announce: File::Cache 0.04

Posted by Gunther Birznieks <gu...@extropia.com>.
1. I think this is a great idea. Not everything is session related.

I don't think Session feels fully appropriate to what DeWitt is trying to
accomplish. A session is a bit more than just a cache of data.

While it seems akin to the *Store.pm module in Apache::Session, that is not
what Apache::Session promotes as a public interface to the world.  And I
haven't really seen it documented that a future Apache::Session could not
decide to change the API of how it uses the *Store.pm modules internally.

Cache:: seems nice because it is advertises the interface of the storage, it
fits a common interface people have been using already to do this (so it plugs
into existing programs), and it does a very specific task (sharing data among
processes) very well. I agree that Session could make use of the Cache::
Mechanism though.

2. The main issue I see in your message below is the class hierarchy
definition.

Rather than

use Cache::Generic (IPC)...

I would rather stick to it as

use Cache (IPC);

In OO style, I suspect you would want the abstract class (which is what
Cache.pm essentially acts like in this case) to introduce the interface that
is specified by IPC. There is no need to call it Generic. Generic implies that
it stores data in non-exotic places, but you actually override the word
Generic by adding (IPC) to it.

So it's really an IPC Cache not a Generic one afterall that is being imported.

Also, thre is something cleaner about having all the files in the Cache
subdirectory being subclasses of Cache.pm one directory above. Otherwise it
just feels weird (at least to me) to have an ancestor be in the same
directory.

In addition, I have never used IPC::Cache so forgive my ignorance on the
interface. If the interface is not exported into the main namespace and is
instead OO in nature, I would prefer if you defer the loading of the subclass
until the constructor (new()) is called... eg

use Cache;

my $cache = new Cache(-type => "IPC");

This allows Cache to act as a factory creating the particular types of caches
on the fly. Otherwise the use statement gets hardcoded into your application
code. When you use a string like -type => "IPC", you can actually replace it
with -type => $cache_type and then make $cache_type read in from a
configuration file...

4. The third thing that I would caution about is that there is a whole
shedload of Persistent::* modules that have come out (see Stas' post in the
last week). It seems like this, more than anything, could share a lot of
features with Cache::. It could be that Cache:: might actually be better off
wrapping around Persistant::.

The arguement against this is that Persistent:: might implement many more
features and hence be slower than a thinly implemented Cache:: module in terms
of number of method calls etc. I don't know what you'd run into because I've
never looked at Persistent:: in any detail.

5. Fourth, the LRU purging mechanism. If I were you I would put that in the
realm of CacheManager that wraps a Cache not a routine inside of Cache itself.
However, as another interesting analogy, I have to say that in my recent
attempt at a SessionManager class (to manage sessiosns), I have implemented
garbage collection based on last access time, creation time, and modify
time... So it again seems like there is a Session <-> Cache parallel here.
Don't know what to say about this. May require more thought.

Later,
   Gunther

DeWitt Clinton wrote:

> On Sat, Feb 19, 2000 at 04:50:46PM -0800, Joshua Chamas wrote:
>
> > Does this have any performance advantages to using Apache::Session
> > with a file backing store ?  Would you point out the differences
> > in interface between the two that might lead developers to
> > choosing one over the other ?
>
> You are definitely thinking along the right lines here!  I agree,
> there are a lot of similarities between my cache interface and
> Apache::Session.
>
> What I'm providing is a simple (from the caller's point of view) way
> of sticking data somewhere and getting it back conveniently, with an
> easy mechanism for object expiration.
>
> The idea of inverting the namespaces is spot on -- I am thinking of
> creating the Cache::* namespace, and the creating Cache::IPC,
> Cache::File, Cache::DBI, Cache::Memory, etc, modules.  Most
> importantly, I'd then add Cache::Generic, which would allow "use
> Cache::Generic (IPC)" or "use Cache::Generic (File)" and similar
> constructs.  Ideally, classes such as Cache::File would implement an
> interface that would force them to support a subset of caching
> functionality (get, set, purge, clear, etc.)  This way, programmers
> could switch easily between the implementations based on what sort of
> performance characteristics they are looking for.
>
> Again, this is very similar to Apache::Session.  What I really see
> making sense is for the underlying stores for Apache::Session to wrap
> the Cache::* classes, not vice-versa.  The Session stuff is actually
> more functional and specific than the generic cache, nevermind that
> the Apache namespace isn't always appropriate.  But since there is a
> lot of good code in the Apache::Session stores, they would definitely
> be the right place to get the implementation details.
>
> I would glad accept anyone interested in helping with this.  And, as
> always, sanity checks are most welcome -- if this is way out of line,
> please stop me before I get any deeper.  :)
>
> BTW, this interface isn't Perl specific at all -- we use the same
> thing for a Java cache, which is shared between threads in an application.
>
> -DeWitt
>
> PS:  The one other feature I would like to add to the cache is an
> optional LRU purging mechanism.  Any takers?


Re: Announce: File::Cache 0.04

Posted by DeWitt Clinton <de...@eziba.com>.
On Sat, Feb 19, 2000 at 04:50:46PM -0800, Joshua Chamas wrote:

> Does this have any performance advantages to using Apache::Session
> with a file backing store ?  Would you point out the differences
> in interface between the two that might lead developers to 
> choosing one over the other ?

You are definitely thinking along the right lines here!  I agree,
there are a lot of similarities between my cache interface and 
Apache::Session.  

What I'm providing is a simple (from the caller's point of view) way
of sticking data somewhere and getting it back conveniently, with an
easy mechanism for object expiration.

The idea of inverting the namespaces is spot on -- I am thinking of
creating the Cache::* namespace, and the creating Cache::IPC,
Cache::File, Cache::DBI, Cache::Memory, etc, modules.  Most
importantly, I'd then add Cache::Generic, which would allow "use
Cache::Generic (IPC)" or "use Cache::Generic (File)" and similar
constructs.  Ideally, classes such as Cache::File would implement an
interface that would force them to support a subset of caching
functionality (get, set, purge, clear, etc.)  This way, programmers
could switch easily between the implementations based on what sort of
performance characteristics they are looking for.

Again, this is very similar to Apache::Session.  What I really see
making sense is for the underlying stores for Apache::Session to wrap
the Cache::* classes, not vice-versa.  The Session stuff is actually
more functional and specific than the generic cache, nevermind that
the Apache namespace isn't always appropriate.  But since there is a
lot of good code in the Apache::Session stores, they would definitely
be the right place to get the implementation details.

I would glad accept anyone interested in helping with this.  And, as
always, sanity checks are most welcome -- if this is way out of line,
please stop me before I get any deeper.  :)

BTW, this interface isn't Perl specific at all -- we use the same
thing for a Java cache, which is shared between threads in an application.

-DeWitt

PS:  The one other feature I would like to add to the cache is an
optional LRU purging mechanism.  Any takers?



Re: Announce: File::Cache 0.04

Posted by Joshua Chamas <jo...@chamas.com>.
DeWitt Clinton wrote:
> 
> Hi,
> 
> First, I'd like to announce the release of File::Cache 0.04, the first
> public release of another perl library that caches data to be shared
> between processes, this time via the filesystem:
> 
> The uploaded file
> 
>     File-Cache-0.04.tar.gz
> 

Does this have any performance advantages to using Apache::Session
with a file backing store ?  Would you point out the differences
in interface between the two that might lead developers to 
choosing one over the other ?

What I'm thinking is that if you init Apache::Session with
a fixed identifier, it should have the same functionality
doing the same with File::Cache, maybe?

There's seems to be something similar afoot here too, let's
say you were to implement a DBIish way of doing this, then 
you might have DBI::Cache ... now if you reverse the namespace,
like Cache::DBI,File,IPC you seem to have the makings of what 
Apache::Session has already done with lower level backing stores.

-- Joshua
_________________________________________________________________
Joshua Chamas			        Chamas Enterprises Inc.
NodeWorks >> free web link monitoring	Huntington Beach, CA  USA 
http://www.nodeworks.com                1-714-625-4051