You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Clinton Gormley <cl...@drtech.co.uk> on 2003/03/07 18:11:23 UTC

Optimising cache performance

I'd appreciate some feedback on my logic to optimise my cache (under
mod_perl 1)

I'm building a site which will have a large number of fairly complicated
objects (each of which would require 5-20 queries to build from scratch)
which are read frequently and updated relatively seldom.

I'm planning a two level cache : 
    1) Live objects in each mod_perl process
    2) Serialised objects in a database

The logic goes as follows :
NORMAL READ-ONLY REQUEST
1) REQUEST FROM BROWSER
     * Request comes from browser to view for object 12345 
       (responding to this request may involve accessing 10 other
objects)
2) PURGE OUTDATED LIVE OBJECTS
    * mod_perl process runs a query to look for the ID's of any objects
      that have been updated since the last time this 
      query was run (last_modified_time).
    * Any object IDs returned by this request have their objects removed
      from the in-memory mod_perl process specific cache
3) REQUEST IS PROCESSED
    * Any objects required by this request are retrieved first from 
      the in-memory cache.
    * If they are not present, 
          * the process looks in the serialised object cache in the
database.
          * If not present there either, 
                * the object is constructed from scratch the relational
DB.
                  and stored in the serialised object cache
          * retrieved object is store in the in-memory live object
cache  
4) TRIM LIVE OBJECT CACHE
    * Any live objects that are not in the 1000 most recently accessed 
      objects are deleted from the in-memory cache


UPDATE REQUEST
Steps as above except : 
3a) UPDATING OBJECT
     * Any objects that are modified 
         * are deleted from the serialised object cache in the DB
         * and are deleted from the in-memory cache for this mod_perl 
            process only


This means that at the start of every request, each process has access
to the most up to date versions of each object with a small (hopefully)
penalty to pay in the form of the query checking for last_modified_time.

Does this sound reasonable or is overkill

many thanks

Clinton Gormley


Re: Optimising cache performance

Posted by Perrin Harkins <pe...@elem.com>.
Clinton Gormley wrote:
> For now it's not a distributed system, and I have been using 
> Cache::FileCache.  But that still means freezing and thawing objects - 
> which I'm trying to minimise.

Other things (IPC::MM, MLDBM::Sync, Cache::Mmap, BerkeleyDB) are 
significantly faster than Cache::FileCache.  If you have tons of free 
memory, then go ahead and cache things in memory.  My feeling is that 
the very small amount of time that the fastest of these systems use to 
freeze and thaw is totally made up for in the huge memory savings which 
allows you to run more server processes.

> When you say that Cache::Mmap is only limited by the size of your disk, 
> is that because the file in memory gets written to disk as part of VM? ( 
> I don't see any other mention of files in the docs.) Which presumably 
> means resizing your VM to make space for the cache?

That's right, it uses your system's mmap() call.  I've never needed to 
adjust the amount of VM I have because of memory-mapping a file, but I 
suppose it could happen.  This would be a good question for the author 
of the module, or an expert on your system's mmap() implementation.

> I see the author of IPC::MM has an e-toys address - was this something 
> you used at e-toys?

It was used at one point, although not in the version of the system that 
I wrote about.  He originally wrote it as a wrapper around the mm 
library, and I asked if he could put in a shared hash just for fun.  It 
turned out be very fast, largely because the sharing and the hash (or 
btree) is implemented in C.  The Perl part is just an interface to it.

> I know very little about shared memory segments, 
> but is MM used to share small data objects, rather than to keep large 
> caches in shared memory?

It's a shared hash.  You can put whatever you want into it.  Apache uses 
mm to share data between processes.

> Ralph Engelschall writes in the MM documentation :
> "The maximum size of a continuous shared memory segment one can allocate 
> depends on the underlaying platform. This cannot be changed, of course. 
> But currently the high-level malloc(3)-style API just uses a single 
> shared memory segment as the underlaying data structure for an MM object 
> which means that the maximum amount of memory an MM object represents 
> also depends on the platform."
> 
> What implications does this have on the size of the cache that can be 
> created with IPC::MM

It varies by platform, but I believe that on Linux it means each 
individual hash is limited to 64MB.  So maybe I spoke too soon about 
having unlimited storage, but you should be able to have as many hashes 
as you want.

If you're seriously concerned about storage limits like these, you could 
use one of the other options like MLDBM::Sync or BerkeleyDB which use 
disk-storage.

- Perrin


Re: Optimising cache performance

Posted by Clinton Gormley <cl...@drtech.co.uk>.
Thanks for your feedback - a couple more questions


> First, I'm assuming this is for a distributed system running on multiple 
> servers.  If not, you should just download one of the cache modules from 
> CPAN.  They're good.
> 

For now it's not a distributed system, and I have been using
Cache::FileCache.  But that still means freezing and thawing objects -
which I'm trying to minimise.

> I suggest you use either Cache::Mmap or IPC::MM for your local cache. 
> They are both very fast and will save you memory.  Also, Cache::Mmap is 
> only limited by the size of your disk, so you don't have to do any purging.
> 

When you say that Cache::Mmap is only limited by the size of your disk,
is that because the file in memory gets written to disk as part of VM? (
I don't see any other mention of files in the docs.) Which presumably
means resizing your VM to make space for the cache?

> You seem to be taking a lot of care to ensure that everything always has 
> the latest version of the data.  If you can handle slightly out-of-date 

Call me anal ;) Most of the time it wouldn't really matter, but
sometimes it could be extremely off-putting

> If everything really does have to be 100% up-to-date, then what you're 
> doing is reasonable.  It would be nice to not do the step that checks 
> for outdated objects before processing the request, but instead do it in 
> a cleanup handler, although that could lead to stale data being used now 
> and then.

Yes - had considered that.

> If you were using a shared cache like Cache::Mmap, you could have a cron 
> job or a separate Perl daemon that simply purges outdated objects every 
> minute or so, and leave that out of your mod_perl code completely.
> 

I see the author of IPC::MM has an e-toys address - was this something
you used at e-toys?  I know very little about shared memory segments,
but is MM used to share small data objects, rather than to keep large
caches in shared memory?

Ralph Engelschall writes in the MM documentation : 
"The maximum size of a continuous shared memory segment one can allocate
depends on the underlaying platform. This cannot be changed, of course.
But currently the high-level malloc(3)-style API just uses a single
shared memory segment as the underlaying data structure for an MM object
which means that the maximum amount of memory an MM object represents
also depends on the platform."

What implications does this have on the size of the cache that can be
created with IPC::MM


thanks

Clinton Gormley

Re: Optimising cache performance

Posted by Cory 'G' Watson <gp...@cafes.net>.
On Friday, March 7, 2003, at 02:20  PM, Perrin Harkins wrote:

> Cory 'G' Watson wrote:
>> I'm not sure if my way would fit in with your objects Clinton, but I 
>> have some code in the commit() method of all my objects which, when 
>> it is called, removes any cached copies of the object.  That's how I 
>> stay up to date.
>
> Why wouldn't it simply update the version in the cache when you 
> commit?  Also, do you have a way of synchronizing changes across 
> multiple machines?

I suppose it could, but I use it as a poor man's cache cleaning.  I 
suppose it would boost performance to do what you suggest.  I'll just 
implement a cache cleaner elsewhere.

I only run on one machine, so I don't do any synchronization.  I hope 
to have that problem some day ;)

Cory 'G' Watson
http://gcdb.spleck.net


Re: Optimising cache performance

Posted by Perrin Harkins <pe...@elem.com>.
Cory 'G' Watson wrote:
> I'm not sure if my way would fit in with your objects Clinton, but I 
> have some code in the commit() method of all my objects which, when it 
> is called, removes any cached copies of the object.  That's how I stay 
> up to date.

Why wouldn't it simply update the version in the cache when you commit? 
  Also, do you have a way of synchronizing changes across multiple machines?

- Perrin


Re: Optimising cache performance

Posted by Cory 'G' Watson <gp...@cafes.net>.
On Friday, March 7, 2003, at 12:45  PM, Perrin Harkins wrote:
> You seem to be taking a lot of care to ensure that everything always 
> has the latest version of the data.  If you can handle slightly 
> out-of-date data, I would suggest that you simply keep objects in the 
> local cache with a time-to-live (which Cache::Mmap or Cache::FileCache 
> can do for you) and just look at the local version until it expires.  
> You would end up building the objects once per server, but that isn't 
> so bad.

I'm not sure if my way would fit in with your objects Clinton, but I 
have some code in the commit() method of all my objects which, when it 
is called, removes any cached copies of the object.  That's how I stay 
up to date.

Cory 'G' Watson
http://gcdb.spleck.net


Re: Optimising cache performance

Posted by Perrin Harkins <pe...@elem.com>.
Clinton Gormley wrote:
> I'd appreciate some feedback on my logic to optimise my cache (under 
> mod_perl 1)

First, I'm assuming this is for a distributed system running on multiple 
servers.  If not, you should just download one of the cache modules from 
CPAN.  They're good.

> I'm planning a two level cache :
>     1) Live objects in each mod_perl process
>     2) Serialised objects in a database

I suggest you use either Cache::Mmap or IPC::MM for your local cache. 
They are both very fast and will save you memory.  Also, Cache::Mmap is 
only limited by the size of your disk, so you don't have to do any purging.

You seem to be taking a lot of care to ensure that everything always has 
the latest version of the data.  If you can handle slightly out-of-date 
data, I would suggest that you simply keep objects in the local cache 
with a time-to-live (which Cache::Mmap or Cache::FileCache can do for 
you) and just look at the local version until it expires.  You would end 
up building the objects once per server, but that isn't so bad.

If everything really does have to be 100% up-to-date, then what you're 
doing is reasonable.  It would be nice to not do the step that checks 
for outdated objects before processing the request, but instead do it in 
a cleanup handler, although that could lead to stale data being used now 
and then.

If you were using a shared cache like Cache::Mmap, you could have a cron 
job or a separate Perl daemon that simply purges outdated objects every 
minute or so, and leave that out of your mod_perl code completely.

Yet another way to handle a distributed cache is to have each write to 
the cache send updates to the other caches using something like 
Spread::Queue.  This is a bit more complex, but it means you don't need 
a second-tier in your cache to share updates.

- Perrin