You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Clinton Gormley <cl...@drtech.co.uk> on 2003/03/07 18:11:23 UTC
Optimising cache performance
I'd appreciate some feedback on my logic to optimise my cache (under
mod_perl 1)
I'm building a site which will have a large number of fairly complicated
objects (each of which would require 5-20 queries to build from scratch)
which are read frequently and updated relatively seldom.
I'm planning a two level cache :
1) Live objects in each mod_perl process
2) Serialised objects in a database
The logic goes as follows :
NORMAL READ-ONLY REQUEST
1) REQUEST FROM BROWSER
* Request comes from browser to view for object 12345
(responding to this request may involve accessing 10 other
objects)
2) PURGE OUTDATED LIVE OBJECTS
* mod_perl process runs a query to look for the ID's of any objects
that have been updated since the last time this
query was run (last_modified_time).
* Any object IDs returned by this request have their objects removed
from the in-memory mod_perl process specific cache
3) REQUEST IS PROCESSED
* Any objects required by this request are retrieved first from
the in-memory cache.
* If they are not present,
* the process looks in the serialised object cache in the
database.
* If not present there either,
* the object is constructed from scratch the relational
DB.
and stored in the serialised object cache
* retrieved object is store in the in-memory live object
cache
4) TRIM LIVE OBJECT CACHE
* Any live objects that are not in the 1000 most recently accessed
objects are deleted from the in-memory cache
UPDATE REQUEST
Steps as above except :
3a) UPDATING OBJECT
* Any objects that are modified
* are deleted from the serialised object cache in the DB
* and are deleted from the in-memory cache for this mod_perl
process only
This means that at the start of every request, each process has access
to the most up to date versions of each object with a small (hopefully)
penalty to pay in the form of the query checking for last_modified_time.
Does this sound reasonable or is overkill
many thanks
Clinton Gormley
Re: Optimising cache performance
Posted by Perrin Harkins <pe...@elem.com>.
Clinton Gormley wrote:
> For now it's not a distributed system, and I have been using
> Cache::FileCache. But that still means freezing and thawing objects -
> which I'm trying to minimise.
Other things (IPC::MM, MLDBM::Sync, Cache::Mmap, BerkeleyDB) are
significantly faster than Cache::FileCache. If you have tons of free
memory, then go ahead and cache things in memory. My feeling is that
the very small amount of time that the fastest of these systems use to
freeze and thaw is totally made up for in the huge memory savings which
allows you to run more server processes.
> When you say that Cache::Mmap is only limited by the size of your disk,
> is that because the file in memory gets written to disk as part of VM? (
> I don't see any other mention of files in the docs.) Which presumably
> means resizing your VM to make space for the cache?
That's right, it uses your system's mmap() call. I've never needed to
adjust the amount of VM I have because of memory-mapping a file, but I
suppose it could happen. This would be a good question for the author
of the module, or an expert on your system's mmap() implementation.
> I see the author of IPC::MM has an e-toys address - was this something
> you used at e-toys?
It was used at one point, although not in the version of the system that
I wrote about. He originally wrote it as a wrapper around the mm
library, and I asked if he could put in a shared hash just for fun. It
turned out be very fast, largely because the sharing and the hash (or
btree) is implemented in C. The Perl part is just an interface to it.
> I know very little about shared memory segments,
> but is MM used to share small data objects, rather than to keep large
> caches in shared memory?
It's a shared hash. You can put whatever you want into it. Apache uses
mm to share data between processes.
> Ralph Engelschall writes in the MM documentation :
> "The maximum size of a continuous shared memory segment one can allocate
> depends on the underlaying platform. This cannot be changed, of course.
> But currently the high-level malloc(3)-style API just uses a single
> shared memory segment as the underlaying data structure for an MM object
> which means that the maximum amount of memory an MM object represents
> also depends on the platform."
>
> What implications does this have on the size of the cache that can be
> created with IPC::MM
It varies by platform, but I believe that on Linux it means each
individual hash is limited to 64MB. So maybe I spoke too soon about
having unlimited storage, but you should be able to have as many hashes
as you want.
If you're seriously concerned about storage limits like these, you could
use one of the other options like MLDBM::Sync or BerkeleyDB which use
disk-storage.
- Perrin
Re: Optimising cache performance
Posted by Clinton Gormley <cl...@drtech.co.uk>.
Thanks for your feedback - a couple more questions
> First, I'm assuming this is for a distributed system running on multiple
> servers. If not, you should just download one of the cache modules from
> CPAN. They're good.
>
For now it's not a distributed system, and I have been using
Cache::FileCache. But that still means freezing and thawing objects -
which I'm trying to minimise.
> I suggest you use either Cache::Mmap or IPC::MM for your local cache.
> They are both very fast and will save you memory. Also, Cache::Mmap is
> only limited by the size of your disk, so you don't have to do any purging.
>
When you say that Cache::Mmap is only limited by the size of your disk,
is that because the file in memory gets written to disk as part of VM? (
I don't see any other mention of files in the docs.) Which presumably
means resizing your VM to make space for the cache?
> You seem to be taking a lot of care to ensure that everything always has
> the latest version of the data. If you can handle slightly out-of-date
Call me anal ;) Most of the time it wouldn't really matter, but
sometimes it could be extremely off-putting
> If everything really does have to be 100% up-to-date, then what you're
> doing is reasonable. It would be nice to not do the step that checks
> for outdated objects before processing the request, but instead do it in
> a cleanup handler, although that could lead to stale data being used now
> and then.
Yes - had considered that.
> If you were using a shared cache like Cache::Mmap, you could have a cron
> job or a separate Perl daemon that simply purges outdated objects every
> minute or so, and leave that out of your mod_perl code completely.
>
I see the author of IPC::MM has an e-toys address - was this something
you used at e-toys? I know very little about shared memory segments,
but is MM used to share small data objects, rather than to keep large
caches in shared memory?
Ralph Engelschall writes in the MM documentation :
"The maximum size of a continuous shared memory segment one can allocate
depends on the underlaying platform. This cannot be changed, of course.
But currently the high-level malloc(3)-style API just uses a single
shared memory segment as the underlaying data structure for an MM object
which means that the maximum amount of memory an MM object represents
also depends on the platform."
What implications does this have on the size of the cache that can be
created with IPC::MM
thanks
Clinton Gormley
Re: Optimising cache performance
Posted by Cory 'G' Watson <gp...@cafes.net>.
On Friday, March 7, 2003, at 02:20 PM, Perrin Harkins wrote:
> Cory 'G' Watson wrote:
>> I'm not sure if my way would fit in with your objects Clinton, but I
>> have some code in the commit() method of all my objects which, when
>> it is called, removes any cached copies of the object. That's how I
>> stay up to date.
>
> Why wouldn't it simply update the version in the cache when you
> commit? Also, do you have a way of synchronizing changes across
> multiple machines?
I suppose it could, but I use it as a poor man's cache cleaning. I
suppose it would boost performance to do what you suggest. I'll just
implement a cache cleaner elsewhere.
I only run on one machine, so I don't do any synchronization. I hope
to have that problem some day ;)
Cory 'G' Watson
http://gcdb.spleck.net
Re: Optimising cache performance
Posted by Perrin Harkins <pe...@elem.com>.
Cory 'G' Watson wrote:
> I'm not sure if my way would fit in with your objects Clinton, but I
> have some code in the commit() method of all my objects which, when it
> is called, removes any cached copies of the object. That's how I stay
> up to date.
Why wouldn't it simply update the version in the cache when you commit?
Also, do you have a way of synchronizing changes across multiple machines?
- Perrin
Re: Optimising cache performance
Posted by Cory 'G' Watson <gp...@cafes.net>.
On Friday, March 7, 2003, at 12:45 PM, Perrin Harkins wrote:
> You seem to be taking a lot of care to ensure that everything always
> has the latest version of the data. If you can handle slightly
> out-of-date data, I would suggest that you simply keep objects in the
> local cache with a time-to-live (which Cache::Mmap or Cache::FileCache
> can do for you) and just look at the local version until it expires.
> You would end up building the objects once per server, but that isn't
> so bad.
I'm not sure if my way would fit in with your objects Clinton, but I
have some code in the commit() method of all my objects which, when it
is called, removes any cached copies of the object. That's how I stay
up to date.
Cory 'G' Watson
http://gcdb.spleck.net
Re: Optimising cache performance
Posted by Perrin Harkins <pe...@elem.com>.
Clinton Gormley wrote:
> I'd appreciate some feedback on my logic to optimise my cache (under
> mod_perl 1)
First, I'm assuming this is for a distributed system running on multiple
servers. If not, you should just download one of the cache modules from
CPAN. They're good.
> I'm planning a two level cache :
> 1) Live objects in each mod_perl process
> 2) Serialised objects in a database
I suggest you use either Cache::Mmap or IPC::MM for your local cache.
They are both very fast and will save you memory. Also, Cache::Mmap is
only limited by the size of your disk, so you don't have to do any purging.
You seem to be taking a lot of care to ensure that everything always has
the latest version of the data. If you can handle slightly out-of-date
data, I would suggest that you simply keep objects in the local cache
with a time-to-live (which Cache::Mmap or Cache::FileCache can do for
you) and just look at the local version until it expires. You would end
up building the objects once per server, but that isn't so bad.
If everything really does have to be 100% up-to-date, then what you're
doing is reasonable. It would be nice to not do the step that checks
for outdated objects before processing the request, but instead do it in
a cleanup handler, although that could lead to stale data being used now
and then.
If you were using a shared cache like Cache::Mmap, you could have a cron
job or a separate Perl daemon that simply purges outdated objects every
minute or so, and leave that out of your mod_perl code completely.
Yet another way to handle a distributed cache is to have each write to
the cache send updates to the other caches using something like
Spread::Queue. This is a bit more complex, but it means you don't need
a second-tier in your cache to share updates.
- Perrin