You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Bill Whillers <mo...@whohasit.com> on 2005/07/06 23:01:24 UTC

MLDBM::Sync / BerkeleyDB

Instead of using Apache::Reload to monitor periodically re-cached flat files 
of session configuration data, I'm considering implementing MLDBM::Sync or 
BerkeleyDB.

The amount of data needing to be loaded during each session is relatively 
small and variable (10-25k) but typically, these configurations are used to 
load hashes, arrays and other types. --- (surprise, right?)

On high to very high volume sites, can someone suggest a best approach, 
feedback or case study results about either (or other) approaches?


Thanks in advance!

Bill

Re: MLDBM::Sync / BerkeleyDB

Posted by Perrin Harkins <pe...@elem.com>.
On Fri, 2005-07-08 at 12:35 -0700, Bill Whillers wrote:
>  1. What do you consider "simple hash-like operations"?  - The target 
> application currently stores single hashes, lists, etc. as 1 per column with 
> 1 row per user (i.e. "select * from tbl where u=user") - RO's sit within a 
> single column.

I tested it using SQL like "SELECT value FROM table WHERE key = ?",
using DBI best practices like prepare_cached() and bind_cols().  The
contents of the value column were a hash serialized with Storable.

>  2. With combined application/web servers balanced and moving the db to a 
> locally networked location is this still at least roughly the case?

Yes, still pretty fast and faster than every CPAN cache/IPC module
except Cache::FastMmap, BerkeleyDB, and IPC::MM.

- Perrin


Re: MLDBM::Sync / BerkeleyDB

Posted by Bill Whillers <mo...@whohasit.com>.
This is enormously helpful, especially after a few days of reading often 
conflicting information.

> Incidentally, a local MySQL server doing simple hash-like operations

I remember optimistically reading this statement you made elsewhere:

 1. What do you consider "simple hash-like operations"?  - The target 
application currently stores single hashes, lists, etc. as 1 per column with 
1 row per user (i.e. "select * from tbl where u=user") - RO's sit within a 
single column.
 2. With combined application/web servers balanced and moving the db to a 
locally networked location is this still at least roughly the case? 


Thank you !


On Friday 08 July 2005 09:07, Perrin Harkins wrote:
> I meant to respond to this one earlier, and your Storable question
> reminded me.
>
> On Wed, 2005-07-06 at 14:01 -0700, Bill Whillers wrote:
> > Instead of using Apache::Reload to monitor periodically re-cached flat
> > files of session configuration data, I'm considering implementing
> > MLDBM::Sync or BerkeleyDB.
> >
> > The amount of data needing to be loaded during each session is relatively
> > small and variable (10-25k) but typically, these configurations are used
> > to load hashes, arrays and other types. --- (surprise, right?)
>
> You should probably take a look at this:
> http://cpan.robm.fastmail.fm/cache_perf.html
>
> MLDBM::Sync is pretty good, and solid.  BerkeleyDB is very fast, and
> I've had good luck with it, but some people have data corruption issues
> with it, so consider how valuable this data is and test carefully.
> Cache::FastMmap is very fast, but is a lossy cache -- it will drop
> things if the cache gets too big.  That's probably not appropriate for
> session data unless you know for sure how big your data will be.
>
> Incidentally, a local MySQL server doing simple hash-like operations is
> faster than most of the cache modules on CPAN, including MLDBM::Sync,
> but not BerkeleyDB or Cache::FastMmap.  You should consider that before
> going to a lot of trouble with one of these other modules.
>
> - Perrin

Re: MLDBM::Sync / BerkeleyDB

Posted by Perrin Harkins <pe...@elem.com>.
I meant to respond to this one earlier, and your Storable question
reminded me.

On Wed, 2005-07-06 at 14:01 -0700, Bill Whillers wrote:
> Instead of using Apache::Reload to monitor periodically re-cached flat files 
> of session configuration data, I'm considering implementing MLDBM::Sync or 
> BerkeleyDB.
> 
> The amount of data needing to be loaded during each session is relatively 
> small and variable (10-25k) but typically, these configurations are used to 
> load hashes, arrays and other types. --- (surprise, right?)

You should probably take a look at this:
http://cpan.robm.fastmail.fm/cache_perf.html

MLDBM::Sync is pretty good, and solid.  BerkeleyDB is very fast, and
I've had good luck with it, but some people have data corruption issues
with it, so consider how valuable this data is and test carefully.
Cache::FastMmap is very fast, but is a lossy cache -- it will drop
things if the cache gets too big.  That's probably not appropriate for
session data unless you know for sure how big your data will be.

Incidentally, a local MySQL server doing simple hash-like operations is
faster than most of the cache modules on CPAN, including MLDBM::Sync,
but not BerkeleyDB or Cache::FastMmap.  You should consider that before
going to a lot of trouble with one of these other modules.

- Perrin