You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modperl@perl.apache.org by "Differentiated Software Solutions Pvt. Ltd" <di...@vsnl.com> on 2000/11/08 12:39:34 UTC

Re: Fast DB access

Hi,

We are returning after extensive tests of various options suggested.

First, we are not entering into the debate about well designed DBs and
database can handle lots of queries and all that. Assume that we have an
app.(an adserver) which dbs don't support well.. i.e., fairly complex
queries to be services quickly.

Some of the things we've found are ....
1. DBD::RAM is quite slow !! We presume this is because the SQL's have to be
parsed everytime we make requests
2. Building the entire DB into a hash variable inside the mod_perl program
is the fastest.... we found it to be 25 times faster than querying a
postgres database !!
3. We have a problem rebuilding this database in the ram.... even say every
1000 requests. We tried using dbm and found it a good compromise solution.
We found that it is about 8 times faster than postgres querying.
4. Another surprising finding was.... we built a denormalised db on the
Linux file system itself, by using the directory and file name as the key on
which we wanted to search. We found that dbm was faster than this.

We're carrying out more tests to see how scaleable is dbm. Hope these
findings are useful to others.

Thanks for all the help.

Murali
Differentiated Software Solutions Pvt. Ltd.
176, Ground Floor, 6th Main,
2nd Block, RT Nagar
Bangalore - 560032
Phone : 91 80 3431470
www.diffs-india.com
----- Original Message -----
From: Francesc Guasch <fr...@etsetb.upc.es>
To: Differentiated Software Solutions Pvt. Ltd <di...@vsnl.com>
Cc: <mo...@apache.org>
Sent: Wednesday, October 11, 2000 1:56 PM
Subject: Re: Fast DB access

> > "Differentiated Software Solutions Pvt. Ltd" wrote:
> >
> > Hi,
> >
> > We have an application where we will have to service as high as 50
> > queries a second.
> > We've discovered that most database just cannot keep pace.
> >
> > The only option we know is to service queries out of flat files.
>
> There is a DBD module : DBD::Ram. If you got enough memory
> or there is not many data it could be what you need.
>
> I also have seen recently a post about a new DBD module for
> CSV files, in addition of DBD::CSV, try
>
> http://search.cpan.org
>
> --
>  - frankie -

Re: Fast DB access

Posted by "Differentiated Software Solutions Pvt. Ltd" <di...@vsnl.com>.

Dear Tim,

As you had rightly pointed out we have data which is not volatile. This data
gets updated once an hour by another process (cron job). Concurrency is not
really an issue, because we are not updating the data.

We're now continuing our benchmark on some scaling issues.... basically when
does dbm degenerate. We are increasing number of entries in the dbm file to
see when it will break.

Murali
Differentiated Software Solutions Pvt. Ltd.
176, Ground Floor, 6th Main,
2nd Block, RT Nagar
Bangalore - 560032
Phone : 91 80 3431470
www.diffs-india.com
----- Original Message -----
From: Tim Sweetman <ti...@aldigital.co.uk>
To: Differentiated Software Solutions Pvt. Ltd <di...@vsnl.com>
Cc: <mo...@apache.org>
Sent: Thursday, November 09, 2000 8:59 PM
Subject: Re: Fast DB access


> Hi,
>
> Firstly, thanks for bringing these results back to the mailing list...
> having seen this sort of problem previously, but without (IIRC) having
> done side-by-side comparisons between these various techniques, I'm keen
> to see what you find.
>
> "Differentiated Software Solutions Pvt. Ltd" wrote:
> > 2. Building the entire DB into a hash variable inside the mod_perl
program
> > is the fastest.... we found it to be 25 times faster than querying a
> > postgres database !!
> > 3. We have a problem rebuilding this database in the ram.... even say
every
> > 1000 requests. We tried using dbm and found it a good compromise
solution.
> > We found that it is about 8 times faster than postgres querying.
>
> I assume from this that your data changes, but slowly, and you're
> getting better performance by accepting that your data be slightly out
> of date.
>
> > 4. Another surprising finding was.... we built a denormalised db on the
> > Linux file system itself, by using the directory and file name as the
key on
> > which we wanted to search. We found that dbm was faster than this.
>
> I'm curious about how you're dealing with the concurrency aspect with
> solutions 2-3. My guess is that, for 2, you're simply storing a hash in
> the memory, which means that each Apache child has its own copy. There
> will, every 1000 requests in that child, be the overhead of querying the
> DB & rebuilding the hash.
>
> 3 presumably means having only _one_ DBMfile. Do the CGI/mod-Perl
> processes rebuild this periodically, or is this done offline by another
> process? Do the CGI/mod-Perl processes have to wait while writes are
> going on?
>
> Cheers
>
> --
> Tim Sweetman
> A L Digital
> ---- moving sideways --->

Re: Fast DB access

Posted by Tim Sweetman <ti...@aldigital.co.uk>.

Hi,

Firstly, thanks for bringing these results back to the mailing list...
having seen this sort of problem previously, but without (IIRC) having
done side-by-side comparisons between these various techniques, I'm keen
to see what you find.

"Differentiated Software Solutions Pvt. Ltd" wrote:
> 2. Building the entire DB into a hash variable inside the mod_perl program
> is the fastest.... we found it to be 25 times faster than querying a
> postgres database !!
> 3. We have a problem rebuilding this database in the ram.... even say every
> 1000 requests. We tried using dbm and found it a good compromise solution.
> We found that it is about 8 times faster than postgres querying.

I assume from this that your data changes, but slowly, and you're
getting better performance by accepting that your data be slightly out
of date.

> 4. Another surprising finding was.... we built a denormalised db on the
> Linux file system itself, by using the directory and file name as the key on
> which we wanted to search. We found that dbm was faster than this.

I'm curious about how you're dealing with the concurrency aspect with
solutions 2-3. My guess is that, for 2, you're simply storing a hash in
the memory, which means that each Apache child has its own copy. There
will, every 1000 requests in that child, be the overhead of querying the
DB & rebuilding the hash.

3 presumably means having only _one_ DBMfile. Do the CGI/mod-Perl
processes rebuild this periodically, or is this done offline by another
process? Do the CGI/mod-Perl processes have to wait while writes are
going on?

Cheers

--
Tim Sweetman
A L Digital
---- moving sideways --->

Re: Fast DB access

Posted by Perrin Harkins <pe...@primenet.com>.

On Thu, 9 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote:
> When we rebuild the hash in the RAM it takes too much time.

Did you try using Storable as the data format?  It has a function to load
from files which is very fast.

- Perrin

Re: Fast DB access

Posted by "Differentiated Software Solutions Pvt. Ltd" <di...@vsnl.com>.

Hi,

When we rebuild the hash in the RAM it takes too much time.
Other questions, my collegues will answer.

Murali
Differentiated Software Solutions Pvt. Ltd.
176, Ground Floor, 6th Main,
2nd Block, RT Nagar
Bangalore - 560032
Phone : 91 80 3431470
www.diffs-india.com

----- Original Message -----
From: Perrin Harkins <pe...@primenet.com>
To: Differentiated Software Solutions Pvt. Ltd <di...@vsnl.com>
Cc: <mo...@apache.org>
Sent: Thursday, November 09, 2000 12:19 AM
Subject: Re: Fast DB access


> "Differentiated Software Solutions Pvt. Ltd" wrote:
> > 3. We have a problem rebuilding this database in the ram.... even say
every
> > 1000 requests.
>
> What problem are you having with it?
>
> > We tried using dbm and found it a good compromise solution.
> > We found that it is about 8 times faster than postgres querying.
>
> Some dbm implementations are faster than others.  Depending on your data
> size, you may want to try a couple of them.
>
> > 4. Another surprising finding was.... we built a denormalised db on the
> > Linux file system itself, by using the directory and file name as the
key on
> > which we wanted to search. We found that dbm was faster than this.
>
> Did you end up with a large number of files in one directory?  When
> using the file system in this way, it's a common practice to hash the
> key you're using and then split that across multiple directories to
> prevent too many files from building up in one and slowing things down.
>
> For example:
>
> "my_key" --> "dHodeifehH" --> /usr/local/data/dH/odeifehH
>
> Also, you could try using mmap for reading the files, or possibly the
> Cache::Mmap module.
>
> > We're carrying out more tests to see how scaleable is dbm.
>
> If you're using read-only data, you can leave the dbm handles persistent
> between connections.  That will speed things up.
>
> You could look at BerkeleyDB, which has a built-in shared memory buffer
> and page-level locking.
>
> You could also try IPC::MM, which offers a shared memory hash written in
> C with a perl interface.
>
> > Hope these findings are useful to others.
>
> They are.  Keep 'em coming.
>
> - Perrin

Re: Fast DB access

Posted by barries <ba...@slaysys.com>.

On Wed, Nov 08, 2000 at 10:49:00AM -0800, Perrin Harkins wrote:
> 
> Also, you could try using mmap for reading the files, or possibly the
> Cache::Mmap module.

If you do play with mmap, note that it can lose some or all of it's
effeciency in SMP environments, or so I've read.

- Barrie

Re: Fast DB access

Posted by Perrin Harkins <pe...@primenet.com>.

"Differentiated Software Solutions Pvt. Ltd" wrote:
> 3. We have a problem rebuilding this database in the ram.... even say every
> 1000 requests.

What problem are you having with it?

> We tried using dbm and found it a good compromise solution.
> We found that it is about 8 times faster than postgres querying.

Some dbm implementations are faster than others.  Depending on your data
size, you may want to try a couple of them.

> 4. Another surprising finding was.... we built a denormalised db on the
> Linux file system itself, by using the directory and file name as the key on
> which we wanted to search. We found that dbm was faster than this.

Did you end up with a large number of files in one directory?  When
using the file system in this way, it's a common practice to hash the
key you're using and then split that across multiple directories to
prevent too many files from building up in one and slowing things down.

For example:

"my_key" --> "dHodeifehH" --> /usr/local/data/dH/odeifehH

Also, you could try using mmap for reading the files, or possibly the
Cache::Mmap module.

> We're carrying out more tests to see how scaleable is dbm.

If you're using read-only data, you can leave the dbm handles persistent
between connections.  That will speed things up.

You could look at BerkeleyDB, which has a built-in shared memory buffer
and page-level locking.

You could also try IPC::MM, which offers a shared memory hash written in
C with a perl interface.

> Hope these findings are useful to others.

They are.  Keep 'em coming.

- Perrin

Re: Fast DB access

Posted by "Differentiated Software Solutions Pvt. Ltd" <di...@vsnl.com>.

Yes. The tables were indexed.
Otherwise we might have seen even more spectacular results !!!!

Murali
----- Original Message ----- 
From: G.W. Haywood <ge...@www.jubileegroup.co.uk>
To: Differentiated Software Solutions Pvt. Ltd <di...@vsnl.com>
Cc: <mo...@apache.org>
Sent: Wednesday, November 08, 2000 5:44 PM
Subject: Re: Fast DB access


> Hi there,
> 
> On Wed, 8 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote:
> 
> > We are returning after extensive tests of various options suggested.
> 
> Did you try different indexing mechanisms in your tests?
> 
> 73,
> Ged.
>

Re: Fast DB access

Posted by "G.W. Haywood" <ge...@www.jubileegroup.co.uk>.

Hi there,

On Wed, 8 Nov 2000, Differentiated Software Solutions Pvt. Ltd wrote:

> We are returning after extensive tests of various options suggested.

Did you try different indexing mechanisms in your tests?

73,
Ged.