You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Rob Nagler <na...@bivio.net> on 2001/10/30 07:17:58 UTC

Re: Neo-Classical Transaction Processing

Perrin Harkins writes:
> The trouble here should be obvious: sooner or later it becomes hard to scale
> the database. You can cache the read-only data, but the read/write data
> isn't so simple.

Good point.  Fortunately, the problem isn't new.  

> Theoretically, the big players like Oracle and DB2 offer clustering
> solutions to deal with this, but they don't seem to get used very
> often.

Oracle was built on an SMP assumption.  They added clustering later.
It doesn't scale well, which is probably why you haven't heard of
people using their parallel server solutions.  I don't know much about
DB2, but I'm pretty sure it assumes shared memory.  Tandem's Non-Stop
SQL is a shared nothing architecture.  It scales well, but isn't cheap
to walk in the door.

> Other sites find ways to divide their traffic up (users 1 - n go to
> this database, n - m go to that one, etc.)

Partitioning is a great way to get scalability, if you can do it.

> However, you can usually scale up enough just by getting a bigger
> box to run your database on until you reach the reach the realm of
> Yahoo and Amazon, so this doesn't become an issue for most sites.

I agree.  This is why I think Apache/mod_perl is a great solution for
the majority of web apps.  The scaling issues supposedly being solved
by J2EE don't exist.

On another note, one of the ways to make sure your database scales
better is to keep the database as simple as possible.  I've seen a lot
of solutions which rely on stored procedures to "get performance".
All this does is make the database slower and more of a bottleneck.

> But how can you actually make a shared nothing system for a commerce web
> site?  They may not be sharing local memory, but you'll need read/write
> access to the same data, which means shared locking and waiting somewhere
> along the line.

I meant "shared nothing" in the sense of multiprocessor architectures.
SMP (symmetric multiprocessing) relies on shared memory.  This is the
J2EE/E10K model.  "shared nothing" is the Neo Classical model.  Really
these are NUMAs (non-uniform memory architecture), because most
servers are SMPs.  Here's a classic from Stonebraker on the subject:

http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf

DeWitt has a lot of papers on parallelism and distributed db design:
http://www.cs.wisc.edu/~dewitt/

Cheers,
Rob