You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Tim Sweetman <ti...@aldigital.co.uk> on 2000/12/06 12:36:33 UTC
J2EE and distributed commits (was Re: mod_perl advocacy project resurrection)

I'll bite, 'cuz I think I've run several times recently into this sort
of issue. I've not done anything with J2EE, so there's a risk that I've
misunderstood _that_. Now, from the top:

> On Tue, 5 Dec 2000, Jeffrey W. Baker wrote:
> > With J2EE you get the complete illusion that you are doing txns across as
> > many data sources on as many systems and vendors as you want, but behind
> > the illusion there is the nonzero risk that the data is inconsistent.  In
> > a real transactional system, you can never have inconsistent data.

Logged relational databases have evolved to ensure "atomic"[1]
transactions. This is to avoid embarrassing incidents like me giving you
a dollar[2], but a system crash midway means that we both end up with
the dollar. Or neither of us have it.

If the application catches a SIGKILL (say), the RDBMS "rolls back" to
the beginning of the transaction (I've still got my dollar).

If the RDBMS catches a SIGKILL, or a digger goes through the power
cable, the system is built so that it can restore itself to its previous
state.

A variety of safeguards are used so that transactions are very hard to
lose. Often, a TX is not regarded as committed until it's been written
to disk. Logs are periodically written to tape, in case the disk gets
scrunched (or the RDBMS software blows up).

In short:
The losing of transactions is Greatly Discouraged, but can happen.
_Inconsistant_ processing of transactions should be *impossible*.

Distributed transactions, where you have two systems (say my bank and
yours are on separate machines), simply cannot be made as reliable.
There is *always* the problem that, in a single system, there is _A_ log
that records whether the transaction has happened. With two systems, the
record must be made in two places, and there is always an instant in
time - however small - when one system can crash leaving the other
believing that the transaction has been successful. (IIRC, textbooks
sometimes call this the "red & white army problem").

This sort of problem is *not* confined to finance. You may, for
instance, be maintaining lists of usernames & passwords in multiple
locations.

There are a variety of approaches to this sort of problem:
1. Use a single database.

2. Make one database the "slave" of the other.
   If the DB is too big to copy, you can use a "hybrid" approach where
changes are propagated, but
   the DBs are periodically sync'd. Or use something like rsync, which
makes two database/file trees
   /whatever identical without brute-force copying the whole thing.

3. Be careful with foreign keys on other people's servers
   ... since there may be no way to find out when those become invalid,
or point to something else.
   (Hence, the dreaded "link rot" suffered by search engines).

As evidenced by the WWW's popularity explosion, loosely coupled
distributed systems are in many ways very powerful. Trying to force
"everything" via a single system, whilst tackling problems of
consistency & transaction processing, can be crippling. Different
approaches suit different apps, but pretending the problem isn't there
isn't generally a good approach.

Hope that helps.

--
Tim Sweetman
A L Digital
'Now you see this one-eyed midget shouting the word "now"...'
 --- Bob Dylan

[1] That's unsplittable, like atoms are, except the radioactive ones, or
if you're cheating and
    have a particle accelerator.

[2] I'm guessing there are lots of Americans here, and going with the
majority. When Euros are
    in wider use, I'll use that as my metasyntactic currency unit.