You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2007/10/09 01:02:04 UTC

Clustering and consistency

Hi,

The recent thread about William Louth's JXInsight findings got me
thinking about how the current clustering feature achieves consistency
across cluster nodes.

The current clustering solution uses a journal where it records all
the actions taken by the cluster nodes. However, this journal is only
polled periodically by the nodes to sync up with the rest of the
cluster. Meanwhile the actual workspace persistence stores might be
out of sync from what Jackrabbit expects. This could be a problem
since Jackrabbit core makes a rather fundamental assumption about
always being in control of the persistence store.

The issue shouldn't be that critical for deployments with mostly read
access, or write access happening only on a single cluster node, or
even multiple nodes writing to separate branches of the content tree.
But do we have a good picture of what will happen if multiple cluster
nodes concurrently update the same content nodes?

BR,

Jukka Zitting

Re: Clustering and consistency

Posted by Dominique Pfister <do...@day.com>.
Hi Jukka

On 09/10/2007, Jukka Zitting <ju...@gmail.com> wrote:
> Looking at the code I never realized this was happening, but now that
> you mentioned it I was able to find the relevant calls in
> DefaultRecordProducer and AppendRecord. Is there a rationale for
> having the calls in these locations, or should we consider refactoring
> the code somehow? I'd find something like journal.append(record)
> followed by journal.unlock() more intuitive than record.update(), but
> I guess that would spread both the journal reference and the locking
> functionality to a number of places around Jackrabbit.

I agree that the "lock and sync" functionality is located in a place
where it is hard to find. The actual design was influenced by the need
for an abstraction of the journal and by some bugs (notably
https://issues.apache.org/jira/browse/JCR-884), which made it
necessary to move more information (and responsibility) to the Record
class. I can refactor the necessary parts if you think that it's
worthwhile.

Dominique

Re: Clustering and consistency

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 10/9/07, Dominique Pfister <do...@day.com> wrote:
> > But do we have a good picture of what will happen if multiple cluster
> > nodes concurrently update the same content nodes?
>
> What happens is the following: before actually being able to store its
> changes, a cluster node has to "lock and sync", i.e. it must first
> obtain a non-shareable lock on the journal (which is either a file or
> a database table) and will then read all changes upto the latest
> revision.

Ah, you're right! That should work fine.

Looking at the code I never realized this was happening, but now that
you mentioned it I was able to find the relevant calls in
DefaultRecordProducer and AppendRecord. Is there a rationale for
having the calls in these locations, or should we consider refactoring
the code somehow? I'd find something like journal.append(record)
followed by journal.unlock() more intuitive than record.update(), but
I guess that would spread both the journal reference and the locking
functionality to a number of places around Jackrabbit.

BR,

Jukka Zitting

Re: Clustering and consistency

Posted by Dominique Pfister <do...@day.com>.
Hi Jukka,

> The issue shouldn't be that critical for deployments with mostly read
> access, or write access happening only on a single cluster node, or
> even multiple nodes writing to separate branches of the content tree.
> But do we have a good picture of what will happen if multiple cluster
> nodes concurrently update the same content nodes?

What happens is the following: before actually being able to store its
changes, a cluster node has to "lock and sync", i.e. it must first
obtain a non-shareable lock on the journal (which is either a file or
a database table) and will then read all changes upto the latest
revision. If one of those changes conflicts with the change this
cluster node is currently trying to save, the whole update process
will fail with a StaleItemException, similar to the situation when
another session in the same repository would have made that change.

Kind regards
Dominique