You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2013/01/17 18:15:28 UTC

Conflict handling through rebasing branches

Hi,

There are various places in Oak where conflicting updates to the content 
tree can occur: committing changes against a non root revision, merging 
changes from a branch into trunk and synchronisation of cluster nodes.

The Microkernel API currently doesn't specify a contract for merging 
conflicting changes but leaves it mostly to the underlying 
implementation. As discussed before [1] this is not satisfactory and I'd 
like to tighten the contract.

As announced earlier, I spent some time implementing various approaches 
for rebasing branches in Oak. Apart from the other problems this solves 
and which have been discussed at length, this also lends itself nicely 
for a central, clean and efficient way of handling conflicts (I briefly 
mentioned this in [2] already).

The core idea is to try to resolve all conflicts in a branch through 
rebasing it on top of the current trunk. After successfully rebasing a 
branch, merging it to trunk is as simple as fast forwarding the head of 
the trunk to the head of the branch.

Here are a naive sample implementations of the relevant methods in 
pseudo code. Note how commit is implemented in terms of branch and 
merge. I has not to be implemented that way but rather the observable 
behaviour should be like this.

/* Rebase branch on top of the head of the trunk. Returns false if
    a conflict occurs, true otherwise. */
boolean rebase(branch) {
     // See https://github.com/mduerig/jackrabbit-oak/commits/OAK-536
     // and https://github.com/mduerig/jackrabbit-oak/commits/OAK-536-2
     // for two possible implementations
}

int merge(branch) {
     while(true) {
         if (!rebase(branch)) {
             throw new MicroKernelException("merge conflict");
         }

         atomic {
             if (branch.baseRevision == trunk.headRevision) {
                 trunk.headRevision = branch.headRevision
                 return trunk.headRevision;
             }
         }
     }
}

int commit(baseRevision, jsop) {
     branch = createBranchFrom(baseRevision)
     branch.apply(jsop)
     if (!rebase(branch)) {
         throw new MicroKernelException("merge conflict");
     }
     return merge(branch)
}

Michael

[1] http://markmail.org/message/4xwfwbax3kpoysbp
[2] http://markmail.org/message/niam2xs2ora5rufi


Re: Conflict handling through rebasing branches

Posted by Michael Dürig <md...@apache.org>.

On 18.1.13 9:57, Jukka Zitting wrote:
> Hi,
>
> On Thu, Jan 17, 2013 at 7:15 PM, Michael Dürig <md...@apache.org> wrote:
>> int merge(branch) {
>>      while(true) {
>>          if (!rebase(branch)) {
>>              throw new MicroKernelException("merge conflict");
>>          }
>>
>>          atomic {
>>              if (branch.baseRevision == trunk.headRevision) {
>>                  trunk.headRevision = branch.headRevision
>>                  return trunk.headRevision;
>>              }
>>          }
>>      }
>> }
>
> One thing that came up already earlier, but should be reiterated here:
> We have commit hooks that update indexes and possibly other
> repository-wide content structures. It is quite likely for these to
> cause merge conflicts. Also, the rebase operation may end up
> invalidating some content constraints, so it should be possible to
> re-run validators between the rebase operation and the final
> headRevision update.

Thanks for the heads-up/reminder. Yes, I figure we should then not do an 
implicit rebase in merge(branch) but rather rely on the caller doing 
that. merge() and commit() could then simply fail with a conflict error 
when the changes are not based on the latest trunk. This makes the whole 
logic in Microkernel even easier and keeps conflict detection/annotation 
strictly in one single place.

>
> Another, more minor point is that the loop above will behave badly if
> the system is already overloaded and can't process merges as fast as
> requested. A retry limit that causes the operation to fail after a
> fixed amount of time would be safer.

I left this out to keep things simple. But taking this further we might 
even think of some kind of exponential back-off to take some load of a 
busy system.

[...]


> The trouble here is that the commit hooks are in oak-core, so we'd
> need to have oak-core instead of the MicroKernel be in charge of this
> coordinating this work. One way of doing it would be to expose the
> rebase() operation as a part of the MicroKernel API and to require
> merge() and commit() to simply reject all changes that aren't based on
> the latest head revision.

Right. See above ;-)

Michael

>
> BR,
>
> Jukka Zitting
>

Re: Conflict handling through rebasing branches

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Jan 17, 2013 at 7:15 PM, Michael Dürig <md...@apache.org> wrote:
> int merge(branch) {
>     while(true) {
>         if (!rebase(branch)) {
>             throw new MicroKernelException("merge conflict");
>         }
>
>         atomic {
>             if (branch.baseRevision == trunk.headRevision) {
>                 trunk.headRevision = branch.headRevision
>                 return trunk.headRevision;
>             }
>         }
>     }
> }

One thing that came up already earlier, but should be reiterated here:
We have commit hooks that update indexes and possibly other
repository-wide content structures. It is quite likely for these to
cause merge conflicts. Also, the rebase operation may end up
invalidating some content constraints, so it should be possible to
re-run validators between the rebase operation and the final
headRevision update.

Another, more minor point is that the loop above will behave badly if
the system is already overloaded and can't process merges as fast as
requested. A retry limit that causes the operation to fail after a
fixed amount of time would be safer.

Thus I propose the following update to the logic:

    int merge(branch) {
        for (int i = 0; i < MAX_REBASE_COUNT; i++) {
            if (!rebase(branch)) {
                throw new CommitFailedException("conflict");
            }

            runCommitHooks(branch);

            atomic {
                if (branch.baseRevision == trunk.headRevision) {
                     trunk.headRevision = branch.headRevision;
                     return trunk.headRevision;
                 }
             }
        }
        throw new CommitFailedException("too busy");
    }

The trouble here is that the commit hooks are in oak-core, so we'd
need to have oak-core instead of the MicroKernel be in charge of this
coordinating this work. One way of doing it would be to expose the
rebase() operation as a part of the MicroKernel API and to require
merge() and commit() to simply reject all changes that aren't based on
the latest head revision.

BR,

Jukka Zitting

Re: Conflict handling through rebasing branches

Posted by Michael Dürig <md...@apache.org>.

On 18.1.13 7:35, Jukka Zitting wrote:
> Hi,
>
> On Thu, Jan 17, 2013 at 7:15 PM, Michael Dürig <md...@apache.org> wrote:
>> The core idea is to try to resolve all conflicts in a branch through
>> rebasing it on top of the current trunk. After successfully rebasing a
>> branch, merging it to trunk is as simple as fast forwarding the head of the
>> trunk to the head of the branch.
>

[...]

>
> The use of "atomic" in the pseudo-code is still a bit troublesome in
> terms of fully distributed operation, though for now I'd be fine with
> it as long as the critical section can be kept really small (as it is
> in the pseudo-code).

I didn't mean this mechanism to be used globally but rather for a single 
cluster node only. For cluster sync. I was more thinking along the lines 
you sketch below.

>
> An extension for a bit more write-concurrency would be to have each
> cluster node to maintain their own headRevision as essentially a local
> master branch of the global state of the repository. Any local changes
> would first get merged to that local master branch before being pushed
> to update the global headRevision.

Exactly.

The main problem with such a
> mechanism is that there may be no good way to resolve conflicts. One
> potential solution would be to order the cluster nodes in some
> hierarchy of increased authority. Changes from nodes with higher
> authority would always override concurrent changes coming from below,
> with a single master node or (preferably) a small fully synchronized
> cluster acting as the ultimate authority. A client that wants to
> ensure that its changes won't get accidentally overwritten by other
> clients could write directly to that master, while others that don't
> require such guarantees could gain extra write concurrency by
> committing their changes to other cluster nodes with lower authority.

Or add the same kind of conflict annotations and rely on clients to 
resolve those. Such a client could well be a human. At least initially 
and/or partially. As Thomas notes, we don't have much experience with 
what kind of conflicts to expect and how often each type of conflict 
would appear. Since conflict handlers are plug-able in oak--core, such 
an approach would allow us to gain this experience and implement 
specific automated resolution algorithms tailored to specific settings 
as it turns out to be necessary.

On top of that, specific/custom applications might lead to new classes 
of conflicts. These could also be handled through specific/custom 
algorithms as it turns out to be necessary.

Michael



Re: Conflict handling through rebasing branches

Posted by Michael Dürig <md...@apache.org>.

On 18.1.13 7:35, Jukka Zitting wrote:
> Hi,
>
> On Thu, Jan 17, 2013 at 7:15 PM, Michael Dürig <md...@apache.org> wrote:
>> The core idea is to try to resolve all conflicts in a branch through
>> rebasing it on top of the current trunk. After successfully rebasing a
>> branch, merging it to trunk is as simple as fast forwarding the head of the
>> trunk to the head of the branch.
>
> +1
>
> In general, instead of throwing an exception (especially a generic one
> indistinguishable from other problems), I'd prefer for commit or merge
> to rather just return the revision of the new partially rebased
> branch, together with the appropriate conflict markers. The client
> would then have a chance to resolve those conflicts and re-merge using
> higher level context information than what's available in the
> MicroKernel.

Yes this is also my idea. I left it out from the pseudo code to keep it 
as simple as possible as I didn't want to get into details about 
conflict resolution but concentrate on the conflict free case.

The first POC implementation I did [1] does in fact exactly this: it 
annotates conflicts in much the same way like AnnotatingConflictHandler 
does it currently in oak-core. See the contract for rebase [2] for what 
constitutes a conflict and how each conflict is annotated.

Michael

[1] https://github.com/mduerig/jackrabbit-oak/commits/OAK-536
[2] 
https://github.com/mduerig/jackrabbit-oak/commit/63868f6f1b03cf1dfe2793b32be588fc075cfdcc#oak-mk-api/src/main/java/org/apache/jackrabbit/mk/api/MicroKernel.java

Re: Conflict handling through rebasing branches

Posted by Thomas Mueller <mu...@adobe.com>.
Hi,

Do we have any real-world data about how many conflicts we typically get?
I would rather avoid adding a complex mechanism for something that occurs
very rarely. Also, why do we need to merge conflicts, and not just throw
the exception to the client? What kind of conflicts can be merged, how
common are they, and do they really need to be merged? The reason I ask is
that in the database world I have never heard about merging conflicts.
Instead, typically locks are used (even in PostgreSQL and Oracle that
support MVCC).

Regards,
Thomas


On 1/18/13 8:35 AM, "Jukka Zitting" <ju...@gmail.com> wrote:

>Hi,
>
>On Thu, Jan 17, 2013 at 7:15 PM, Michael Dürig <md...@apache.org> wrote:
>> The core idea is to try to resolve all conflicts in a branch through
>> rebasing it on top of the current trunk. After successfully rebasing a
>> branch, merging it to trunk is as simple as fast forwarding the head of
>>the
>> trunk to the head of the branch.
>
>+1
>
>In general, instead of throwing an exception (especially a generic one
>indistinguishable from other problems), I'd prefer for commit or merge
>to rather just return the revision of the new partially rebased
>branch, together with the appropriate conflict markers. The client
>would then have a chance to resolve those conflicts and re-merge using
>higher level context information than what's available in the
>MicroKernel.
>
>The use of "atomic" in the pseudo-code is still a bit troublesome in
>terms of fully distributed operation, though for now I'd be fine with
>it as long as the critical section can be kept really small (as it is
>in the pseudo-code).
>
>An extension for a bit more write-concurrency would be to have each
>cluster node to maintain their own headRevision as essentially a local
>master branch of the global state of the repository. Any local changes
>would first get merged to that local master branch before being pushed
>to update the global headRevision. The main problem with such a
>mechanism is that there may be no good way to resolve conflicts. One
>potential solution would be to order the cluster nodes in some
>hierarchy of increased authority. Changes from nodes with higher
>authority would always override concurrent changes coming from below,
>with a single master node or (preferably) a small fully synchronized
>cluster acting as the ultimate authority. A client that wants to
>ensure that its changes won't get accidentally overwritten by other
>clients could write directly to that master, while others that don't
>require such guarantees could gain extra write concurrency by
>committing their changes to other cluster nodes with lower authority.
>
>BR,
>
>Jukka Zitting


Re: Conflict handling through rebasing branches

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Jan 17, 2013 at 7:15 PM, Michael Dürig <md...@apache.org> wrote:
> The core idea is to try to resolve all conflicts in a branch through
> rebasing it on top of the current trunk. After successfully rebasing a
> branch, merging it to trunk is as simple as fast forwarding the head of the
> trunk to the head of the branch.

+1

In general, instead of throwing an exception (especially a generic one
indistinguishable from other problems), I'd prefer for commit or merge
to rather just return the revision of the new partially rebased
branch, together with the appropriate conflict markers. The client
would then have a chance to resolve those conflicts and re-merge using
higher level context information than what's available in the
MicroKernel.

The use of "atomic" in the pseudo-code is still a bit troublesome in
terms of fully distributed operation, though for now I'd be fine with
it as long as the critical section can be kept really small (as it is
in the pseudo-code).

An extension for a bit more write-concurrency would be to have each
cluster node to maintain their own headRevision as essentially a local
master branch of the global state of the repository. Any local changes
would first get merged to that local master branch before being pushed
to update the global headRevision. The main problem with such a
mechanism is that there may be no good way to resolve conflicts. One
potential solution would be to order the cluster nodes in some
hierarchy of increased authority. Changes from nodes with higher
authority would always override concurrent changes coming from below,
with a single master node or (preferably) a small fully synchronized
cluster acting as the ultimate authority. A client that wants to
ensure that its changes won't get accidentally overwritten by other
clients could write directly to that master, while others that don't
require such guarantees could gain extra write concurrency by
committing their changes to other cluster nodes with lower authority.

BR,

Jukka Zitting

Re: Conflict handling through rebasing branches

Posted by Michael Dürig <md...@apache.org>.
I added a Wiki page [1] summarising and amending this proposal.

Michael

[1] 
http://wiki.apache.org/jackrabbit/Conflict%20handling%20through%20rebasing%20branches

On 17.1.13 17:15, Michael Dürig wrote:
>
> Hi,
>
> There are various places in Oak where conflicting updates to the content
> tree can occur: committing changes against a non root revision, merging
> changes from a branch into trunk and synchronisation of cluster nodes.
>
> The Microkernel API currently doesn't specify a contract for merging
> conflicting changes but leaves it mostly to the underlying
> implementation. As discussed before [1] this is not satisfactory and I'd
> like to tighten the contract.
>
> As announced earlier, I spent some time implementing various approaches
> for rebasing branches in Oak. Apart from the other problems this solves
> and which have been discussed at length, this also lends itself nicely
> for a central, clean and efficient way of handling conflicts (I briefly
> mentioned this in [2] already).
>
> The core idea is to try to resolve all conflicts in a branch through
> rebasing it on top of the current trunk. After successfully rebasing a
> branch, merging it to trunk is as simple as fast forwarding the head of
> the trunk to the head of the branch.
>
> Here are a naive sample implementations of the relevant methods in
> pseudo code. Note how commit is implemented in terms of branch and
> merge. I has not to be implemented that way but rather the observable
> behaviour should be like this.
>
> /* Rebase branch on top of the head of the trunk. Returns false if
>     a conflict occurs, true otherwise. */
> boolean rebase(branch) {
>      // See https://github.com/mduerig/jackrabbit-oak/commits/OAK-536
>      // and https://github.com/mduerig/jackrabbit-oak/commits/OAK-536-2
>      // for two possible implementations
> }
>
> int merge(branch) {
>      while(true) {
>          if (!rebase(branch)) {
>              throw new MicroKernelException("merge conflict");
>          }
>
>          atomic {
>              if (branch.baseRevision == trunk.headRevision) {
>                  trunk.headRevision = branch.headRevision
>                  return trunk.headRevision;
>              }
>          }
>      }
> }
>
> int commit(baseRevision, jsop) {
>      branch = createBranchFrom(baseRevision)
>      branch.apply(jsop)
>      if (!rebase(branch)) {
>          throw new MicroKernelException("merge conflict");
>      }
>      return merge(branch)
> }
>
> Michael
>
> [1] http://markmail.org/message/4xwfwbax3kpoysbp
> [2] http://markmail.org/message/niam2xs2ora5rufi