You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2012/12/12 16:46:21 UTC

Conflict handling in Oak

Hi,

Currently the Microkernel contract does not specify a merge policy but 
is free to try to merge conflicting changes or throw an exception. I 
think this is problematic in various ways:

1) Automatic merging may violate the principal of least surprise. It can 
be arbitrary complex and still be incorrect wrt. different use cases 
which need different merge strategies for the same conflict.

2) Furthermore merges should be correctly mirrored in the journal. 
According to the Microkernel API: "deleting a node is allowed if the 
node existed in the given revision, even if it was deleted in the 
meantime." So the following should currently not fail (it does though, 
see OAK-507):

     String base = mk.getHeadRevision();
     String r1 = mk.commit("-a", base)
     String r2 = mk.commit("-a", base)

At this point retrieving the journal up to revision r2 should only 
contain a single -a operation. I'm quite sure this is currently not the 
case and the journal will contain two -a operations. One for revision r1 
and another for revision r2.

3) Throwing an unspecific MicrokernelException leaves the API consumer 
with no clue on what caused a commit to fail. Retrying a commit after 
some client side conflict resolution becomes a hit and miss. See OAK-442.


To address 1) I suggest we define a set of clear cut cases where any 
Microkernel implementations MUST merge. For the other cases I'm not sure 
whether we should make them MUST NOT, SHOULD NOT or MAY merge.

To address 2) My preferred solution would be to drop getJournal entirely 
from the Microkernel API. However, this means rebasing a branch would 
need to go into the Microkernel (OAK-464). Otherwise every merge defined 
for 1) would need to take care the journal is adjusted accordingly.
Another possibility here is to leave the journal unadjusted. However 
then we need to specify MUST NOT for other merges in 1). Because only 
then can clients of the journal know how to interpret the journal 
(receptively the conflicts contained therein).

To address 3) I'd simply derive a more specific exception from 
MicroKernelException and throw that in the case of a conflict. See OAK-496.

Michael

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.

On 18.12.12 9:13, Marcel Reutegger wrote:
> hi,
>
>> Just remember that "MAY" is difficult to handle by developers: Can I depend
>> on it or not ?
>
> no, you can't.
>
>> What if the "MAY" feature does not exist?
>
> in this context it means, a commit will fail because of a conflict. for oak-jcr that
> will usually mean it throws an InvalidItemStateException.
>
>> What if I develop on
>> an implementation providing the "MAY" feature and then running on an
>> implementation not providing the "MAY" feature ?
>
> you will get a different behavior and will be advised to read the documentation ;)

It will make different Microkernel implementations less interchangeable 
though. The differences in the observable effect between an 
implementation which is able to merge a certain conflict compared to one 
that isn't might be *very* subtle and might lead to errors which are 
hard to foresee and diagnose. I think we are better of if we just 
specify the merges which are MUST and make all others MUST NOT.

Michael

>
>> In essence, a "MAY" feature basically must be considered as non-existing :-(
>>
>> All in all, please don't use "MAY". Thanks from a developer ;-)
>
> that's a valid alternative.
>
> what do others think?
>
> regards
>   marcel
>

RE: Conflict handling in Oak

Posted by Marcel Reutegger <mr...@adobe.com>.

hi,

> Just remember that "MAY" is difficult to handle by developers: Can I depend
> on it or not ?

no, you can't.

> What if the "MAY" feature does not exist?

in this context it means, a commit will fail because of a conflict. for oak-jcr that
will usually mean it throws an InvalidItemStateException. 

> What if I develop on
> an implementation providing the "MAY" feature and then running on an
> implementation not providing the "MAY" feature ?

you will get a different behavior and will be advised to read the documentation ;)

> In essence, a "MAY" feature basically must be considered as non-existing :-(
> 
> All in all, please don't use "MAY". Thanks from a developer ;-)

that's a valid alternative. 

what do others think?

regards
 marcel

OT: normative language, was: Conflict handling in Oak

Posted by Julian Reschke <ju...@gmx.de>.

On 2012-12-18 15:12, Tommaso Teofili wrote:
>
> On 18/dic/2012, at 09:49, Felix Meschberger wrote:
>
>> Hi,
>>
>> Just remember that "MAY" is difficult to handle by developers: Can I depend on it or not ? What if the "MAY" feature does not exist ? What if I develop on an implementation providing the "MAY" feature and then running on an implementation not providing the "MAY" feature ?
>>
>> In essence, a "MAY" feature basically must be considered as non-existing :-(
>>
>> All in all, please don't use "MAY". Thanks from a developer ;-)
>
> I remember such a pain when dealing with browser compliance to HTTP spec some years ago, SHOULD / MAY [NOT] were my enemies :-)
 > ...

...in which case you should read the new spec(s) and provide feedback 
before they get finalized :-) (-> 
http://tools.ietf.org/wg/httpbis/trac/wiki)

Best regards, Julian

Re: Conflict handling in Oak

Posted by Tommaso Teofili <te...@adobe.com>.

On 18/dic/2012, at 09:49, Felix Meschberger wrote:

> Hi,
> 
> Just remember that "MAY" is difficult to handle by developers: Can I depend on it or not ? What if the "MAY" feature does not exist ? What if I develop on an implementation providing the "MAY" feature and then running on an implementation not providing the "MAY" feature ?
> 
> In essence, a "MAY" feature basically must be considered as non-existing :-(
> 
> All in all, please don't use "MAY". Thanks from a developer ;-)

I remember such a pain when dealing with browser compliance to HTTP spec some years ago, SHOULD / MAY [NOT] were my enemies :-)
Apart from that, I agree with MichaelM (with the exception that I'd keep MAY out using either MUST or MUST NOT, with a slight preference on MUST).
My 0.02 cents,

Tommaso

> 
> Regards
> Felix
> 
> Am 18.12.2012 um 09:37 schrieb Marcel Reutegger:
> 
>> Hi,
>> 
>>> To address 1) I suggest we define a set of clear cut cases where any
>>> Microkernel implementations MUST merge. For the other cases I'm not sure
>>> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
>> 
>> I agree and I think three cases are sufficient. MUST, MUST NOT and MAY.
>> MUST is for conflicts we know are easy and straight forward to resolve.
>> MUST NOT is for conflicts that are known to be problematic because there's
>> no clean resolution strategy.
>> MAY is for conflicts that have a defined resolution but we think happen
>> rarely and is not worth implementing.
>> 
>> I don't see how SHOULD NOT is useful in this context.
>> 
>> regards
>> marcel
>

Re: Conflict handling in Oak

Posted by Michael Marth <mm...@adobe.com>.

Agree with Felix, we should stay away from MAY especially if we want to achieve clarity for Oak-Core what it can expect the MK to do

On Dec 18, 2012, at 9:49 AM, Felix Meschberger wrote:

> Hi,
> 
> Just remember that "MAY" is difficult to handle by developers: Can I depend on it or not ? What if the "MAY" feature does not exist ? What if I develop on an implementation providing the "MAY" feature and then running on an implementation not providing the "MAY" feature ?
> 
> In essence, a "MAY" feature basically must be considered as non-existing :-(
> 
> All in all, please don't use "MAY". Thanks from a developer ;-)
> 
> Regards
> Felix
> 
> Am 18.12.2012 um 09:37 schrieb Marcel Reutegger:
> 
>> Hi,
>> 
>>> To address 1) I suggest we define a set of clear cut cases where any
>>> Microkernel implementations MUST merge. For the other cases I'm not sure
>>> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
>> 
>> I agree and I think three cases are sufficient. MUST, MUST NOT and MAY.
>> MUST is for conflicts we know are easy and straight forward to resolve.
>> MUST NOT is for conflicts that are known to be problematic because there's
>> no clean resolution strategy.
>> MAY is for conflicts that have a defined resolution but we think happen
>> rarely and is not worth implementing.
>> 
>> I don't see how SHOULD NOT is useful in this context.
>> 
>> regards
>> marcel
>

Re: Conflict handling in Oak

Posted by Felix Meschberger <fm...@adobe.com>.

Hi,

Just remember that "MAY" is difficult to handle by developers: Can I depend on it or not ? What if the "MAY" feature does not exist ? What if I develop on an implementation providing the "MAY" feature and then running on an implementation not providing the "MAY" feature ?

In essence, a "MAY" feature basically must be considered as non-existing :-(

All in all, please don't use "MAY". Thanks from a developer ;-)

Regards
Felix

Am 18.12.2012 um 09:37 schrieb Marcel Reutegger:

> Hi,
> 
>> To address 1) I suggest we define a set of clear cut cases where any
>> Microkernel implementations MUST merge. For the other cases I'm not sure
>> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
> 
> I agree and I think three cases are sufficient. MUST, MUST NOT and MAY.
> MUST is for conflicts we know are easy and straight forward to resolve.
> MUST NOT is for conflicts that are known to be problematic because there's
> no clean resolution strategy.
> MAY is for conflicts that have a defined resolution but we think happen
> rarely and is not worth implementing.
> 
> I don't see how SHOULD NOT is useful in this context.
> 
> regards
> marcel

RE: Conflict handling in Oak

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

> To address 1) I suggest we define a set of clear cut cases where any
> Microkernel implementations MUST merge. For the other cases I'm not sure
> whether we should make them MUST NOT, SHOULD NOT or MAY merge.

I agree and I think three cases are sufficient. MUST, MUST NOT and MAY.
MUST is for conflicts we know are easy and straight forward to resolve.
MUST NOT is for conflicts that are known to be problematic because there's
no clean resolution strategy.
MAY is for conflicts that have a defined resolution but we think happen
rarely and is not worth implementing.

I don't see how SHOULD NOT is useful in this context.

regards
 marcel

Re: Conflict handling in Oak

Posted by Michael Dürig <mi...@gmail.com>.


On 18.12.12 12:44, Stefan Guggisberg wrote:
> On Tue, Dec 18, 2012 at 12:49 PM, Michael Dürig <md...@apache.org> wrote:
>>
>> This is a bit more complicated. In fact it is the other way around: if two
>> journal entries commute, the corresponding differences on the nodes due not
>> conflict regarding the definition I gave.
>>
>> OTOH non conflicting changes could still lead to non commuting journal
>> entries and thus merging such changes would require journals to be adjusted.
>
> why should a journal need to be adjusted? MicroKernel#getJournal returns the
> exact diffs of successive revisions.

See the beginning of the thread an OAK-532.
Michael


>
> cheers
> stefan
>
>> I'll rephrase below.
>>
>>
>> On 18.12.12 11:09, Michael Dürig wrote:
>>>
>>>
>>>
>>> On 18.12.12 9:38, Thomas Mueller wrote:
>>>>
>>>> What I suggest should be merged within the MicroKernel:
>>>>
>>>> * Two sessions concurrently add different child nodes to a node
>>>> ("/test/a"
>>>> and "/test/b"): this is merged as it's not really a conflict
>>>>
>>>> * Two sessions concurrently delete different child nodes ("/test/a" and
>>>> "/test/b"): this is merged
>>>>
>>>> * Two sessions concurrently move different child nodes to another
>>>> location
>>>
>>>
>>> I think this can be summed up as:
>>>
>>> Only merge non conflicting changes wrt. the children of a node. The
>>> children of any nodes are its child nodes and its properties. Two
>>> changes to the children of a node conflict if these children have the
>>> same name.
>>
>>
>> If there are no other conflicts (*), merge changes wrt. the children of a
>> node. The children of any nodes are its child nodes and its properties. Two
>> changes to the children of a node conflict if these children have the same
>> name.
>>
>> (*) see Tom's initial post for what constitutes "other conflicts".
>>
>> This additional complication somewhat unnecessarily restricts the set of
>> mergeable changes. That's why I came up with the proposal to drop support
>> for the getJournal() API.
>>
>> Michael
>>
>>
>>
>>>
>>> This has the beauty of simplicity and as Tom notes below also does not
>>> require the journal to be corrected.
>>>
>>>>
>>>> The reason for this is to allow concurrently manipulating child nodes if
>>>> there are many child nodes (concurrent repository loading).
>>>>
>>>> With this rules, I believe that "2) Furthermore merges should be
>>>> correctly
>>>> mirrored in the journal" wouldn't be required, as there are no merges
>>>> that
>>>> would cause the journal to change.
>>>
>>>
>>> Right. The reason for this is - and that's again a very nice property of
>>> this approach - that for these conflicts the corresponding journal
>>> entries commute.
>>>
>>>
>>> In addition it would be nice to annotate conflicts in some way. This is
>>> quite easy to do and would allow upper layers to resolve conflicts based
>>> on specific business logic. Currently we do something along these lines
>>> with the AnnotatingConflictHandler [1] in oak-core.
>>>
>>> Michael
>>>
>>>
>>> [1]
>>>
>>> https://github.com/jukka/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/commit/AnnotatingConflictHandler.java
>>>
>>>
>>>
>>

Re: Conflict handling in Oak

Posted by Stefan Guggisberg <st...@gmail.com>.

On Tue, Dec 18, 2012 at 12:49 PM, Michael Dürig <md...@apache.org> wrote:
>
> This is a bit more complicated. In fact it is the other way around: if two
> journal entries commute, the corresponding differences on the nodes due not
> conflict regarding the definition I gave.
>
> OTOH non conflicting changes could still lead to non commuting journal
> entries and thus merging such changes would require journals to be adjusted.

why should a journal need to be adjusted? MicroKernel#getJournal returns the
exact diffs of successive revisions.

cheers
stefan

> I'll rephrase below.
>
>
> On 18.12.12 11:09, Michael Dürig wrote:
>>
>>
>>
>> On 18.12.12 9:38, Thomas Mueller wrote:
>>>
>>> What I suggest should be merged within the MicroKernel:
>>>
>>> * Two sessions concurrently add different child nodes to a node
>>> ("/test/a"
>>> and "/test/b"): this is merged as it's not really a conflict
>>>
>>> * Two sessions concurrently delete different child nodes ("/test/a" and
>>> "/test/b"): this is merged
>>>
>>> * Two sessions concurrently move different child nodes to another
>>> location
>>
>>
>> I think this can be summed up as:
>>
>> Only merge non conflicting changes wrt. the children of a node. The
>> children of any nodes are its child nodes and its properties. Two
>> changes to the children of a node conflict if these children have the
>> same name.
>
>
> If there are no other conflicts (*), merge changes wrt. the children of a
> node. The children of any nodes are its child nodes and its properties. Two
> changes to the children of a node conflict if these children have the same
> name.
>
> (*) see Tom's initial post for what constitutes "other conflicts".
>
> This additional complication somewhat unnecessarily restricts the set of
> mergeable changes. That's why I came up with the proposal to drop support
> for the getJournal() API.
>
> Michael
>
>
>
>>
>> This has the beauty of simplicity and as Tom notes below also does not
>> require the journal to be corrected.
>>
>>>
>>> The reason for this is to allow concurrently manipulating child nodes if
>>> there are many child nodes (concurrent repository loading).
>>>
>>> With this rules, I believe that "2) Furthermore merges should be
>>> correctly
>>> mirrored in the journal" wouldn't be required, as there are no merges
>>> that
>>> would cause the journal to change.
>>
>>
>> Right. The reason for this is - and that's again a very nice property of
>> this approach - that for these conflicts the corresponding journal
>> entries commute.
>>
>>
>> In addition it would be nice to annotate conflicts in some way. This is
>> quite easy to do and would allow upper layers to resolve conflicts based
>> on specific business logic. Currently we do something along these lines
>> with the AnnotatingConflictHandler [1] in oak-core.
>>
>> Michael
>>
>>
>> [1]
>>
>> https://github.com/jukka/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/commit/AnnotatingConflictHandler.java
>>
>>
>>
>

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.

On 18.12.12 14:12, Stefan Guggisberg wrote:
>> 3) Adjust the journals to reflect the changes introduced through merges.
> irrelevant since the journal is expected to be consistent
>

I wonder how you would go about to guarantee that in the face of 
concurrent modifications without falling back to fully synchronise 
commits. After all this is the topic of this thread and a more profound 
contribution would be appreciated.

Michael

Re: Conflict handling in Oak

Posted by Stefan Guggisberg <st...@gmail.com>.

On Tue, Dec 18, 2012 at 2:55 PM, Michael Dürig <md...@apache.org> wrote:
>
>
> On 18.12.12 12:35, Thomas Mueller wrote:
>>
>> Hi,
>>
>>> OTOH non conflicting changes could still lead to non commuting journal
>>> entries and thus merging such changes would require journals to be
>>> adjusted.
>>
>>
>> Could you give an example?
>
>
> Consider the revision you get after
>
> +/a:{}
> +/x:{}
>
> on an initially empty tree.
>
> Now session 1 does
>
>>/a:/x/a
>
> and session 2 does
>
> -/a
>
> concurrently on that revision.
>
> The resulting trees could be merged by my first definition. However, the two
> journal from session 1 and session 2 do not commute. You could neither do
>
>>/a:/x/a
> -/a
>
> nor could you do
>
> -/a
>>/a:/x/a
>
> That's why I included "If there are no other conflicts" referring to the
> list of conflicts you gave earlier in my updated definition. With this the
> resulting trees could not be merged at all in the first place.
>
>
> Generally I think there are four ways to deal with this situation:
>
> 1) Make the definition of conflicts sufficiently strong to exclude such
> cases. That's Tom's proposal from this Thread.
>
> 2) Allow inconsistent journals.

-1, i consider that a bug

>
> 3) Adjust the journals to reflect the changes introduced through merges.

irrelevant since the journal is expected to be consistent

>
> 4) Drop journal support.

-1, i fail to see why that would be required.

cheers
stefan


>
> Michael
>
>
>>
>> Regards,
>> Thomas
>>
>

Re: Conflict handling in Oak

Posted by Dominik Süß <do...@gmail.com>.

Hi, although I did not have the opportunity to jump in as planned I'm still
following changes and had some thoughts about that as well.

On Tue, Dec 18, 2012 at 4:51 PM, Michael Dürig <md...@apache.org> wrote:

>
> Right. However, degrading moves to remove/add node operations limits the
> size of sub trees which can be moved: if the moved sub tree (serialised to
> json add node operations) do not fit into heap, moves wont work at all.
>

When reading about splitting up a move operation in an add and a remove
operation I realized that automatic mergin might  lead to strange
constellations. If a node is moved and the creation did work while the
removal ended in a conflict (worst case: to concurrent moves) the create
would be performed without any annotation, so it is not that easy to
perform a client side resolution  without the awareness of a dedicated move
operation.

IMHO a commit that needs a merge should always fail but return the
necessary diff-information for client side resolution. This information
could either be in a way that allow the client just to "accept" a
mergeproposal or for unresolvable clients having the oportunity of
implementing a "mine"-"theirs" resolution logic. That way it is up to the
client to define how relaxed the system should behave. The (core) API could
even give the tooling to autoaccept or sequentially accept mergeproposals,
so the remaining implementationeffort for a client/binding can be lowered.

Hope I did get everything right and my thoughts do not irritate to much.

Best regards
Dominik

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.


On 18.12.12 15:30, Stefan Guggisberg wrote:
> On Tue, Dec 18, 2012 at 4:08 PM, Michael Dürig <md...@apache.org> wrote:
>>
>>
>> On 18.12.12 14:43, Thomas Mueller wrote:
>>>
>>> Hi,
>>>
>>>>>> 2) Allow inconsistent journals.
>>>>>
>>>>>
>>>>> I guess we don't want that. But the question is how close the journal
>>>>> has
>>>>> to match the original commit, specially "move" and "copy" operations. If
>>>>> they need to be preserved (do they?), then it's complicated.
>>>>
>>>>
>>>> There is no use for a journal which is not accurate. After all, if we
>>>> consider implementing rebase (OAK-464) on top of the journal, it has to
>>>> be accurate.
>>>
>>>
>>> Yes, I think we should have a consistent journal, if we have a journal.
>>>
>>> But the question is how close the journal has to match the original
>>> commit, specially "move" and "copy" operations.
>>>
>>> So, do "move" and "copy" operations need to be preserved, or can they be
>>> converted to "add node" / "remove node"?
>>
>>
>> Now we are getting somewhere: This is exactly the original topic of OAK-464.
>> If the Microkernel converts moves to add/remove, implementing rebase on top
>> of that results in moves of big sub trees to become *very* expensive.
>
> IIRC we didn't consider efficient move operations a design goal.
> i guess we can live with non-optimized move operations.

Right. However, degrading moves to remove/add node operations limits the 
size of sub trees which can be moved: if the moved sub tree (serialised 
to json add node operations) do not fit into heap, moves wont work at all.

Michael

>
> cheers
> stefan
>
>>
>> Michael
>>
>>>
>>>
>>> Regards,
>>> Thomas
>>>
>>>
>>

Re: Conflict handling in Oak

Posted by Stefan Guggisberg <st...@gmail.com>.

On Tue, Dec 18, 2012 at 4:08 PM, Michael Dürig <md...@apache.org> wrote:
>
>
> On 18.12.12 14:43, Thomas Mueller wrote:
>>
>> Hi,
>>
>>>>> 2) Allow inconsistent journals.
>>>>
>>>>
>>>> I guess we don't want that. But the question is how close the journal
>>>> has
>>>> to match the original commit, specially "move" and "copy" operations. If
>>>> they need to be preserved (do they?), then it's complicated.
>>>
>>>
>>> There is no use for a journal which is not accurate. After all, if we
>>> consider implementing rebase (OAK-464) on top of the journal, it has to
>>> be accurate.
>>
>>
>> Yes, I think we should have a consistent journal, if we have a journal.
>>
>> But the question is how close the journal has to match the original
>> commit, specially "move" and "copy" operations.
>>
>> So, do "move" and "copy" operations need to be preserved, or can they be
>> converted to "add node" / "remove node"?
>
>
> Now we are getting somewhere: This is exactly the original topic of OAK-464.
> If the Microkernel converts moves to add/remove, implementing rebase on top
> of that results in moves of big sub trees to become *very* expensive.

IIRC we didn't consider efficient move operations a design goal.
i guess we can live with non-optimized move operations.

cheers
stefan

>
> Michael
>
>>
>>
>> Regards,
>> Thomas
>>
>>
>

Re: Conflict handling in Oak

Posted by Stefan Guggisberg <st...@gmail.com>.

On Tue, Dec 18, 2012 at 5:12 PM, Michael Dürig <md...@apache.org> wrote:
>
>
> On 18.12.12 16:05, Mete Atamel wrote:
>>
>> In MongoMK, getJournal basically returns the jsonDiff from the commit, at
>> least in the simple case when there is no path to filter.
>
>
> And AFAIK this is the same for the H2 MK.

currently, yes.

cheers
stefan

>
> Michael
>
>
>>
>> -Mete
>>
>> On 12/18/12 4:57 PM, "Thomas Mueller" <mu...@adobe.com> wrote:
>>
>>> Hi,
>>>
>>>> "But the question is how close the journal has to match the original
>>>> commit, specially "move" and "copy" operations.
>>>
>>>
>>> Yes. There are various degrees of how close the journal is to the commit.
>>> One option is: the commit is preserved 1:1. The other extreme is: moves
>>> are fully converted to add+remove. But there are options in the middle,
>>> for example if the original operation included "move /a /b", and the
>>> journal wouldn't return it 1:1, but instead "add /b, then move /a/x to
>>> /b/x, and remove /a". I thought this is what the MicroKernelImpl does in
>>> some cases (if there are multiple operations), and I don't think it's a
>>> problem.
>>>
>>> Regards,
>>> Thomas
>>>
>>>
>>
>

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.


On 18.12.12 16:05, Mete Atamel wrote:
> In MongoMK, getJournal basically returns the jsonDiff from the commit, at
> least in the simple case when there is no path to filter.

And AFAIK this is the same for the H2 MK.

Michael

>
> -Mete
>
> On 12/18/12 4:57 PM, "Thomas Mueller" <mu...@adobe.com> wrote:
>
>> Hi,
>>
>>> "But the question is how close the journal has to match the original
>>> commit, specially "move" and "copy" operations.
>>
>> Yes. There are various degrees of how close the journal is to the commit.
>> One option is: the commit is preserved 1:1. The other extreme is: moves
>> are fully converted to add+remove. But there are options in the middle,
>> for example if the original operation included "move /a /b", and the
>> journal wouldn't return it 1:1, but instead "add /b, then move /a/x to
>> /b/x, and remove /a". I thought this is what the MicroKernelImpl does in
>> some cases (if there are multiple operations), and I don't think it's a
>> problem.
>>
>> Regards,
>> Thomas
>>
>>
>

Re: Conflict handling in Oak

Posted by Mete Atamel <ma...@adobe.com>.

In MongoMK, getJournal basically returns the jsonDiff from the commit, at
least in the simple case when there is no path to filter.

-Mete

On 12/18/12 4:57 PM, "Thomas Mueller" <mu...@adobe.com> wrote:

>Hi,
>
>>"But the question is how close the journal has to match the original
>>commit, specially "move" and "copy" operations.
>
>Yes. There are various degrees of how close the journal is to the commit.
>One option is: the commit is preserved 1:1. The other extreme is: moves
>are fully converted to add+remove. But there are options in the middle,
>for example if the original operation included "move /a /b", and the
>journal wouldn't return it 1:1, but instead "add /b, then move /a/x to
>/b/x, and remove /a". I thought this is what the MicroKernelImpl does in
>some cases (if there are multiple operations), and I don't think it's a
>problem.
>
>Regards,
>Thomas
>
>

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>"But the question is how close the journal has to match the original
>commit, specially "move" and "copy" operations.

Yes. There are various degrees of how close the journal is to the commit.
One option is: the commit is preserved 1:1. The other extreme is: moves
are fully converted to add+remove. But there are options in the middle,
for example if the original operation included "move /a /b", and the
journal wouldn't return it 1:1, but instead "add /b, then move /a/x to
/b/x, and remove /a". I thought this is what the MicroKernelImpl does in
some cases (if there are multiple operations), and I don't think it's a
problem.

Regards,
Thomas

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.


On 18.12.12 15:34, Thomas Mueller wrote:
> Hi,
>
>>> So, do "move" and "copy" operations need to be preserved, or can they be
>>> converted to "add node" / "remove node"?
>>
>> Now we are getting somewhere: This is exactly the original topic of
>> OAK-464. If the Microkernel converts moves to add/remove, implementing
>> rebase on top of that results in moves of big sub trees to become *very*
>> expensive.
>
> As far as I know, in MongoDB, moves are implemented as "copy & delete", so
> I don't think performance is a problem.
>
> But I guess memory usage would be a problem if the whole subtree has to be
> put in a Json document. MicroKernel.getJournal could lead to out of memory
> for large move & copy operations, unless those operations are at least
> somewhat preserved in the journal. As far as I understand, both
> MicroKernel implementations do return move operations, so I guess it's not
> a problem in practice(?)

Yes but it was you who said earlier:

"But the question is how close the journal has to match the original
commit, specially "move" and "copy" operations.

So, do "move" and "copy" operations need to be preserved, or can they be
converted to "add node" / "remove node"?"

in order to circumvent the problem of inconsistent journals.

Which leaves me a bit clueless on what you mean with "not a problem in 
practice".

Michael


>
> Regards,
> Thomas
>

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>>So, do "move" and "copy" operations need to be preserved, or can they be
>> converted to "add node" / "remove node"?
>
>Now we are getting somewhere: This is exactly the original topic of
>OAK-464. If the Microkernel converts moves to add/remove, implementing
>rebase on top of that results in moves of big sub trees to become *very*
>expensive.

As far as I know, in MongoDB, moves are implemented as "copy & delete", so
I don't think performance is a problem.

But I guess memory usage would be a problem if the whole subtree has to be
put in a Json document. MicroKernel.getJournal could lead to out of memory
for large move & copy operations, unless those operations are at least
somewhat preserved in the journal. As far as I understand, both
MicroKernel implementations do return move operations, so I guess it's not
a problem in practice(?)

Regards,
Thomas

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.


On 18.12.12 14:43, Thomas Mueller wrote:
> Hi,
>
>>>> 2) Allow inconsistent journals.
>>>
>>> I guess we don't want that. But the question is how close the journal
>>> has
>>> to match the original commit, specially "move" and "copy" operations. If
>>> they need to be preserved (do they?), then it's complicated.
>>
>> There is no use for a journal which is not accurate. After all, if we
>> consider implementing rebase (OAK-464) on top of the journal, it has to
>> be accurate.
>
> Yes, I think we should have a consistent journal, if we have a journal.
>
> But the question is how close the journal has to match the original
> commit, specially "move" and "copy" operations.
>
> So, do "move" and "copy" operations need to be preserved, or can they be
> converted to "add node" / "remove node"?

Now we are getting somewhere: This is exactly the original topic of 
OAK-464. If the Microkernel converts moves to add/remove, implementing 
rebase on top of that results in moves of big sub trees to become *very* 
expensive.

Michael

>
>
> Regards,
> Thomas
>
>

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>>>2) Allow inconsistent journals.
>>
>> I guess we don't want that. But the question is how close the journal
>>has
>> to match the original commit, specially "move" and "copy" operations. If
>> they need to be preserved (do they?), then it's complicated.
>
>There is no use for a journal which is not accurate. After all, if we
>consider implementing rebase (OAK-464) on top of the journal, it has to
>be accurate.

Yes, I think we should have a consistent journal, if we have a journal.

But the question is how close the journal has to match the original
commit, specially "move" and "copy" operations.

So, do "move" and "copy" operations need to be preserved, or can they be
converted to "add node" / "remove node"?


Regards,
Thomas

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.


On 18.12.12 14:25, Thomas Mueller wrote:
> Hi,
>
>> 1) Make the definition of conflicts sufficiently strong to exclude such
>> cases. That's Tom's proposal from this Thread.
>
> Ah, OK, I thought you meant it could still be a problem even with my
> proposal.
>
> I guess failing on (node-level-) conflicts would be the most simple
> solution, as a start. It would also simplify checking node type
> constraints within oak-core I guess (if we actually want to have strict
> checks).
>
> At the beginning, I would probably not try to merge conflicts in oak-core,
> and simply fail the commit. If it turns out to be a problem in reality, we
> could still change it. Unless, of course, we already know it's a problem?

Yes, this matches the way we currently do it through 
AnnotatingConflictHandler in oak-core. We just mark the conflicts and 
later fail the commit with the ConflictValidator if such markers are 
present.


>
>> 2) Allow inconsistent journals.
>
> I guess we don't want that. But the question is how close the journal has
> to match the original commit, specially "move" and "copy" operations. If
> they need to be preserved (do they?), then it's complicated.

There is no use for a journal which is not accurate. After all, if we 
consider implementing rebase (OAK-464) on top of the journal, it has to 
be accurate.

Michael

>
> Regards,
> Thomas
>

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

I would probably initially only implement "strict conflict detection" in
the MongoMK, if it's not already implemented in that way(?).

I don't see a need currently that both MicroKernel implementations behave
in the exact same way, until we have a clear picture what the best
solution is.

Regards,
Thomas






On 12/18/12 3:25 PM, "Thomas Mueller" <mu...@adobe.com> wrote:

>Hi,
>
>>1) Make the definition of conflicts sufficiently strong to exclude such
>>cases. That's Tom's proposal from this Thread.
>
>Ah, OK, I thought you meant it could still be a problem even with my
>proposal.
>
>I guess failing on (node-level-) conflicts would be the most simple
>solution, as a start. It would also simplify checking node type
>constraints within oak-core I guess (if we actually want to have strict
>checks).
>
>At the beginning, I would probably not try to merge conflicts in oak-core,
>and simply fail the commit. If it turns out to be a problem in reality, we
>could still change it. Unless, of course, we already know it's a problem?
>
>>2) Allow inconsistent journals.
>
>I guess we don't want that. But the question is how close the journal has
>to match the original commit, specially "move" and "copy" operations. If
>they need to be preserved (do they?), then it's complicated.
>
>Regards,
>Thomas
>

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>1) Make the definition of conflicts sufficiently strong to exclude such
>cases. That's Tom's proposal from this Thread.

Ah, OK, I thought you meant it could still be a problem even with my
proposal.

I guess failing on (node-level-) conflicts would be the most simple
solution, as a start. It would also simplify checking node type
constraints within oak-core I guess (if we actually want to have strict
checks).

At the beginning, I would probably not try to merge conflicts in oak-core,
and simply fail the commit. If it turns out to be a problem in reality, we
could still change it. Unless, of course, we already know it's a problem?

>2) Allow inconsistent journals.

I guess we don't want that. But the question is how close the journal has
to match the original commit, specially "move" and "copy" operations. If
they need to be preserved (do they?), then it's complicated.

Regards,
Thomas

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.

On 18.12.12 12:35, Thomas Mueller wrote:
> Hi,
>
>> OTOH non conflicting changes could still lead to non commuting journal
>> entries and thus merging such changes would require journals to be
>> adjusted.
>
> Could you give an example?

Consider the revision you get after

+/a:{}
+/x:{}

on an initially empty tree.

Now session 1 does

 >/a:/x/a

and session 2 does

-/a

concurrently on that revision.

The resulting trees could be merged by my first definition. However, the 
two journal from session 1 and session 2 do not commute. You could 
neither do

 >/a:/x/a
-/a

nor could you do

-/a
 >/a:/x/a

That's why I included "If there are no other conflicts" referring to the 
list of conflicts you gave earlier in my updated definition. With this 
the resulting trees could not be merged at all in the first place.

Generally I think there are four ways to deal with this situation:

1) Make the definition of conflicts sufficiently strong to exclude such 
cases. That's Tom's proposal from this Thread.

2) Allow inconsistent journals.

3) Adjust the journals to reflect the changes introduced through merges.

4) Drop journal support.

Michael

>
> Regards,
> Thomas
>

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>OTOH non conflicting changes could still lead to non commuting journal
>entries and thus merging such changes would require journals to be
>adjusted.

Could you give an example?

Regards,
Thomas

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.

This is a bit more complicated. In fact it is the other way around: if 
two journal entries commute, the corresponding differences on the nodes 
due not conflict regarding the definition I gave.

OTOH non conflicting changes could still lead to non commuting journal 
entries and thus merging such changes would require journals to be 
adjusted. I'll rephrase below.

On 18.12.12 11:09, Michael Dürig wrote:
>
>
> On 18.12.12 9:38, Thomas Mueller wrote:
>> What I suggest should be merged within the MicroKernel:
>>
>> * Two sessions concurrently add different child nodes to a node
>> ("/test/a"
>> and "/test/b"): this is merged as it's not really a conflict
>>
>> * Two sessions concurrently delete different child nodes ("/test/a" and
>> "/test/b"): this is merged
>>
>> * Two sessions concurrently move different child nodes to another
>> location
>
> I think this can be summed up as:
>
> Only merge non conflicting changes wrt. the children of a node. The
> children of any nodes are its child nodes and its properties. Two
> changes to the children of a node conflict if these children have the
> same name.

If there are no other conflicts (*), merge changes wrt. the children of 
a node. The children of any nodes are its child nodes and its 
properties. Two changes to the children of a node conflict if these 
children have the same name.

(*) see Tom's initial post for what constitutes "other conflicts".

This additional complication somewhat unnecessarily restricts the set of 
mergeable changes. That's why I came up with the proposal to drop 
support for the getJournal() API.

Michael

>
> This has the beauty of simplicity and as Tom notes below also does not
> require the journal to be corrected.
>
>>
>> The reason for this is to allow concurrently manipulating child nodes if
>> there are many child nodes (concurrent repository loading).
>>
>> With this rules, I believe that "2) Furthermore merges should be
>> correctly
>> mirrored in the journal" wouldn't be required, as there are no merges
>> that
>> would cause the journal to change.
>
> Right. The reason for this is - and that's again a very nice property of
> this approach - that for these conflicts the corresponding journal
> entries commute.
>
>
> In addition it would be nice to annotate conflicts in some way. This is
> quite easy to do and would allow upper layers to resolve conflicts based
> on specific business logic. Currently we do something along these lines
> with the AnnotatingConflictHandler [1] in oak-core.
>
> Michael
>
>
> [1]
> https://github.com/jukka/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/commit/AnnotatingConflictHandler.java
>
>
>

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

>In addition it would be nice to annotate conflicts in some way. This is
>quite easy to do and would allow upper layers to resolve conflicts based
>on specific business logic. Currently we do something along these lines
>with the AnnotatingConflictHandler [1] in oak-core.

Sure, that would make sense. We would need to define the exact way on how
to mark conflicts. I wonder if the MicroKernel should stop at the first
conflict (as it does now) or if it should try to continue and mark
multiple conflicts. Continuing might be a bit tricky to implement, but for
an efficient conflict resolution it would be better I guess (to avoid many
roundtrips). Should we define a Json format for conflicts?

Regards,
Thomas

Re: Conflict handling in Oak

Posted by Michael Dürig <md...@apache.org>.

On 18.12.12 9:38, Thomas Mueller wrote:
> What I suggest should be merged within the MicroKernel:
>
> * Two sessions concurrently add different child nodes to a node ("/test/a"
> and "/test/b"): this is merged as it's not really a conflict
>
> * Two sessions concurrently delete different child nodes ("/test/a" and
> "/test/b"): this is merged
>
> * Two sessions concurrently move different child nodes to another location

I think this can be summed up as:

Only merge non conflicting changes wrt. the children of a node. The 
children of any nodes are its child nodes and its properties. Two 
changes to the children of a node conflict if these children have the 
same name.

This has the beauty of simplicity and as Tom notes below also does not 
require the journal to be corrected.

>
> The reason for this is to allow concurrently manipulating child nodes if
> there are many child nodes (concurrent repository loading).
>
> With this rules, I believe that "2) Furthermore merges should be correctly
> mirrored in the journal" wouldn't be required, as there are no merges that
> would cause the journal to change.

Right. The reason for this is - and that's again a very nice property of 
this approach - that for these conflicts the corresponding journal 
entries commute.

In addition it would be nice to annotate conflicts in some way. This is 
quite easy to do and would allow upper layers to resolve conflicts based 
on specific business logic. Currently we do something along these lines 
with the AnnotatingConflictHandler [1] in oak-core.

Michael

[1] 
https://github.com/jukka/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/commit/AnnotatingConflictHandler.java

Re: Conflict handling in Oak

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

I think it would be better if we clearly define the rules in the
MicroKernel API. There are various edge cases, and having too much freedom
in the MicroKernel API will make oak-core more complicated I guess.

We should also look at what we did in Jackrabbit 2.x and what problems we
ran into. As far as I understand, Jackrabbit 2.x tries to merge concurrent
updates within jackrabbit-core, but we did run into some problems
(concurrent modifications to a single node, concurrently adding child
nodes, concurrently changing and deleting a node). There is an eventing
mechanism within jackrabbit-core so that changes are applied in real time
in other sessions, I believe this mechanism is problematic
performance-wise if there are hundreds of open sessions (there is an open
issue for that). I wonder if we should try to emulate what Jackrabbit 2.x
did, or rather use a different behaviour, now that we have MVCC.

In the past, we discussed that multiple cluster nodes could contain the
same (logical) data, for example "/lib", and concurrently update this
area, and the cluster nodes would then synchronize (merge the changes).
This would require all changes can be merged, but it would be rather
complex, and we would probably need to consult the upper layer on how to
merge (so that node type or other restrictions are not violated, or so
that the index doesn't become corrupt).

With the current MongoDB architecture, there is no shared data: each
MongoDB shard is responsible for a part of the data, but there is no
overlap (for replica sets, all the writes occur on the master, and the
slaves only support read operations). With this architecture, we wouldn't
need to merge changes to the same node; instead, we could throw an
exception to the user of the JCR API.

Within the Microkernel, I wonder if we actually should merge conflicting
updates at all. So, I propose the second session fails (concurrent update
within the MicroKernel) for:

* Two sessions concurrently add a node "/test" (even if the node has the
same properties and values): the second session fails. Reason: this could
be incorrectly interpreted as a same name sibling.

* Two sessions concurrently update the same property of a node to a
different value ("/test/x=1" and "/test/x=2"): the second session fails

* Two sessions concurrently update a property of a node ("/test/x=1" and
"/test/y=2"): the second session fails. This is to simplify checking node
type constraints.

* Two sessions concurrently update the same property of a node to the same
value ("/test/x=1" and "/test/x=1"): the second session fails. This might
seem strange, but let's assume originally the value was "itemsInStock=10",
then a session updates that to "itemsInStock=5" and another session
updates it as well - for a stock keeping (or reservation) application it
would be better if the second update would fail.

* Two session concurrently move a node to another location: the second
sessions fails

* Two session concurrently move a node to the same location: the second
sessions fails (similar reason as for concurrent update)

* One session moves a node, another updates a property: the second session
fails

* One session moves a node, another deletes a node: the second session
fails

* Two sessions concurrently delete a node: the second session fails. This
also seems strange, but it would make it consistent with other concurrent
updates. A possible use case is a reservation system, where each node is
an available seat (so that deleting an available seat would make it
unavailable for other sessions).

* One session moves a node and another deletes, moves, or updates a child
node: the second session fails

What I suggest should be merged within the MicroKernel:

* Two sessions concurrently add different child nodes to a node ("/test/a"
and "/test/b"): this is merged as it's not really a conflict

* Two sessions concurrently delete different child nodes ("/test/a" and
"/test/b"): this is merged

* Two sessions concurrently move different child nodes to another location

The reason for this is to allow concurrently manipulating child nodes if
there are many child nodes (concurrent repository loading).

With this rules, I believe that "2) Furthermore merges should be correctly
mirrored in the journal" wouldn't be required, as there are no merges that
would cause the journal to change.

As for "3) Throwing an unspecific MicrokernelException": yes, this should
be changed. We could also include the line number and position within the
journal in the exception object. But I'm not sure if oak-core should try
to merge (validating node type constraints) or rather also throw to the
JCR API caller.

Regards,
Thomas

On 12/17/12 5:05 PM, "Michael Marth" <mm...@adobe.com> wrote:

>Hi,
>
>you raise a very important point for a distributed MK implementation.
>
>I agree with your suggestions for 1 and 3.
>Re 2 I would prefer to specify the MUST NOTs (which we would probably
>have to do anyway if we specify the MUSTs IMO)
>
>Michael
>
>On Dec 12, 2012, at 4:46 PM, Michael Dürig wrote:
>
>> Hi,
>> 
>> Currently the Microkernel contract does not specify a merge policy but
>> is free to try to merge conflicting changes or throw an exception. I
>> think this is problematic in various ways:
>> 
>> 1) Automatic merging may violate the principal of least surprise. It
>>can 
>> be arbitrary complex and still be incorrect wrt. different use cases
>> which need different merge strategies for the same conflict.
>> 
>> 2) Furthermore merges should be correctly mirrored in the journal.
>> According to the Microkernel API: "deleting a node is allowed if the
>> node existed in the given revision, even if it was deleted in the
>> meantime." So the following should currently not fail (it does though,
>> see OAK-507):
>> 
>>     String base = mk.getHeadRevision();
>>     String r1 = mk.commit("-a", base)
>>     String r2 = mk.commit("-a", base)
>> 
>> At this point retrieving the journal up to revision r2 should only
>> contain a single -a operation. I'm quite sure this is currently not the
>> case and the journal will contain two -a operations. One for revision
>>r1 
>> and another for revision r2.
>> 
>> 3) Throwing an unspecific MicrokernelException leaves the API consumer
>> with no clue on what caused a commit to fail. Retrying a commit after
>> some client side conflict resolution becomes a hit and miss. See
>>OAK-442.
>> 
>> 
>> To address 1) I suggest we define a set of clear cut cases where any
>> Microkernel implementations MUST merge. For the other cases I'm not
>>sure 
>> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
>> 
>> To address 2) My preferred solution would be to drop getJournal
>>entirely 
>> from the Microkernel API. However, this means rebasing a branch would
>> need to go into the Microkernel (OAK-464). Otherwise every merge
>>defined 
>> for 1) would need to take care the journal is adjusted accordingly.
>> Another possibility here is to leave the journal unadjusted. However
>> then we need to specify MUST NOT for other merges in 1). Because only
>> then can clients of the journal know how to interpret the journal
>> (receptively the conflicts contained therein).
>> 
>> To address 3) I'd simply derive a more specific exception from
>> MicroKernelException and throw that in the case of a conflict. See
>>OAK-496.
>> 
>> Michael
>

Re: Conflict handling in Oak

Posted by Michael Marth <mm...@adobe.com>.

Hi,

you raise a very important point for a distributed MK implementation.

I agree with your suggestions for 1 and 3.
Re 2 I would prefer to specify the MUST NOTs (which we would probably have to do anyway if we specify the MUSTs IMO)

Michael

On Dec 12, 2012, at 4:46 PM, Michael Dürig wrote:

> Hi,
> 
> Currently the Microkernel contract does not specify a merge policy but 
> is free to try to merge conflicting changes or throw an exception. I 
> think this is problematic in various ways:
> 
> 1) Automatic merging may violate the principal of least surprise. It can 
> be arbitrary complex and still be incorrect wrt. different use cases 
> which need different merge strategies for the same conflict.
> 
> 2) Furthermore merges should be correctly mirrored in the journal. 
> According to the Microkernel API: "deleting a node is allowed if the 
> node existed in the given revision, even if it was deleted in the 
> meantime." So the following should currently not fail (it does though, 
> see OAK-507):
> 
>     String base = mk.getHeadRevision();
>     String r1 = mk.commit("-a", base)
>     String r2 = mk.commit("-a", base)
> 
> At this point retrieving the journal up to revision r2 should only 
> contain a single -a operation. I'm quite sure this is currently not the 
> case and the journal will contain two -a operations. One for revision r1 
> and another for revision r2.
> 
> 3) Throwing an unspecific MicrokernelException leaves the API consumer 
> with no clue on what caused a commit to fail. Retrying a commit after 
> some client side conflict resolution becomes a hit and miss. See OAK-442.
> 
> 
> To address 1) I suggest we define a set of clear cut cases where any 
> Microkernel implementations MUST merge. For the other cases I'm not sure 
> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
> 
> To address 2) My preferred solution would be to drop getJournal entirely 
> from the Microkernel API. However, this means rebasing a branch would 
> need to go into the Microkernel (OAK-464). Otherwise every merge defined 
> for 1) would need to take care the journal is adjusted accordingly.
> Another possibility here is to leave the journal unadjusted. However 
> then we need to specify MUST NOT for other merges in 1). Because only 
> then can clients of the journal know how to interpret the journal 
> (receptively the conflicts contained therein).
> 
> To address 3) I'd simply derive a more specific exception from 
> MicroKernelException and throw that in the case of a conflict. See OAK-496.
> 
> Michael

Re: Conflict handling in Oak

Posted by Stefan Guggisberg <st...@gmail.com>.

On Tue, Dec 18, 2012 at 2:30 PM, Michael Dürig <mi...@gmail.com> wrote:
>
>
> On 18.12.12 11:30, Stefan Guggisberg wrote:
>>
>> On Wed, Dec 12, 2012 at 4:46 PM, Michael Dürig <md...@apache.org> wrote:
>>>
>>> Hi,
>>>
>>> Currently the Microkernel contract does not specify a merge policy but is
>>> free to try to merge conflicting changes or throw an exception. I think
>>> this
>>> is problematic in various ways:
>>>
>>> 1) Automatic merging may violate the principal of least surprise. It can
>>> be
>>> arbitrary complex and still be incorrect wrt. different use cases which
>>> need
>>> different merge strategies for the same conflict.
>>>
>>> 2) Furthermore merges should be correctly mirrored in the journal.
>>> According
>>> to the Microkernel API: "deleting a node is allowed if the node existed
>>> in
>>> the given revision, even if it was deleted in the meantime." So the
>>> following should currently not fail (it does though, see OAK-507):
>>>
>>>      String base = mk.getHeadRevision();
>>>      String r1 = mk.commit("-a", base)
>>>      String r2 = mk.commit("-a", base)
>>>
>>> At this point retrieving the journal up to revision r2 should only
>>> contain a
>>> single -a operation. I'm quite sure this is currently not the case and
>>> the
>>> journal will contain two -a operations. One for revision r1 and another
>>> for
>>> revision r2.
>>
>>
>> if that's the case then it's a bug. the journal must IMO contain the exact
>> diff
>> from a revision to its predecessor.
>
>
> See OAK-532.

thanks
stefan

>
> Michael
>
>
>>
>> cheers
>> stefan
>>
>>>
>>> 3) Throwing an unspecific MicrokernelException leaves the API consumer
>>> with
>>> no clue on what caused a commit to fail. Retrying a commit after some
>>> client
>>> side conflict resolution becomes a hit and miss. See OAK-442.
>>>
>>>
>>> To address 1) I suggest we define a set of clear cut cases where any
>>> Microkernel implementations MUST merge. For the other cases I'm not sure
>>> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
>>>
>>> To address 2) My preferred solution would be to drop getJournal entirely
>>> from the Microkernel API. However, this means rebasing a branch would
>>> need
>>> to go into the Microkernel (OAK-464). Otherwise every merge defined for
>>> 1)
>>> would need to take care the journal is adjusted accordingly.
>>> Another possibility here is to leave the journal unadjusted. However then
>>> we
>>> need to specify MUST NOT for other merges in 1). Because only then can
>>> clients of the journal know how to interpret the journal (receptively the
>>> conflicts contained therein).
>>>
>>> To address 3) I'd simply derive a more specific exception from
>>> MicroKernelException and throw that in the case of a conflict. See
>>> OAK-496.
>>>
>>> Michael

Re: Conflict handling in Oak

Posted by Michael Dürig <mi...@gmail.com>.


On 18.12.12 11:30, Stefan Guggisberg wrote:
> On Wed, Dec 12, 2012 at 4:46 PM, Michael Dürig <md...@apache.org> wrote:
>> Hi,
>>
>> Currently the Microkernel contract does not specify a merge policy but is
>> free to try to merge conflicting changes or throw an exception. I think this
>> is problematic in various ways:
>>
>> 1) Automatic merging may violate the principal of least surprise. It can be
>> arbitrary complex and still be incorrect wrt. different use cases which need
>> different merge strategies for the same conflict.
>>
>> 2) Furthermore merges should be correctly mirrored in the journal. According
>> to the Microkernel API: "deleting a node is allowed if the node existed in
>> the given revision, even if it was deleted in the meantime." So the
>> following should currently not fail (it does though, see OAK-507):
>>
>>      String base = mk.getHeadRevision();
>>      String r1 = mk.commit("-a", base)
>>      String r2 = mk.commit("-a", base)
>>
>> At this point retrieving the journal up to revision r2 should only contain a
>> single -a operation. I'm quite sure this is currently not the case and the
>> journal will contain two -a operations. One for revision r1 and another for
>> revision r2.
>
> if that's the case then it's a bug. the journal must IMO contain the exact diff
> from a revision to its predecessor.

See OAK-532.

Michael

>
> cheers
> stefan
>
>>
>> 3) Throwing an unspecific MicrokernelException leaves the API consumer with
>> no clue on what caused a commit to fail. Retrying a commit after some client
>> side conflict resolution becomes a hit and miss. See OAK-442.
>>
>>
>> To address 1) I suggest we define a set of clear cut cases where any
>> Microkernel implementations MUST merge. For the other cases I'm not sure
>> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
>>
>> To address 2) My preferred solution would be to drop getJournal entirely
>> from the Microkernel API. However, this means rebasing a branch would need
>> to go into the Microkernel (OAK-464). Otherwise every merge defined for 1)
>> would need to take care the journal is adjusted accordingly.
>> Another possibility here is to leave the journal unadjusted. However then we
>> need to specify MUST NOT for other merges in 1). Because only then can
>> clients of the journal know how to interpret the journal (receptively the
>> conflicts contained therein).
>>
>> To address 3) I'd simply derive a more specific exception from
>> MicroKernelException and throw that in the case of a conflict. See OAK-496.
>>
>> Michael

Re: Conflict handling in Oak

Posted by Stefan Guggisberg <st...@gmail.com>.

On Wed, Dec 12, 2012 at 4:46 PM, Michael Dürig <md...@apache.org> wrote:
> Hi,
>
> Currently the Microkernel contract does not specify a merge policy but is
> free to try to merge conflicting changes or throw an exception. I think this
> is problematic in various ways:
>
> 1) Automatic merging may violate the principal of least surprise. It can be
> arbitrary complex and still be incorrect wrt. different use cases which need
> different merge strategies for the same conflict.
>
> 2) Furthermore merges should be correctly mirrored in the journal. According
> to the Microkernel API: "deleting a node is allowed if the node existed in
> the given revision, even if it was deleted in the meantime." So the
> following should currently not fail (it does though, see OAK-507):
>
>     String base = mk.getHeadRevision();
>     String r1 = mk.commit("-a", base)
>     String r2 = mk.commit("-a", base)
>
> At this point retrieving the journal up to revision r2 should only contain a
> single -a operation. I'm quite sure this is currently not the case and the
> journal will contain two -a operations. One for revision r1 and another for
> revision r2.

if that's the case then it's a bug. the journal must IMO contain the exact diff
from a revision to its predecessor.

cheers
stefan

>
> 3) Throwing an unspecific MicrokernelException leaves the API consumer with
> no clue on what caused a commit to fail. Retrying a commit after some client
> side conflict resolution becomes a hit and miss. See OAK-442.
>
>
> To address 1) I suggest we define a set of clear cut cases where any
> Microkernel implementations MUST merge. For the other cases I'm not sure
> whether we should make them MUST NOT, SHOULD NOT or MAY merge.
>
> To address 2) My preferred solution would be to drop getJournal entirely
> from the Microkernel API. However, this means rebasing a branch would need
> to go into the Microkernel (OAK-464). Otherwise every merge defined for 1)
> would need to take care the journal is adjusted accordingly.
> Another possibility here is to leave the journal unadjusted. However then we
> need to specify MUST NOT for other merges in 1). Because only then can
> clients of the journal know how to interpret the journal (receptively the
> conflicts contained therein).
>
> To address 3) I'd simply derive a more specific exception from
> MicroKernelException and throw that in the case of a conflict. See OAK-496.
>
> Michael