You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Marcel Reutegger <mr...@adobe.com> on 2013/03/05 10:27:33 UTC

Consistency aka Isolation Level (was: OAK-638 Avoid branch/merge for small commits)

Hi,

I think it's better to discuss on the mailing list instead of an already
closed issue...

Michael wrote:
> It should be possible to combine this with the "branch, rebase,
> fast forward merge" approach I described above : we just need to
> make the fast forward merge a bit more clever such that it could
> detect and merge changes in distinct areas of the tree instead of
> just giving up when there was concurrent change.

Right, but then why do we branch and rebase in the first place, when
we need the merge (and commit!) do the same (the more clever
bit you mentioned) in a concurrent write scenario?

The only reason I see right now is consistency we gain with the
conflict handling Michael proposed [0] for the MicroKernel API
and Jukka implemented in the SegmentMK with the NodeStore
API.

Jukkas implementation of the conflict handling provides
Serializable Snapshot Isolation because it re-runs the hooks on
a rebased branch whenever a concurrent change is detected
before it merges. If we allow it to be more clever, we lose
the Serializable part of the isolation level. Though, I think that's
quite OK. See below.

With the MicroKernel the story is a bit different. E.g. the conflict
handling model allows the commit to internally rebase, but
without re-running the commit hooks. validation and changes
performed by the commit hook are already part of the JSOP.
This means the MicroKernel will exhibit 'write skew' and not
provide Serializable Snapshot Isolation. This was already described
a while ago by Michael [1].

I think this is OK and we shouldn't require the MicroKernel nor 
the NodeStore to provide more strict consistency guarantees
than Snapshot Isolation. I don't see how we can add the Serializable
part without severely impacting throughput in a distributed write
scenario.

If we still need Serializable Snapshot Isolation for some of our
Validators, we can use other techniques [2] to ensure consistency.
Materializing the conflict is one option and sometimes just happens
automatically. E.g. with a unique index on jcr:uuid we can ensure
consistency (every referenceable node has a unique UUID) even
with an implementation that only provides snapshot isolation.

What I propose, is to make this explicit in the JavaDoc of the MicroKernel
and the NodeStore API. Specifically the NodeStore API lacks quite some
details. Maybe we can just reference the relevant parts in the MicroKernel.

Regards
 Marcel

[0] http://wiki.apache.org/jackrabbit/Conflict%20handling%20through%20rebasing%20branches
[1] http://wiki.apache.org/jackrabbit/Transactional%20model%20of%20the%20Microkernel%20based%20Jackrabbit%20prototype
[2] http://en.wikipedia.org/wiki/Snapshot_isolation#Serializable_Snapshot_Isolation


Re: Consistency aka Isolation Level (was: OAK-638 Avoid branch/merge for small commits)

Posted by Michael Dürig <md...@apache.org>.

On 6.3.13 8:23, Marcel Reutegger wrote:
> Hi,
>
>> Just to further clarify, the approach where private branches are rebase
>> and the merged into trunk is not too different from what the initial
>> implementation of Microkernel.commit() (H2) tried to do: rebase and then
>> merge. The difference is, that we can "take rebase out of the lock" if
>> we perform it on a private branch.
>
> I have the impression you assume a specific implementation. some
> implementation could simply do what databases usually do and only
> synchronize (or lock) on the nodes they write to. concurrent writes
> in distinct areas of the repository will not block in this case.

No not really. AFAICT locking individual nodes would be an 
implementation variant of rebasing a private branch. That is, it has the 
same observable behaviour. I mentioned this already on my initial post 
on this [1]: " Note how commit is implemented in terms of branch and 
merge. I has not to be implemented that way but rather the observable 
behaviour should be like this."

[1] http://markmail.org/message/wtaarmdtgyf5lvjt

>
> regards
>   marcel
>

RE: Consistency aka Isolation Level (was: OAK-638 Avoid branch/merge for small commits)

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi, 

> Just to further clarify, the approach where private branches are rebase
> and the merged into trunk is not too different from what the initial
> implementation of Microkernel.commit() (H2) tried to do: rebase and then
> merge. The difference is, that we can "take rebase out of the lock" if
> we perform it on a private branch.

I have the impression you assume a specific implementation. some
implementation could simply do what databases usually do and only
synchronize (or lock) on the nodes they write to. concurrent writes
in distinct areas of the repository will not block in this case.

regards
 marcel

Re: Consistency aka Isolation Level (was: OAK-638 Avoid branch/merge for small commits)

Posted by Michael Dürig <md...@apache.org>.

On 5.3.13 12:57, Michael Dürig wrote:
>> Jukkas implementation of the conflict handling provides
>> Serializable Snapshot Isolation because it re-runs the hooks on
>> a rebased branch whenever a concurrent change is detected
>> before it merges. If we allow it to be more clever, we lose
>> the Serializable part of the isolation level. Though, I think that's
>> quite OK. See below.
>>
>> With the MicroKernel the story is a bit different. E.g. the conflict
>> handling model allows the commit to internally rebase, but
>> without re-running the commit hooks. validation and changes
>> performed by the commit hook are already part of the JSOP.
>> This means the MicroKernel will exhibit 'write skew' and not
>> provide Serializable Snapshot Isolation. This was already described
>> a while ago by Michael [1].
>
> My take on this was: when we rebase internally anyway, why not make this
> rebase available externally so branches could rebase themselves and thus
> make it easier for the commit later on? AFAICS the internal rebase would
> either have to be on something which pretty much resembles a private
> branch (optimistic locking) or would need to be synchronised for
> serialising concurrent commits. The latter would result in a very fat
> lock and is not what we want. FYI the H2 based Microkernel tries to
> implement commit without such a fat lock and fails. See OAK-532.

Just to further clarify, the approach where private branches are rebase 
and the merged into trunk is not too different from what the initial 
implementation of Microkernel.commit() (H2) tried to do: rebase and then 
merge. The difference is, that we can "take rebase out of the lock" if 
we perform it on a private branch.

Michael

Re: Consistency aka Isolation Level (was: OAK-638 Avoid branch/merge for small commits)

Posted by Michael Dürig <md...@apache.org>.

On 6.3.13 8:16, Marcel Reutegger wrote:
> Hi,
>
>> My take on this was: when we rebase internally anyway, why not make this
>> rebase available externally so branches could rebase themselves and thus
>> make it easier for the commit later on? AFAICS the internal rebase would
>> either have to be on something which pretty much resembles a private
>> branch (optimistic locking) or would need to be synchronised for
>> serialising concurrent commits. The latter would result in a very fat
>> lock and is not what we want. FYI the H2 based Microkernel tries to
>> implement commit without such a fat lock and fails. See OAK-532.
>
> I don't think this is an implementation issue, but rather a design
> problem as you noted in the discussion on the dev list referenced in
> OAK-532 (http://markmail.org/message/4xwfwbax3kpoysbp)
>
> Concurrent deletes of nodes must IMO fail for the reasons you
> stated and the test you provided in OAK-532.

Ack.

>
> I think we should remove the example from the MK.commit()
> JavaDoc and refer to the conflict definition of MK.rebase(). after
> all this is how I understand your proposal [0] and implication
> on MK.commit().

Ack.

>
> what is the reason MK.commit() explicitly says deleting a concurrently
> deleted node must be merged?

Legacy.

Michael

>
> regards
>   marcel
>
> [0] http://wiki.apache.org/jackrabbit/Conflict%20handling%20through%20rebasing%20branches
>

RE: Consistency aka Isolation Level (was: OAK-638 Avoid branch/merge for small commits)

Posted by Marcel Reutegger <mr...@adobe.com>.
Hi,

> My take on this was: when we rebase internally anyway, why not make this
> rebase available externally so branches could rebase themselves and thus
> make it easier for the commit later on? AFAICS the internal rebase would
> either have to be on something which pretty much resembles a private
> branch (optimistic locking) or would need to be synchronised for
> serialising concurrent commits. The latter would result in a very fat
> lock and is not what we want. FYI the H2 based Microkernel tries to
> implement commit without such a fat lock and fails. See OAK-532.

I don't think this is an implementation issue, but rather a design
problem as you noted in the discussion on the dev list referenced in
OAK-532 (http://markmail.org/message/4xwfwbax3kpoysbp)

Concurrent deletes of nodes must IMO fail for the reasons you
stated and the test you provided in OAK-532.

I think we should remove the example from the MK.commit()
JavaDoc and refer to the conflict definition of MK.rebase(). after
all this is how I understand your proposal [0] and implication
on MK.commit().

what is the reason MK.commit() explicitly says deleting a concurrently
deleted node must be merged?

regards
 marcel

[0] http://wiki.apache.org/jackrabbit/Conflict%20handling%20through%20rebasing%20branches


Re: Consistency aka Isolation Level (was: OAK-638 Avoid branch/merge for small commits)

Posted by Michael Dürig <md...@apache.org>.

On 5.3.13 9:27, Marcel Reutegger wrote:
> Hi,
>
> I think it's better to discuss on the mailing list instead of an already
> closed issue...
>
> Michael wrote:
>> It should be possible to combine this with the "branch, rebase,
>> fast forward merge" approach I described above : we just need to
>> make the fast forward merge a bit more clever such that it could
>> detect and merge changes in distinct areas of the tree instead of
>> just giving up when there was concurrent change.
>
> Right, but then why do we branch and rebase in the first place, when
> we need the merge (and commit!) do the same (the more clever
> bit you mentioned) in a concurrent write scenario?
>
> The only reason I see right now is consistency we gain with the
> conflict handling Michael proposed [0] for the MicroKernel API
> and Jukka implemented in the SegmentMK with the NodeStore
> API.

My idea was to balance off things: rebase on branch is for handling the 
heavy duty stuff while the clever merge should only take care of very 
simple things like changes in separate sub trees. But see below.

>
> Jukkas implementation of the conflict handling provides
> Serializable Snapshot Isolation because it re-runs the hooks on
> a rebased branch whenever a concurrent change is detected
> before it merges. If we allow it to be more clever, we lose
> the Serializable part of the isolation level. Though, I think that's
> quite OK. See below.
>
> With the MicroKernel the story is a bit different. E.g. the conflict
> handling model allows the commit to internally rebase, but
> without re-running the commit hooks. validation and changes
> performed by the commit hook are already part of the JSOP.
> This means the MicroKernel will exhibit 'write skew' and not
> provide Serializable Snapshot Isolation. This was already described
> a while ago by Michael [1].

My take on this was: when we rebase internally anyway, why not make this 
rebase available externally so branches could rebase themselves and thus 
make it easier for the commit later on? AFAICS the internal rebase would 
either have to be on something which pretty much resembles a private 
branch (optimistic locking) or would need to be synchronised for 
serialising concurrent commits. The latter would result in a very fat 
lock and is not what we want. FYI the H2 based Microkernel tries to 
implement commit without such a fat lock and fails. See OAK-532.

Michael

>
> I think this is OK and we shouldn't require the MicroKernel nor
> the NodeStore to provide more strict consistency guarantees
> than Snapshot Isolation. I don't see how we can add the Serializable
> part without severely impacting throughput in a distributed write
> scenario.
>
> If we still need Serializable Snapshot Isolation for some of our
> Validators, we can use other techniques [2] to ensure consistency.
> Materializing the conflict is one option and sometimes just happens
> automatically. E.g. with a unique index on jcr:uuid we can ensure
> consistency (every referenceable node has a unique UUID) even
> with an implementation that only provides snapshot isolation.
>
> What I propose, is to make this explicit in the JavaDoc of the MicroKernel
> and the NodeStore API. Specifically the NodeStore API lacks quite some
> details. Maybe we can just reference the relevant parts in the MicroKernel.
>
> Regards
>   Marcel
>
> [0] http://wiki.apache.org/jackrabbit/Conflict%20handling%20through%20rebasing%20branches
> [1] http://wiki.apache.org/jackrabbit/Transactional%20model%20of%20the%20Microkernel%20based%20Jackrabbit%20prototype
> [2] http://en.wikipedia.org/wiki/Snapshot_isolation#Serializable_Snapshot_Isolation
>