You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2014/05/02 11:38:49 UTC

Re: Avoiding conflicts (OAK-1717)

On 30.4.14 9:41 , Davide Giannella wrote:
> On 29/04/2014 16:06, Davide Giannella wrote:
>> ...
>>
>> Serving searches could be done either by the `:next` stream or by the
>> `:next` and then merged on the fly with the current `:next-ID` in the
>> iterator while returning the resultset.
>>
>> Any thoughts on the above?
> More thinking around it... if can't find a way to deliver merged
> searches on the fly it doesn't worth the effort as it will work as an
> asynchronous index with some extra complexity.
>

Another idea that came to mind, which might be easier to implement and 
require fewer changes to the existing code:

Stay with the linked list / skip list approach but defer inserting new 
keys to a background operation. Instead new keys are put into a 
"pending-non-conflicting-id" container until picked up by a background 
thread and merged into the linked list. Conflicts are simply handled by 
a first committer wins strategy in the same way we handle the entire 
index update currently.

When such an index is used, the "pending-non-conflicting-id" container 
would have to be considered and its content would have to be merged into 
the linked list on the fly. This should be rather easy by first sorting 
these extra keys and then on the fly merging the linked list and the 
sorted extra keys.

I think such an approach is traded off very well: the index is always 
consistent, however depending on the number of keys in the 
"pending-non-conflicting-id" container performance might degrade a bit 
until the background process has caught up again. Additionally this 
approach is only slightly more complicated than what we already have.

Michael

Re: Avoiding conflicts (OAK-1717)

Posted by Davide Giannella <gi...@gmail.com>.

On 06/05/2014 20:29, Michael Dürig wrote:
> On 6.5.14 2:59 , Davide Giannella wrote:
>> On 02/05/2014 10:38, Michael Dürig wrote:
>>> Stay with the linked list / skip list approach but defer inserting new
>>> keys to a background operation. Instead new keys are put into a
>>> "pending-non-conflicting-id" container until picked up by a background
>>> thread and merged into the linked list. Conflicts are simply handled
>>> by a first committer wins strategy in the same way we handle the
>>> entire index update currently.
>>>
>>> When such an index is used, the "pending-non-conflicting-id" container
>>> would have to be considered and its content would have to be merged
>>> into the linked list on the fly. This should be rather easy by first
>>> sorting these extra keys and then on the fly merging the linked list
>>> and the sorted extra keys.
>> ...
> ...

Had a long off-line discussion about the topic with Michael where we
brainstormed structures, use-cases and conflicts.

We're going to go for one of the following:

# Solution 1

:index : {
   _pending : {
      prop-550e8400-e29b-41d4-a716-446655440000 : {
        foo="/path/to/the/content" },
      prop-550e8400-e29b-41d4-a716-446655440000 : {
        bar="-/path/to/the/content" } <== deletion
   }
}

where each cluster node/session add a

prop-UUID : { $property-value = "${operation-if-needed}/pathtocontent" }

then the async merger will parse those, apply to the actual structure
and delete from the _pending area.

## Pros

definitely avoid conflicts
works as a sort of ChangeLog

## Cons

will be harder the merging of results.

# Solution 2

:index : {
    keyA : {
        path : { to : { content : {match=true} } }
    },
    keyB : {
        path : { to : { content : {deleted=true} } }
    }
}

where each cluster use the _pending area as it would be a real index (no
order in here though) and will never physically delete a node.

The async merger will parse and merge into the actual index cleaning-up
while doing it.

## Pros

Easy for on-the-fly merging while serving queries

## Cons

Could not avoid conflict as we expect.

Will update the ticket accordingly for the records.

Cheers
Davide

Re: Avoiding conflicts (OAK-1717)

Posted by Michael Dürig <md...@apache.org>.


On 6.5.14 2:59 , Davide Giannella wrote:
> On 02/05/2014 10:38, Michael Dürig wrote:
>> Stay with the linked list / skip list approach but defer inserting new
>> keys to a background operation. Instead new keys are put into a
>> "pending-non-conflicting-id" container until picked up by a background
>> thread and merged into the linked list. Conflicts are simply handled
>> by a first committer wins strategy in the same way we handle the
>> entire index update currently.
>>
>> When such an index is used, the "pending-non-conflicting-id" container
>> would have to be considered and its content would have to be merged
>> into the linked list on the fly. This should be rather easy by first
>> sorting these extra keys and then on the fly merging the linked list
>> and the sorted extra keys.
> Interesting idea. We could have the index with a structure like
>
> :index : {
>      _pending : {
>          keyA : {...},
>          keyB : {...},
>          ...
>          keyN : {...}
>      },
>      :start : {...},
>      ...
>      keyN : {...}
> }
>
> where under `_pending` it's kept, as you said, a list of keys+documents
> but unlinked. This is maintained with a simple NodeBuilder.child(). The
> Async agent will be the only one taking care of doing
> NodeBuilder.remove() once merged.
>
> Will it be an unmanageable conflict if we have the agent on ClusterA
> removing the key "foobar" and in the same cycle ClusterB add a new
> document under "foobar"?
>
> Still not sure how I could manage deletions and edits (documents that
> changed key) in this scenario.

I think we can. At least if we can rely on the conflict resolution being 
implemented as I described earlier [1]. You'd need to double check this 
for the DocumentNS though. @Marcel, could you shed some light on this?

The keyXXX nodes from above would need to contain properties with unique 
names that in turn point to the respective nodes for that key. When such 
a node is removed the property would get removed, but not the parent 
node, event if it happens to be empty. This would be left to the async 
maintenance task.

For example say we initially have

keyA : { p5254=/foo/bar }

now someone adds /bar/baz and someone else adds /quz/qoz and someone 
else again removes /foo/bar for the same keyA. This would result in:

keyA : { p6475=/bar/baz, p7489=/quz/qoz }

This would even work without conflict if two or more cluster nodes would 
at the same time remove /foo/bar.

As special case is when the same node for keyA, say /x/y, is added 
concurrently by multiple cluster nodes. This would now result in

keyA : { p6475=/bar/baz, p7489=/quz/qoz, p9873=/x/y, p1255=/x/y }

I.e. /x/y get's duplicated, which needs to be properly handled by the 
async maintenance task and also when using the index before it has been 
maintained.


[1] 
https://issues.apache.org/jira/browse/OAK-1553?focusedCommentId=13940731&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13940731


>> I think such an approach is traded off very well: the index is always
>> consistent, however depending on the number of keys in the
>> "pending-non-conflicting-id" container performance might degrade a bit
>> until the background process has caught up again. Additionally this
>> approach is only slightly more complicated than what we already have.
> Well we could limit the sorting on the _pending area only to a
> configurable limit to avoid OOM. Let's say 1000 keys. It could be a
> decent compromise.

I think it is a decent compromise and we should not worry about it too 
much at this point.

>
> In the meanwhile I was investigating the B-Tree (BT). It would work on
> paper of course and it has the same O() of the SkipList (SL). As
> advantage it could serve ascending and descending order off the same
> index. On the other hand it sounds as it will be less efficient in
> serving range queries. In the case of the SL all we have to do is
> seeking the correct start and then just "browse through", while in case
> of the BT to navigate it we'll have to continuously going up and down
> the tree, in-memory sorting of the current bucket and seeking where to
> go. Plus in the case of `lastModified` dates where we always have to add
> a greater element, it will require the update of all the buckets from
> top to leaf.
>
> What I'm not sure around the BT is the ability to manage conflict. Let's
> say we have
>
> A Z
> |  \
> .. MQZ
>      / | \
>        OQ XYZ
>
> if the same instant ClusterA deletes O and Q the results will be
>
> A Z
> |  \
> .. MZ
>      /  \
>         XYZ
>
> and ClusterB adds P with the results will be
>
> A Z
> |  \
> .. MQZ
>      / | \
>       OPQ XYZ
>
> how do we expect mongo to manage this situation during the cluster sync?
> Each letter is a node and the line points to subnodes.

My initial idea was that the btree maintenance would only be done by the 
async task while added keys would be kept in a "pending" bucket... at 
which point I realised that we could very well do the same thing with 
the current linked/skip list approach, which probably performs better, 
is easier to implement and closer to what we already have.

Michael

>
> Any thoughts on this?
>
> D.
>

Re: Avoiding conflicts (OAK-1717)

Posted by Davide Giannella <gi...@gmail.com>.

On 02/05/2014 10:38, Michael Dürig wrote:
> Stay with the linked list / skip list approach but defer inserting new
> keys to a background operation. Instead new keys are put into a
> "pending-non-conflicting-id" container until picked up by a background
> thread and merged into the linked list. Conflicts are simply handled
> by a first committer wins strategy in the same way we handle the
> entire index update currently.
>
> When such an index is used, the "pending-non-conflicting-id" container
> would have to be considered and its content would have to be merged
> into the linked list on the fly. This should be rather easy by first
> sorting these extra keys and then on the fly merging the linked list
> and the sorted extra keys.
Interesting idea. We could have the index with a structure like

:index : {
    _pending : {
        keyA : {...},
        keyB : {...},
        ...
        keyN : {...}
    },
    :start : {...},
    ...
    keyN : {...}
}

where under `_pending` it's kept, as you said, a list of keys+documents
but unlinked. This is maintained with a simple NodeBuilder.child(). The
Async agent will be the only one taking care of doing
NodeBuilder.remove() once merged.

Will it be an unmanageable conflict if we have the agent on ClusterA
removing the key "foobar" and in the same cycle ClusterB add a new
document under "foobar"?

Still not sure how I could manage deletions and edits (documents that
changed key) in this scenario.
> I think such an approach is traded off very well: the index is always
> consistent, however depending on the number of keys in the
> "pending-non-conflicting-id" container performance might degrade a bit
> until the background process has caught up again. Additionally this
> approach is only slightly more complicated than what we already have.
Well we could limit the sorting on the _pending area only to a
configurable limit to avoid OOM. Let's say 1000 keys. It could be a
decent compromise.

In the meanwhile I was investigating the B-Tree (BT). It would work on
paper of course and it has the same O() of the SkipList (SL). As
advantage it could serve ascending and descending order off the same
index. On the other hand it sounds as it will be less efficient in
serving range queries. In the case of the SL all we have to do is
seeking the correct start and then just "browse through", while in case
of the BT to navigate it we'll have to continuously going up and down
the tree, in-memory sorting of the current bucket and seeking where to
go. Plus in the case of `lastModified` dates where we always have to add
a greater element, it will require the update of all the buckets from
top to leaf.

What I'm not sure around the BT is the ability to manage conflict. Let's
say we have

A Z
|  \
.. MQZ
    / | \
      OQ XYZ

if the same instant ClusterA deletes O and Q the results will be

A Z
|  \
.. MZ
    /  \
       XYZ

and ClusterB adds P with the results will be

A Z
|  \
.. MQZ
    / | \
     OPQ XYZ

how do we expect mongo to manage this situation during the cluster sync?
Each letter is a node and the line points to subnodes.

Any thoughts on this?

D.