You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Michael Dürig <md...@apache.org> on 2014/05/02 11:38:49 UTC
Re: Avoiding conflicts (OAK-1717)
On 30.4.14 9:41 , Davide Giannella wrote:
> On 29/04/2014 16:06, Davide Giannella wrote:
>> ...
>>
>> Serving searches could be done either by the `:next` stream or by the
>> `:next` and then merged on the fly with the current `:next-ID` in the
>> iterator while returning the resultset.
>>
>> Any thoughts on the above?
> More thinking around it... if can't find a way to deliver merged
> searches on the fly it doesn't worth the effort as it will work as an
> asynchronous index with some extra complexity.
>
Another idea that came to mind, which might be easier to implement and
require fewer changes to the existing code:
Stay with the linked list / skip list approach but defer inserting new
keys to a background operation. Instead new keys are put into a
"pending-non-conflicting-id" container until picked up by a background
thread and merged into the linked list. Conflicts are simply handled by
a first committer wins strategy in the same way we handle the entire
index update currently.
When such an index is used, the "pending-non-conflicting-id" container
would have to be considered and its content would have to be merged into
the linked list on the fly. This should be rather easy by first sorting
these extra keys and then on the fly merging the linked list and the
sorted extra keys.
I think such an approach is traded off very well: the index is always
consistent, however depending on the number of keys in the
"pending-non-conflicting-id" container performance might degrade a bit
until the background process has caught up again. Additionally this
approach is only slightly more complicated than what we already have.
Michael
Re: Avoiding conflicts (OAK-1717)
Posted by Davide Giannella <gi...@gmail.com>.
On 06/05/2014 20:29, Michael Dürig wrote:
> On 6.5.14 2:59 , Davide Giannella wrote:
>> On 02/05/2014 10:38, Michael Dürig wrote:
>>> Stay with the linked list / skip list approach but defer inserting new
>>> keys to a background operation. Instead new keys are put into a
>>> "pending-non-conflicting-id" container until picked up by a background
>>> thread and merged into the linked list. Conflicts are simply handled
>>> by a first committer wins strategy in the same way we handle the
>>> entire index update currently.
>>>
>>> When such an index is used, the "pending-non-conflicting-id" container
>>> would have to be considered and its content would have to be merged
>>> into the linked list on the fly. This should be rather easy by first
>>> sorting these extra keys and then on the fly merging the linked list
>>> and the sorted extra keys.
>> ...
> ...
Had a long off-line discussion about the topic with Michael where we
brainstormed structures, use-cases and conflicts.
We're going to go for one of the following:
# Solution 1
:index : {
_pending : {
prop-550e8400-e29b-41d4-a716-446655440000 : {
foo="/path/to/the/content" },
prop-550e8400-e29b-41d4-a716-446655440000 : {
bar="-/path/to/the/content" } <== deletion
}
}
where each cluster node/session add a
prop-UUID : { $property-value = "${operation-if-needed}/pathtocontent" }
then the async merger will parse those, apply to the actual structure
and delete from the _pending area.
## Pros
definitely avoid conflicts
works as a sort of ChangeLog
## Cons
will be harder the merging of results.
# Solution 2
:index : {
keyA : {
path : { to : { content : {match=true} } }
},
keyB : {
path : { to : { content : {deleted=true} } }
}
}
where each cluster use the _pending area as it would be a real index (no
order in here though) and will never physically delete a node.
The async merger will parse and merge into the actual index cleaning-up
while doing it.
## Pros
Easy for on-the-fly merging while serving queries
## Cons
Could not avoid conflict as we expect.
Will update the ticket accordingly for the records.
Cheers
Davide
Re: Avoiding conflicts (OAK-1717)
Posted by Michael Dürig <md...@apache.org>.
On 6.5.14 2:59 , Davide Giannella wrote:
> On 02/05/2014 10:38, Michael Dürig wrote:
>> Stay with the linked list / skip list approach but defer inserting new
>> keys to a background operation. Instead new keys are put into a
>> "pending-non-conflicting-id" container until picked up by a background
>> thread and merged into the linked list. Conflicts are simply handled
>> by a first committer wins strategy in the same way we handle the
>> entire index update currently.
>>
>> When such an index is used, the "pending-non-conflicting-id" container
>> would have to be considered and its content would have to be merged
>> into the linked list on the fly. This should be rather easy by first
>> sorting these extra keys and then on the fly merging the linked list
>> and the sorted extra keys.
> Interesting idea. We could have the index with a structure like
>
> :index : {
> _pending : {
> keyA : {...},
> keyB : {...},
> ...
> keyN : {...}
> },
> :start : {...},
> ...
> keyN : {...}
> }
>
> where under `_pending` it's kept, as you said, a list of keys+documents
> but unlinked. This is maintained with a simple NodeBuilder.child(). The
> Async agent will be the only one taking care of doing
> NodeBuilder.remove() once merged.
>
> Will it be an unmanageable conflict if we have the agent on ClusterA
> removing the key "foobar" and in the same cycle ClusterB add a new
> document under "foobar"?
>
> Still not sure how I could manage deletions and edits (documents that
> changed key) in this scenario.
I think we can. At least if we can rely on the conflict resolution being
implemented as I described earlier [1]. You'd need to double check this
for the DocumentNS though. @Marcel, could you shed some light on this?
The keyXXX nodes from above would need to contain properties with unique
names that in turn point to the respective nodes for that key. When such
a node is removed the property would get removed, but not the parent
node, event if it happens to be empty. This would be left to the async
maintenance task.
For example say we initially have
keyA : { p5254=/foo/bar }
now someone adds /bar/baz and someone else adds /quz/qoz and someone
else again removes /foo/bar for the same keyA. This would result in:
keyA : { p6475=/bar/baz, p7489=/quz/qoz }
This would even work without conflict if two or more cluster nodes would
at the same time remove /foo/bar.
As special case is when the same node for keyA, say /x/y, is added
concurrently by multiple cluster nodes. This would now result in
keyA : { p6475=/bar/baz, p7489=/quz/qoz, p9873=/x/y, p1255=/x/y }
I.e. /x/y get's duplicated, which needs to be properly handled by the
async maintenance task and also when using the index before it has been
maintained.
[1]
https://issues.apache.org/jira/browse/OAK-1553?focusedCommentId=13940731&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13940731
>> I think such an approach is traded off very well: the index is always
>> consistent, however depending on the number of keys in the
>> "pending-non-conflicting-id" container performance might degrade a bit
>> until the background process has caught up again. Additionally this
>> approach is only slightly more complicated than what we already have.
> Well we could limit the sorting on the _pending area only to a
> configurable limit to avoid OOM. Let's say 1000 keys. It could be a
> decent compromise.
I think it is a decent compromise and we should not worry about it too
much at this point.
>
> In the meanwhile I was investigating the B-Tree (BT). It would work on
> paper of course and it has the same O() of the SkipList (SL). As
> advantage it could serve ascending and descending order off the same
> index. On the other hand it sounds as it will be less efficient in
> serving range queries. In the case of the SL all we have to do is
> seeking the correct start and then just "browse through", while in case
> of the BT to navigate it we'll have to continuously going up and down
> the tree, in-memory sorting of the current bucket and seeking where to
> go. Plus in the case of `lastModified` dates where we always have to add
> a greater element, it will require the update of all the buckets from
> top to leaf.
>
> What I'm not sure around the BT is the ability to manage conflict. Let's
> say we have
>
> A Z
> | \
> .. MQZ
> / | \
> OQ XYZ
>
> if the same instant ClusterA deletes O and Q the results will be
>
> A Z
> | \
> .. MZ
> / \
> XYZ
>
> and ClusterB adds P with the results will be
>
> A Z
> | \
> .. MQZ
> / | \
> OPQ XYZ
>
> how do we expect mongo to manage this situation during the cluster sync?
> Each letter is a node and the line points to subnodes.
My initial idea was that the btree maintenance would only be done by the
async task while added keys would be kept in a "pending" bucket... at
which point I realised that we could very well do the same thing with
the current linked/skip list approach, which probably performs better,
is easier to implement and closer to what we already have.
Michael
>
> Any thoughts on this?
>
> D.
>
Re: Avoiding conflicts (OAK-1717)
Posted by Davide Giannella <gi...@gmail.com>.
On 02/05/2014 10:38, Michael Dürig wrote:
> Stay with the linked list / skip list approach but defer inserting new
> keys to a background operation. Instead new keys are put into a
> "pending-non-conflicting-id" container until picked up by a background
> thread and merged into the linked list. Conflicts are simply handled
> by a first committer wins strategy in the same way we handle the
> entire index update currently.
>
> When such an index is used, the "pending-non-conflicting-id" container
> would have to be considered and its content would have to be merged
> into the linked list on the fly. This should be rather easy by first
> sorting these extra keys and then on the fly merging the linked list
> and the sorted extra keys.
Interesting idea. We could have the index with a structure like
:index : {
_pending : {
keyA : {...},
keyB : {...},
...
keyN : {...}
},
:start : {...},
...
keyN : {...}
}
where under `_pending` it's kept, as you said, a list of keys+documents
but unlinked. This is maintained with a simple NodeBuilder.child(). The
Async agent will be the only one taking care of doing
NodeBuilder.remove() once merged.
Will it be an unmanageable conflict if we have the agent on ClusterA
removing the key "foobar" and in the same cycle ClusterB add a new
document under "foobar"?
Still not sure how I could manage deletions and edits (documents that
changed key) in this scenario.
> I think such an approach is traded off very well: the index is always
> consistent, however depending on the number of keys in the
> "pending-non-conflicting-id" container performance might degrade a bit
> until the background process has caught up again. Additionally this
> approach is only slightly more complicated than what we already have.
Well we could limit the sorting on the _pending area only to a
configurable limit to avoid OOM. Let's say 1000 keys. It could be a
decent compromise.
In the meanwhile I was investigating the B-Tree (BT). It would work on
paper of course and it has the same O() of the SkipList (SL). As
advantage it could serve ascending and descending order off the same
index. On the other hand it sounds as it will be less efficient in
serving range queries. In the case of the SL all we have to do is
seeking the correct start and then just "browse through", while in case
of the BT to navigate it we'll have to continuously going up and down
the tree, in-memory sorting of the current bucket and seeking where to
go. Plus in the case of `lastModified` dates where we always have to add
a greater element, it will require the update of all the buckets from
top to leaf.
What I'm not sure around the BT is the ability to manage conflict. Let's
say we have
A Z
| \
.. MQZ
/ | \
OQ XYZ
if the same instant ClusterA deletes O and Q the results will be
A Z
| \
.. MZ
/ \
XYZ
and ClusterB adds P with the results will be
A Z
| \
.. MQZ
/ | \
OPQ XYZ
how do we expect mongo to manage this situation during the cluster sync?
Each letter is a node and the line points to subnodes.
Any thoughts on this?
D.