You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Tomek Rękawek (JIRA)" <ji...@apache.org> on 2015/12/02 14:28:11 UTC

[jira] [Comment Edited] (OAK-3559) Bulk document updates in MongoDocumentStore

    [ https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986971#comment-14986971 ] 

Tomek Rękawek edited comment on OAK-3559 at 12/2/15 1:27 PM:
-------------------------------------------------------------

h4. New bulk update method

The patch adds new {{createOrUpdate(Collection<T> collection, List<UpdateOp> updateOps)}} method to the {{DocumentStore}} interface. The MongoDB implementation uses Bulk API. RDB and Memory document stores has been extended with a naive implementation iterating over {{updateOps}}. The Mongo implementation works as follows:

1. For each {{UpdateOp}} try to read the assigned document from the cache. Add them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and read them in one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following operation:
    * Find document with the same id and the same {{mod_count}} as in the {{oldDocs}}.
    * Apply changes from the {{UpdateOps}}.

4. Execute the bulk update.

If some other process modifies the target documents between points 2 and 3, the {{mod_count}} will be increased as well and the bulk update will fail for the concurrently modified docs. The method will then remove the failed documents from the {{oldDocs}} and restart the process from point 2. It will stop after 3rd iteration.

h4. Changes in the Commit class

The new method has been used in the {{Commit#applyToDocumentStore}}. If it fails (eg. there has been more than 3 unsuccessful retries in the Mongo implementation), there will be fallback to the classic approach, applying one update after another.

h4. Changes in the CommitQueue and ConflictException

Introducing bulk updates means that we may have conflicts in many revisions at the same time. That's the reason why the {{ConflictException}} now contains the revision list, rather than a single revision number. In order to resolve conflicts in the {{DocumentNodeStoreBranch#merge0}} method, the {{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass a list of revisions and suspends execution until all of them are visible.


was (Author: tomek.rekawek):
The pull request has been created here:
https://github.com/apache/jackrabbit-oak/pull/43

The patch can be downloaded from:
https://patch-diff.githubusercontent.com/raw/apache/jackrabbit-oak/pull/43.diff

h4. New bulk update method

The patch adds new {{createOrUpdate(Collection<T> collection, List<UpdateOp> updateOps)}} method to the {{DocumentStore}} interface. The MongoDB implementation uses Bulk API. RDB and Memory document stores has been extended with a naive implementation iterating over {{updateOps}}. The Mongo implementation works as follows:

1. For each {{UpdateOp}} try to read the assigned document from the cache. Add them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and read them in one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following operation:
    * Find document with the same id and the same {{mod_count}} as in the {{oldDocs}}.
    * Apply changes from the {{UpdateOps}}.

4. Execute the bulk update.

If some other process modifies the target documents between points 2 and 3, the {{mod_count}} will be increased as well and the bulk update will fail for the concurrently modified docs. The method will then remove the failed documents from the {{oldDocs}} and restart the process from point 2. It will stop after 3rd iteration.

h4. Changes in the Commit class

The new method has been used in the {{Commit#applyToDocumentStore}}. If it fails (eg. there has been more than 3 unsuccessful retries in the Mongo implementation), there will be fallback to the classic approach, applying one update after another.

h4. Changes in the CommitQueue and ConflictException

Introducing bulk updates means that we may have conflicts in many revisions at the same time. That's the reason why the {{ConflictException}} now contains the revision list, rather than a single revision number. In order to resolve conflicts in the {{DocumentNodeStoreBranch#merge0}} method, the {{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass a list of revisions and suspends execution until all of them are visible.

> Bulk document updates in MongoDocumentStore
> -------------------------------------------
>
>                 Key: OAK-3559
>                 URL: https://issues.apache.org/jira/browse/OAK-3559
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: mongomk
>            Reporter: Tomek Rękawek
>             Fix For: 1.4
>
>         Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)