You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Vikas Saurabh (JIRA)" <ji...@apache.org> on 2015/12/06 18:42:10 UTC

[jira] [Commented] (OAK-3733) Sometimes hierarchy confict between concurrent add/delete isn't detected

    [ https://issues.apache.org/jira/browse/OAK-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043966#comment-15043966 ] 

Vikas Saurabh commented on OAK-3733:
------------------------------------

The interesting revisions are:
# {{r151233e54e1-0-4}} (the conflicting rev) - it's marked committed at {{2:p/r1512340e9c1-0-4/0}}. The revision marked _deleted=false starting from {{9:/oak:index/event.job.topic/:index/enc_value/var/eventing/jobs/assigned/server_uuid}} (and downwards). The revision was marked correctly for _commitRoot for the parent of created hierarchy - {{assigned}} node.
# {{r151233e5114-0-2}} (the rev which deleted the one of the parent and remained undetected OR didn't detect hierarchy change mentioned above) - it's marked committed at {{:p/r151233e5114-0-2/0}}. The revision marked _deleted=true starting from {{5:/oak:index/event.job.topic/:index/enc_value/var}}  upto {{8:/oak:index/event.job.topic/:index/enc_value/var/eventing/jobs/assigned}} (and most probably a sibling hierarchy that I didn't capture in mongoexport).

Depending upon which session got to get committed first, either #1 or #2 should have detected a conflict due to changes on {{assigned}} node. Given, the revision timestamps are milliseconds apart - most probably both cluster id 2 and 4 would have treated the other rev as being from future.

> Sometimes hierarchy confict between concurrent add/delete isn't detected
> ------------------------------------------------------------------------
>
>                 Key: OAK-3733
>                 URL: https://issues.apache.org/jira/browse/OAK-3733
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: core, documentmk
>            Reporter: Vikas Saurabh
>            Assignee: Vikas Saurabh
>         Attachments: mongoexport.zip
>
>
> I'm not sure of exact set of event that led to an incident on one of our test clusters. The cluster is running 3 AEM instances based on oak build at 1.3.10.r1713699 backed by a single mongo 3 instance.
> Unfortunately, we found the issue too late and logs had rolled over. Here's the exception that showed over and over as workflow jobs were (trying to) being processed:
> {noformat}
> ....
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: javax.jcr.InvalidItemStateException: OakMerge0004: OakMerge0004: The node 8:/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned was already added in revision
> r151233e54e1-0-4, before
> r15166378b6a-0-2 (retries 5, 6830 ms)
>         at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:239)
>         at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:212)
>         at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.newRepositoryException(SessionDelegate.java:669)
>         at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:495)
>         at org.apache.jackrabbit.oak.jcr.session.SessionImpl$8.performVoid(SessionImpl.java:419)
>         at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.performVoid(SessionDelegate.java:273)
>         at org.apache.jackrabbit.oak.jcr.session.SessionImpl.save(SessionImpl.java:416)
>         at org.apache.sling.jcr.resource.internal.helper.jcr.JcrResourceProvider.commit(JcrResourceProvider.java:634)
>         ... 16 common frames omitted
> Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: OakMerge0004: OakMerge0004: The node 8:/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned was already added in revision
> r151233e54e1-0-4, before
> r15166378b6a-0-2 (retries 5, 6830 ms)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge0(DocumentNodeStoreBranch.java:200)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge(DocumentNodeStoreBranch.java:123)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentRootBuilder.merge(DocumentRootBuilder.java:158)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.merge(DocumentNodeStore.java:1497)
>         at org.apache.jackrabbit.oak.core.MutableRoot.commit(MutableRoot.java:247)
>         at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.commit(SessionDelegate.java:346)
>         at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:493)
>         ... 20 common frames omitted
> Caused by: org.apache.jackrabbit.oak.plugins.document.ConflictException: The node 8:/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned was already added in revision
> r151233e54e1-0-4, before
> r15166378b6a-0-2
>         at org.apache.jackrabbit.oak.plugins.document.Commit.checkConflicts(Commit.java:582)
>         at org.apache.jackrabbit.oak.plugins.document.Commit.createOrUpdateNode(Commit.java:487)
>         at org.apache.jackrabbit.oak.plugins.document.Commit.applyToDocumentStore(Commit.java:371)
>         at org.apache.jackrabbit.oak.plugins.document.Commit.applyToDocumentStore(Commit.java:265)
>         at org.apache.jackrabbit.oak.plugins.document.Commit.applyInternal(Commit.java:234)
>         at org.apache.jackrabbit.oak.plugins.document.Commit.apply(Commit.java:219)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.persist(DocumentNodeStoreBranch.java:290)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.persist(DocumentNodeStoreBranch.java:260)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.access$300(DocumentNodeStoreBranch.java:54)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch$InMemory.merge(DocumentNodeStoreBranch.java:498)
>         at org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreBranch.merge0(DocumentNodeStoreBranch.java:180)
>         ... 26 common frames omitted
> ....
> {noformat}
> Doing following removed repo corruption and restored w/f processing:
> {noformat}
> oak.removeDescendantsAndSelf("/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned")
> {noformat}
> Attaching [mongoexport output|^mongoexport.zip] for {{/oak:index/event.job.topic/:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel/var/eventing/jobs/assigned/6a389a6a-a8bf-4038-b57b-cb441c6ac557/com.adobe.granite.workflow.transient.job.etc.workflow.models.dam-xmp-writeback.jcr_content.model/2015/11/19/23/54/6a389a6a-a8bf-4038-b57b-cb441c6ac557_10}} (the hierarchy created at {{r151233e54e1-0-4}}). I've renamed a few path elements to make it more reable though (e.g. {{:index/com%2Fadobe%2Fgranite%2Fworkflow%2Ftransient%2Fjob%2Fetc%2Fworkflow%2Fmodels%2Fdam-xmp-writeback%2Fjcr_content%2Fmodel}} -> {{enc_value}}).
> [~mreutegg], I'm assigning it to myself for now, but I think this would require your expertise all the way :).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)