You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Marcel Reutegger (JIRA)" <ji...@apache.org> on 2016/02/11 09:42:18 UTC

[jira] [Commented] (OAK-3488) LastRevRecovery for self async?

    [ https://issues.apache.org/jira/browse/OAK-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142428#comment-15142428 ] 

Marcel Reutegger commented on OAK-3488:
---------------------------------------

Continued work in a GitHub branch: https://github.com/mreutegg/jackrabbit-oak/tree/OAK-3488

With the changes on the GitHub branch, the lastRev recovery mechanism now also check whether the recovering cluster node is still alive and if needed breaks the recovery lock to take over the recovery.

> LastRevRecovery for self async?
> -------------------------------
>
>                 Key: OAK-3488
>                 URL: https://issues.apache.org/jira/browse/OAK-3488
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: documentmk
>            Reporter: Julian Reschke
>            Assignee: Marcel Reutegger
>              Labels: resilience
>             Fix For: 1.6
>
>         Attachments: OAK-3488.patch
>
>
> Currently, when a cluster node starts and discovers that it wasn't properly shutdown, it first runs the complete LastRevRecovery and only continues startup when done.
> However, when it fails to acquire the recovery lock, which implies that a different cluster node is already running the recovery on its behalf, it simply skips recovery and continues startup?
> So what is it? Is running the recovery before proceeding critical or not? If it is, this code in {{LastRevRecoveryAgent}} needs to change:
> {code}
>         //TODO What if recovery is being performed for current clusterNode by some other node
>         //should we halt the startup
>         if(!lockAcquired){
>             log.info("Last revision recovery already being performed by some other node. " +
>                     "Would not attempt recovery");
>             return 0;
>         }
> {code}
> If it's not critical, we may want to run the recovery always asynchronously. 
> cc [~mreutegg]  and [~chetanm]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)