You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Daniel Templeton (JIRA)" <ji...@apache.org> on 2016/12/02 00:46:58 UTC

[jira] [Updated] (YARN-5694) ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable

     [ https://issues.apache.org/jira/browse/YARN-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Templeton updated YARN-5694:
-----------------------------------
    Attachment: YARN-5694.branch-2.6.002.patch

Uploading new branch-2.6 patch to fix the test.

> ZKRMStateStore can prevent the transition to standby in branch-2.7 if the ZK node is unreachable
> ------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5694
>                 URL: https://issues.apache.org/jira/browse/YARN-5694
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.3
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>              Labels: oct16-medium
>         Attachments: YARN-5694.001.patch, YARN-5694.002.patch, YARN-5694.003.patch, YARN-5694.004.patch, YARN-5694.004.patch, YARN-5694.005.patch, YARN-5694.006.patch, YARN-5694.007.patch, YARN-5694.008.patch, YARN-5694.branch-2.6.001.patch, YARN-5694.branch-2.6.002.patch, YARN-5694.branch-2.7.001.patch, YARN-5694.branch-2.7.002.patch, YARN-5694.branch-2.7.004.patch, YARN-5694.branch-2.7.005.patch
>
>
> {{ZKRMStateStore.doStoreMultiWithRetries()}} holds the lock while trying to talk to ZK.  If the connection fails, it will retry while still holding the lock.  The retries are intended to be strictly time limited, but in the case that the ZK node is unreachable, the time limit fails, resulting in the thread holding the lock for over an hour.  Transitioning the RM to standby requires that same lock, so in exactly the case that the RM should be transitioning to standby, the {{VerifyActiveStatusThread}} blocks it from happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org