You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Karthik Kambatla (JIRA)" <ji...@apache.org> on 2013/11/04 08:49:18 UTC

[jira] [Commented] (YARN-1222) Make improvements in ZKRMStateStore for fencing

    [ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812663#comment-13812663 ] 

Karthik Kambatla commented on YARN-1222:
----------------------------------------

bq. deleteWithRetries() - The new logic does not seem identical to the older logic.
Good catch! It is not identical, but should have the same effect. Instead of using a ZKAction that does both exist() and delete(), the second patch was doing exist first (with retries) and then delete which can lead to funny behavior. Hence, the return null.

bq. why do we do an exists check instead of catching the NoNodeException after calling delete directly?
Agree deleting and catching NoNodeException makes more sense. The exists() followed by delete() comes from earlier patches on YARN-353, and I am not sure the reasoning behind it. We don't seem to be using the watch anywhere. The updated patch (-3.patch) adopts this approach.

bq. Should this method only change the ACL's pertaining to other RM instances and not every ACL on the znode? Alternatively, if the assumption is that the root znode etc are manipulated only by the RM's then can we simply remove all older ACL's and set a new ACL for the current RM.
Only the RMs manipulate the znode structure. Here, we are using the ZKStoreACLs to generate the ZKRootNodeACLs. IIUC, the ZKStoreACLs define who has access to the store. That could be used to define the users that can run the RM with ZKStateStore; an example could be that only adminuser is allowed. The patch provides adminuser as much access on the ZKRootNode as possible, removing create-delete permissions. In addition to that, it gives exclusive create-delete perms to the RM on ZKRootNode. I think this is required if we want to support this behavior even for an ACL list where say each element gives one user access to run the RM. One extreme case could be two RMs running - each started by a different user with both users being allowed by the ACLs. 

Does that sound reasonable? Should we handle it any other way?

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the child of the root znode. This is to achieve fencing by modifying the create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)