You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org> on 2012/03/27 00:28:26 UTC

[jira] [Commented] (HADOOP-8217) Edge case split-brain race in ZK-based auto-failover

    [ https://issues.apache.org/jira/browse/HADOOP-8217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238915#comment-13238915 ] 

Todd Lipcon commented on HADOOP-8217:
-------------------------------------

My thinking for the solution is the following:
- add a parameter to transitionToStandby/transitionToActive which is a {{long logicalTime}}
- when the ZKFC acquires the lock znode, it makes a note of the zxid (ZK transaction ID)
- when it then asks the old active to go to standby, or asks its own node to go active, it includes the zxid
- the NN itself maintains a record of the highest zxid it has heard. If it receives a state transition request with an older zxid, it ignores it.

This would solve the race as described, since when ZKFC2 calls NN1.transitionToStandby(), it hands NN1 a higher zxid than ZKFC1 saw. So when ZKFC1 then asks it to go active, the request is denied.

There is still potentially some race involving the NNs restarting quickly and "forgetting" the highest zxid. I'm not sure whether the right solution there is to record the info persistently, or to attach a UUID to each NN startup, and use that to make sure we don't target a newer instance of a NN with an RPC that was meant for an earlier one.

Other creative solutions appreciated :)
                
> Edge case split-brain race in ZK-based auto-failover
> ----------------------------------------------------
>
>                 Key: HADOOP-8217
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8217
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> As discussed in HADOOP-8206, the current design for automatic failover has the following race:
> - ZKFC1 gets active lock
> - ZKFC1 is about to send transitionToActive() and machine freezes (eg GC pause + swapping)
> - ZKFC1 loses its ZK lock, ZKFC2 gets ZK lock
> - ZKFC2 calls transitionToStandby on NN1, and transitions NN2 to active
> - ZKFC1 wakes up from pause, calls transitionToActive(), now we have a bad situation
> This is rare, since it requires ZKFC1 to freeze longer than its ZK session timeout, but worth fixing, since the results can be disastrous.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira