You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Rakesh R (Commented) (JIRA)" <ji...@apache.org> on 2011/09/28 09:06:45 UTC

[jira] [Commented] (ZOOKEEPER-1209) LeaderElection recipe doesn't handle the split-brain issue, n/w disconnection can bring both the client nodes to be in ELECTED

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116251#comment-13116251 ] 

Rakesh R commented on ZOOKEEPER-1209:
-------------------------------------

Soon will upload patch with the below approach. Please let me know if we have any other better solution to apply.

*Approach:*
IMO better the LES f/w can handle these events ('Disconnected', 'SyncConnected' and 'Expiry' ZooKeeper events), rather than be silent. It will help the users be in a safe state instead of be in the same state (ELECTED/READY). 

Provide 'EventProcessor-Thread', one thread per LES. This service will execute the events with a time bounded delay. After choosing the first event, the processor will wait for the configured ‘eventDelayTimeout’ and again pick the latest event present in the queue (if exists). Finally the processor will execute the most recent event. This delay is given in order to avoid slight network fluctuations, wait for some grace period say ‘eventDelayTimeout’ default value could be ‘sessionTimeOut/2’.

All the watchevents (‘Disconnected’, ‘SyncConnected’, ‘Expiry’ events ) from the ZooKeeper server and will be given to this processor. It will have the following logic

+Disconnected logic:+
Introduce new state NEUTRAL to represent the disconnection and the clients will see the node has disconnected from the ZooKeeper can be in a safe mode.
1)If the LeaderElectionSupport state is not STOP, dispatch NEUTRAL event to the user. So the user application can act upon it. This will help to go to a safe state rather than in the ELECTED state.

+SyncConnected logic:+
1)Check if my ephemeral node ‘leaderOffer.getnodePath()’ is present in the ZooKeeper or not
2)If Yes, go to determineElectionStatus(). This will decide the state ELECTED/READY.
3)If No, makeOffer() and determineElectionStatus(). This will first create ephemeral node and go to leader determination phase.

+Expiry logic:+
The serving cluster or standalone ZooKeeper has expired this session. This implies, user must create a new client connection (instantiate a new ZooKeeper instance) if you with to access the ensemble.

1) On receival of Expiry, dispatch STOP event to the client. This will notifies the client and they can restart the LeaderElectionSupport with new ZooKeeper client session.

Thanks,
Rakesh
                
> LeaderElection recipe doesn't handle the split-brain issue, n/w disconnection can bring both the client nodes to be in ELECTED
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1209
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1209
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: recipes
>    Affects Versions: 3.3.3
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>
> *Case1-* N/w disconnection can bring both the client nodes to be in ELECTED state. Current LeaderElectionSupport(LES) f/w handles only 'NodeDeletion'.
>  
> Consider the scenario where ELECTED and READY nodes are running. Say ELECTED node's n/w got failed and is "Disconnected" from ZooKeeper. But it will behave as ELECTED as it is not getting any events from the LeaderElectionSupport(LES) framework.
> After sessiontimeout, node in READY state will be notified by 'NodeDeleted' event and will go to ELECTED state.
> *Problem:* 
> Both the node becomes ELECTED and finally the user sees two Master (ELECTED) node and cause inconsistencies.
> *Case2-* Also in this case, Let's say if user has started only one client node and becomes ELECTED. After sometime n/w has disconnected with the ZooKeeper server and the session got expired. 
> *Problem:*
> Still the client node will be in the ELECTED state. After sometime if user has started the second client node. Again both the nodes becomes ELECTED.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira