You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Swaroop Jagadish (JIRA)" <ji...@apache.org> on 2013/01/25 03:17:13 UTC
[jira] [Commented] (HELIX-26) Better support for handling network partition and process freeze

    [ https://issues.apache.org/jira/browse/HELIX-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562282#comment-13562282 ] 

Swaroop Jagadish commented on HELIX-26:
---------------------------------------

There are two distinct cases to consider
1) Process freeze
2) Network partition

Borrowing ideas heavily from https://ramcloud.stanford.edu/wiki/display/ramcloud/Distributed+Leases, we can have the following strategy

1) Process freeze
Objective: Don't be active if coordinator thinks you are dead as soon as possible

a) When disconnection from zk has been processed by the helix client thread after unfreezing, it needs to assume that the global view has diverged from the local view and send a "reset" so that the participant can put itself in a "limbo" state till it hears from the coordinator/zk and after a timeout enters "zombie" state.

b) Before the disconnection event is processed,  we need to guard against not being active when local view has diverged from global view. The participant can ping another random peer participant or the coordinator periodically (every 10ms?). Pinging a random peer participant scales better than every participant pinging the coordinator every 10ms. In response to a ping, if the participant learns that it is a "zombie", it puts itself in a "limbo" state till it hears from the coordinator/zk and after a timeout enters "zombie" state.

2) Network partition
Assume coordinator and zk are in the same partition
 Objective: All nodes who cannot reach coordinator/zk need to enter zombie state as quickly as possible

 a) If a ping response to a peer participant yields no response, the participant puts itself in a "limbo" state till it hears from the coordinator/zk and after a timeout enters "zombie" state (same as 1b)

 b) If the response to a ping request is "I'm in limbo or I'm a zombie", the participant puts itself in a "limbo" state till it hears from the coordinator/zk and after a timeout enters "zombie" state. The participant can optionally wait for a response before entering "limbo" in order to avoid overly aggressive spread of "limbo" infection - its important to ensure coordinator response time is much lesser than ping interval. Hence keeping coordinator load light is important
This scheme ensures the disconnected partition quickly converges to a "zombie" state  
 


                
> Better support for handling network partition and process freeze
> ----------------------------------------------------------------
>
>                 Key: HELIX-26
>                 URL: https://issues.apache.org/jira/browse/HELIX-26
>             Project: Apache Helix
>          Issue Type: Improvement
>            Reporter: kishore gopalakrishna
>
> Handling network partition is tricky in distributed systems. Zookeeper allows us to solve this upto some degree with the use of heart beat. But this is not sufficient in large scale systems with many nodes. One of the problems is that once the client detects disconnect which happens on the client side, the options are
> 1. Put your self in a pause state until you reconnect.
> 2. Continue what ever you are doing until notified of session expiry.
> Unfortunately 1 is too agressive and 2 is too passive. Since Helix comes with the centralized controller, its possible to have a more middle ground solution where once the participant receives a disconnect event, it can check with co-ordinator(s)/peers to check if it can continue operating.
> The challenge here for the node to detect if it belongs to the same partition as of the co-ordinator or not. So its goal is to reach the controller, if it cannot reach the controller it has to disable/fence itself.
> As of now Helix simply provides the state if its disconnected from the cluster and user can either chose 1) or 2).
> This JIRA aims to investigate better ways to enhance network partition detection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira