You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Kanak Biscuitwala (JIRA)" <ji...@apache.org> on 2013/11/13 01:29:18 UTC

[jira] [Updated] (HELIX-26) Better support for handling network partition and process freeze

     [ https://issues.apache.org/jira/browse/HELIX-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kanak Biscuitwala updated HELIX-26:
-----------------------------------

    Fix Version/s:     (was: 0.6.2-incubating)

> Better support for handling network partition and process freeze
> ----------------------------------------------------------------
>
>                 Key: HELIX-26
>                 URL: https://issues.apache.org/jira/browse/HELIX-26
>             Project: Apache Helix
>          Issue Type: Improvement
>    Affects Versions: 0.6.0-incubating
>            Reporter: kishore gopalakrishna
>            Assignee: Swaroop Jagadish
>
> Handling network partition is tricky in distributed systems. Zookeeper allows us to solve this upto some degree with the use of heart beat. But this is not sufficient in large scale systems with many nodes. One of the problems is that once the client detects disconnect which happens on the client side, the options are
> 1. Put your self in a pause state until you reconnect.
> 2. Continue what ever you are doing until notified of session expiry.
> Unfortunately 1 is too agressive and 2 is too passive. Since Helix comes with the centralized controller, its possible to have a more middle ground solution where once the participant receives a disconnect event, it can check with co-ordinator(s)/peers to check if it can continue operating.
> The challenge here for the node to detect if it belongs to the same partition as of the co-ordinator or not. So its goal is to reach the controller, if it cannot reach the controller it has to disable/fence itself.
> As of now Helix simply provides the state if its disconnected from the cluster and user can either chose 1) or 2).
> This JIRA aims to investigate better ways to enhance network partition detection.



--
This message was sent by Atlassian JIRA
(v6.1#6144)