You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@curator.apache.org by "Jordan Zimmerman (JIRA)" <ji...@apache.org> on 2015/08/22 01:04:45 UTC

[jira] [Commented] (CURATOR-246) Parent task for adding a SESSION_LOST connection state, etc.

    [ https://issues.apache.org/jira/browse/CURATOR-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707619#comment-14707619 ] 

Jordan Zimmerman commented on CURATOR-246:
------------------------------------------

Implementation notes so far:

* It makes more sense to alter the meaning of the current LOST state than adding a new state
* Now is a good time to fix a very old problem. Every API call bottlenecks through RetryLoop.callWithRetry(). The first thing this method does is client.internalBlockUntilConnectedOrTimedOut(). If the connection doesn't succeed, the actual API call will fail and the retry policy will signal a retry which again calls client.internalBlockUntilConnectedOrTimedOut(). This is not reasonable behavior and makes having a true LOST session event more difficult. So, if the new behavior is enabled, a timeout during connection will immediately throw KeeperException.ConnectionLossException without retrying
* ConnectionStateManager has been altered so that the event poller will post a LOST state if the configured session timeout elapses
* When the new behavior is enabled, the background sync() call is no longer made when the Disconnect is received. It is no longer necessary as the ConnectionStateManager is now watching for session timeout.
* The Base testing class now runs each test twice. Once in the pre 3.0 mode and once with enableSessionExpiredState set to true

> Parent task for adding a SESSION_LOST connection state, etc.
> ------------------------------------------------------------
>
>                 Key: CURATOR-246
>                 URL: https://issues.apache.org/jira/browse/CURATOR-246
>             Project: Apache Curator
>          Issue Type: New Feature
>          Components: Framework, Recipes
>            Reporter: Dong Lei
>
> Spark now leverage curator to help manage the connections to ZK and do leader election. 
> Currently, whenever a ZK session gets disassociated, the ConnectionStateManager will be aware and mark the state to be SUSPENDED and a new leader election will be triggered. 
> Even though a ZK session is able to reconnect to another machine very soon. 
> I wonder if we can tolerate such unstable network trembling and do not trigger a leader election. Because the upper layer application's (like spark) reaction of new leader can be very costly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)