You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@helix.apache.org by "Hao Zhang (JIRA)" <ji...@apache.org> on 2018/07/19 19:25:00 UTC
[jira] [Created] (HELIX-742) ZkHelixManager should consider session
expire when detecting connection flapping
Hao Zhang created HELIX-742:
-------------------------------
Summary: ZkHelixManager should consider session expire when detecting connection flapping
Key: HELIX-742
URL: https://issues.apache.org/jira/browse/HELIX-742
Project: Apache Helix
Issue Type: Task
Reporter: Hao Zhang
In production we are seeing is because of infinite expiry-connect loop. These caused live instance change and trigger massive state transitions. As a result, controller overloads the ZK with thousands of messages, and bring down the cluster.
Currently, when ZkHelixManager detects connection flapping, it only counts disconnects, but not session expiry, we need to take session expiry into consideration as well.
AC:
* follow up this ticket with a plan to consolidate semantics and behavior
* Code complete and test it out
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)