You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Ben Sherman (JIRA)" <ji...@apache.org> on 2017/06/11 19:00:20 UTC
[jira] [Resolved] (ZOOKEEPER-2783) follower disconnects and cannot
reconnect
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Sherman resolved ZOOKEEPER-2783.
------------------------------------
Resolution: Duplicate
Assignee: Ben Sherman
Fix Version/s: 3.4.11
3.5.4
Same symptoms and fix as ZOOKEEPER-1748. Closing as such.
> follower disconnects and cannot reconnect
> -----------------------------------------
>
> Key: ZOOKEEPER-2783
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2783
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.4.10
> Environment: centos 7, AWS EC2
> Reporter: Ben Sherman
> Assignee: Ben Sherman
> Fix For: 3.5.4, 3.4.11
>
> Attachments: fail3.log, fail5.log
>
>
> We have a 5 node cluster running 3.4.10 we saw this in .8 and .9 as well), and sometimes, a node gets a read timeout, drops all the connections and tries to re-establish itself to the quorum. It can usually do this in a few seconds, but last night it took almost 15 minutes to reconnect.
> These are 5 servers in AWS, and we've tried tuning the timeouts, but the are exceeding any reasonable timeout and still failing.
> In the attached logs, 5 is a follower, 3 is the leader. 5 loses connectivity at 11:21:34. 3 sees the disconnect at the same moment.
> 5 tries to re-establish the quorum, but cannot do it until the connections to the other servers expire at 11:37:02. After the connections are re-established, 5 connects immediately.
> At 11:41:08, the operator restarted the server, and it reconnected normally.
> I suspect there is a problem with stale connections to the rest of the quorum - the other services on this box were fine (monitoring, puppet) and able to establish new connections with no problems.
> I posed this problem to the zookeeper-users list and was asked to open a ticket.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)