You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/11 09:13:00 UTC
[jira] [Commented] (ZOOKEEPER-2836) QuorumCnxManager.Listener
Thread Better handling of SocketTimeoutException
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123064#comment-16123064 ]
ASF GitHub Bot commented on ZOOKEEPER-2836:
-------------------------------------------
GitHub user bitgaoshu opened a pull request:
https://github.com/apache/zookeeper/pull/334
ZOOKEEPER-2836 SocketTimeoutException
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bitgaoshu/zookeeper fix/ZOOKEEPER-2836
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zookeeper/pull/334.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #334
----
commit 313406fdb0bd247c897409d8dbf800f6a6d62ce4
Author: fengwei <fe...@oneapm.com>
Date: 2017-08-11T09:10:06Z
ZOOKEEPER-2836 SocketTimeoutException
----
> QuorumCnxManager.Listener Thread Better handling of SocketTimeoutException
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2836
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2836
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection, quorum
> Affects Versions: 3.4.6
> Environment: Machine: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.78-1 x86_64 GNU/Linux
> Java Version: jdk64/jdk1.8.0_40
> zookeeper version: 3.4.6.2.3.2.0-2950
> Reporter: Amarjeet Singh
> Priority: Critical
>
> QuorumCnxManager Listener thread blocks SocketServer on accept but we are getting SocketTimeoutException on our boxes after 49days 17 hours . As per current code there is a 3 times retry and after that it says "_As I'm leaving the listener thread, I won't be able to participate in leader election any longer: $<hostname>/$<ip>:3888__" , Once server nodes reache this state and we restart or add a new node ,it fails to join cluster and logs 'WARN QuorumPeer<myid=1>/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@383 - Cannot open channel to 3 at election address $<hostname>/$<ip>:3888' .
> As there is no timeout specified for ServerSocket it should never timeout but there are some already discussed issues where people have seen this issue and added checks for SocketTimeoutException explicitly like https://issues.apache.org/jira/browse/KARAF-3325 .
> I think we need to handle SocketTimeoutException on similar lines for zookeeper as well
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)