You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/29 05:03:21 UTC

[jira] [Commented] (ZOOKEEPER-1748) TCP keepalive for leader election connections

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531799#comment-15531799 ] 

ASF GitHub Bot commented on ZOOKEEPER-1748:
-------------------------------------------

GitHub user gnethercutt opened a pull request:

    https://github.com/apache/zookeeper/pull/83

    enable TCP keepalive for the leadership election/quorum socket

    Use TCP keep-alives for election/quorum peer connections.
    
    This is the shortest edit distance to address [ZOOKEEPER-1748](https://issues.apache.org/jira/browse/ZOOKEEPER-1748), and is required to avoid silent packet delivery failures for a long-lived connection in AWS (amongst other environments). 
    
    See also:
    - [VPC security group connection tracking](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#security-group-connection-tracking)
    - [Using TCP keepalives](http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html)
    - [Zookeeper internals](https://zookeeper.apache.org/doc/r3.4.8/zookeeperInternals.html)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gnethercutt/zookeeper election_tcp_keepalive

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/83.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #83
    
----
commit bcab41003d91dc121e368337317710c2434bece8
Author: Glenn Nethercutt <gl...@inin.com>
Date:   2016-09-28T18:22:41Z

    enable TCP keepalive for the leadership election/quorum socket

----


> TCP keepalive for leader election connections
> ---------------------------------------------
>
>                 Key: ZOOKEEPER-1748
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1748
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: leaderElection
>    Affects Versions: 3.4.5, 3.5.0
>         Environment: Linux, Java 1.7
>            Reporter: Antal Sasvári
>            Assignee: Daniel Peon
>            Priority: Minor
>             Fix For: 3.5.3, 3.6.0
>
>         Attachments: Zookeeper-1748-add_tcp_keepalive.patch
>
>
> In our system we encountered the following problem:
> If the system is stable, and there is no leader election, the leader election port connections are open for very long time without any packets being sent on them.
> Some network elements silently drop the established TCP connection after a timeout if there are no packets being sent on it. In this case the ZK servers will not notice the connection loss. This causes additional delay later when the next leader election is started, as the TCP connections are not alive any more.
> We would like to be able to enable TCP keepalive on the leader election sockets in order to prevent the connection timeout in some network elements due to connection inactivity.
> This could be controlled by adding a new config parameter called tcpKeepAlive in the ZooKeeper configuration file. It would be only applicable in case of algorithm 3 (TCP based fast leader election), having the default value false.
> If tcpKeepAlive is set to true, the TCP keepalive flag should be enabled for the leader election sockets in QuorumCnxManager.setSockOpts() by calling sock.setKeepAlive(true).
> We have tested this change successfully in our environment.
> Please comment whether you see any problem with this. If not, I am going to submit a patch.
> I've been told that e.g. Apache ActiveMQ also has a config option for similar purpose called transport.keepalive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)