You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@airavata.apache.org by "Marcus Christie (JIRA)" <ji...@apache.org> on 2017/05/30 19:21:05 UTC

[jira] [Commented] (AIRAVATA-2321) Thousands of Zookeeper client connections in TIME_WAIT

    [ https://issues.apache.org/jira/browse/AIRAVATA-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16029979#comment-16029979 ] 

Marcus Christie commented on AIRAVATA-2321:
-------------------------------------------

This might be a red herring, but there was a recent issue with Logstash that also generated lots of TIME_WAIT connections: https://lists.apache.org/thread.html/acb745986b563c0eaf500b7b5d07d8aaa592735e8e75ff9d57281227@%3Cdev.airavata.apache.org%3E  It may or may not be relevant.  In any case, it would be good to check Kafka/Logstash next time we get a lot of TIME_WAIT Zookeeper connections.

> Thousands of Zookeeper client connections in TIME_WAIT
> ------------------------------------------------------
>
>                 Key: AIRAVATA-2321
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2321
>             Project: Airavata
>          Issue Type: Bug
>          Components: Airavata Orchestrator, GFac
>    Affects Versions: 0.17
>            Reporter: Marcus Christie
>            Assignee: Marcus Christie
>
> On gw56.iu.xsede.org, where the develop branch of airavata is deployed, there are currently over 4,000 Zookeeper connections in TIME_WAIT state.
> {noformat}
> [airavata@gw56 ~]$ netstat -anp --tcp | grep 2181 | grep TIME_WAIT | wc -l
> (Not all processes could be identified, non-owned process info
>  will not be shown, you would have to be root to see it all.)
> 4758
> {noformat}
> This number is fairly constant during the time I've been watching it.  On gw77.iu.xsede.org where the master branch is deployed, there are none of these TIME_WAIT connections.
> I looked into this a bit and wrote the following on HipChat
> {quote}
> [5:41 PM] Marcus Christie: From what I've been reading, I think the TIME_WAIT problem must be coming from Zookeeper clients connecting and then closing over and over again.
> [5:42 PM] Marcus Christie: A TCP connection will stay in TIME_WAIT for about 4 minutes after it is closed http://stackoverflow.com/questions/10726049/what-is-the-reason-for-time-wait-connection-increasing-i...
> [5:44 PM] Marcus Christie: There are consistently about 4,000 connections in TIME_WAIT. If they hang around for 4 minutes (240 seconds), then that means there must be 16.667 new connections being created (and eventually closed) each second.
> {quote}
> Other things:
> * [~smarru] already tried purging old logs, [see the Zookeeper docs|https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_strengthsAndLimitations]
> * Zookeeper has [some administrative commands|https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkCommands] that are useful for finding out it's self-reported statistics about number of connections, etc.
> ** to run these do
> {noformat}
> telnet localhost 2181
> stat
> {noformat}
> * useful links on TIME_WAIT
> ** http://serverfault.com/questions/329845/how-to-forcibly-close-a-socket-in-time-wait
> ** http://stackoverflow.com/questions/10726049/what-is-the-reason-for-time-wait-connection-increasing-in-java
> ** http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)