You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Ryan Ruel (Jira)" <ji...@apache.org> on 2021/12/15 16:24:00 UTC

[jira] [Updated] (ZOOKEEPER-4428) ZooKeeper Server leaks "SyncThread" threads when leadership connection times out and is reestablished

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan Ruel updated ZOOKEEPER-4428:
---------------------------------
    Summary: ZooKeeper Server leaks "SyncThread" threads when leadership connection times out and is reestablished   (was: ZooKeeper leaks "SyncThread" threads when leadership connection times out and is reestablished )

> ZooKeeper Server leaks "SyncThread" threads when leadership connection times out and is reestablished 
> ------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4428
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.6.3
>         Environment: # On a follower node for an established ZooKeeper ensemble, issue the following command to determine number of SyncThreads:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
>  # Issue the following IP tables command on the leader to drop traffic coming from the follower used in Step 1:
> iptables -A INPUT -s <Follower IP Address> -j DROP
>  # Watch the zookeeper logs on the nodes and wait for the connection to drop due to timeout.
>  # Issue the following IP tables command on the leader to re-enable traffic coming from follower used in Step 1:
> iptables -D INPUT -s <Follower IP Address> -j DROP
>  # Watch the zookeeper logs on the nodes and wait for the connection to the leader to reestablish.
>  # On the follower node (used in Step 1), check the number of SyncThreads.  That value should have increased by one and stay pinned there indefinitely: 
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
>            Reporter: Ryan Ruel
>            Priority: Major
>
> In a production environment with some connectivity problems it was found the ZooKeeper server was using over 1000 threads with name "SyncThread" (that were never being freed).
> Looking through the server logs indicates that these nodes were experiencing connection timeouts to the leader.
> A test environment (described below in the "environment" field of this ticket) showed that these connection timeouts are what seem to be leaking these threads.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)