You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Ryan Ruel (Jira)" <ji...@apache.org> on 2021/12/15 16:24:00 UTC
[jira] [Updated] (ZOOKEEPER-4428) ZooKeeper Server leaks "SyncThread" threads when leadership connection times out and is reestablished
[ https://issues.apache.org/jira/browse/ZOOKEEPER-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan Ruel updated ZOOKEEPER-4428:
---------------------------------
Summary: ZooKeeper Server leaks "SyncThread" threads when leadership connection times out and is reestablished (was: ZooKeeper leaks "SyncThread" threads when leadership connection times out and is reestablished )
> ZooKeeper Server leaks "SyncThread" threads when leadership connection times out and is reestablished
> ------------------------------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-4428
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4428
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.6.3
> Environment: # On a follower node for an established ZooKeeper ensemble, issue the following command to determine number of SyncThreads:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
> # Issue the following IP tables command on the leader to drop traffic coming from the follower used in Step 1:
> iptables -A INPUT -s <Follower IP Address> -j DROP
> # Watch the zookeeper logs on the nodes and wait for the connection to drop due to timeout.
> # Issue the following IP tables command on the leader to re-enable traffic coming from follower used in Step 1:
> iptables -D INPUT -s <Follower IP Address> -j DROP
> # Watch the zookeeper logs on the nodes and wait for the connection to the leader to reestablish.
> # On the follower node (used in Step 1), check the number of SyncThreads. That value should have increased by one and stay pinned there indefinitely:
> ps -T -p `pidof mdtzookeeper` | grep SyncThread | wc
> Reporter: Ryan Ruel
> Priority: Major
>
> In a production environment with some connectivity problems it was found the ZooKeeper server was using over 1000 threads with name "SyncThread" (that were never being freed).
> Looking through the server logs indicates that these nodes were experiencing connection timeouts to the leader.
> A test environment (described below in the "environment" field of this ticket) showed that these connection timeouts are what seem to be leaking these threads.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)