You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Rickey Visinski (Jira)" <ji...@apache.org> on 2023/12/05 22:01:00 UTC

[jira] [Created] (ZOOKEEPER-4777) Zookeeper becomes unresponsive when using native GSSAPI

Rickey Visinski created ZOOKEEPER-4777:
------------------------------------------

             Summary: Zookeeper becomes unresponsive when using native GSSAPI
                 Key: ZOOKEEPER-4777
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4777
             Project: ZooKeeper
          Issue Type: Bug
          Components: kerberos, server
    Affects Versions: 3.8.3, 3.8.2, 3.7.2, 3.6.4, 3.7.1, 3.6.2, 3.5.7, 3.5.6, 3.4.14, 3.4.13
         Environment: RHEL 7 and OpenJDK Runtime Environment (build 1.8.0_392-b08)

RHEL 8 and OpenJDK Runtime Environment (Red_Hat-17.0.9.0.9-1) (build 17.0.9+9-LTS)
            Reporter: Rickey Visinski


Zookeeper ensemble starts up properly after quorum is made. The leader is elected and it starts serving requests. After a while the Leader gets stuck, so its just accepting requests but not processing it, same is the case with participants. They are accepting requests but since the leader doesn't process they keep piling up.

This causes an issue with sudden increase on the no. of CLOSE_WAIT connections on the zookeeper servers. When this happens, the ensemble is completely unresponsive causing connection loss/timeouts. Once the CLOSE_WAIT start the number of open connections on each server spike as high as 100000 from a mere 200 connections within a few minutes.

A pattern was found in thread dump where we always saw {{NIOServerCxnFactory}} selector thread blocked on a lock waiting in {{org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer}}
{code:java}
tdump_zkdev14.i.ia55.net_1694037623.logs-"NIOServerCxnFactory.SelectorThread-0" #16 daemon prio=5 os_prio=0 cpu=9126323.70ms elapsed=25935.16s tid=0x00007f9118702320 nid=0x20ed94 waiting for monitor entry  [0x00007f907e635000]
tdump_zkdev14.i.ia55.net_1694037623.logs:   java.lang.Thread.State: BLOCKED (on object monitor)
tdump_zkdev14.i.ia55.net_1694037623.logs-	at org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:42)
tdump_zkdev14.i.ia55.net_1694037623.logs-	- waiting to lock <0x0000000700391098> (a org.apache.zookeeper.Login)
tdump_zkdev14.i.ia55.net_1694037623.logs-	at org.apache.zookeeper.server.ZooKeeperSaslServer.<init>(ZooKeeperSaslServer.java:38) {code}
{{}}

Seems to be related to https://issues.apache.org/jira/browse/ZOOKEEPER-2230

 

Thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)