You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Rushabh Shah (Jira)" <ji...@apache.org> on 2021/09/02 20:09:00 UTC

[jira] [Created] (ZOOKEEPER-4367) Zookeeper#Login thread leak in case of Sasl AuthFailed.

Rushabh Shah created ZOOKEEPER-4367:
---------------------------------------

             Summary: Zookeeper#Login thread leak in case of Sasl AuthFailed.
                 Key: ZOOKEEPER-4367
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4367
             Project: ZooKeeper
          Issue Type: Bug
          Components: java client, kerberos
    Affects Versions: 3.4.13
            Reporter: Rushabh Shah


We are seeing 1000's of Zookeeper#Login threads leak in our production clusters.
[ZooKeeperSaslClient#createSaslClient|https://github.com/apache/zookeeper/blob/branch-3.4.13/src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java#L205] creates Login thread.
[ZooKeeperSaslClient#createSaslToken |https://github.com/apache/zookeeper/blob/branch-3.4.13/src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java#L310] throws SaslException which propagates all the way back to [ClientCnxn#SendThread#run|https://github.com/apache/zookeeper/blob/branch-3.4.13/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1074] method.

[ClientCnxn#SendThread#run|https://github.com/apache/zookeeper/blob/branch-3.4.13/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1075-L1078] handles SaslException by changing setting state to AUTH_FAILED, queueing the eventOfDeath for EventThread and exiting/cleaning up the SendThread but we DON'T close the zookeeperSaslClient which in turns shutDown the Login thread.

Logs are added below for one failed connection.
{noformat}
`20210831053800.393 jute.maxbuffer value is 4194304 Bytes
`20210831053800.393 Initiating client connection, connectString=<zookeeper-ensemble string> sessionTimeout=4000 watcher=org.apache.curator.ConnectionState@7b974f93

`20210831053800.401 zookeeper.request.timeout value is 10000. feature enabled=
`20210831053800.404 Client successfully logged in.
`20210831053800.405 Client will use GSSAPI as SASL mechanism.
`20210831053800.405 TGT refresh sleeping until: Wed Sep 01 00:59:06 GMT 2021
`20210831053800.405 TGT refresh thread started.
`20210831053800.405 TGT valid starting at:        Tue Aug 31 05:38:00 GMT 2021
`20210831053800.405 TGT expires:                  Wed Sep 01 05:38:00 GMT 2021

`20210831053800.407 Opening socket connection to server <zookeeper-server-1>. Will attempt to SASL-authenticate using Login Context section 'Client'

`20210831053800.419 Socket connection established to <zookeeper-server-1>, initiating session

`20210831053800.435 Session establishment complete on server <zookeeper-server-1>, sessionid = 0x1000004066cc52b, negotiated timeout = 6000

`20210831053800.438 An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. This may be caused by Java's being unable to resolve the Zookeeper Quorum Member's hostname correctly. You may want to try to adding '-Dsun.net.spi.nameservice.provider.1=dns,sun' to your client's JVMFLAGS environment. Zookeeper Client will go to AUTH_FAILED state.

`20210831053800.438 EventThread shut down for session: 0x1000004066cc52b

`20210831053800.438 SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. This may be caused by Java's being unable to resolve the Zookeeper Quorum Member's hostname correctly. You may want to try to adding '-Dsun.net.spi.nameservice.provider.1=dns,sun' to your client's JVMFLAGS environment. Zookeeper Client will go to AUTH_FAILED state.
{noformat}


What is the correct way to shutdown Login thread in case of SaslException ?
We use Curator framework to connect to Zookeeper.

We fixed similar bug here where we were leaking EventThreads.  ZOOKEEPER-3059
This is similar except for Login threads. Please help.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)