You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Colvin Cowie (Jira)" <ji...@apache.org> on 2020/12/10 09:52:00 UTC

[jira] [Commented] (ZOOKEEPER-2966) Flaky NullPointerException while closing client connection

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247136#comment-17247136 ] 

Colvin Cowie commented on ZOOKEEPER-2966:
-----------------------------------------

Hello, I encountered this same bug in ZooKeeper 3.6.2, in the context of the SolrJ client. We hit the NPE when a DNS error causes an exception after the SolrZkClient trys to connect to ZooKeeper, but then immediately calls close on the `ClientCnxn` https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L158-L204.

{noformat}
java.lang.NullPointerException: null
        at org.apache.zookeeper.ClientCnxnSocketNetty.onClosing(ClientCnxnSocketNetty.java:247) ~[zookeeper-3.6.2.jar:3.6.2]
        at org.apache.zookeeper.ClientCnxn$SendThread.close(ClientCnxn.java:1445) ~[zookeeper-3.6.2.jar:3.6.2]
        at org.apache.zookeeper.ClientCnxn.disconnect(ClientCnxn.java:1488) ~[zookeeper-3.6.2.jar:3.6.2]
        at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1517) ~[zookeeper-3.6.2.jar:3.6.2]
        at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:1614) ~[zookeeper-3.6.2.jar:3.6.2]
        at org.apache.solr.common.cloud.SolrZooKeeper.close(SolrZooKeeper.java:97) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18]
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:198) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18]
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:127) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18]
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:122) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18]
        at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:109) ~[solr-solrj-8.7.0.jar:8.7.0 2dc63e901c60cda27ef3b744bc554f1481b3b067 - atrisharma - 2020-10-29 19:39:18]
{noformat}

This happens if the `ClientCnxnSocketNetty`'s `onClosing()` is called before `connect(...)` (or if connect isn't called at all) because the `firstConnect` `CountDownLatch` is only initialized in `connect(...)`.
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java#L129
https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java#L247
A null check in `onClosing()` will fix it, but I don't know if there's any greater change required, e.g. some synchronization around connect and onClosing.

The code in [3.5.3|https://github.com/apache/zookeeper/blame/1507f67a06175155003722297daeb60bc912af1d/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java#L206] looks very similar so I expect it was the same issue back then when this was originally raised. 

> Flaky NullPointerException while closing client connection
> ----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2966
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2966
>             Project: ZooKeeper
>          Issue Type: Sub-task
>          Components: java client
>    Affects Versions: 3.5.3
>            Reporter: Enrico Olivelli
>            Priority: Critical
>
> It is not always reproducible, I get this in system tests of client applications.
> ZK client 3.5.3, stacktrace self-explains
> {code:java}
> java.lang.NullPointerException
>     at org.apache.zookeeper.ClientCnxnSocketNetty.onClosing(ClientCnxnSocketNetty.java:206)
>     at org.apache.zookeeper.ClientCnxn$SendThread.close(ClientCnxn.java:1395)
>     at org.apache.zookeeper.ClientCnxn.disconnect(ClientCnxn.java:1440)
>     at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1467)
>     at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:1319){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)