You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Colvin Cowie (Jira)" <ji...@apache.org> on 2020/05/21 10:10:00 UTC

[jira] [Commented] (SOLR-14504) ZkController LiveNodesListener has NullPointerException in startup race

    [ https://issues.apache.org/jira/browse/SOLR-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113034#comment-17113034 ] 

Colvin Cowie commented on SOLR-14504:
-------------------------------------

So with breakpoints on in a debugger it's easy to reproduce this, but I'm not seeing a _nice_ way to write a test for it, at least not in the ZkControllerTest. There's a lot of stuff happening in the constructor of the ZkController (both registerLiveNodesListener and getSolrCloudManager are called, so the race only exists between those calls) and registerLiveNodesListener is private.

 

I've gotten part way to testing it by constructing the ZkController on another thread, but then the NullPointerException is just lost in a callback thread's stack... The NullPointerException causing the Core container load to fail can only happen when a node lost event occurs in the initial {{at org.apache.solr.cloud.ZkController.registerLiveNodesListener(ZkController.java:1035)}}

 

 

> ZkController LiveNodesListener has NullPointerException in startup race
> -----------------------------------------------------------------------
>
>                 Key: SOLR-14504
>                 URL: https://issues.apache.org/jira/browse/SOLR-14504
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.7, 7.7.1, 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>            Reporter: Colvin Cowie
>            Priority: Minor
>
> If a NODELOST event happens before the cloudManager is initialized then a NullPointerException will occur on this line [https://github.com/apache/lucene-solr/blob/c18666ad05afc02979c150aacd4810cff02e43f3/solr/core/src/java/org/apache/solr/cloud/ZkController.java#L1020]
> {code:java}
> byte[] json = Utils.toJSON(Collections.singletonMap("timestamp", cloudManager.getTimeSource().getEpochTimeNs())); {code}
> Rather than accessing cloudManager directly, getSolrCloudManager() should be called.
>  
> This happens very rarely, but if it happens it will stop Solr starting, result in "CoreContainer is either not initialized or shutting down". Snippet from 8.3.1
> {noformat}
> 2020-05-19 03:44:40.241 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.245 INFO  (zkConnectionManagerCallback-11-thread-1) [   ] o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.245 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
> 2020-05-19 03:44:40.359 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Waiting for client to connect to ZooKeeper
> 2020-05-19 03:44:40.361 INFO  (zkConnectionManagerCallback-13-thread-1) [   ] o.a.s.c.c.ConnectionManager zkClient has connected
> 2020-05-19 03:44:40.361 INFO  (main) [   ] o.a.s.c.c.ConnectionManager Client is connected to ZooKeeper
> 2020-05-19 03:44:40.417 INFO  (main) [   ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (0) -> (1)
> 2020-05-19
>  03:44:56.606 INFO  (zkCallback-12-thread-2) [   ] 
> o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> 
> (0)
> 2020-05-19 03:44:56.614 ERROR (main) [   ] 
> o.a.s.s.SolrDispatchFilter Could not start Solr. Check solr/home 
> property and the logs
> 2020-05-19 03:44:56.639 ERROR (main) [   ] o.a.s.c.SolrCore null:java.lang.NullPointerException
> 	at org.apache.solr.cloud.ZkController.lambda$registerLiveNodesListener$10(ZkController.java:1020)
> 	at org.apache.solr.common.cloud.ZkStateReader.registerLiveNodesListener(ZkStateReader.java:880)
> 	at org.apache.solr.cloud.ZkController.registerLiveNodesListener(ZkController.java:1035)
> 	at org.apache.solr.cloud.ZkController.init(ZkController.java:917)
> 	at org.apache.solr.cloud.ZkController.<init>(ZkController.java:473)
> 	at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:115)
> 	at org.apache.solr.core.CoreContainer.load(CoreContainer.java:631){noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org