You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Zhaohui Ma (JIRA)" <ji...@apache.org> on 2018/12/12 07:27:00 UTC

[jira] [Updated] (SOLR-13061) Solr replica remaining down status when hitting the maxQueueSize as 20000

     [ https://issues.apache.org/jira/browse/SOLR-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhaohui Ma updated SOLR-13061:
------------------------------
       Priority: Blocker  (was: Critical)
    Description: 
1. Cluster info: 6 nodes, 30 Solr servers

1000 collections, 10 shards per collection, 3 replica per shard

Exception happened when restarting Solr cluster.

 

2. Exception happened when restarting Solr cluster. The question is NO exception hander is defined when this exception "java.lang.IllegalStateException: queue is full" is thrown when arriving at the threshold

STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to preRegister and never come up again.

 

3. Suggestions:

a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable?

b. IllegalStateException should be handled and retry logic should be added.

 

4. Detailed error is given as below.

2018-12-12 11:20:24,737 | ERROR | coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | Error waiting for SolrCore to be created | org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
 java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
 at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
 ... 5 more
 Caused by: java.lang.IllegalStateException: queue is full
 at org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
 at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
 ... 6 more

  was:
1. Cluster info: 6 nodes, 30 Solr servers

1000 collections, 10 shards per collection, 3 replica per shard

Exception happened when restarting Solr cluster.

2. Exception happened when restarting Solr cluster. The question is NO exception hander is defined when this exception "java.lang.IllegalStateException: queue is full" is thrown when arriving at the threshold

STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to preRegister and never come up again.

 

3. Suggestions:

a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable?

b. IllegalStateException should be handled and retry logic should be added.

 

4. Detailed error is given as below.

2018-12-12 11:20:24,737 | ERROR | coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | Error waiting for SolrCore to be created | org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:192)
 at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
 at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
 ... 5 more
Caused by: java.lang.IllegalStateException: queue is full
 at org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
 at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
 ... 6 more


> Solr replica remaining down status when hitting the maxQueueSize as 20000
> -------------------------------------------------------------------------
>
>                 Key: SOLR-13061
>                 URL: https://issues.apache.org/jira/browse/SOLR-13061
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 7.2.1
>         Environment: Cluster info: 6 nodes, 30 Solr servers
> 1000 collections, 10 shards per collection, 3 replica per shard
> Exception happened when restarting Solr cluster.
>            Reporter: Zhaohui Ma
>            Priority: Blocker
>              Labels: performance
>
> 1. Cluster info: 6 nodes, 30 Solr servers
> 1000 collections, 10 shards per collection, 3 replica per shard
> Exception happened when restarting Solr cluster.
>  
> 2. Exception happened when restarting Solr cluster. The question is NO exception hander is defined when this exception "java.lang.IllegalStateException: queue is full" is thrown when arriving at the threshold
> STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to preRegister and never come up again.
>  
> 3. Suggestions:
> a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable?
> b. IllegalStateException should be handled and retry logic should be added.
>  
> 4. Detailed error is given as below.
> 2018-12-12 11:20:24,737 | ERROR | coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | Error waiting for SolrCore to be created | org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
>  java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
>  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>  at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
>  at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
>  ... 5 more
>  Caused by: java.lang.IllegalStateException: queue is full
>  at org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
>  at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
>  at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
>  at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
>  at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
>  ... 6 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org