You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Tomas Eduardo Fernandez Lobbe (Jira)" <ji...@apache.org> on 2019/10/22 22:01:00 UTC

[jira] [Created] (SOLR-13859) ADDREPLICA stuck in OverseerCollectionMessageHandler.waitToSeeReplicasInState

Tomas Eduardo Fernandez Lobbe created SOLR-13859:
----------------------------------------------------

             Summary: ADDREPLICA stuck in OverseerCollectionMessageHandler.waitToSeeReplicasInState
                 Key: SOLR-13859
                 URL: https://issues.apache.org/jira/browse/SOLR-13859
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
            Reporter: Tomas Eduardo Fernandez Lobbe


I noticed this every now and then in tests, ADDREPLICA command timeouts and it seems like the exceptions shows the command is stuck in {{OverseerCollectionMessageHandler.waitToSeeReplicasInState(OverseerCollectionMessageHandler.java:699)}}. There is section of the log
{noformat}
   [junit4]   2> 160264 INFO  (qtp1234431125-235) [n:127.0.0.1:56754_solr     ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :addreplica with params action=ADDREPLICA&collection=tlog_replica_test_remove_leader&shard=shard1&type=TLOG&wt=javabin&version=2 and sendToOCPQueue=true
   [junit4]   2> 160269 INFO  (OverseerThreadFactory-14-thread-5-processing-n:127.0.0.1:56754_solr) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader s:shard1   ] o.a.s.c.a.c.AddReplicaCmd Node Identified 127.0.0.1:56754_solr for creating new replica of shard shard1 for collection tlog_replica_test_remove_leader
   [junit4]   2> 160271 INFO  (OverseerThreadFactory-14-thread-5-processing-n:127.0.0.1:56754_solr) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader s:shard1   ] o.a.s.c.a.c.AddReplicaCmd Returning CreateReplica command.
   [junit4]   2> 160274 INFO  (OverseerStateUpdate-72113680894263303-127.0.0.1:56754_solr-n_0000000000) [n:127.0.0.1:56754_solr     ] o.a.s.c.o.SliceMutator createReplica() {
   [junit4]   2>   "operation":"addreplica",
   [junit4]   2>   "collection":"tlog_replica_test_remove_leader",
   [junit4]   2>   "shard":"shard1",
   [junit4]   2>   "core":"tlog_replica_test_remove_leader_shard1_replica_t5",
   [junit4]   2>   "state":"down",
   [junit4]   2>   "base_url":"http://127.0.0.1:56754/solr",
   [junit4]   2>   "node_name":"127.0.0.1:56754_solr",
   [junit4]   2>   "type":"TLOG"} 
   [junit4]   2> 160385 INFO  (zkCallback-163-thread-3) [     ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tlog_replica_test_remove_leader/state.json] for collection [tlog_replica_test_remove_leader] has occurred - updating... (live nodes size: [2])
   [junit4]   2> 160385 INFO  (zkCallback-163-thread-2) [     ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tlog_replica_test_remove_leader/state.json] for collection [tlog_replica_test_remove_leader] has occurred - updating... (live nodes size: [2])
   [junit4]   2> 210134 INFO  (qtp1234431125-603) [n:127.0.0.1:56754_solr     ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
   [junit4]   2> 210269 INFO  (qtp50249358-694) [n:127.0.0.1:56755_solr     ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:INDEX.sizeInBytes&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:UPDATE./update.requests&key=solr.core.tlog_replica_test_r
emove_leader.shard1.replica_t2:QUERY./select.requests} status=0 QTime=131
   [junit4]   2> 210272 INFO  (qtp50249358-689) [n:127.0.0.1:56755_solr     ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
   [junit4]   2> 250262 INFO  (TEST-TestTlogReplica.testRemoveLeader-seed#[9E36ECDD7B3349CD]) [     ] o.a.s.c.TestTlogReplica tearDown deleting collection
   [junit4]   2> 250265 INFO  (qtp1234431125-603) [n:127.0.0.1:56754_solr     ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :delete with params name=tlog_replica_test_remove_leader&action=DELETE&wt=javabin&version=2 and sendToOCPQueue=true
   [junit4]   2> 270281 INFO  (qtp1234431125-600) [n:127.0.0.1:56754_solr     ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
   [junit4]   2> 270334 INFO  (qtp50249358-689) [n:127.0.0.1:56755_solr     ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:INDEX.sizeInBytes&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:UPDATE./update.requests&key=solr.core.tlog_replica_test_r
emove_leader.shard1.replica_t2:QUERY./select.requests} status=0 QTime=49
   [junit4]   2> 270337 INFO  (qtp50249358-694) [n:127.0.0.1:56755_solr     ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
   [junit4]   2> 280342 ERROR (OverseerThreadFactory-14-thread-5-processing-n:127.0.0.1:56754_solr) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader s:shard1   ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: tlog_replica_test_remove_leader operation: addreplica failed:org.apache.solr.common.SolrException: Timed out waiting to see all replicas: [tlog_replica_test_remove_leader_shard1_replica_t5] in cluster state. Last state: DocCollection(tlog_replica_test_remove_leader//collections/tlog_replica_test_remove_leader/state.json/6)={
   [junit4]   2>   "pullReplicas":"0",
   [junit4]   2>   "replicationFactor":"0",
   [junit4]   2>   "shards":{"shard1":{
   [junit4]   2>       "range":"80000000-7fffffff",
   [junit4]   2>       "state":"active",
   [junit4]   2>       "replicas":{"core_node4":{
   [junit4]   2>           "core":"tlog_replica_test_remove_leader_shard1_replica_t2",
   [junit4]   2>           "base_url":"http://127.0.0.1:56755/solr",
   [junit4]   2>           "node_name":"127.0.0.1:56755_solr",
   [junit4]   2>           "state":"active",
   [junit4]   2>           "type":"TLOG",
   [junit4]   2>           "force_set_state":"false"}}}},
   [junit4]   2>   "router":{"name":"compositeId"},
   [junit4]   2>   "maxShardsPerNode":"100",
   [junit4]   2>   "autoAddReplicas":"false",
   [junit4]   2>   "nrtReplicas":"0",
   [junit4]   2>   "tlogReplicas":"2"}
   [junit4]   2>        at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.waitToSeeReplicasInState(OverseerCollectionMessageHandler.java:699)
   [junit4]   2>        at org.apache.solr.cloud.api.collections.AddReplicaCmd.getReplicaParams(AddReplicaCmd.java:263)
   [junit4]   2>        at org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:172)
   [junit4]   2>        at org.apache.solr.cloud.api.collections.AddReplicaCmd.call(AddReplicaCmd.java:93)
   [junit4]   2>        at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:263)
   [junit4]   2>        at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
   [junit4]   2>        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
   [junit4]   2>        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   [junit4]   2>        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   [junit4]   2>        at java.base/java.lang.Thread.run(Thread.java:835)
   [junit4]   2> 
   [junit4]   2> 280364 INFO  (qtp1234431125-235) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={action=ADDREPLICA&collection=tlog_replica_test_remove_leader&shard=shard1&type=TLOG&wt=javabin&version=2} status=500 QTime=120106
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org