You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Tomas Eduardo Fernandez Lobbe (Jira)" <ji...@apache.org> on 2019/10/22 22:01:00 UTC
[jira] [Created] (SOLR-13859) ADDREPLICA stuck in
OverseerCollectionMessageHandler.waitToSeeReplicasInState
Tomas Eduardo Fernandez Lobbe created SOLR-13859:
----------------------------------------------------
Summary: ADDREPLICA stuck in OverseerCollectionMessageHandler.waitToSeeReplicasInState
Key: SOLR-13859
URL: https://issues.apache.org/jira/browse/SOLR-13859
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrCloud
Reporter: Tomas Eduardo Fernandez Lobbe
I noticed this every now and then in tests, ADDREPLICA command timeouts and it seems like the exceptions shows the command is stuck in {{OverseerCollectionMessageHandler.waitToSeeReplicasInState(OverseerCollectionMessageHandler.java:699)}}. There is section of the log
{noformat}
[junit4] 2> 160264 INFO (qtp1234431125-235) [n:127.0.0.1:56754_solr ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :addreplica with params action=ADDREPLICA&collection=tlog_replica_test_remove_leader&shard=shard1&type=TLOG&wt=javabin&version=2 and sendToOCPQueue=true
[junit4] 2> 160269 INFO (OverseerThreadFactory-14-thread-5-processing-n:127.0.0.1:56754_solr) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader s:shard1 ] o.a.s.c.a.c.AddReplicaCmd Node Identified 127.0.0.1:56754_solr for creating new replica of shard shard1 for collection tlog_replica_test_remove_leader
[junit4] 2> 160271 INFO (OverseerThreadFactory-14-thread-5-processing-n:127.0.0.1:56754_solr) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader s:shard1 ] o.a.s.c.a.c.AddReplicaCmd Returning CreateReplica command.
[junit4] 2> 160274 INFO (OverseerStateUpdate-72113680894263303-127.0.0.1:56754_solr-n_0000000000) [n:127.0.0.1:56754_solr ] o.a.s.c.o.SliceMutator createReplica() {
[junit4] 2> "operation":"addreplica",
[junit4] 2> "collection":"tlog_replica_test_remove_leader",
[junit4] 2> "shard":"shard1",
[junit4] 2> "core":"tlog_replica_test_remove_leader_shard1_replica_t5",
[junit4] 2> "state":"down",
[junit4] 2> "base_url":"http://127.0.0.1:56754/solr",
[junit4] 2> "node_name":"127.0.0.1:56754_solr",
[junit4] 2> "type":"TLOG"}
[junit4] 2> 160385 INFO (zkCallback-163-thread-3) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tlog_replica_test_remove_leader/state.json] for collection [tlog_replica_test_remove_leader] has occurred - updating... (live nodes size: [2])
[junit4] 2> 160385 INFO (zkCallback-163-thread-2) [ ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/tlog_replica_test_remove_leader/state.json] for collection [tlog_replica_test_remove_leader] has occurred - updating... (live nodes size: [2])
[junit4] 2> 210134 INFO (qtp1234431125-603) [n:127.0.0.1:56754_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
[junit4] 2> 210269 INFO (qtp50249358-694) [n:127.0.0.1:56755_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:INDEX.sizeInBytes&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:UPDATE./update.requests&key=solr.core.tlog_replica_test_r
emove_leader.shard1.replica_t2:QUERY./select.requests} status=0 QTime=131
[junit4] 2> 210272 INFO (qtp50249358-689) [n:127.0.0.1:56755_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
[junit4] 2> 250262 INFO (TEST-TestTlogReplica.testRemoveLeader-seed#[9E36ECDD7B3349CD]) [ ] o.a.s.c.TestTlogReplica tearDown deleting collection
[junit4] 2> 250265 INFO (qtp1234431125-603) [n:127.0.0.1:56754_solr ] o.a.s.h.a.CollectionsHandler Invoked Collection Action :delete with params name=tlog_replica_test_remove_leader&action=DELETE&wt=javabin&version=2 and sendToOCPQueue=true
[junit4] 2> 270281 INFO (qtp1234431125-600) [n:127.0.0.1:56754_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
[junit4] 2> 270334 INFO (qtp50249358-689) [n:127.0.0.1:56755_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:INDEX.sizeInBytes&key=solr.core.tlog_replica_test_remove_leader.shard1.replica_t2:UPDATE./update.requests&key=solr.core.tlog_replica_test_r
emove_leader.shard1.replica_t2:QUERY./select.requests} status=0 QTime=49
[junit4] 2> 270337 INFO (qtp50249358-694) [n:127.0.0.1:56755_solr ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=1
[junit4] 2> 280342 ERROR (OverseerThreadFactory-14-thread-5-processing-n:127.0.0.1:56754_solr) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader s:shard1 ] o.a.s.c.a.c.OverseerCollectionMessageHandler Collection: tlog_replica_test_remove_leader operation: addreplica failed:org.apache.solr.common.SolrException: Timed out waiting to see all replicas: [tlog_replica_test_remove_leader_shard1_replica_t5] in cluster state. Last state: DocCollection(tlog_replica_test_remove_leader//collections/tlog_replica_test_remove_leader/state.json/6)={
[junit4] 2> "pullReplicas":"0",
[junit4] 2> "replicationFactor":"0",
[junit4] 2> "shards":{"shard1":{
[junit4] 2> "range":"80000000-7fffffff",
[junit4] 2> "state":"active",
[junit4] 2> "replicas":{"core_node4":{
[junit4] 2> "core":"tlog_replica_test_remove_leader_shard1_replica_t2",
[junit4] 2> "base_url":"http://127.0.0.1:56755/solr",
[junit4] 2> "node_name":"127.0.0.1:56755_solr",
[junit4] 2> "state":"active",
[junit4] 2> "type":"TLOG",
[junit4] 2> "force_set_state":"false"}}}},
[junit4] 2> "router":{"name":"compositeId"},
[junit4] 2> "maxShardsPerNode":"100",
[junit4] 2> "autoAddReplicas":"false",
[junit4] 2> "nrtReplicas":"0",
[junit4] 2> "tlogReplicas":"2"}
[junit4] 2> at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.waitToSeeReplicasInState(OverseerCollectionMessageHandler.java:699)
[junit4] 2> at org.apache.solr.cloud.api.collections.AddReplicaCmd.getReplicaParams(AddReplicaCmd.java:263)
[junit4] 2> at org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:172)
[junit4] 2> at org.apache.solr.cloud.api.collections.AddReplicaCmd.call(AddReplicaCmd.java:93)
[junit4] 2> at org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:263)
[junit4] 2> at org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
[junit4] 2> at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
[junit4] 2> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[junit4] 2> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[junit4] 2> at java.base/java.lang.Thread.run(Thread.java:835)
[junit4] 2>
[junit4] 2> 280364 INFO (qtp1234431125-235) [n:127.0.0.1:56754_solr c:tlog_replica_test_remove_leader ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections params={action=ADDREPLICA&collection=tlog_replica_test_remove_leader&shard=shard1&type=TLOG&wt=javabin&version=2} status=500 QTime=120106
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org