You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2017/01/09 08:24:58 UTC

[jira] [Created] (SOLR-9945) LIR should check the node is recovering before bring it down

Cao Manh Dat created SOLR-9945:
----------------------------------

             Summary: LIR should check the node is recovering before bring it down
                 Key: SOLR-9945
                 URL: https://issues.apache.org/jira/browse/SOLR-9945
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Cao Manh Dat


When a node is recovering, the leader can meet an exception when trying to send an update to the buffering node. So the leader will try running LIR process: first set the node's state to DOWN, then send recovery OP to the node.
In the same time, PrepRecoveryOp will make the leader wait for a very long time to see the node's state is RECOVERING. 
This scenario can easily be achieved by using this test
{code}
String collection = "collection2";
CollectionAdminRequest
    .createCollection(collection, "config", 1, 2)
    .setMaxShardsPerNode(1)
    .process(cluster.getSolrClient());
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
    false, true, 30);
CloudSolrClient cloudClient = cluster.getSolrClient();

DocCollection docCollection = cloudClient.getZkStateReader().getClusterState().getCollection(collection);
Slice slice = docCollection.getSlice("shard1");
Replica replicaNode = slice.getReplicas(replica -> replica != slice.getLeader()).get(0);
JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);

new UpdateRequest()
    .add(sdoc("id", "1"))
    .process(cloudClient, collection);
ChaosMonkey.stop(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "2"))
    .process(cloudClient, collection);
ChaosMonkey.start(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "3"))
    .process(cloudClient, collection);
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
    false, true, 60);
CollectionAdminRequest
    .deleteCollection(collection)
    .process(cloudClient);  
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org