You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2017/01/09 08:26:58 UTC
[jira] [Updated] (SOLR-9945) LIR should check the node is recovering before bring it down

     [ https://issues.apache.org/jira/browse/SOLR-9945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cao Manh Dat updated SOLR-9945:
-------------------------------
    Description: 
When a node is recovering, the leader can meet an exception when trying to send an update to the buffering node. So the leader will try running LIR process: first set the node's state to DOWN, then send recovery OP to the node.
In the same time, PrepRecoveryOp will make the leader wait for a very long time to see the node's state is RECOVERING. 
This scenario can easily be achieved by using this test
{code}
String collection = "collection2";
CollectionAdminRequest
    .createCollection(collection, "config", 1, 2)
    .setMaxShardsPerNode(1)
    .process(cluster.getSolrClient());
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
    false, true, 30);
CloudSolrClient cloudClient = cluster.getSolrClient();

DocCollection docCollection = cloudClient.getZkStateReader().getClusterState().getCollection(collection);
Slice slice = docCollection.getSlice("shard1");
Replica replicaNode = slice.getReplicas(replica -> replica != slice.getLeader()).get(0);
JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);

new UpdateRequest()
    .add(sdoc("id", "1"))
    .process(cloudClient, collection);
ChaosMonkey.stop(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "2"))
    .process(cloudClient, collection);
ChaosMonkey.start(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "3"))
    .process(cloudClient, collection);
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
    false, true, 60);
CollectionAdminRequest
    .deleteCollection(collection)
    .process(cloudClient);  
{code}

Here are some log 
{code}
10942 INFO  (searcherExecutor-53-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.SolrCore [collection1_shard2_replica2] Registered new searcher Searcher@510d1ea5[collection1_shard2_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader())}
10942 INFO  (searcherExecutor-56-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node3) [n:127.0.0.1:36014_solr c:collection1 s:shard1 r:core_node3 x:collection1_shard1_replica2] o.a.s.c.SolrCore [collection1_shard1_replica2] Registered new searcher Searcher@505d7839[collection1_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader())}
10958 INFO  (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.ZkController Core needs to recover:collection1_shard2_replica2
10964 INFO  (updateExecutor-20-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
10970 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Starting recovery process. recoveringAfterStartup=true
10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy ###### startupVersions=[[-1556034709770731520]]
10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Begin buffering updates. core=[collection1_shard2_replica2]
10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.u.UpdateLog Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Publishing state of core [collection1_shard2_replica2] as recovering, leader is [http://127.0.0.1:42549/solr/collection1_shard2_replica1/] and I am [http://127.0.0.1:36014/solr/collection1_shard2_replica2/]
11129 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery command to [http://127.0.0.1:42549/solr]; [WaitForState: action=PREPRECOVERY&core=collection1_shard2_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node4&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
11131 INFO  (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr) [n:127.0.0.1:42549_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection1/state.json] for collection [collection1] has occurred - updating... (live nodes size: [2])
11131 INFO  (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection1/state.json] for collection [collection1] has occurred - updating... (live nodes size: [2])
11133 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node4, state: recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see collection1_shard2_replica1 (shard2 of collection1) have state: recovering
11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection1, shard=shard2, thisCore=collection1_shard2_replica1, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=recovering, localState=active, nodeName=127.0.0.1:36014_solr, coreNodeName=core_node4, onlyIfActiveCheckResult=false, nodeProps: core_node4:{"core":"collection1_shard2_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"recovering"}
11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Waited coreNodeName: core_node4, state: recovering, checkLive: true, onlyIfLeader: true for: 0 seconds.
11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={nodeName=127.0.0.1:36014_solr&onlyIfLeaderActive=true&core=collection1_shard2_replica1&coreNodeName=core_node4&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2} status=0 QTime=7
11160 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.s.SolrIndexSearcher Opening [Searcher@28717b92[collection2_shard1_replica2] main]
11161 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Configured ZooKeeperStorageIO with znodeBase: /configs/config
11162 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Loaded null at path _rest_managed.json using ZooKeeperStorageIO:path=/configs/config
11162 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.h.ReplicationHandler Commits will be reserved for  10000
11162 INFO  (searcherExecutor-54-thread-1-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.SolrCore [collection2_shard1_replica2] Registered new searcher Searcher@28717b92[collection2_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.0.0):C1)))}
11168 INFO  (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.ZkController Core needs to recover:collection2_shard1_replica2
11168 INFO  (updateExecutor-20-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
11168 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Starting recovery process. recoveringAfterStartup=true
11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy ###### startupVersions=[[1556034713508904960]]
11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Begin buffering updates. core=[collection2_shard1_replica2]
11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.u.UpdateLog Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Publishing state of core [collection2_shard1_replica2] as recovering, leader is [http://127.0.0.1:42549/solr/collection2_shard1_replica1/] and I am [http://127.0.0.1:36014/solr/collection2_shard1_replica2/]
11171 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery command to [http://127.0.0.1:42549/solr]; [WaitForState: action=PREPRECOVERY&core=collection2_shard1_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node1&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
11174 INFO  (qtp344759431-37) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node1, state: recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
11175 INFO  (qtp344759431-37) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see collection2_shard1_replica1 (shard1 of collection2) have state: recovering
11175 INFO  (qtp344759431-37) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection2, shard=shard1, thisCore=collection2_shard1_replica1, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=down, localState=active, nodeName=127.0.0.1:36014_solr, coreNodeName=core_node1, onlyIfActiveCheckResult=false, nodeProps: core_node1:{"core":"collection2_shard1_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"down"}
11278 INFO  (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection2/state.json] for collection [collection2] has occurred - updating... (live nodes size: [2])
11278 INFO  (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr) [n:127.0.0.1:42549_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection2/state.json] for collection [collection2] has occurred - updating... (live nodes size: [2])
11323 ERROR (updateExecutor-7-thread-1-processing-http:////127.0.0.1:36014//solr//collection2_shard1_replica2 x:collection2_shard1_replica1 r:core_node2 n:127.0.0.1:42549_solr s:shard1 c:collection2) [n:127.0.0.1:42549_solr c:collection2 s:shard1 r:core_node2 x:collection2_shard1_replica1] o.a.s.u.StreamingSolrClients error
java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
	at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126)
	at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138)
	at org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146)
	at org.apache.http.impl.io.ChunkedOutputStream.close(ChunkedOutputStream.java:205)
	at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:159)
	at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
	at org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:61)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:323)
	at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:186)
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

  was:
When a node is recovering, the leader can meet an exception when trying to send an update to the buffering node. So the leader will try running LIR process: first set the node's state to DOWN, then send recovery OP to the node.
In the same time, PrepRecoveryOp will make the leader wait for a very long time to see the node's state is RECOVERING. 
This scenario can easily be achieved by using this test
{code}
String collection = "collection2";
CollectionAdminRequest
    .createCollection(collection, "config", 1, 2)
    .setMaxShardsPerNode(1)
    .process(cluster.getSolrClient());
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
    false, true, 30);
CloudSolrClient cloudClient = cluster.getSolrClient();

DocCollection docCollection = cloudClient.getZkStateReader().getClusterState().getCollection(collection);
Slice slice = docCollection.getSlice("shard1");
Replica replicaNode = slice.getReplicas(replica -> replica != slice.getLeader()).get(0);
JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);

new UpdateRequest()
    .add(sdoc("id", "1"))
    .process(cloudClient, collection);
ChaosMonkey.stop(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "2"))
    .process(cloudClient, collection);
ChaosMonkey.start(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "3"))
    .process(cloudClient, collection);
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
    false, true, 60);
CollectionAdminRequest
    .deleteCollection(collection)
    .process(cloudClient);  
{code}



> LIR should check the node is recovering before bring it down
> ------------------------------------------------------------
>
>                 Key: SOLR-9945
>                 URL: https://issues.apache.org/jira/browse/SOLR-9945
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>
> When a node is recovering, the leader can meet an exception when trying to send an update to the buffering node. So the leader will try running LIR process: first set the node's state to DOWN, then send recovery OP to the node.
> In the same time, PrepRecoveryOp will make the leader wait for a very long time to see the node's state is RECOVERING. 
> This scenario can easily be achieved by using this test
> {code}
> String collection = "collection2";
> CollectionAdminRequest
>     .createCollection(collection, "config", 1, 2)
>     .setMaxShardsPerNode(1)
>     .process(cluster.getSolrClient());
> AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
>     false, true, 30);
> CloudSolrClient cloudClient = cluster.getSolrClient();
> DocCollection docCollection = cloudClient.getZkStateReader().getClusterState().getCollection(collection);
> Slice slice = docCollection.getSlice("shard1");
> Replica replicaNode = slice.getReplicas(replica -> replica != slice.getLeader()).get(0);
> JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);
> new UpdateRequest()
>     .add(sdoc("id", "1"))
>     .process(cloudClient, collection);
> ChaosMonkey.stop(replicaRunner);
> new UpdateRequest()
>     .add(sdoc("id", "2"))
>     .process(cloudClient, collection);
> ChaosMonkey.start(replicaRunner);
> new UpdateRequest()
>     .add(sdoc("id", "3"))
>     .process(cloudClient, collection);
> AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, cluster.getSolrClient().getZkStateReader(),
>     false, true, 60);
> CollectionAdminRequest
>     .deleteCollection(collection)
>     .process(cloudClient);  
> {code}
> Here are some log 
> {code}
> 10942 INFO  (searcherExecutor-53-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.SolrCore [collection1_shard2_replica2] Registered new searcher Searcher@510d1ea5[collection1_shard2_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 10942 INFO  (searcherExecutor-56-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard1_replica2 s:shard1 c:collection1 r:core_node3) [n:127.0.0.1:36014_solr c:collection1 s:shard1 r:core_node3 x:collection1_shard1_replica2] o.a.s.c.SolrCore [collection1_shard1_replica2] Registered new searcher Searcher@505d7839[collection1_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> 10958 INFO  (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.ZkController Core needs to recover:collection1_shard2_replica2
> 10964 INFO  (updateExecutor-20-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
> 10970 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Starting recovery process. recoveringAfterStartup=true
> 10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy ###### startupVersions=[[-1556034709770731520]]
> 10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Begin buffering updates. core=[collection1_shard2_replica2]
> 10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.u.UpdateLog Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> 10971 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Publishing state of core [collection1_shard2_replica2] as recovering, leader is [http://127.0.0.1:42549/solr/collection1_shard2_replica1/] and I am [http://127.0.0.1:36014/solr/collection1_shard2_replica2/]
> 11129 INFO  (recoveryExecutor-21-thread-1-processing-n:127.0.0.1:36014_solr x:collection1_shard2_replica2 s:shard2 c:collection1 r:core_node4) [n:127.0.0.1:36014_solr c:collection1 s:shard2 r:core_node4 x:collection1_shard2_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery command to [http://127.0.0.1:42549/solr]; [WaitForState: action=PREPRECOVERY&core=collection1_shard2_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node4&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
> 11131 INFO  (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr) [n:127.0.0.1:42549_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection1/state.json] for collection [collection1] has occurred - updating... (live nodes size: [2])
> 11131 INFO  (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection1/state.json] for collection [collection1] has occurred - updating... (live nodes size: [2])
> 11133 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node4, state: recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
> 11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see collection1_shard2_replica1 (shard2 of collection1) have state: recovering
> 11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection1, shard=shard2, thisCore=collection1_shard2_replica1, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=recovering, localState=active, nodeName=127.0.0.1:36014_solr, coreNodeName=core_node4, onlyIfActiveCheckResult=false, nodeProps: core_node4:{"core":"collection1_shard2_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"recovering"}
> 11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Waited coreNodeName: core_node4, state: recovering, checkLive: true, onlyIfLeader: true for: 0 seconds.
> 11140 INFO  (qtp344759431-38) [n:127.0.0.1:42549_solr    ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={nodeName=127.0.0.1:36014_solr&onlyIfLeaderActive=true&core=collection1_shard2_replica1&coreNodeName=core_node4&action=PREPRECOVERY&checkLive=true&state=recovering&onlyIfLeader=true&wt=javabin&version=2} status=0 QTime=7
> 11160 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.s.SolrIndexSearcher Opening [Searcher@28717b92[collection2_shard1_replica2] main]
> 11161 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Configured ZooKeeperStorageIO with znodeBase: /configs/config
> 11162 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.r.ManagedResourceStorage Loaded null at path _rest_managed.json using ZooKeeperStorageIO:path=/configs/config
> 11162 INFO  (coreLoadExecutor-52-thread-3-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.h.ReplicationHandler Commits will be reserved for  10000
> 11162 INFO  (searcherExecutor-54-thread-1-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.SolrCore [collection2_shard1_replica2] Registered new searcher Searcher@28717b92[collection2_shard1_replica2] main{ExitableDirectoryReader(UninvertingDirectoryReader(Uninverting(_0(7.0.0):C1)))}
> 11168 INFO  (coreZkRegister-47-thread-1-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.ZkController Core needs to recover:collection2_shard1_replica2
> 11168 INFO  (updateExecutor-20-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.u.DefaultSolrCoreState Running recovery
> 11168 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Starting recovery process. recoveringAfterStartup=true
> 11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy ###### startupVersions=[[1556034713508904960]]
> 11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Begin buffering updates. core=[collection2_shard1_replica2]
> 11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.u.UpdateLog Starting to buffer updates. FSUpdateLog{state=ACTIVE, tlog=null}
> 11169 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Publishing state of core [collection2_shard1_replica2] as recovering, leader is [http://127.0.0.1:42549/solr/collection2_shard1_replica1/] and I am [http://127.0.0.1:36014/solr/collection2_shard1_replica2/]
> 11171 INFO  (recoveryExecutor-21-thread-2-processing-n:127.0.0.1:36014_solr x:collection2_shard1_replica2 s:shard1 c:collection2 r:core_node1) [n:127.0.0.1:36014_solr c:collection2 s:shard1 r:core_node1 x:collection2_shard1_replica2] o.a.s.c.RecoveryStrategy Sending prep recovery command to [http://127.0.0.1:42549/solr]; [WaitForState: action=PREPRECOVERY&core=collection2_shard1_replica1&nodeName=127.0.0.1:36014_solr&coreNodeName=core_node1&state=recovering&checkLive=true&onlyIfLeader=true&onlyIfLeaderActive=true]
> 11174 INFO  (qtp344759431-37) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Going to wait for coreNodeName: core_node1, state: recovering, checkLive: true, onlyIfLeader: true, onlyIfLeaderActive: true
> 11175 INFO  (qtp344759431-37) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp Will wait a max of 183 seconds to see collection2_shard1_replica1 (shard1 of collection2) have state: recovering
> 11175 INFO  (qtp344759431-37) [n:127.0.0.1:42549_solr    ] o.a.s.h.a.PrepRecoveryOp In WaitForState(recovering): collection=collection2, shard=shard1, thisCore=collection2_shard1_replica1, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=down, localState=active, nodeName=127.0.0.1:36014_solr, coreNodeName=core_node1, onlyIfActiveCheckResult=false, nodeProps: core_node1:{"core":"collection2_shard1_replica2","base_url":"http://127.0.0.1:36014/solr","node_name":"127.0.0.1:36014_solr","state":"down"}
> 11278 INFO  (zkCallback-23-thread-1-processing-n:127.0.0.1:36014_solr) [n:127.0.0.1:36014_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection2/state.json] for collection [collection2] has occurred - updating... (live nodes size: [2])
> 11278 INFO  (zkCallback-12-thread-2-processing-n:127.0.0.1:42549_solr) [n:127.0.0.1:42549_solr    ] o.a.s.c.c.ZkStateReader A cluster state change: [WatchedEvent state:SyncConnected type:NodeDataChanged path:/collections/collection2/state.json] for collection [collection2] has occurred - updating... (live nodes size: [2])
> 11323 ERROR (updateExecutor-7-thread-1-processing-http:////127.0.0.1:36014//solr//collection2_shard1_replica2 x:collection2_shard1_replica1 r:core_node2 n:127.0.0.1:42549_solr s:shard1 c:collection2) [n:127.0.0.1:42549_solr c:collection2 s:shard1 r:core_node2 x:collection2_shard1_replica1] o.a.s.u.StreamingSolrClients error
> java.net.SocketException: Broken pipe (Write failed)
> 	at java.net.SocketOutputStream.socketWrite0(Native Method)
> 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
> 	at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> 	at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:126)
> 	at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:138)
> 	at org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:146)
> 	at org.apache.http.impl.io.ChunkedOutputStream.close(ChunkedOutputStream.java:205)
> 	at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:159)
> 	at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:162)
> 	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:237)
> 	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:122)
> 	at org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:61)
> 	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
> 	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
> 	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
> 	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> 	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
> 	at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:323)
> 	at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:186)
> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org