You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Abhishek Mishra <so...@gmail.com> on 2020/12/07 09:55:24 UTC

Inconsistent recovery status of replicas

Hello guys
I am using Solr cloud 7.7 on Kubernetes. During the adding of replica
sometimes we see inconsistency after successful addition nodes go to
recovery status sometimes it takes 2-3 minute to recover while sometimes it
takes more than an hour. We are getting this error.
We have 4 shards each shard has around 7GB of data. After seeing the system
metrics we see bandwidth exchanges are high between the leader and the new
replica node. Do we have any way to rate-limit the bandwidth exchange like
we had some configuration for it in master-slave? maxMbpersec something
like that?

Error

> 2020-12-01 13:40:34.983 ERROR (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec s:shard2 r:core_node3956) [c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 x:olxid-20200531_d6e431ec_shard2_replica_p3955] o.a.s.c.RecoveryStrategy Error while trying to recover:org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://solr-olxid-statefulset-tlog-7.solr-olxid-statefulset-headless.relevance:8983/solr/olxid-20200531_d6e431ec_shard2_replica_t139
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:654)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
> 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
> 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211)
> 	at org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:287)
> 	at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:215)
> 	at org.apache.solr.cloud.RecoveryStrategy.doReplicateOnlyRecovery(RecoveryStrategy.java:382)
> 	at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:328)
> 	at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:307)
> 	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.net.SocketTimeoutException: Read timed out
> 	at java.base/java.net.SocketInputStream.socketRead0(Native Method)
> 	at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:115)
> 	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:168)
> 	at java.base/java.net.SocketInputStream.read(SocketInputStream.java:140)
> 	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
> 	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
> 	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
> 	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
> 	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
> 	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
> 	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
> 	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
> 	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
> 	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> 	at org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
> 	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
> 	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
> 	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
> 	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> 	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
> 	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
> 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
> 	... 16 more2020-12-01 13:40:34.983 ERROR (recoveryExecutor-4-thread-1-processing-n:solr-olxid-statefulset-pull-9.solr-olxid-statefulset-headless.relevance:8983_solr x:olxid-20200531_d6e431ec_shard2_replica_p3955 c:olxid-20200531_d6e431ec s:shard2 r:core_node3956) [c:olxid-20200531_d6e431ec s:shard2 r:core_node3956 x:olxid-20200531_d6e431ec_shard2_replica_p3955] o.a.s.c.RecoveryStrategy Recovery failed - trying again... (1)
>
>