You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Endika Posadas (Jira)" <ji...@apache.org> on 2020/05/05 14:50:00 UTC
[jira] [Resolved] (SOLR-14458) Solr Replica locked in recovering
state after a Zookeeper disconnection
[ https://issues.apache.org/jira/browse/SOLR-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Endika Posadas resolved SOLR-14458.
-----------------------------------
Resolution: Duplicate
> Solr Replica locked in recovering state after a Zookeeper disconnection
> -----------------------------------------------------------------------
>
> Key: SOLR-14458
> URL: https://issues.apache.org/jira/browse/SOLR-14458
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: 8.4.1
> Environment: A Solr cluster with 2 replicas that each has 2 shards split across 2 Windows VMS.
> They use a 3 replica zookeeper across 3 vms.
> Reporter: Endika Posadas
> Priority: Major
> Attachments: image-2020-05-05-09-47-27-854.png, replica7.log, solr-thread-dump.log, solr.log, solrrecovering.png
>
>
> In a solr cluster, a Solr instance containing two shards has lost connection with zookeeper. Upon reconnecting, it has checked the status with the leader and start a recovery. However, it's stuck in recovering status without making further progress (has been like that for days now).
>
> Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is trying to acquire the lock to createa new Index Writer: `at org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)` (
> after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the ReentrantLock it's waiting for is never released. Moreover, no thread can be found holding the lock, leaving restarting Solr as the only solution.
> There is no Error in the logs that can help with the issue. I have attached solr.log and a grep with node 7 lines, as well as a thread dump.
>
> There is also no other recovery currently running. In Solr metrics, 4 recoveries have started, 3 have completed and 1 is running (forever).
>
> My hypothesis is that org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore, boolean) was called once but for some reason openIndexWriter was skipped.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org