You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Endika Posadas (Jira)" <ji...@apache.org> on 2020/05/05 14:50:00 UTC

[jira] [Resolved] (SOLR-14458) Solr Replica locked in recovering state after a Zookeeper disconnection

     [ https://issues.apache.org/jira/browse/SOLR-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Endika Posadas resolved SOLR-14458.
-----------------------------------
    Resolution: Duplicate

> Solr Replica locked in recovering state after a Zookeeper disconnection
> -----------------------------------------------------------------------
>
>                 Key: SOLR-14458
>                 URL: https://issues.apache.org/jira/browse/SOLR-14458
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.4.1
>         Environment: A Solr cluster with 2 replicas that each has 2 shards split across 2 Windows VMS.
> They use a 3 replica zookeeper across 3 vms.
>            Reporter: Endika Posadas
>            Priority: Major
>         Attachments: image-2020-05-05-09-47-27-854.png, replica7.log, solr-thread-dump.log, solr.log, solrrecovering.png
>
>
> In a solr cluster, a Solr instance containing two shards has lost connection with zookeeper. Upon reconnecting, it has checked the status with the leader and start a recovery. However, it's stuck in recovering status without making further progress (has been like that for days now).
>  
> Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  trying to acquire the lock to createa new Index Writer: `at org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)` (
> after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the ReentrantLock it's waiting for is never released. Moreover, no thread can be found holding the lock, leaving restarting Solr as the only solution.
> There is no Error in the logs that can help with the issue. I have attached solr.log and a grep with node 7 lines, as well as a thread dump.
>  
> There is also no other recovery currently running. In Solr metrics, 4 recoveries have started, 3 have completed and 1 is running (forever).
>  
> My hypothesis is that org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore, boolean) was called once but for some reason openIndexWriter was skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org