You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Endika Posadas (Jira)" <ji...@apache.org> on 2020/05/04 10:01:00 UTC

[jira] [Updated] (SOLR-14458) Solr Replica locked in recovering state after a Zookeeper disconnection

     [ https://issues.apache.org/jira/browse/SOLR-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Endika Posadas updated SOLR-14458:
----------------------------------
    Description: 
In a solr cluster, a Solr instance containing two shards has lost connection with zookeeper. Upon reconnecting, it has checked the status with the leader and start a recovery. However, it's stuck in recovering status without making further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  trying to acquire the lock to createa new Index Writer: `at org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)` (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the ReentrantLock it's waiting for is never released. Moreover, no thread can be found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached solr.log and a grep with node 7 lines, as well as a thread dump.

 

There is also no other recovery currently running. In Solr metrics, 4 recoveries have started, 3 have completed and 1 is running (forever).

 

My hypothesis is that org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore, boolean) was called once but for some reason openIndexWriter was skipped.

  was:
In a solr cluster, a Solr instance containing two shards has lost connection with zookeeper. Upon reconnecting, it has checked the status with the leader and start a recovery. However, it's stuck in recovering status without making further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  trying to acquire the lock to createa new Index Writer: `at org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)` (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the ReentrantLock it's waiting for is never released. Moreover, no thread can be found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached solr.log and a grep with node 7 lines, as well as a thread dump.

 

My hypothesis is that org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore, boolean) was called once but for some reason openIndexWriter was skipped.


> Solr Replica locked in recovering state after a Zookeeper disconnection
> -----------------------------------------------------------------------
>
>                 Key: SOLR-14458
>                 URL: https://issues.apache.org/jira/browse/SOLR-14458
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 8.4.1
>         Environment: A Solr cluster with 2 replicas that each has 2 shards split across 2 Windows VMS.
> They use a 3 replica zookeeper across 3 vms.
>            Reporter: Endika Posadas
>            Priority: Major
>         Attachments: replica7.log, solr-thread-dump.log, solr.log
>
>
> In a solr cluster, a Solr instance containing two shards has lost connection with zookeeper. Upon reconnecting, it has checked the status with the leader and start a recovery. However, it's stuck in recovering status without making further progress (has been like that for days now).
>  
> Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  trying to acquire the lock to createa new Index Writer: `at org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)` (
> after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the ReentrantLock it's waiting for is never released. Moreover, no thread can be found holding the lock, leaving restarting Solr as the only solution.
> There is no Error in the logs that can help with the issue. I have attached solr.log and a grep with node 7 lines, as well as a thread dump.
>  
> There is also no other recovery currently running. In Solr metrics, 4 recoveries have started, 3 have completed and 1 is running (forever).
>  
> My hypothesis is that org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore, boolean) was called once but for some reason openIndexWriter was skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org