You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Endika Posadas (Jira)" <ji...@apache.org> on 2020/05/04 09:55:00 UTC

[jira] [Created] (SOLR-14458) Solr Replica locked in recovering state after a Zookeeper disconnection

Endika Posadas created SOLR-14458:
-------------------------------------

             Summary: Solr Replica locked in recovering state after a Zookeeper disconnection
                 Key: SOLR-14458
                 URL: https://issues.apache.org/jira/browse/SOLR-14458
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 8.4.1
         Environment: A Solr cluster with 2 replicas that each has 2 shards split across 2 Windows VMS.
They use a 3 replica zookeeper across 3 vms.
            Reporter: Endika Posadas
         Attachments: replica7.log, solr-thread-dump.log, solr.log

In a solr cluster, a Solr instance containing two shards has lost connection with zookeeper. Upon reconnecting, it has checked the status with the leader and start a recovery. However, it's stuck in recovering status without making further progress (has been like that for days now).

 

Upon checking a thread dump, `recoveryExecutor-7-thread-3-processing-n` is  trying to acquire the lock to createa new Index Writer: `at org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)` (

after lock(iwLock.writeLock()){color:#cc7832};{color}). However, the ReentrantLock it's waiting for is never released. Moreover, no thread can be found holding the lock, leaving restarting Solr as the only solution.

There is no Error in the logs that can help with the issue. I have attached solr.log and a grep with node 7 lines, as well as a thread dump.

 

My hypothesis is that org.apache.solr.update.DefaultSolrCoreState#closeIndexWriter(org.apache.solr.core.SolrCore, boolean) was called once but for some reason openIndexWriter was skipped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org