You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Justin Sweeney <ju...@gmail.com> on 2020/04/23 23:28:14 UTC

Solr 8.2 Cloud Replication Locked

Hi all,

We are running Solr 8.2 Cloud in a cluster where we have a single TLOG
replica per shard and multiple PULL replicas for each shard. We have
noticed an issue recently where some of the PULL replicas stop replicating
from the masters. The will have a replication which outputs:

o.a.s.h.IndexFetcher Number of files in latest index in master:

Then nothing else for IndexFetcher after that. I went onto a few instances
and took a thread dump and we see the following where it seems to be locked
getting the index write lock. I don’t see anything else in the thread dump
indicating deadlock. Any ideas here?

"indexFetcher-19-thread-1" #468 prio=5 os_prio=0 cpu=285847.01ms
> elapsed=62993.13s tid=0x00007fa8fc004800 nid=0x254 waiting on condition
> [0x00007ef584ede000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
> - parking to wait for <0x00000003aa5e4ad8> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.6
> /LockSupport.java:234)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:980)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(java.base@11.0.6
> /AbstractQueuedSynchronizer.java:1288)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(java.base@11.0.6
> /ReentrantReadWriteLock.java:1131)
> at
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)
> at
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:240)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)
> at
> org.apache.solr.handler.ReplicationHandler.lambda$setupPolling$13(ReplicationHandler.java:1193)
> at
> org.apache.solr.handler.ReplicationHandler$$Lambda$668/0x0000000800d0f440.run(Unknown
> Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.6
> /Executors.java:515)
> at java.util.concurrent.FutureTask.runAndReset(java.base@11.0.6
> /FutureTask.java:305)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.6
> /ScheduledThreadPoolExecutor.java:305)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6
> /ThreadPoolExecutor.java:1128)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6
> /ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)