You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Al...@swisscom.com on 2022/10/11 20:09:18 UTC

Problem in recovering...

Hi everyone,

We have a 3 nodes solrcloud cluster using solr operator. We are in version 8.11.1.
We have created a collection with 2 shards and 3 replicas on each shards.

We could index around 45 million documents, but since a few days, I could see that one node is “missing” 4.3 millions documents and is in the state of “recovering”.
And it doesn’t work, it is like stuck.

In the logs I get following error messages:

10/11/2022, 9:33:37 PM  ERROR false x:Documents_shard2_replica_n10  IndexFetcher    Error fetching file,​ doing one retry...
10/11/2022, 9:33:37 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _itx6_Lucene84_0.doc (downloaded 583008256 of 798710576 bytes)
10/11/2022, 9:33:39 PM  ERROR false x:Documents_shard2_replica_n10  IndexFetcher    Error deleting file: _itx6_Lucene84_0.doc
10/11/2022, 9:33:39 PM  ERROR false x:Documents_shard2_replica_n10  ReplicationHandler  Index fetch failed
10/11/2022, 9:33:39 PM  ERROR false x:Documents_shard2_replica_n10  RecoveryStrategy    Error while trying to recover
10/11/2022, 9:33:39 PM  ERROR false x:Documents_shard2_replica_n10  RecoveryStrategy    Recovery failed - trying again... (0)
10/11/2022, 9:34:37 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _fby0.fdt (downloaded 2404384768 of 2924978973 bytes)
10/11/2022, 9:34:37 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _mkqu.fdt (downloaded 2736783360 of 2937231251 bytes)
10/11/2022, 9:34:40 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _mkqu.fdt (downloaded 2736783360 of 2937231251 bytes)
10/11/2022, 9:34:41 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _fby0.fdt (downloaded 2404384768 of 2924978973 bytes)
10/11/2022, 9:34:41 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _mkqu.fdt (downloaded 2736783360 of 2937231251 bytes)
10/11/2022, 9:34:42 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _mkqu.fdt (downloaded 2736783360 of 2937231251 bytes)
10/11/2022, 9:34:44 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _mkqu.fdt (downloaded 2736783360 of 2937231251 bytes)
10/11/2022, 9:34:44 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _fby0.fdt (downloaded 2404384768 of 2924978973 bytes)
10/11/2022, 9:34:45 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _mkqu.fdt (downloaded 2736783360 of 2937231251 bytes)
10/11/2022, 9:34:46 PM  ERROR false x:Documents_shard1_replica_n4   IndexFetcher    Error fetching file,​ doing one retry...
10/11/2022, 9:34:46 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _mkqu.fdt (downloaded 2736783360 of 2937231251 bytes)
10/11/2022, 9:34:47 PM  ERROR false x:Documents_shard1_replica_n4   IndexFetcher    Error deleting file: _mkqu.fdt
10/11/2022, 9:34:48 PM  ERROR false x:Documents_shard1_replica_n4   ReplicationHandler  Index fetch failed
10/11/2022, 9:34:48 PM  ERROR false x:Documents_shard1_replica_n4   RecoveryStrategy    Error while trying to recover
10/11/2022, 9:34:48 PM  ERROR false x:Documents_shard1_replica_n4   RecoveryStrategy    Recovery failed - trying again... (0)
10/11/2022, 9:36:23 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _mk2d.fdt (downloaded 1245708288 of 2947895758 bytes)
10/11/2022, 9:36:23 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _ismo_Lucene84_0.tim (downloaded 111149056 of 230410466 bytes)
10/11/2022, 9:36:25 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _ismo_Lucene84_0.tim (downloaded 111149056 of 230410466 bytes)
10/11/2022, 9:36:25 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _ismo_Lucene84_0.tim (downloaded 111149056 of 230410466 bytes)
10/11/2022, 9:36:26 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _ismo_Lucene84_0.tim (downloaded 111149056 of 230410466 bytes)
10/11/2022, 9:36:26 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _ismo_Lucene84_0.tim (downloaded 111149056 of 230410466 bytes)
10/11/2022, 9:36:27 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _ismo_Lucene84_0.tim (downloaded 111149056 of 230410466 bytes)
10/11/2022, 9:36:28 PM  ERROR false x:Documents_shard1_replica_n4   IndexFetcher    Error fetching file,​ doing one retry...
10/11/2022, 9:36:28 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _ismo_Lucene84_0.tim (downloaded 111149056 of 230410466 bytes)
10/11/2022, 9:36:28 PM  ERROR false x:Documents_shard1_replica_n4   IndexFetcher    Error deleting file: _ismo_Lucene84_0.tim
10/11/2022, 9:36:29 PM  ERROR false x:Documents_shard1_replica_n4   ReplicationHandler  Index fetch failed
10/11/2022, 9:36:29 PM  ERROR false x:Documents_shard1_replica_n4   RecoveryStrategy    Error while trying to recover
10/11/2022, 9:36:29 PM  ERROR false x:Documents_shard1_replica_n4   RecoveryStrategy    Recovery failed - trying again... (1)
10/11/2022, 9:37:20 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _q41p.fdt (downloaded 983564288 of 2951956658 bytes)
10/11/2022, 9:37:20 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _f1ie.fdt (downloaded 2085617664 of 2922856230 bytes)
10/11/2022, 9:37:30 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _f1ie.fdt (downloaded 2085617664 of 2922856230 bytes)
10/11/2022, 9:37:41 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _f1ie.fdt (downloaded 2085617664 of 2922856230 bytes)
10/11/2022, 9:37:44 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _q41p.fdt (downloaded 983564288 of 2951956658 bytes)
10/11/2022, 9:37:52 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _f1ie.fdt (downloaded 2085617664 of 2922856230 bytes)
10/11/2022, 9:38:04 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _f1ie.fdt (downloaded 2085617664 of 2922856230 bytes)
10/11/2022, 9:38:11 PM  WARN false  x:Documents_shard2_replica_n10  IndexFetcher    Error in fetching file: _q41p.fdt (downloaded 983564288 of 2951956658 bytes)
10/11/2022, 9:38:15 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _f1ie.fdt (downloaded 2085617664 of 2922856230 bytes)
10/11/2022, 9:38:26 PM  ERROR false x:Documents_shard1_replica_n4   IndexFetcher    Error fetching file,​ doing one retry...
10/11/2022, 9:38:26 PM  WARN false  x:Documents_shard1_replica_n4   IndexFetcher    Error in fetching file: _f1ie.fdt (downloaded 2085617664 of 2922856230 bytes)
10/11/2022, 9:38:37 PM  ERROR false x:Documents_shard1_replica_n4   IndexFetcher    Error deleting file: _f1ie.fdt
10/11/2022, 9:38:37 PM  ERROR false x:Documents_shard1_replica_n4   ReplicationHandler  Index fetch failed
10/11/2022, 9:38:37 PM  ERROR false x:Documents_shard1_replica_n4   RecoveryStrategy    Error while trying to recover
10/11/2022, 9:38:37 PM  ERROR false x:Documents_shard1_replica_n4   RecoveryStrategy    Recovery failed - trying again... (2)

And at one point I got such stack-trace:

10/11/2022, 9:43:01 PM  ERROR true  x:Documents_shard1_replica_n4   IndexFetcher    Error deleting file: _mkqu.fdt
java.nio.file.NoSuchFileException: /var/solr/data/Documents_shard1_replica_n4/data/index.20221011193935739/_mkqu.fdt
    at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
    at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
    at java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source)
    at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source)
    at java.base/java.nio.file.Files.delete(Unknown Source)
    at org.apache.lucene.store.FSDirectory.privateDeleteFile(FSDirectory.java:370)
    at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:339)
    at org.apache.lucene.store.NRTCachingDirectory.deleteFile(NRTCachingDirectory.java:118)
    at org.apache.solr.handler.IndexFetcher$DirectoryFile.delete(IndexFetcher.java:1948)
    at org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1857)
    at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1743)
    at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1718)
    at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1109)
    at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:619)
    at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:384)
    at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:458)
    at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:252)
    at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:683)
    at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:339)
    at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:318)
    at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)


I’ve done on each indexes the command (without the exorcise argument):

java -cp lucene-core-8.11.1.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex

And I got on all 6 indexes the same message:  “ No problems were detected with this index “

What should I do to recover from this situation?

Thank you in advance for all the help you can give me!

Kind regards,
Alessandro