You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by wg85907 <ge...@sina.com> on 2017/09/15 10:55:55 UTC

Meet CorruptIndexException while shutdown one node in Solr cloud

Hi team,
        Currently I am using Solr 4.10 in tomcat. I have a one shard Solr
Cloud with 3 replicas. I set heap size to 15GB for each node. As I have big
data volume and large amount of query request. So always meet frequent full
GC issue. We have checked this and found that many memory was used as field
cache by Solr. To avoid this, we begin to reboot tomcat instance one by one
in schedule. We don't kill any process but run script  "catalina.sh stop" to
shutdown tomcat gracefully. To keep message not pending,  we receive message
from user all the time and send update request to Solr once get new message.
This means Solr may get update request during shutdown. I think that is the
reason we get  CorruptIndexException. Since we begin to do the reboot, we
always get CorruptIndexException. The trace is as below:
2017-09-14 04:25:49,241
ERROR[commitScheduler-15-thread-1][R31609](CommitTracker) - auto commit
error...:org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
        at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:607)
        at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException:
liveDocs.count()=33574 info.docCount=34156 info.getDelCount()=584
(filename=_1uvck_k.del)
        at
org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.readLiveDocs(Lucene40LiveDocsFormat.java:96)
        at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:116)
        at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:144)
        at
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
        at
org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271)
        at
org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262)
        at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)
        at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:279)
        at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476)
        ... 10 more


        As we shutdown Solr gracefully, I think Solr should be strong enough
to handle this case. Please give me some advice about why this happen and
what we can do to avoid this. Ps below is some of our solrConfig cotent:

<autoCommit>
<maxTime>60000</maxTime>
<openSearcher>true</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>

Regards,
Geng, Wei



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Meet CorruptIndexException while shutdown one node in Solr cloud

Posted by wg85907 <ge...@sina.com>.

Hi Erick,
        Thanks for your advice about having openSearcher set to true
unnecessary for my case. For CorruptIndexException issue, I think Solr
should handle this quite well too. Because I always shutdown tomcat
gracefully. 
         Recently I did a couple of tests about this issue. When keep
posting update request to Solr and stop one of three tomcat node in a single
shard cluster, it is easy to reproduction CorruptIndexException, no matter
the stop node is leader node or replica node. So I think this is a Bug of
Solr. Any idea how can I avoid meeting this issue? For example if I can
remove one node from zookeeper before stop it. Also please show me if reboot
tomcat node is the only way to resolve the memory issue. If I can control
the field cache size, then reboot is unnecessary.

Below is the trace when start tomcat and first time meet
CorruptIndexException issue:
2017-09-19 10:18:57,614 ERROR [RecoveryThread][RQ-Init]
(SolrException.java:142) - SnapPull failed
:org.apache.solr.common.SolrException: Error opening new searcher
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
        at
org.apache.solr.handler.SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:673)
        at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:493)
        at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
        at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:163)
        at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:447)
        at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235)
Caused by: org.apache.lucene.index.CorruptIndexException:
liveDocs.count()=10309577 info.docCount=15057819 info.getDelCount()=4748252
(filename=_4y65a_13g.del)
        at
org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.readLiveDocs(Lucene40LiveDocsFormat.java:96)
        at
org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:116)
        at
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:144)
        at
org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:238)
        at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:104)
        at
org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:422)
        at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:279)
        at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476)
        ... 7 more


Regards.
Geng, Wei 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Meet CorruptIndexException while shutdown one node in Solr cloud

Posted by Erick Erickson <er...@gmail.com>.

bq: This means Solr may get update request during shutdown. I think
that is the reason we get  CorruptIndexException.

This is unlikely, Solr should handle this quite well. More likely you
encountered some other issue, one possibility is that you had a disk
full situation and that was the root of your issue.

I'll add as an aside that having openSearcher set to true in your
autoCommit setting _and_ setting autoSoftCommit is unnecessary, choose
one or the other.

See: https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Fri, Sep 15, 2017 at 3:55 AM, wg85907 <ge...@sina.com> wrote:
> Hi team,
>         Currently I am using Solr 4.10 in tomcat. I have a one shard Solr
> Cloud with 3 replicas. I set heap size to 15GB for each node. As I have big
> data volume and large amount of query request. So always meet frequent full
> GC issue. We have checked this and found that many memory was used as field
> cache by Solr. To avoid this, we begin to reboot tomcat instance one by one
> in schedule. We don't kill any process but run script  "catalina.sh stop" to
> shutdown tomcat gracefully. To keep message not pending,  we receive message
> from user all the time and send update request to Solr once get new message.
> This means Solr may get update request during shutdown. I think that is the
> reason we get  CorruptIndexException. Since we begin to do the reboot, we
> always get CorruptIndexException. The trace is as below:
> 2017-09-14 04:25:49,241
> ERROR[commitScheduler-15-thread-1][R31609](CommitTracker) - auto commit
> error...:org.apache.solr.common.SolrException: Error opening new searcher
>         at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1565)
>         at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1677)
>         at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:607)
>         at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.lucene.index.CorruptIndexException:
> liveDocs.count()=33574 info.docCount=34156 info.getDelCount()=584
> (filename=_1uvck_k.del)
>         at
> org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.readLiveDocs(Lucene40LiveDocsFormat.java:96)
>         at
> org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:116)
>         at
> org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:144)
>         at
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
>         at
> org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3271)
>         at
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3262)
>         at
> org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:421)
>         at
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:279)
>         at
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
>         at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1476)
>         ... 10 more
>
>
>         As we shutdown Solr gracefully, I think Solr should be strong enough
> to handle this case. Please give me some advice about why this happen and
> what we can do to avoid this. Ps below is some of our solrConfig cotent:
>
> <autoCommit>
> <maxTime>60000</maxTime>
> <openSearcher>true</openSearcher>
> </autoCommit>
> <autoSoftCommit>
> <maxTime>1000</maxTime>
> </autoSoftCommit>
>
> Regards,
> Geng, Wei
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html