You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Bruno Osiek <ba...@gmail.com> on 2014/11/08 18:47:32 UTC

Help with SolrCloud exceptions while recovering

Hi,

I am a newbie SolrCloud enthusiast. My goal is to implement an
infrastructure to enable text analysis (clustering, classification,
information extraction, sentiment analysis, etc).

My development environment consists of one machine, quad-core processor,
16GB RAM and 1TB HD.

Have started implementing Apache Flume, Twitter as source and SolrCloud
(within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
configuration and managing cluster.

The pseudo-distributed cluster consists of one collection, three shards
each with three replicas.

Everything runs smoothly for a while. After 50.000 tweets committed
(actually CloudSolrServer commits every batch consisting of 500 documents)
randomly SolrCloud starts logging exceptions: Lucene file not found,
IndexWriter cannot be opened, replication unsuccessful and the likes.
Recovery starts with no success until replica goes down.

Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
same results.

I have looked everywhere for help before writing this email. My guess right
now is that the problem lies with SolrCloud and Zookeeper connection,
although haven't seen any such exception.

Any reference or help will be welcomed.

Cheers,
B.

Re: Help with SolrCloud exceptions while recovering

Posted by Erick Erickson <er...@gmail.com>.

Glad to hear that! Thanks for closing this out.

Best,
Erick

On Sun, Nov 9, 2014 at 4:55 PM, Bruno Osiek <ba...@gmail.com> wrote:
> Erick,
>
> Once again thank you very much for your attention.
>
> Now my pseudo-distributed SolrCloud is configured with no inconsistency. An
> additional problem was starting Jboss with "solr.data.dir" set to a path
> not expected by Solr (actually it was not even underneath solr.home
> directory).
>
> This thread (
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAO8xR5Zv8O-s6ZN7yPAXPzPoURQjKnBsm59mBE6H3DPFYkgcNA@mail.gmail.com%3E)
> explains the inconsistency.
>
> I found no need to change Solr data directory. After commenting this
> property at Jboss' standalone.xml and setting
> "<lockType>${solr.lock.type:native}</lockType>" everything started to work
> properly.
>
> Regards,
> Bruno
>
>
>
> 2014-11-09 14:35 GMT-02:00 Erick Erickson <er...@gmail.com>:
>
>> OK, we're _definitely_ in the speculative realm here, so don't think
>> I know more than I do ;)...
>>
>> The next thing I'd try is to go back to "native" as the lock type on the
>> theory that the lock type wasn't your problem, it was the too-frequent
>> commits.
>>
>> bq: This file "_1.nvm" once existed. Was deleted during one auto commit ,
>> but
>> remains somewhere in a queue for deletion
>>
>> Assuming Unix, this is entirely expected. Searchers have all the files
>> open. Commits
>> do background merges, which may delete segments. So the current searcher
>> may
>> have the file open even though it's been "merged away". When the searcher
>> closes, the file will actually truly disappear.
>>
>> It's more complicated on Windows but eventually that's what happens
>>
>> Anyway, keep us posted. If this continues to occur, please open a new
>> thread,
>> that might catch the eye of people who are deep into Lucene file locking...
>>
>> Best,
>> Erick
>>
>> On Sun, Nov 9, 2014 at 6:45 AM, Bruno Osiek <ba...@gmail.com> wrote:
>> > Hi Erick,
>> >
>> > Thank you very much for your reply.
>> > I disabled client commit while setting commits at solconfig.xml as
>> follows:
>> >
>> >      <autoCommit>
>> >        <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
>> >        <openSearcher>false</openSearcher>
>> >      </autoCommit>
>> >
>> >      <autoSoftCommit>
>> >        <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
>> >      </autoSoftCommit>
>> >
>> > The picture changed for the better. No more index corruption, endless
>> > replication trials and, up till now, 16 hours since start-up and more
>> than
>> > 142k tweet downloaded, shards and replicas are "active".
>> >
>> > One problem remains though. While auto committing Solr logs the following
>> > stack-trace
>> >
>> > 00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
>> > (commitScheduler-25-thread-1) auto commit
>> > error...:org.apache.solr.common.SolrException: *Error opening new
>> searcher*
>> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
>> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
>> > at
>> >
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
>> > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> > at
>> >
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>> > at
>> >
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > at java.lang.Thread.run(Thread.java:745)
>> > *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
>> > _1.nvm*
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
>> > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
>> > at java.util.TimSort.sort(TimSort.java:203)
>> > at java.util.TimSort.sort(TimSort.java:173)
>> > at java.util.Arrays.sort(Arrays.java:659)
>> > at java.util.Collections.sort(Collections.java:217)
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
>> > at
>> >
>> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
>> > at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
>> > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
>> > at
>> >
>> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
>> > at
>> >
>> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
>> > at
>> >
>> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
>> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
>> > ... 10 more
>> > *Caused by: java.io.FileNotFoundException: _1.nvm*
>> > at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
>> > at
>> >
>> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
>> > at
>> >
>> org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
>> > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
>> > at
>> >
>> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
>> > ... 24 more
>> >
>> > This file "_1.nvm" once existed. Was deleted during one auto commit , but
>> > remains somewhere in a queue for deletion. I believe the consequence is
>> > that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status
>> is
>> > off for all shards' replica number 3. If I understand correctly this
>> means
>> > that changes to the index are not becoming visible.
>> >
>> > Once again I tried to find possible reasons for that situation, but none
>> of
>> > the threads found seems to reflect my case.
>> >
>> > My lock type is set to: <lockType>${solr.lock.type:single}</lockType>.
>> This
>> > is due to lock.wait timeout error with both "native" and "simple" when
>> > trying to create collection using the commands API. There is a thread
>> > discussing this issue:
>> >
>> >
>> http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html
>> >
>> > The only thing is that "single" should only be used if "there is no
>> > possibility of another process trying to modify the index" and I
>> > cannot guarantee that. Could that be the cause of the file not found
>> > exception?
>> >
>> > Thanks once again for your help.
>> >
>> > Regards,
>> > Bruno.
>> >
>> >
>> >
>> > 2014-11-08 18:36 GMT-02:00 Erick Erickson <er...@gmail.com>:
>> >
>> >> First. for tweets committing every 500 docs is much too frequent.
>> >> Especially from the client and super-especially if you have multiple
>> >> clients running. I'd recommend you just configure solrconfig this way
>> >> as a place to start and do NOT commit from any clients.
>> >> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
>> >> 2> a soft commit every minute
>> >>
>> >> This latter governs how long it'll be between when a doc is indexed and
>> >> when
>> >> can be searched.
>> >>
>> >> Here's a long post about how all this works:
>> >>
>> >>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>
>> >>
>> >> As far as the rest, it's a puzzle definitely. If it continues, a
>> complete
>> >> stack
>> >> trace would be a good thing to start with.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <ba...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > I am a newbie SolrCloud enthusiast. My goal is to implement an
>> >> > infrastructure to enable text analysis (clustering, classification,
>> >> > information extraction, sentiment analysis, etc).
>> >> >
>> >> > My development environment consists of one machine, quad-core
>> processor,
>> >> > 16GB RAM and 1TB HD.
>> >> >
>> >> > Have started implementing Apache Flume, Twitter as source and
>> SolrCloud
>> >> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
>> >> > configuration and managing cluster.
>> >> >
>> >> > The pseudo-distributed cluster consists of one collection, three
>> shards
>> >> > each with three replicas.
>> >> >
>> >> > Everything runs smoothly for a while. After 50.000 tweets committed
>> >> > (actually CloudSolrServer commits every batch consisting of 500
>> >> documents)
>> >> > randomly SolrCloud starts logging exceptions: Lucene file not found,
>> >> > IndexWriter cannot be opened, replication unsuccessful and the likes.
>> >> > Recovery starts with no success until replica goes down.
>> >> >
>> >> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1)
>> with
>> >> > same results.
>> >> >
>> >> > I have looked everywhere for help before writing this email. My guess
>> >> right
>> >> > now is that the problem lies with SolrCloud and Zookeeper connection,
>> >> > although haven't seen any such exception.
>> >> >
>> >> > Any reference or help will be welcomed.
>> >> >
>> >> > Cheers,
>> >> > B.
>> >>
>>

Re: Help with SolrCloud exceptions while recovering

Posted by Bruno Osiek <ba...@gmail.com>.

Erick,

Once again thank you very much for your attention.

Now my pseudo-distributed SolrCloud is configured with no inconsistency. An
additional problem was starting Jboss with "solr.data.dir" set to a path
not expected by Solr (actually it was not even underneath solr.home
directory).

This thread (
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3CCAO8xR5Zv8O-s6ZN7yPAXPzPoURQjKnBsm59mBE6H3DPFYkgcNA@mail.gmail.com%3E)
explains the inconsistency.

I found no need to change Solr data directory. After commenting this
property at Jboss' standalone.xml and setting
"<lockType>${solr.lock.type:native}</lockType>" everything started to work
properly.

Regards,
Bruno



2014-11-09 14:35 GMT-02:00 Erick Erickson <er...@gmail.com>:

> OK, we're _definitely_ in the speculative realm here, so don't think
> I know more than I do ;)...
>
> The next thing I'd try is to go back to "native" as the lock type on the
> theory that the lock type wasn't your problem, it was the too-frequent
> commits.
>
> bq: This file "_1.nvm" once existed. Was deleted during one auto commit ,
> but
> remains somewhere in a queue for deletion
>
> Assuming Unix, this is entirely expected. Searchers have all the files
> open. Commits
> do background merges, which may delete segments. So the current searcher
> may
> have the file open even though it's been "merged away". When the searcher
> closes, the file will actually truly disappear.
>
> It's more complicated on Windows but eventually that's what happens
>
> Anyway, keep us posted. If this continues to occur, please open a new
> thread,
> that might catch the eye of people who are deep into Lucene file locking...
>
> Best,
> Erick
>
> On Sun, Nov 9, 2014 at 6:45 AM, Bruno Osiek <ba...@gmail.com> wrote:
> > Hi Erick,
> >
> > Thank you very much for your reply.
> > I disabled client commit while setting commits at solconfig.xml as
> follows:
> >
> >      <autoCommit>
> >        <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
> >        <openSearcher>false</openSearcher>
> >      </autoCommit>
> >
> >      <autoSoftCommit>
> >        <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
> >      </autoSoftCommit>
> >
> > The picture changed for the better. No more index corruption, endless
> > replication trials and, up till now, 16 hours since start-up and more
> than
> > 142k tweet downloaded, shards and replicas are "active".
> >
> > One problem remains though. While auto committing Solr logs the following
> > stack-trace
> >
> > 00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
> > (commitScheduler-25-thread-1) auto commit
> > error...:org.apache.solr.common.SolrException: *Error opening new
> searcher*
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
> > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
> > at
> >
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
> > at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> > at
> >
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> > _1.nvm*
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
> > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
> > at java.util.TimSort.sort(TimSort.java:203)
> > at java.util.TimSort.sort(TimSort.java:173)
> > at java.util.Arrays.sort(Arrays.java:659)
> > at java.util.Collections.sort(Collections.java:217)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
> > at
> >
> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
> > at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
> > at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
> > at
> >
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
> > at
> >
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
> > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
> > ... 10 more
> > *Caused by: java.io.FileNotFoundException: _1.nvm*
> > at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
> > at
> >
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
> > at
> >
> org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
> > at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
> > at
> >
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
> > ... 24 more
> >
> > This file "_1.nvm" once existed. Was deleted during one auto commit , but
> > remains somewhere in a queue for deletion. I believe the consequence is
> > that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status
> is
> > off for all shards' replica number 3. If I understand correctly this
> means
> > that changes to the index are not becoming visible.
> >
> > Once again I tried to find possible reasons for that situation, but none
> of
> > the threads found seems to reflect my case.
> >
> > My lock type is set to: <lockType>${solr.lock.type:single}</lockType>.
> This
> > is due to lock.wait timeout error with both "native" and "simple" when
> > trying to create collection using the commands API. There is a thread
> > discussing this issue:
> >
> >
> http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html
> >
> > The only thing is that "single" should only be used if "there is no
> > possibility of another process trying to modify the index" and I
> > cannot guarantee that. Could that be the cause of the file not found
> > exception?
> >
> > Thanks once again for your help.
> >
> > Regards,
> > Bruno.
> >
> >
> >
> > 2014-11-08 18:36 GMT-02:00 Erick Erickson <er...@gmail.com>:
> >
> >> First. for tweets committing every 500 docs is much too frequent.
> >> Especially from the client and super-especially if you have multiple
> >> clients running. I'd recommend you just configure solrconfig this way
> >> as a place to start and do NOT commit from any clients.
> >> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
> >> 2> a soft commit every minute
> >>
> >> This latter governs how long it'll be between when a doc is indexed and
> >> when
> >> can be searched.
> >>
> >> Here's a long post about how all this works:
> >>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >>
> >> As far as the rest, it's a puzzle definitely. If it continues, a
> complete
> >> stack
> >> trace would be a good thing to start with.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <ba...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > I am a newbie SolrCloud enthusiast. My goal is to implement an
> >> > infrastructure to enable text analysis (clustering, classification,
> >> > information extraction, sentiment analysis, etc).
> >> >
> >> > My development environment consists of one machine, quad-core
> processor,
> >> > 16GB RAM and 1TB HD.
> >> >
> >> > Have started implementing Apache Flume, Twitter as source and
> SolrCloud
> >> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
> >> > configuration and managing cluster.
> >> >
> >> > The pseudo-distributed cluster consists of one collection, three
> shards
> >> > each with three replicas.
> >> >
> >> > Everything runs smoothly for a while. After 50.000 tweets committed
> >> > (actually CloudSolrServer commits every batch consisting of 500
> >> documents)
> >> > randomly SolrCloud starts logging exceptions: Lucene file not found,
> >> > IndexWriter cannot be opened, replication unsuccessful and the likes.
> >> > Recovery starts with no success until replica goes down.
> >> >
> >> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1)
> with
> >> > same results.
> >> >
> >> > I have looked everywhere for help before writing this email. My guess
> >> right
> >> > now is that the problem lies with SolrCloud and Zookeeper connection,
> >> > although haven't seen any such exception.
> >> >
> >> > Any reference or help will be welcomed.
> >> >
> >> > Cheers,
> >> > B.
> >>
>

Re: Help with SolrCloud exceptions while recovering

Posted by Erick Erickson <er...@gmail.com>.

OK, we're _definitely_ in the speculative realm here, so don't think
I know more than I do ;)...

The next thing I'd try is to go back to "native" as the lock type on the
theory that the lock type wasn't your problem, it was the too-frequent
commits.

bq: This file "_1.nvm" once existed. Was deleted during one auto commit , but
remains somewhere in a queue for deletion

Assuming Unix, this is entirely expected. Searchers have all the files
open. Commits
do background merges, which may delete segments. So the current searcher may
have the file open even though it's been "merged away". When the searcher
closes, the file will actually truly disappear.

It's more complicated on Windows but eventually that's what happens

Anyway, keep us posted. If this continues to occur, please open a new thread,
that might catch the eye of people who are deep into Lucene file locking...

Best,
Erick

On Sun, Nov 9, 2014 at 6:45 AM, Bruno Osiek <ba...@gmail.com> wrote:
> Hi Erick,
>
> Thank you very much for your reply.
> I disabled client commit while setting commits at solconfig.xml as follows:
>
>      <autoCommit>
>        <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
>        <openSearcher>false</openSearcher>
>      </autoCommit>
>
>      <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
>      </autoSoftCommit>
>
> The picture changed for the better. No more index corruption, endless
> replication trials and, up till now, 16 hours since start-up and more than
> 142k tweet downloaded, shards and replicas are "active".
>
> One problem remains though. While auto committing Solr logs the following
> stack-trace
>
> 00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
> (commitScheduler-25-thread-1) auto commit
> error...:org.apache.solr.common.SolrException: *Error opening new searcher*
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
> at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
> at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> *Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
> _1.nvm*
> at
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
> at
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
> at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
> at java.util.TimSort.sort(TimSort.java:203)
> at java.util.TimSort.sort(TimSort.java:173)
> at java.util.Arrays.sort(Arrays.java:659)
> at java.util.Collections.sort(Collections.java:217)
> at
> org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
> at
> org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
> at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
> at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
> at
> org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
> at
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
> at
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
> ... 10 more
> *Caused by: java.io.FileNotFoundException: _1.nvm*
> at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
> at
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
> at
> org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
> at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
> at
> org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
> ... 24 more
>
> This file "_1.nvm" once existed. Was deleted during one auto commit , but
> remains somewhere in a queue for deletion. I believe the consequence is
> that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status is
> off for all shards' replica number 3. If I understand correctly this means
> that changes to the index are not becoming visible.
>
> Once again I tried to find possible reasons for that situation, but none of
> the threads found seems to reflect my case.
>
> My lock type is set to: <lockType>${solr.lock.type:single}</lockType>. This
> is due to lock.wait timeout error with both "native" and "simple" when
> trying to create collection using the commands API. There is a thread
> discussing this issue:
>
> http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html
>
> The only thing is that "single" should only be used if "there is no
> possibility of another process trying to modify the index" and I
> cannot guarantee that. Could that be the cause of the file not found
> exception?
>
> Thanks once again for your help.
>
> Regards,
> Bruno.
>
>
>
> 2014-11-08 18:36 GMT-02:00 Erick Erickson <er...@gmail.com>:
>
>> First. for tweets committing every 500 docs is much too frequent.
>> Especially from the client and super-especially if you have multiple
>> clients running. I'd recommend you just configure solrconfig this way
>> as a place to start and do NOT commit from any clients.
>> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
>> 2> a soft commit every minute
>>
>> This latter governs how long it'll be between when a doc is indexed and
>> when
>> can be searched.
>>
>> Here's a long post about how all this works:
>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>>
>> As far as the rest, it's a puzzle definitely. If it continues, a complete
>> stack
>> trace would be a good thing to start with.
>>
>> Best,
>> Erick
>>
>> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <ba...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am a newbie SolrCloud enthusiast. My goal is to implement an
>> > infrastructure to enable text analysis (clustering, classification,
>> > information extraction, sentiment analysis, etc).
>> >
>> > My development environment consists of one machine, quad-core processor,
>> > 16GB RAM and 1TB HD.
>> >
>> > Have started implementing Apache Flume, Twitter as source and SolrCloud
>> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
>> > configuration and managing cluster.
>> >
>> > The pseudo-distributed cluster consists of one collection, three shards
>> > each with three replicas.
>> >
>> > Everything runs smoothly for a while. After 50.000 tweets committed
>> > (actually CloudSolrServer commits every batch consisting of 500
>> documents)
>> > randomly SolrCloud starts logging exceptions: Lucene file not found,
>> > IndexWriter cannot be opened, replication unsuccessful and the likes.
>> > Recovery starts with no success until replica goes down.
>> >
>> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
>> > same results.
>> >
>> > I have looked everywhere for help before writing this email. My guess
>> right
>> > now is that the problem lies with SolrCloud and Zookeeper connection,
>> > although haven't seen any such exception.
>> >
>> > Any reference or help will be welcomed.
>> >
>> > Cheers,
>> > B.
>>

Re: Help with SolrCloud exceptions while recovering

Posted by Bruno Osiek <ba...@gmail.com>.

Hi Erick,

Thank you very much for your reply.
I disabled client commit while setting commits at solconfig.xml as follows:

     <autoCommit>
       <maxTime>${solr.autoCommit.maxTime:300000}</maxTime>
       <openSearcher>false</openSearcher>
     </autoCommit>

     <autoSoftCommit>
       <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
     </autoSoftCommit>

The picture changed for the better. No more index corruption, endless
replication trials and, up till now, 16 hours since start-up and more than
142k tweet downloaded, shards and replicas are "active".

One problem remains though. While auto committing Solr logs the following
stack-trace

00:00:40,383 ERROR [org.apache.solr.update.CommitTracker]
(commitScheduler-25-thread-1) auto commit
error...:org.apache.solr.common.SolrException: *Error opening new searcher*
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1550)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1662)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
*Caused by: java.lang.RuntimeException: java.io.FileNotFoundException:
_1.nvm*
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:252)
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:238)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at
org.apache.lucene.index.TieredMergePolicy.findMerges(TieredMergePolicy.java:286)
at
org.apache.lucene.index.IndexWriter.updatePendingMerges(IndexWriter.java:2017)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1986)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:407)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:287)
at
org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:272)
at
org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:251)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1461)
... 10 more
*Caused by: java.io.FileNotFoundException: _1.nvm*
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260)
at
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177)
at
org.apache.lucene.index.SegmentCommitInfo.sizeInBytes(SegmentCommitInfo.java:141)
at org.apache.lucene.index.MergePolicy.size(MergePolicy.java:513)
at
org.apache.lucene.index.TieredMergePolicy$SegmentByteSizeDescending.compare(TieredMergePolicy.java:242)
... 24 more

This file "_1.nvm" once existed. Was deleted during one auto commit , but
remains somewhere in a queue for deletion. I believe the consequence is
that at SolrCloud Admin UI -> Core Admin -> Stats, the "Current" status is
off for all shards' replica number 3. If I understand correctly this means
that changes to the index are not becoming visible.

Once again I tried to find possible reasons for that situation, but none of
the threads found seems to reflect my case.

My lock type is set to: <lockType>${solr.lock.type:single}</lockType>. This
is due to lock.wait timeout error with both "native" and "simple" when
trying to create collection using the commands API. There is a thread
discussing this issue:

http://lucene.472066.n3.nabble.com/unable-to-load-core-after-cluster-restart-td4098731.html

The only thing is that "single" should only be used if "there is no
possibility of another process trying to modify the index" and I
cannot guarantee that. Could that be the cause of the file not found
exception?

Thanks once again for your help.

Regards,
Bruno.

2014-11-08 18:36 GMT-02:00 Erick Erickson <er...@gmail.com>:

> First. for tweets committing every 500 docs is much too frequent.
> Especially from the client and super-especially if you have multiple
> clients running. I'd recommend you just configure solrconfig this way
> as a place to start and do NOT commit from any clients.
> 1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
> 2> a soft commit every minute
>
> This latter governs how long it'll be between when a doc is indexed and
> when
> can be searched.
>
> Here's a long post about how all this works:
>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
>
> As far as the rest, it's a puzzle definitely. If it continues, a complete
> stack
> trace would be a good thing to start with.
>
> Best,
> Erick
>
> On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <ba...@gmail.com> wrote:
> > Hi,
> >
> > I am a newbie SolrCloud enthusiast. My goal is to implement an
> > infrastructure to enable text analysis (clustering, classification,
> > information extraction, sentiment analysis, etc).
> >
> > My development environment consists of one machine, quad-core processor,
> > 16GB RAM and 1TB HD.
> >
> > Have started implementing Apache Flume, Twitter as source and SolrCloud
> > (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
> > configuration and managing cluster.
> >
> > The pseudo-distributed cluster consists of one collection, three shards
> > each with three replicas.
> >
> > Everything runs smoothly for a while. After 50.000 tweets committed
> > (actually CloudSolrServer commits every batch consisting of 500
> documents)
> > randomly SolrCloud starts logging exceptions: Lucene file not found,
> > IndexWriter cannot be opened, replication unsuccessful and the likes.
> > Recovery starts with no success until replica goes down.
> >
> > Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
> > same results.
> >
> > I have looked everywhere for help before writing this email. My guess
> right
> > now is that the problem lies with SolrCloud and Zookeeper connection,
> > although haven't seen any such exception.
> >
> > Any reference or help will be welcomed.
> >
> > Cheers,
> > B.
>

Re: Help with SolrCloud exceptions while recovering

Posted by Erick Erickson <er...@gmail.com>.

First. for tweets committing every 500 docs is much too frequent.
Especially from the client and super-especially if you have multiple
clients running. I'd recommend you just configure solrconfig this way
as a place to start and do NOT commit from any clients.
1> a hard commit (openSearcher=false) every minute (or maybe 5 minutes)
2> a soft commit every minute

This latter governs how long it'll be between when a doc is indexed and when
can be searched.

Here's a long post about how all this works:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


As far as the rest, it's a puzzle definitely. If it continues, a complete stack
trace would be a good thing to start with.

Best,
Erick

On Sat, Nov 8, 2014 at 9:47 AM, Bruno Osiek <ba...@gmail.com> wrote:
> Hi,
>
> I am a newbie SolrCloud enthusiast. My goal is to implement an
> infrastructure to enable text analysis (clustering, classification,
> information extraction, sentiment analysis, etc).
>
> My development environment consists of one machine, quad-core processor,
> 16GB RAM and 1TB HD.
>
> Have started implementing Apache Flume, Twitter as source and SolrCloud
> (within JBoss AS 7) as sink. Using Zookeeper (5 servers) to upload
> configuration and managing cluster.
>
> The pseudo-distributed cluster consists of one collection, three shards
> each with three replicas.
>
> Everything runs smoothly for a while. After 50.000 tweets committed
> (actually CloudSolrServer commits every batch consisting of 500 documents)
> randomly SolrCloud starts logging exceptions: Lucene file not found,
> IndexWriter cannot be opened, replication unsuccessful and the likes.
> Recovery starts with no success until replica goes down.
>
> Have tried different Solr versions (4.10.2, 4.9.1 and lastly 4.8.1) with
> same results.
>
> I have looked everywhere for help before writing this email. My guess right
> now is that the problem lies with SolrCloud and Zookeeper connection,
> although haven't seen any such exception.
>
> Any reference or help will be welcomed.
>
> Cheers,
> B.