You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by vivek sar <vi...@gmail.com> on 2007/10/05 03:30:40 UTC

Help with Lucene Indexer crash recovery

Hi,

 We are using Lucene 2.3. The problem we are facing is quite a few
times if our application is stopped (killed or crash) while Indexer is
doing its job, the next time when we bring up the application the
Indexer fails to run with the following exception,

2007-10-04 12:29:53,089 ERROR [PS thread 10] IndexerJob - Full-text
indexer failed to index
java.io.FileNotFoundException:
/opt/manager/apps/conf/index/MasterIndex/_llb.cfs (No such file or
directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(Unknown Source)
        at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
        at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
        at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
        at org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:70)
        at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:131)
        at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:206)
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610)

The search also doesn't work after this.

Looks like the index were left in some weird state (might be
corrupted). I was wondering if there is a tool or a way to repair the
indexes if we are not able to open them at run-time?

Thanks,
-vivek

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Help with Lucene Indexer crash recovery

Posted by Karl Wettin <ka...@gmail.com>.
5 okt 2007 kl. 21.50 skrev vivek sar:

> Once the writer.addIndexes is done I call writer.optimize()

No biggie, but IndexWriter.addIndexes() will automatically optimize,  
so that is one line of code you can get rid of.

> it may take hours to re-index

/Perhaps/ using IndexWriter.addIndexesNoOptimize(), closing the  
index, making it accessable and then optimizing it in a new thread  
could bring the "master index" up for use noticeable sooner.


-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Help with Lucene Indexer crash recovery

Posted by Chris Hostetter <ho...@fucit.org>.
: Once in a while we kill the running application using "kill -9". I

To quote a great man, who frequently quotes another great man: "Well 
there's your problem!"

stop using "kill -9" ... i'll say it again because it's important, and 
i'm even going to violate etiquite and use all caps because it's *that* 
important...

	STOP USING KILL -9

...it's an abhorent practice that too many people make a habit of. SIGKILL 
(the signal sent when you run "kill -9") is ment to be a last resort only 
if you can't get a rogue process to stop by any other means.  Instead of 
using kill -9, add some sort of notification mechanism to your application 
so you can trigger graceful shutdowns, or at the very least just use 
"kill" (no -9) so that the process (the JVM) can at least exit on it's own 
and do basic buffer flushing and file handle closing.

: I don't have any shutdown hook right now, but I'm thinking of adding
: one for graceful index closing.  We use following merge parameters,

When you use SIGKILL the process has no idea it's about to die ... it is 
given no notice, it is wiped off the face of the earth in one blinding 
atomic action -- so a shutdown hook isn't going to do you any good if you 
keep using kill -9.

http://en.wikipedia.org/wiki/SIGKILL
http://speculation.org/garrick/kill-9.html

-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Help with Lucene Indexer crash recovery

Posted by Chris Hostetter <ho...@fucit.org>.
: That said, it should never in fact cause index corruption, as far as I
: know.  Lucene is "semi-transactional": at any & all moments you should
: be able to destroy the JVM and the index will be unharmed. I would
: really like to get to the bottom of why this is not the case here.

At any point you can shutdown the JVM and the index will be unharmed, but 
"destroying" it with "kill -9" goes a little farther then that.  

Lucene can't make that claim because the JVM can't even garuntee that 
bytes are written to physical disk when we close() an OutputStream -- all 
it garuntees is that the bytes have been handed to the OS.  When you "kill 
-9" a process the OS is free to make *EVERYTHING* about that process 
vanish without cleaning up after it ... i'm pretty sure even pending IO 
operations are fair game for disappearing.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Help with Lucene Indexer crash recovery

Posted by Michael McCandless <lu...@mikemccandless.com>.
"vivek sar" <vi...@gmail.com> wrote:
> Sorry, I'm using Lucene 2.2. We are using Lucene to index our database
> (Oracle) into documents for full-text search feature. Here is the
> process of indexing,
> 
> 1) Have two IndexWriters which run in two different threads and write
> to two different directories (temporary indexes). They both read from
> the same queue (db resultset queue) and then right to the index. Close
> the indexwriters once done.
> 2) Once the IndexWriters are done we start the MasterIndex, which is
> another IndexWriter. This merges the indexes in those two temporary
> indexes.
> 3) Once the writer.addIndexes is done I call writer.optimize() and
> then writer.close().
> 4) Our IndexSearcher reads only from the MasterIndex

This process sounds fine, though as Karl pointed out you could
let the reader before you start the optimize.  You could also consider
skipping the optimize entirely, unless the search latency is in fact
too high (or throughput too low) without it.
 
> Once in a while we kill the running application using "kill -9". I
> think if the IndexWriter is in process of merging and we kill it we
> run into this problem. It has already happened few times in last one
> week. I do clean up the lock if there is a write.lock at the startup
> of the system. I can not recreate the index as it may take hours to
> re-index.

As Hoss pointed out, "kill -9" really should be a means of last
resort.

That said, it should never in fact cause index corruption, as far as I
know.  Lucene is "semi-transactional": at any & all moments you should
be able to destroy the JVM and the index will be unharmed. I would
really like to get to the bottom of why this is not the case here.

So you've noticed that if kill -9 is sent while the addIndexes is
happening then that can lead to this corruption?  If possible, could
you use IndexWriter.setInfoStream(...) during at least that step to
get verbose details about what the writer is doing, and then capture
that output & post it the next time you get this error to happen?
That would go a long ways to getting to the root cause here.

Which OS and file system are you using?  Are all these steps happening
on a single machine & JVM?

> I don't have any shutdown hook right now, but I'm thinking of adding
> one for graceful index closing.  We use following merge parameters,
> 
> mergeFactor=100
> maxMergeDocs=99999
> maxBufferedDocs=1000

Seems OK.

> I can try out your tool, is it something that can be integrated into
> the application itself? So, basically I'm looking to catch the
> "FileNotFoundException" and take some action to recover from it.

Well, once the tool has been tested and shown to be bug-free then you
could in theory use this as a live recovery inside the application.
But for starters I would run it from the command line without the
-check.  Be very careful: this is totally new code and it could make
your situation even worse, if it has any bugs.  And remember when the
tool works, it will have removed a whole segment from your index which
means possibly a great many documents are now gone.

Also, it would be far better to get to the root cause & fix it,
instead of having to use this tool perpetually.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Help with Lucene Indexer crash recovery

Posted by vivek sar <vi...@gmail.com>.
Thanks for the response Michael.

Sorry, I'm using Lucene 2.2. We are using Lucene to index our database
(Oracle) into documents for full-text search feature. Here is the
process of indexing,

1) Have two IndexWriters which run in two different threads and write
to two different directories (temporary indexes). They both read from
the same queue (db resultset queue) and then right to the index. Close
the indexwriters once done.
2) Once the IndexWriters are done we start the MasterIndex, which is
another IndexWriter. This merges the indexes in those two temporary
indexes.
3) Once the writer.addIndexes is done I call writer.optimize() and
then writer.close().
4) Our IndexSearcher reads only from the MasterIndex

Once in a while we kill the running application using "kill -9". I
think if the IndexWriter is in process of merging and we kill it we
run into this problem. It has already happened few times in last one
week. I do clean up the lock if there is a write.lock at the startup
of the system. I can not recreate the index as it may take hours to
re-index.

I don't have any shutdown hook right now, but I'm thinking of adding
one for graceful index closing.  We use following merge parameters,

mergeFactor=100
maxMergeDocs=99999
maxBufferedDocs=1000

I can try out your tool, is it something that can be integrated into
the application itself? So, basically I'm looking to catch the
"FileNotFoundException" and take some action to recover from it.

Thanks,
-vivek



On 10/5/07, Michael McCandless <lu...@mikemccandless.com> wrote:
> "vivek sar" <vi...@gmail.com> wrote:
>
> > We are using Lucene 2.3.
>
> Do you mean Lucene 2.2?  Your stack trace seems to line up with 2.2,
> and 2.3 isn't quite released yet.
>
> > The problem we are facing is quite a few times if our application is
> > stopped (killed or crash) while Indexer is doing its job, the next
> > time when we bring up the application the Indexer fails to run with
> > the following exception,
>
> > 2007-10-04 12:29:53,089 ERROR [PS thread 10] IndexerJob - Full-text
> > indexer failed to index
> > java.io.FileNotFoundException:
> > /opt/manager/apps/conf/index/MasterIndex/_llb.cfs (No such file or
> > directory)
> >         at java.io.RandomAccessFile.open(Native Method)
> >         at java.io.RandomAccessFile.<init>(Unknown Source)
> >         at
> >         org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
> >         at
> >         org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
> >         at
> >         org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
> >         at
> >         org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:70)
> >         at
> >         org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
> >         at
> >         org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
> >         at
> >         org.apache.lucene.index.SegmentReader.get(SegmentReader.java:131)
> >         at
> >         org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:206)
> >         at
> >         org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610)
> >
> > The search also doesn't work after this.
>
> Can you share some details of how you are using Lucene, and, how/why
> it's killed or crashed so often?  When it crashes, do you get an
> exception from Lucene (which could be the root cause here)?
>
> What OS and filesystem is the index on?  Are you changing any default
> settings like autoCommit, lock factory & lock file location, etc?
>
> Even if Lucene (JVM) is killed, the index should not become corrupt in
> this particular way, unless the IO system fails to complete its
> "write" operations.  Lucene always writes & closes new segments files
> (_llb.cfs) before writing the segments_N file that refers to them.
>
> > Looks like the index were left in some weird state (might be
> > corrupted). I was wondering if there is a tool or a way to repair the
> > indexes if we are not able to open them at run-time?
>
> I just took a first stab at just such a tool, here:
>
>   https://issues.apache.org/jira/browse/LUCENE-1020
>
> Please be very very careful!: I just wrote this code and it could have
> some horrible bug that destroys your index.  So make a backup of your
> index first.
>
> Could you first run that tool without the "-fix" option and post back
> the resulting output?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Help with Lucene Indexer crash recovery

Posted by Michael McCandless <lu...@mikemccandless.com>.
"vivek sar" <vi...@gmail.com> wrote:

> We are using Lucene 2.3.

Do you mean Lucene 2.2?  Your stack trace seems to line up with 2.2,
and 2.3 isn't quite released yet.

> The problem we are facing is quite a few times if our application is
> stopped (killed or crash) while Indexer is doing its job, the next
> time when we bring up the application the Indexer fails to run with
> the following exception,

> 2007-10-04 12:29:53,089 ERROR [PS thread 10] IndexerJob - Full-text
> indexer failed to index
> java.io.FileNotFoundException:
> /opt/manager/apps/conf/index/MasterIndex/_llb.cfs (No such file or
> directory)
>         at java.io.RandomAccessFile.open(Native Method)
>         at java.io.RandomAccessFile.<init>(Unknown Source)
>         at
>         org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.<init>(FSDirectory.java:506)
>         at
>         org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:536)
>         at
>         org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
>         at
>         org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.java:70)
>         at
>         org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
>         at
>         org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
>         at
>         org.apache.lucene.index.SegmentReader.get(SegmentReader.java:131)
>         at
>         org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:206)
>         at
>         org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:610)
> 
> The search also doesn't work after this.

Can you share some details of how you are using Lucene, and, how/why
it's killed or crashed so often?  When it crashes, do you get an
exception from Lucene (which could be the root cause here)?

What OS and filesystem is the index on?  Are you changing any default
settings like autoCommit, lock factory & lock file location, etc?

Even if Lucene (JVM) is killed, the index should not become corrupt in
this particular way, unless the IO system fails to complete its
"write" operations.  Lucene always writes & closes new segments files
(_llb.cfs) before writing the segments_N file that refers to them.

> Looks like the index were left in some weird state (might be
> corrupted). I was wondering if there is a tool or a way to repair the
> indexes if we are not able to open them at run-time?

I just took a first stab at just such a tool, here:

  https://issues.apache.org/jira/browse/LUCENE-1020

Please be very very careful!: I just wrote this code and it could have
some horrible bug that destroys your index.  So make a backup of your
index first.

Could you first run that tool without the "-fix" option and post back
the resulting output?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org