You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Arvind Srinivasan <lu...@ziplip.com> on 2005/05/26 18:13:31 UTC

Potential Segment corruption

Hi,
We have seen Lucene segments corrupt, under the following situation:
During merging of segments, the following sequence of operations takes place
  (1) Locks index
  (2) get new segment name by calling newSegmentName() which basically will
      call segmentInfos.counter++
  (3) Data is written to the new Segments
  (4) Segment File is rewritten.
  (5) Old segments are deleted/marked for deletion.
The corruption is a possiblity when an exception ocurrs on step (3)
preventing the Commit to the segments file. Eg: No disk space, 
loose network share etc, Bad Merging segments etc. 
Because the segment files are not replaced there is no corruption immediately,
however. on the next merge operation, the index will corrupt.  [There is an 
scenario where the corruption may not occur, if the new segment 
is bigger than the failed one.].  I am not sure the effect of this on Compound File Store.
The cause of this issue can be traced to segmentInfos.counter. Because the counter 
is not changed in the segments file, the next merge operation will use the same
failed segment name, and if you are using any standard Directory implementation, 
it will probably write the segment to the same file location. Note the merge operations
opens the segments in read-write mode and therefore we start with a non-empty file.
Some options are: 
(1)Commit the counter after the newSegmentName call. This way we never reuse the
the segmentName.
(2)  Add a callback API to directory interface for a new Segment Creation allowing
the directory interface to clean up, on a new segment write.
(3)  Provide a Rollback mechanism in the event of merge failure. (Using the deleteable
     functionality).
(4) For Compound File Store (The file must be empty). (Possibly, it can use the callback 
     in  (2) to cleanup.
We should apply as many of the them to make the merge code robust to potential failures:
I think with the increase adoption of Lucene, we need to think about data corruption
and recovery issues. More later,

Arvind.
     

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Potential Segment corruption

Posted by Doug Cutting <cu...@apache.org>.
Doug Cutting wrote:
> I think the fix is much simpler.  This is a bug in FSDirectory. 
> Directory.createOutput() should always create a new empty file, and 
> FSDirectory's implementation does not ensure this.  It should try to 
> delete the file before opening it and/or call 
> RandomAccessFile.setLength(0).
> 
> I've attached a patch.

I have now committed this patch.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Potential Segment corruption

Posted by Doug Cutting <cu...@apache.org>.
Doug Cutting wrote:
> I've attached a patch.  Does this fix things for you?

Oops.  That had a bug.

Here's a revised patch.  It now passes all unit tests.

Doug

Re: Potential Segment corruption

Posted by Doug Cutting <cu...@apache.org>.
Arvind Srinivasan wrote:
> Some options are: 
> (1)Commit the counter after the newSegmentName call. This way we never reuse the
> the segmentName.
> (2)  Add a callback API to directory interface for a new Segment Creation allowing
> the directory interface to clean up, on a new segment write.
> (3)  Provide a Rollback mechanism in the event of merge failure. (Using the deleteable
>      functionality).
> (4) For Compound File Store (The file must be empty). (Possibly, it can use the callback 
>      in  (2) to cleanup.

I think the fix is much simpler.  This is a bug in FSDirectory. 
Directory.createOutput() should always create a new empty file, and 
FSDirectory's implementation does not ensure this.  It should try to 
delete the file before opening it and/or call RandomAccessFile.setLength(0).

I've attached a patch.  Does this fix things for you?

Doug