You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Doug Cutting <cu...@lucene.com> on 2002/07/10 18:57:49 UTC

Re: Making Lucene Transactional

Scott Ganyo wrote:
> How about this?  I'll admit it punts a little, but I still think it could be
> a working model:
> 
> A-tomicity - A single call to the file system would commit the transaction.
> In the case of an IndexWriter, calling close() already does this with a
> simple rename operation (at least on Unix).  Adding an abort() would throw
> away the new files.  Not yet sure of how to achieve this once Document
> deletes are thrown in the mix...

The new files will be silently overwritten, so, strictly speaking, abort 
does not need to delete them.  The only thing abort would need to do is 
remove the lock file: removing the new files would be a courtesy.

> C-onsistency - Document deletes must somehow be tracked and applied in a
> single operation along with any Document adds.  Again, though, I'm not sure
> of how the deletes could be accomplished with the current file format.

Currently deletions are represented as an (optional) bit-vector file in 
each segment index indicating which files are deleted.  When segments 
are merged, data for deleted documents is dropped, and the new index 
created has no deletions file.

To implement your proposal I think I would move deletions to a global 
bit vector file that is named in the "segments" file.  That way the 
atomic action of installing a new "segments" file would also update the 
deletions.  This would be a little tricky, since this vector must be 
updated whenever segments are merged.  In particular, when a segment 
with deletions is merged, the deletions vector is shortened, and bits in 
the vector must be shifted down.  This adds a factor proportional to the 
size of the index to every merge, which is bad, but the bit shifting is 
probably fast enough that this would not be an issue.

Alternately, one could construct a file of "links" to the current 
deletions file for each segment, and point to this "links" file from the 
  "segments" file.  That would enable atomic updates of deletions along 
with everything else, but also keep deletions files per segment.

> I-solation - Just force write transactions to be serialize.  Lucene does
> this with IndexWriters anyway.  We could enforce a one-to-one relationship
> between transactions and IndexWriters...

Doesn't the lock file do this already?

> D-urability - Lucene would attempt to do its best.  Once it is written to
> the disk, however, it is outside of Lucene's domain.  Wouldn't a journaled
> filesystem take care of this?

I don't think this is Lucene's responsibility.  Lucene should be able to 
assume a non-corrupt filesystem.

Doug



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>