You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Scott Ganyo <sc...@eTapestry.com> on 2002/06/27 18:00:58 UTC

Making Lucene Transactional

That's interesting.  So it would be a very small change to add transactional
(and even 2-phase commit) capabilities to the writer?  What about deletes?
Since they use the reader, would it still be possible to allow a 2-phase
commit/abort on that?

I would very much like to have a 2-phase commit in Lucene in order to ensure
that it is always in sync with my database.  I always thought that I'd end
up having to write custom code to store the Lucene index in the database,
but maybe that wouldn't be necessary...?

Scott

> -----Original Message-----
> From: Doug Cutting [mailto:cutting@lucene.com]
> Sent: Thursday, June 27, 2002 10:36 AM
> To: Lucene Users List
> Subject: Re: Stress Testing Lucene
> 
> 
> It's very hard to leave an index in a bad state.  Updating the 
> "segments" file atomically updates the index.  So the only way to 
> corrupt things is to only partly update the segments file.  
> But that too 
> is hard, since it's first written to a temporary file, which is then 
> renamed "segments".  The only vulnerability I know if is that 
> in Java on 
> Win32 you can't atomically rename a file to something that already 
> exists, so Lucene has to first remove the old version.  So if 
> you were 
> to crash between the time that the old version of "segments" 
> is removed 
> and the new version is moved into place, then the index would be 
> corrupt, because it would have no "segments" file.
> 
> Doug

Re: Making Lucene Transactional

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
>
>
>  
>
>>>-----Original Message-----
>>>From: Doug Cutting [mailto:cutting@lucene.com]
>>>Sent: Thursday, June 27, 2002 10:36 AM
>>>To: Lucene Users List
>>>Subject: Re: Stress Testing Lucene
>>>
>>>      
>>>
>>>It's very hard to leave an index in a bad state.  Updating the 
>>>"segments" file atomically updates the index.  So the only way to 
>>>corrupt things is to only partly update the segments file.  
>>>But that too 
>>>is hard, since it's first written to a temporary file, which is then 
>>>renamed "segments".  
>>>
We could further protect against this one by writing a checksum of some 
sort at the end of the segments file and then re-reading it and 
verifying the checksum before renaming the temporary segments file to 
"segments". This way we'll know that only fully written segments files 
are made active.
The checksum can also be used to verify integrity of the other index 
segment components. I guess there is always a chance that the disk 
driver is caching the writes.

>>>The only vulnerability I know if is that 
>>>in Java on 
>>>Win32 you can't atomically rename a file to something that already 
>>>exists, so Lucene has to first remove the old version.  So if 
>>>you were 
>>>to crash between the time that the old version of "segments" 
>>>is removed 
>>>and the new version is moved into place, then the index would be 
>>>corrupt, because it would have no "segments" file.
>>>
Perhaps we could also protect against this one by simply removing the 
old segments file (is that atomic by itself?) and then letting the next 
IndexReader look for the temporary file when it sees that there is no 
"segments" file and rename it. There might be a case where two competing 
IndexReaders do the "segments" file check at the same time, find that it 
is not there, go after the "segments.tmp" and try to rename it. But in 
this case only the first one will succeed and the following one will 
find that the "segments.tmp" is no longer there (or that another 
"segments" file already exists), in which case it should look for the 
"segments" file again and proceed.

Would these two changes make the index at least as reliable as the disk 
driver?
Dmitry.

>>>
>>>Doug
>>>      
>>>
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>
>  
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Making Lucene Transactional

Posted by Brian Goetz <br...@quiotix.com>.
> That's interesting.  So it would be a very small change to add transactional
> (and even 2-phase commit) capabilities to the writer?  What about deletes?
> Since they use the reader, would it still be possible to allow a 2-phase
> commit/abort on that?

I think you're not using "transactional" in the same sense as Doug is.

Very few file systems are transactional, although some offer a small
number of atomic operations, such as rename.  This doesn't make them
transactional, but it allows application writers (that's us) to write
apps that are _less likely_ to be victimized by system failure.  But
Lucene still writes blocks to disk via the file system, without a
transaction log, and since disk drivers do things like defer or
reorder disk writes, we could still lose if the system crashed at the
wrong time.  Still, we do a lot to reduce this risk beyond that of
most file-based applications.

> I would very much like to have a 2-phase commit in Lucene in order to ensure
> that it is always in sync with my database.  I always thought that I'd end
> up having to write custom code to store the Lucene index in the database,
> but maybe that wouldn't be necessary...?

Two phase commit is a whole different beast; this involves
coordinating multiple transactional resource managers (which Lucene
isn't) with a separate transaction monitor, using a protocol such as
XA or OTS.  We're nowhere near that.  

Storing the index in a database would be a good start, although the
Directory interface is really derived with the assumptions of a file
system.  Still, that would not get us all the way there -- you'd need
to introduce transaction demarcation methods into the Lucene API, so
that these could be passed to the DBDirectory, so we would know what
groups of updates should be considered atomic.  

And that still doesn't get us close to 2PC; we'd still have to support
XA for that, and I don't see any good reason to undertake that level
of effort at this point.  

However, I think revisiting Directory with an eye towards making it
something that can be efficiently implemented on either a DB or a file
system would be worthwhile.  

> > -----Original Message-----
> > From: Doug Cutting [mailto:cutting@lucene.com]
> > Sent: Thursday, June 27, 2002 10:36 AM
> > To: Lucene Users List
> > Subject: Re: Stress Testing Lucene
> > 
> > 
> > It's very hard to leave an index in a bad state.  Updating the 
> > "segments" file atomically updates the index.  So the only way to 
> > corrupt things is to only partly update the segments file.  
> > But that too 
> > is hard, since it's first written to a temporary file, which is then 
> > renamed "segments".  The only vulnerability I know if is that 
> > in Java on 
> > Win32 you can't atomically rename a file to something that already 
> > exists, so Lucene has to first remove the old version.  So if 
> > you were 
> > to crash between the time that the old version of "segments" 
> > is removed 
> > and the new version is moved into place, then the index would be 
> > corrupt, because it would have no "segments" file.
> > 
> > Doug

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>