You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christoph Kiehl <ki...@subshell.com> on 2004/10/27 14:17:09 UTC

Backup strategies

Hi,

I'm curious about your strategy to backup indexes based on FSDirectory. 
If I do a file based copy I suspect I will get corrupted data because of 
concurrent write access.
My current favorite is to create an empty index and use 
IndexWriter.addIndexes() to copy the current index state. But I'm not 
sure about the performance of this solution.

How do you make your backups?

Regards,
Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Backup strategies

Posted by Justin Swanhart <gr...@gmail.com>.
I would suggest that you create a lock file for your index writing
process, if the lock file is encountered close the IndexWriter until
the lock file is removed.  After you create the lockfile, wait a few
seconds to make sure the writer process has quiesced, then create a
snapshot of the filesystem.  Remove the lockfile and backup the
snapshot with your favorite backup tool (exclude the lock file), then
drop the snapshot.

Swany

On Wed, 27 Oct 2004 14:40:20 +0200, Christoph Kiehl <ki...@subshell.com> wrote:
> Christiaan Fluit wrote:
> 
> > I have no practical experience with backing up an online index, but I
> > would try to find out the details of the write lock mechanism used by
> > Lucene at the file level. You can then create a backup component that
> > write-locks the index and does a regular file copy of the index dir.
> > During backup time searches can continue while updates will be
> > temporarily blocked.
> 
> The problem with this approach is that this will not only block write
> operations but you will get timeouts for these operations which will
> lead to exceptions. To prevent this you must implement some queuing,
> which is what I would like avoid.
> 
> Regards,
> Christoph
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Backup strategies

Posted by Christoph Kiehl <ki...@subshell.com>.
Christiaan Fluit wrote:

> I have no practical experience with backing up an online index, but I 
> would try to find out the details of the write lock mechanism used by 
> Lucene at the file level. You can then create a backup component that 
> write-locks the index and does a regular file copy of the index dir. 
> During backup time searches can continue while updates will be 
> temporarily blocked.

The problem with this approach is that this will not only block write 
operations but you will get timeouts for these operations which will 
lead to exceptions. To prevent this you must implement some queuing, 
which is what I would like avoid.

Regards,
Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Backup strategies

Posted by Christiaan Fluit <ch...@aduna.biz>.
Christoph Kiehl wrote:
> I'm curious about your strategy to backup indexes based on FSDirectory. 
> If I do a file based copy I suspect I will get corrupted data because of 
> concurrent write access.
> My current favorite is to create an empty index and use 
> IndexWriter.addIndexes() to copy the current index state. But I'm not 
> sure about the performance of this solution.

I have no practical experience with backing up an online index, but I 
would try to find out the details of the write lock mechanism used by 
Lucene at the file level. You can then create a backup component that 
write-locks the index and does a regular file copy of the index dir. 
During backup time searches can continue while updates will be 
temporarily blocked.

But as I said, I'm only speculating...


Chris
--


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Backup strategies

Posted by Nader Henein <ns...@bayt.net>.
We've recently implemented something similar with the backup process 
creating a file (much like the lock files during indexing) that the 
IndexWriter recognizes (tweak) and doesn't attempt to start and indexing 
or a delete while it's there, wasn't that much work actually.

Nader

Doug Cutting wrote:

> Christoph Kiehl wrote:
>
>> I'm curious about your strategy to backup indexes based on 
>> FSDirectory. If I do a file based copy I suspect I will get corrupted 
>> data because of concurrent write access.
>> My current favorite is to create an empty index and use 
>> IndexWriter.addIndexes() to copy the current index state. But I'm not 
>> sure about the performance of this solution.
>>
>> How do you make your backups?
>
>
> A safe way to backup is to have your indexing process, when it knows 
> the index is stable (e.g., just after calling IndexWriter.close()), 
> make a checkpoint copy of the index by running a shell command like 
> "cp -lpr index index.YYYMMDDHHmmSS".  This is very fast and requires 
> little disk space, since it creates only a new directory of hard 
> links.  Then you can separately back this up and subsequently remove it.
>
> This is also a useful way to replicate indexes.  On the master 
> indexing server periodically perform "cp -lpr" as above.  Then search 
> slaves can use rsync to pull down the latest version of the index.  If 
> a very small mergefactor is used (e.g., 2) then the index will have 
> only a few segments, so that searches are fast.  On the slave, 
> periodically find the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ 
> index.YYYMMDDHHmmSS" and 'rsync --delete master:index.YYYMMDDHHmmSS 
> index.YYYMMDDHHmmSS' to efficiently get a local copy, and finally "ln 
> -fsn index.YYYMMDDHHmmSS index" to publish the new version of the index.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Backup strategies

Posted by Doug Cutting <cu...@apache.org>.
Christoph Kiehl wrote:
> I'm curious about your strategy to backup indexes based on FSDirectory. 
> If I do a file based copy I suspect I will get corrupted data because of 
> concurrent write access.
> My current favorite is to create an empty index and use 
> IndexWriter.addIndexes() to copy the current index state. But I'm not 
> sure about the performance of this solution.
> 
> How do you make your backups?

A safe way to backup is to have your indexing process, when it knows the 
index is stable (e.g., just after calling IndexWriter.close()), make a 
checkpoint copy of the index by running a shell command like "cp -lpr 
index index.YYYMMDDHHmmSS".  This is very fast and requires little disk 
space, since it creates only a new directory of hard links.  Then you 
can separately back this up and subsequently remove it.

This is also a useful way to replicate indexes.  On the master indexing 
server periodically perform "cp -lpr" as above.  Then search slaves can 
use rsync to pull down the latest version of the index.  If a very small 
mergefactor is used (e.g., 2) then the index will have only a few 
segments, so that searches are fast.  On the slave, periodically find 
the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ index.YYYMMDDHHmmSS" 
and 'rsync --delete master:index.YYYMMDDHHmmSS index.YYYMMDDHHmmSS' to 
efficiently get a local copy, and finally "ln -fsn index.YYYMMDDHHmmSS 
index" to publish the new version of the index.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org