You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christoph Kiehl <ki...@subshell.com> on 2004/10/27 14:17:09 UTC
Backup strategies
Hi,
I'm curious about your strategy to backup indexes based on FSDirectory.
If I do a file based copy I suspect I will get corrupted data because of
concurrent write access.
My current favorite is to create an empty index and use
IndexWriter.addIndexes() to copy the current index state. But I'm not
sure about the performance of this solution.
How do you make your backups?
Regards,
Christoph
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Backup strategies
Posted by Justin Swanhart <gr...@gmail.com>.
I would suggest that you create a lock file for your index writing
process, if the lock file is encountered close the IndexWriter until
the lock file is removed. After you create the lockfile, wait a few
seconds to make sure the writer process has quiesced, then create a
snapshot of the filesystem. Remove the lockfile and backup the
snapshot with your favorite backup tool (exclude the lock file), then
drop the snapshot.
Swany
On Wed, 27 Oct 2004 14:40:20 +0200, Christoph Kiehl <ki...@subshell.com> wrote:
> Christiaan Fluit wrote:
>
> > I have no practical experience with backing up an online index, but I
> > would try to find out the details of the write lock mechanism used by
> > Lucene at the file level. You can then create a backup component that
> > write-locks the index and does a regular file copy of the index dir.
> > During backup time searches can continue while updates will be
> > temporarily blocked.
>
> The problem with this approach is that this will not only block write
> operations but you will get timeouts for these operations which will
> lead to exceptions. To prevent this you must implement some queuing,
> which is what I would like avoid.
>
> Regards,
> Christoph
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Backup strategies
Posted by Christoph Kiehl <ki...@subshell.com>.
Christiaan Fluit wrote:
> I have no practical experience with backing up an online index, but I
> would try to find out the details of the write lock mechanism used by
> Lucene at the file level. You can then create a backup component that
> write-locks the index and does a regular file copy of the index dir.
> During backup time searches can continue while updates will be
> temporarily blocked.
The problem with this approach is that this will not only block write
operations but you will get timeouts for these operations which will
lead to exceptions. To prevent this you must implement some queuing,
which is what I would like avoid.
Regards,
Christoph
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Backup strategies
Posted by Christiaan Fluit <ch...@aduna.biz>.
Christoph Kiehl wrote:
> I'm curious about your strategy to backup indexes based on FSDirectory.
> If I do a file based copy I suspect I will get corrupted data because of
> concurrent write access.
> My current favorite is to create an empty index and use
> IndexWriter.addIndexes() to copy the current index state. But I'm not
> sure about the performance of this solution.
I have no practical experience with backing up an online index, but I
would try to find out the details of the write lock mechanism used by
Lucene at the file level. You can then create a backup component that
write-locks the index and does a regular file copy of the index dir.
During backup time searches can continue while updates will be
temporarily blocked.
But as I said, I'm only speculating...
Chris
--
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Backup strategies
Posted by Nader Henein <ns...@bayt.net>.
We've recently implemented something similar with the backup process
creating a file (much like the lock files during indexing) that the
IndexWriter recognizes (tweak) and doesn't attempt to start and indexing
or a delete while it's there, wasn't that much work actually.
Nader
Doug Cutting wrote:
> Christoph Kiehl wrote:
>
>> I'm curious about your strategy to backup indexes based on
>> FSDirectory. If I do a file based copy I suspect I will get corrupted
>> data because of concurrent write access.
>> My current favorite is to create an empty index and use
>> IndexWriter.addIndexes() to copy the current index state. But I'm not
>> sure about the performance of this solution.
>>
>> How do you make your backups?
>
>
> A safe way to backup is to have your indexing process, when it knows
> the index is stable (e.g., just after calling IndexWriter.close()),
> make a checkpoint copy of the index by running a shell command like
> "cp -lpr index index.YYYMMDDHHmmSS". This is very fast and requires
> little disk space, since it creates only a new directory of hard
> links. Then you can separately back this up and subsequently remove it.
>
> This is also a useful way to replicate indexes. On the master
> indexing server periodically perform "cp -lpr" as above. Then search
> slaves can use rsync to pull down the latest version of the index. If
> a very small mergefactor is used (e.g., 2) then the index will have
> only a few segments, so that searches are fast. On the slave,
> periodically find the latest index.YYYMMDDHHmmSS, use "cp -lpr index/
> index.YYYMMDDHHmmSS" and 'rsync --delete master:index.YYYMMDDHHmmSS
> index.YYYMMDDHHmmSS' to efficiently get a local copy, and finally "ln
> -fsn index.YYYMMDDHHmmSS index" to publish the new version of the index.
>
> Doug
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Backup strategies
Posted by Doug Cutting <cu...@apache.org>.
Christoph Kiehl wrote:
> I'm curious about your strategy to backup indexes based on FSDirectory.
> If I do a file based copy I suspect I will get corrupted data because of
> concurrent write access.
> My current favorite is to create an empty index and use
> IndexWriter.addIndexes() to copy the current index state. But I'm not
> sure about the performance of this solution.
>
> How do you make your backups?
A safe way to backup is to have your indexing process, when it knows the
index is stable (e.g., just after calling IndexWriter.close()), make a
checkpoint copy of the index by running a shell command like "cp -lpr
index index.YYYMMDDHHmmSS". This is very fast and requires little disk
space, since it creates only a new directory of hard links. Then you
can separately back this up and subsequently remove it.
This is also a useful way to replicate indexes. On the master indexing
server periodically perform "cp -lpr" as above. Then search slaves can
use rsync to pull down the latest version of the index. If a very small
mergefactor is used (e.g., 2) then the index will have only a few
segments, so that searches are fast. On the slave, periodically find
the latest index.YYYMMDDHHmmSS, use "cp -lpr index/ index.YYYMMDDHHmmSS"
and 'rsync --delete master:index.YYYMMDDHHmmSS index.YYYMMDDHHmmSS' to
efficiently get a local copy, and finally "ln -fsn index.YYYMMDDHHmmSS
index" to publish the new version of the index.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org