You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lance Norskog <go...@gmail.com> on 2013/06/06 07:38:18 UTC

Re: Taking backup of a Lucene index

The simple answer (that somehow nobody gave) is that you can make a copy 
of an index directory at any time. Indexes are changed in "generations". 
The segment* files describe the current generation of files. All active 
indexing goes on in new files. In a commit, all new files are flushed to 
disk and then the segment* files change. At any point in this sequence, 
all of the files in the directory form one consistent index.

This isn't like MySQL or other databases where you have to shut down the 
DB to get a safe copy of the files.

Lance

On 04/17/2013 03:57 AM, Ashish Sarna wrote:
> I want to take back-up of a Lucene index. I need to ensure that index files
> would not change when I take their backup.
>
>   
>
> I am concerned about the housekeeping/merge/optimization activities which
> Lucene performs internally. I am not sure when/how these activities are
> performed by Lucene and how we can prevent them.
>
>   
>
> My application (which allows indexing and searching over the created
> indexes) keeps running in the background. I can ensure that nothing is
> written to the indexes by my application when I take their backup, but I am
> not sure whether indexes would change in some manner when a search is
> performed over it.
>
>   
>
> How can I ensure that an index would not change (i.e., no
> housekeeping/merge/optimization activity is performed by Lucene) when I take
> its backup?
>
>   
>
> Any help would be much appreciated.
>
>   
>
> PS: Currently I am using Lucene 2.9.4 but wish to upgrade it to 3.6.2.
>
>   
>
> Regards
>
> Ashish
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Taking backup of a Lucene index

Posted by Shai Erera <se...@gmail.com>.
Hi

Taking a backup of the index by doing a naive file copy is not a good
approach. As you mentioned, Lucene does background merging and if your
application suddenly commits, old segment files may be deleted. Also, your
backup will most probably include files that were not committed yet.

Rather, you should use SnapshotDeletionPolicy to take a snapshot of the
index, then copy all the files referenced by the snapshot.

You can also try the new Replicator module (will be available in Lucene
4.4) to take periodic backups of the index with very few steps required on
your end.
You can read about it here:
http://shaierera.blogspot.com/2013/05/the-replicator.html

Shai


On Thu, Jun 6, 2013 at 11:14 AM, Daniel Penning <dp...@gamona.de> wrote:

> I do my backups by creating a new index at the backup target and copying
> everything over with IndexWriter#addIndexes(**IndexReader... readers). In
> the future i am also planing on using a RateLimitedDirectoryWrapper to
> reduce the influence of the running backup on the rest of the system.
>
> Am 06.06.2013 09:43, schrieb Thomas Matthijs:
>
>  On Thu, Jun 6, 2013 at 7:38 AM, Lance Norskog <go...@gmail.com> wrote:
>>
>>  The simple answer (that somehow nobody gave) is that you can make a copy
>>> of an index directory at any time. Indexes are changed in "generations".
>>> The segment* files describe the current generation of files. All active
>>> indexing goes on in new files. In a commit, all new files are flushed to
>>> disk and then the segment* files change. At any point in this sequence,
>>> all
>>> of the files in the directory form one consistent index.
>>>
>>> This isn't like MySQL or other databases where you have to shut down the
>>> DB to get a safe copy of the files.
>>>
>>
>> If you just do a naive copy, where it gets a file list first, and then
>> copies them, segments can be merged during the copy and deleted by lucene
>> resulting in an incomplete backup, that is why you need the snapshot
>> policy
>> to keep them around until the copy is completed.
>>
>> If you have very few updates and don't mind risking a broken index, or
>> just
>> loop rsync till both sides are equal you don't need anything else indeed
>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.**apache.org<ja...@lucene.apache.org>
> For additional commands, e-mail: java-user-help@lucene.apache.**org<ja...@lucene.apache.org>
>
>

Re: Taking backup of a Lucene index

Posted by Daniel Penning <dp...@gamona.de>.
I do my backups by creating a new index at the backup target and copying 
everything over with IndexWriter#addIndexes(IndexReader... readers). In 
the future i am also planing on using a RateLimitedDirectoryWrapper to 
reduce the influence of the running backup on the rest of the system.

Am 06.06.2013 09:43, schrieb Thomas Matthijs:
> On Thu, Jun 6, 2013 at 7:38 AM, Lance Norskog <go...@gmail.com> wrote:
>
>> The simple answer (that somehow nobody gave) is that you can make a copy
>> of an index directory at any time. Indexes are changed in "generations".
>> The segment* files describe the current generation of files. All active
>> indexing goes on in new files. In a commit, all new files are flushed to
>> disk and then the segment* files change. At any point in this sequence, all
>> of the files in the directory form one consistent index.
>>
>> This isn't like MySQL or other databases where you have to shut down the
>> DB to get a safe copy of the files.
>
> If you just do a naive copy, where it gets a file list first, and then
> copies them, segments can be merged during the copy and deleted by lucene
> resulting in an incomplete backup, that is why you need the snapshot policy
> to keep them around until the copy is completed.
>
> If you have very few updates and don't mind risking a broken index, or just
> loop rsync till both sides are equal you don't need anything else indeed
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Taking backup of a Lucene index

Posted by Thomas Matthijs <li...@selckin.be>.
On Thu, Jun 6, 2013 at 7:38 AM, Lance Norskog <go...@gmail.com> wrote:

> The simple answer (that somehow nobody gave) is that you can make a copy
> of an index directory at any time. Indexes are changed in "generations".
> The segment* files describe the current generation of files. All active
> indexing goes on in new files. In a commit, all new files are flushed to
> disk and then the segment* files change. At any point in this sequence, all
> of the files in the directory form one consistent index.
>
> This isn't like MySQL or other databases where you have to shut down the
> DB to get a safe copy of the files.


If you just do a naive copy, where it gets a file list first, and then
copies them, segments can be merged during the copy and deleted by lucene
resulting in an incomplete backup, that is why you need the snapshot policy
to keep them around until the copy is completed.

If you have very few updates and don't mind risking a broken index, or just
loop rsync till both sides are equal you don't need anything else indeed