You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Harris <ry...@gmail.com> on 2009/09/25 01:27:00 UTC

Use cases for ReplicationHandler's backup facility?

The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication)
has support for "backups", which can be triggered in one of two ways:

1. in response to startup/commit/optimize events (specified through
the backupAfter tag specified in the handler's requestHandler tag in
solrconfig.xml)
2. by manually hitting http://master_host:port/solr/replication?command=backup

These backups get placed in directories named, e.g.
"snapshot.20090924033521", inside the solr data directory.

According to the docs, these backups are not necessary for replication
to work. My question is: What use case *are* they meant to address?

The first potential use case that came to mind was that maybe I would
be able to restore my index from these snapshot directories should it
ever become corrupted. (I could just do something like "rm -r data; mv
snapshot.20090924033521 data".) That appears not to be one of the
intended use cases, though; if it were, then I imagine the snapshot
directories would contain the entire index, whereas they seem to
contain only deltas of one form or another.

Thanks,
Chris

Re: Use cases for ReplicationHandler's backup facility?

Posted by Chris Harris <ry...@gmail.com>.
2009/9/24 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
> On Fri, Sep 25, 2009 at 4:57 AM, Chris Harris <ry...@gmail.com> wrote:
>> The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication)
>> has support for "backups", which can be triggered in one of two ways:
>>
>> 1. in response to startup/commit/optimize events (specified through
>> the backupAfter tag specified in the handler's requestHandler tag in
>> solrconfig.xml)
>> 2. by manually hitting http://master_host:port/solr/replication?command=backup
>>
>> These backups get placed in directories named, e.g.
>> "snapshot.20090924033521", inside the solr data directory.
>>
>> According to the docs, these backups are not necessary for replication
>> to work. My question is: What use case *are* they meant to address?
>>
>> The first potential use case that came to mind was that maybe I would
>> be able to restore my index from these snapshot directories should it
>> ever become corrupted. (I could just do something like "rm -r data; mv
>> snapshot.20090924033521 data".) That appears not to be one of the
>> intended use cases, though; if it were, then I imagine the snapshot
>> directories would contain the entire index, whereas they seem to
>> contain only deltas of one form or another.
> Yes, the only reason to take a backup should be for restoration/archival
> They should contain all the files required for the latest commit point.

To be clear, you'd have to write your own code to make any kind of
restore from these snapshot back directories possible, right? (That
is, the handler itself doesn't implement any kind of "restore", nor
can you restore by using simple filesystem commands like cp -r or mv.)

For example, the most straightforward case would be if you limited
yourself to only doing backups after each optimize; that's
straightforward in that each snapshot directory should contain all the
segment files required for a particular point-in-time view of the
index. However, it still wouldn't contain the Lucene segments_N file,
and it seems like to implement an index restore you'd need to try to
reconstitute that somehow.

Re: Use cases for ReplicationHandler's backup facility?

Posted by Chris Harris <ry...@gmail.com>.
2009/9/24 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
> Yes, the only reason to take a backup should be for restoration/archival
> They should contain all the files required for the latest commit point.

Ok, I think I get it now. I assumed "all the files required for the
latest commit point" meant that the backup would only contain those
index files added since the last commit. But instead what you mean is
that it contains all the files still required for the index as of the
last commit, regardless of how long ago they were created. This is
indeed the behavior I see if I play around with the replication
handler on a toy index.

I think I may be having some problems with the backup facility on much
larger indexes, but I'll start a new thread for that.

Thanks,
Chris

Re: Use cases for ReplicationHandler's backup facility?

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.
On Fri, Sep 25, 2009 at 4:57 AM, Chris Harris <ry...@gmail.com> wrote:
> The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication)
> has support for "backups", which can be triggered in one of two ways:
>
> 1. in response to startup/commit/optimize events (specified through
> the backupAfter tag specified in the handler's requestHandler tag in
> solrconfig.xml)
> 2. by manually hitting http://master_host:port/solr/replication?command=backup
>
> These backups get placed in directories named, e.g.
> "snapshot.20090924033521", inside the solr data directory.
>
> According to the docs, these backups are not necessary for replication
> to work. My question is: What use case *are* they meant to address?
>
> The first potential use case that came to mind was that maybe I would
> be able to restore my index from these snapshot directories should it
> ever become corrupted. (I could just do something like "rm -r data; mv
> snapshot.20090924033521 data".) That appears not to be one of the
> intended use cases, though; if it were, then I imagine the snapshot
> directories would contain the entire index, whereas they seem to
> contain only deltas of one form or another.
Yes, the only reason to take a backup should be for restoration/archival
They should contain all the files required for the latest commit point.


>
> Thanks,
> Chris
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com