You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tim Heckman <th...@gmail.com> on 2010/12/14 16:31:07 UTC

Solr 1.4 replication, cleaning up old indexes

When using the index replication over HTTP that was introduced in Solr
1.4, what is the recommended way to periodically clean up old indexes
on the slaves?

I found references to the snapcleaner script, but that seems to be for
the older ssh/rsync replication model.


thanks,
Tim

Re: Solr 1.4 replication, cleaning up old indexes

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/14/2010 9:13 AM, Tim Heckman wrote:
> Once per day in the morning, I run a full index + optimize into an "on
> deck" core. When this is complete, I swap the "on deck" with the live
> core. A side-effect of this is that the version number / generation of
> the live index just went backwards, since the "on deck" core does not
> receive the 3x-per-hour deltas during the rest of the day.
>
> The index directories that hang around have timestamps corresponding
> to the daily full update, when the version number goes backward.

Now that you mention this, I too have really only noticed this problem 
when I am fiddling with things and doing a full reindex.  This is the 
only time I swap cores on my master servers.  Since full reindexes don't 
happen very often, that explains why I don't have a problem 95% of the time.

I found SOLR-1781 and posted a comment on it.

https://issues.apache.org/jira/browse/SOLR-1781

Shawn


Re: Solr 1.4 replication, cleaning up old indexes

Posted by Tim Heckman <th...@gmail.com>.
On Tue, Dec 14, 2010 at 10:37 AM, Shawn Heisey <so...@elyograg.org> wrote:
> It's supposed to take care of removing the old indexes on its own - when
> everything is working, it builds an index.<timestamp> directory, replicates,
> swaps that directory in to replace index, and deletes the directory with the
> timestamp.  I have not been able to figure out what circumstances make this
> process break down and cause Solr to simply use the timestamp directory
> as-is, without deleting the old one.  For me, it works most of the time.
>  I'm running 1.4.1.

Interesting. I'm also running 1.4.1. Looking more closely at my index
directories and my update strategy, I see a pattern.

I run delta updates into solr every 20 minutes during the day into my
"live" replicated core. Each time this happens, of course, the index
version and generation is incremented.

Once per day in the morning, I run a full index + optimize into an "on
deck" core. When this is complete, I swap the "on deck" with the live
core. A side-effect of this is that the version number / generation of
the live index just went backwards, since the "on deck" core does not
receive the 3x-per-hour deltas during the rest of the day.

The index directories that hang around have timestamps corresponding
to the daily full update, when the version number goes backward.

Re: Solr 1.4 replication, cleaning up old indexes

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/14/2010 8:31 AM, Tim Heckman wrote:
> When using the index replication over HTTP that was introduced in Solr
> 1.4, what is the recommended way to periodically clean up old indexes
> on the slaves?
>
> I found references to the snapcleaner script, but that seems to be for
> the older ssh/rsync replication model.

It's supposed to take care of removing the old indexes on its own - when 
everything is working, it builds an index.<timestamp> directory, 
replicates, swaps that directory in to replace index, and deletes the 
directory with the timestamp.  I have not been able to figure out what 
circumstances make this process break down and cause Solr to simply use 
the timestamp directory as-is, without deleting the old one.  For me, it 
works most of the time.  I'm running 1.4.1.

Shawn