You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by escher2k <es...@yahoo.com> on 2007/02/13 20:59:19 UTC

Incremental replication...

I was wondering if the scripts provided in Solr do incremental replication.
Looking at the script for snapshooter, it seems like the whole index
directory is copied over. Is that correct ? If so, isn't performance a
problem over the long run ? Thanks for the clarification in advance (I hope
I am wrong !!).
-- 
View this message in context: http://www.nabble.com/Incremental-replication...-tf3222946.html#a8951862
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Incremental replication...

Posted by Kevin Lewandowski <ke...@gmail.com>.
snapshooter copies all files but most files in the snapshot
directories are hard links pointing to segments in the main index
directory. So only new segments end up getting copied.

We've been running replication on discogs.com for several months and
it works great.

On 2/13/07, escher2k <es...@yahoo.com> wrote:
>
> I was wondering if the scripts provided in Solr do incremental replication.
> Looking at the script for snapshooter, it seems like the whole index
> directory is copied over. Is that correct ? If so, isn't performance a
> problem over the long run ? Thanks for the clarification in advance (I hope
> I am wrong !!).
> --
> View this message in context: http://www.nabble.com/Incremental-replication...-tf3222946.html#a8951862
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Incremental replication...

Posted by Bill Au <bi...@gmail.com>.
FYI, additional information on replication is available in the Solr TWiki:

http://wiki.apache.org/solr/CollectionDistribution

Bill

On 2/13/07, Bertrand Delacretaz <bd...@apache.org> wrote:
>
> On 2/13/07, escher2k <es...@yahoo.com> wrote:
>
> > ...Atleast from looking at the snapshooter script, it doesn't
> > seem to be doing anything specific...
>
> The snapshooter script only makes an "instant snapshot" of the index
> directory using cp -lr. This does not involve any copying of index
> data.
>
> The actual replication is done using rsync in the other scripts, by
> copying the index snapshot elsewhere.
>
> Rsync only copies what has changed since the last copy, and not many
> files change in a Lucene index when adding documents, so it's correct
> that replication uses little bandwidth when adding documents.
>
> Index optimization, OTOH, causes much larger changes in the index
> directory, so after an optimization rsync will usually have much more
> data to transfer.
>
> -Bertrand
>

Re: Incremental replication...

Posted by Bertrand Delacretaz <bd...@apache.org>.
On 2/13/07, escher2k <es...@yahoo.com> wrote:

> ...Atleast from looking at the snapshooter script, it doesn't
> seem to be doing anything specific...

The snapshooter script only makes an "instant snapshot" of the index
directory using cp -lr. This does not involve any copying of index
data.

The actual replication is done using rsync in the other scripts, by
copying the index snapshot elsewhere.

Rsync only copies what has changed since the last copy, and not many
files change in a Lucene index when adding documents, so it's correct
that replication uses little bandwidth when adding documents.

Index optimization, OTOH, causes much larger changes in the index
directory, so after an optimization rsync will usually have much more
data to transfer.

-Bertrand

RE: Incremental replication...

Posted by escher2k <es...@yahoo.com>.

Graham Stead-2 wrote:
> 
> We have used replication for a few weeks now and it generally works well.
> 
> I believe you'll find that commit operations cause only new segments to be
> transferred, whereas optimize operations cause the entire index to be
> transferred. Therefore, the amount of data transferred really depends on
> how
> frequently you index new data and how often you call <commit/> and
> <optimize/>.
> 
> Hope this helps,
> -Graham
> 
> 
> 
> 

Thanks Graham. Atleast from looking at the snapshooter script, it doesn't
seem to be doing anything specific.  The following is a fragment from the
script -

snap_name=snapshot.`date +"%Y%m%d%H%M%S"`
name=${data_dir}/${snap_name}
temp=${data_dir}/temp-${snap_name}

if [[ -d ${name} ]]
then
    logMessage snapshot directory ${name} already exists
    logExit aborted 1
fi

if [[ -d ${temp} ]]
then
    logMessage snapshoting of ${name} in progress
    logExit aborted 1
fi

# clean up after INT/TERM
trap 'echo cleaning up, please wait ...;/bin/rm -rf ${name} ${temp};logExit
aborted 13' INT TERM

logMessage taking snapshot ${name}

# take a snapshot using hard links into temporary location
# then move it into place atomically
cp -lr ${data_dir}/index ${temp}
mv ${temp} ${name}
-- 
View this message in context: http://www.nabble.com/Incremental-replication...-tf3222946.html#a8952716
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Incremental replication...

Posted by Graham Stead <gs...@ieee.org>.
We have used replication for a few weeks now and it generally works well.

I believe you'll find that commit operations cause only new segments to be
transferred, whereas optimize operations cause the entire index to be
transferred. Therefore, the amount of data transferred really depends on how
frequently you index new data and how often you call <commit/> and
<optimize/>.

Hope this helps,
-Graham