You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Stewart <bs...@gmail.com> on 2012/08/07 18:25:03 UTC

replication from lucene to solr

Hi,

I have a client who uses Lucene in a home grown CMS system they
developed in Java.  They have a lot of code that uses the Lucene API
directly and they can't change it now.  But they also need to use SOLR
for some other apps which must use the same Lucene index data.  So I
need to make a good way to periodically replicate the Lucene index to
SOLR.  I know how to make efficient Lucene index snapshots from within
their CMS Java app (basically using the same method as the old
replication scripts, using hard-links, etc.) - assuming I have a new
index snapshot, how can I tell a running SOLR instance to start using
the new index snapshot instead of its current index, and also how can
I configure SOLR to use the latest "snapshot" directory on re-start?
Assume I create new index snapshots into a directory such that each
new snapshot is a folder in format YYYYMMHHMMDDSS (timestamp).  Is
there any way to configure SOLR to look someplace for new index
snapshots (some multi-core setup?).

Thanks!

Re: replication from lucene to solr

Posted by Lance Norskog <go...@gmail.com>.
Look at how the older rsync-based snapshooter works: it uses the Unix
rsync program to very efficiently spot and copy updated files in the
master index. It runs from each query slave, just like Java
replication. Unlike Java replication, it just uses the SSH copy
protocol, and does not talk to the master indexing Solr program.

You can run the snapshooter against any directory with a Lucene index.
An actively updated index will work great.

The key to this replicator is that Lucene never saves inconsistent
data on disk: it writes new data and the updates the master list of
what is new data, then deletes the old data. You can copy a Lucene
index at any point in time and it will be consistent.

On Tue, Aug 7, 2012 at 9:25 AM, Robert Stewart <bs...@gmail.com> wrote:
> Hi,
>
> I have a client who uses Lucene in a home grown CMS system they
> developed in Java.  They have a lot of code that uses the Lucene API
> directly and they can't change it now.  But they also need to use SOLR
> for some other apps which must use the same Lucene index data.  So I
> need to make a good way to periodically replicate the Lucene index to
> SOLR.  I know how to make efficient Lucene index snapshots from within
> their CMS Java app (basically using the same method as the old
> replication scripts, using hard-links, etc.) - assuming I have a new
> index snapshot, how can I tell a running SOLR instance to start using
> the new index snapshot instead of its current index, and also how can
> I configure SOLR to use the latest "snapshot" directory on re-start?
> Assume I create new index snapshots into a directory such that each
> new snapshot is a folder in format YYYYMMHHMMDDSS (timestamp).  Is
> there any way to configure SOLR to look someplace for new index
> snapshots (some multi-core setup?).
>
> Thanks!



-- 
Lance Norskog
goksron@gmail.com