You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Paolo Castagna <ca...@googlemail.com> on 2010/12/09 11:30:43 UTC

Solr replication, HAproxy and data management

Hi,
we are using Solr v1.4.x with multi-cores and a master/slaves configuration.
We also use HAProxy [1] to load balance search requests amongst slaves.
Finally, we use MapReduce to create new Solr indexes.

I'd like to share with you what I am doing when I need to:

  1. add a new index
  2. replace an existing index with an new/updated one
  3. add a slave
  4. remove a slave (or a slave died)

I am interested in knowing what are the best practices in these scenarios.


1. add a new index

Copy the index on the master in the correct location.
Use CREATE [2] to load the new index:
http://host:port/solr/admin/cores?action=CREATE&name=[...]&instanceDir=[...]&dataDir=[...]
Use CREATE to create a new empty index/core on each slave.


2. replace an existing index with a new/updated one

Copy the index on the master in the correct location.
Use CREATE [2] to load the new index.
Use SWAP [3] to swap the old index with the new one.
http://host:port/solr/admin/cores?action=SWAP&core=[...]&other=[...]

Updates for that core on the master can continue during the operation.
Isn't it?

Or

Use UNLOAD [4] to remove the core from the master.
http://host:port/solr/admin/cores?action=UNLOAD&core=[...]
Copy the index on the master in the correct location.
Use CREATE [2] to load the new index.

Updates for that core on the master are not possible (but we queue
updates, so for us is just delaying a few updates for a few seconds).

Doing this I saw a strange thing, but I am not sure what was the problem:
index version and generation for the index on the master were different
from the index version and generation on the slave, but replication did
not happen. A RELOAD on the master seemed to trigger the replication.

Also... I know I should not do it, but... what happens if you swap
the directories on disk while Solr is running?


3. add a slave

Install/configure and start up a new slave.
Use CREATE [2] to create new empty indexes/cores.
The slave will start to replicate indexes from the master.
Add the new slave to the HAProxy pool.

This way, however, I need to CREATE all the cores, one by one.
Is there a way to replicate all the cores available on the master?


4. remove a slave

Remove the slave from HAProxy pool.

Or

HAProxy automatically removes it from the pool, if dead.



Does all this seems sensible to you?

Do you have best practices, suggestions to share?

Thank you,
Paolo


  [1] http://haproxy.1wt.eu/
  [2] http://wiki.apache.org/solr/CoreAdmin#CREATE
  [3] http://wiki.apache.org/solr/CoreAdmin#SWAP
  [4] http://wiki.apache.org/solr/CoreAdmin#UNLOAD

Re: Solr replication, HAproxy and data management

Posted by Paolo Castagna <ca...@googlemail.com>.
Paolo Castagna wrote:
> Hi,
> we are using Solr v1.4.x with multi-cores and a master/slaves 
> configuration.
> We also use HAProxy [1] to load balance search requests amongst slaves.
> Finally, we use MapReduce to create new Solr indexes.
> 
> I'd like to share with you what I am doing when I need to:
> 
>  1. add a new index
>  2. replace an existing index with an new/updated one
>  3. add a slave
>  4. remove a slave (or a slave died)
> 
> I am interested in knowing what are the best practices in these scenarios.

[...]

> Does all this seems sensible to you?
> 
> Do you have best practices, suggestions to share?


Well, maybe these are two too broad questions...

I have a very specific one, related to all this.

Let's say I have a Solr master with multi-cores and I want to add a new
slave. Can I tell the slave to replicate all the indexes for the master?
How?

Any comment/advice regarding my original message are still more than
welcome.

Thank you,
Paolo