You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "michael.boom" <my...@yahoo.com> on 2013/11/08 15:39:10 UTC

Merging shards and replicating changes in SolrCloud

Here's the background of this topic: 
I have setup a collection with 4 shards, replicationFactor=2, on two
machines.
I started to index documents, but after hitting some update deadlocks and
restarting servers my shards ranges in ZK state got nulled (i'm using
implicit routing). Indexing continued without me noticing and all new
documents were indexed in shard1 creating huge disproportions with
shards2,3,4.
Of course, I want to fix this and get my index into 4 shards, evenly
distributed.

What I'm thinking to do is:
1. on machine 1, merge shards2,3,4 into shard1 using
http://wiki.apache.org/solr/MergingSolrIndexes
(at this point what happens to the replica of shard1 on machine2 ? will
SolrCloud try to replicate shard1 from machine1?)
2. on machine 2, unload the shard1,2,3,4 cores
3. on machine 1, split shard1 in shard1_0 and shard1_1. Again split shard1_0
and 1_1 getting 4 equal shards 1_0_0, 1_0_1, 1_1_0, 1_1_1
(will now the shard range for the newborns be correct if in the beginning
shard1's range was "null"?)
4. on machine 1 unload shard1
5. rename shards 1_0_0, 1_0_1, 1_1_0, 1_1_1 to 1,2,3,4.
6. replicate shard 1,2,3,4 to machine 2

Do you see any problems with this scenario? Anything that could be don in a
more efficient way ?
Thank you



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp4099997.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging shards and replicating changes in SolrCloud

Posted by "michael.boom" <my...@yahoo.com>.
Thanks for the comments Shalin,I ended up doing just that, reindexing from
ground up. 



-----
Thanks,
Michael
--
View this message in context: http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp4099997p4100255.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging shards and replicating changes in SolrCloud

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Comments inline:

On Fri, Nov 8, 2013 at 8:09 PM, michael.boom <my...@yahoo.com> wrote:

> Here's the background of this topic:
> I have setup a collection with 4 shards, replicationFactor=2, on two
> machines.
> I started to index documents, but after hitting some update deadlocks and
> restarting servers my shards ranges in ZK state got nulled (i'm using
> implicit routing). Indexing continued without me noticing and all new
> documents were indexed in shard1 creating huge disproportions with
> shards2,3,4.
> Of course, I want to fix this and get my index into 4 shards, evenly
> distributed.
>

If you are using implicit routing then the shard ranges should be null.
Shard ranges are only used when the router is compositeId.


>
> What I'm thinking to do is:
> 1. on machine 1, merge shards2,3,4 into shard1 using
> http://wiki.apache.org/solr/MergingSolrIndexes
> (at this point what happens to the replica of shard1 on machine2 ? will
> SolrCloud try to replicate shard1 from machine1?)
>

Index merge is a core admin command. It is not solr cloud aware. Therefore
I think that merging will not automatically replicate shard1 on machine1 to
other replicas unless a recovery is requested for some reason.


> 2. on machine 2, unload the shard1,2,3,4 cores
> 3. on machine 1, split shard1 in shard1_0 and shard1_1. Again split
> shard1_0
> and 1_1 getting 4 equal shards 1_0_0, 1_0_1, 1_1_0, 1_1_1
> (will now the shard range for the newborns be correct if in the beginning
> shard1's range was "null"?)
>

No, shard splitting does not work with implicit routing. It works only if
router is compositeId.


> 4. on machine 1 unload shard1
> 5. rename shards 1_0_0, 1_0_1, 1_1_0, 1_1_1 to 1,2,3,4.
> 6. replicate shard 1,2,3,4 to machine 2
>
> Do you see any problems with this scenario? Anything that could be don in a
> more efficient way ?
> Thank you
>
>
>
Unfortunately no. If you had only inserts on your index and you were
searching across the entire cluster always i.e. you don't care where a
document ends up -- then you could have used the core admin split API to
re-balance the cluster. I think you should just re-index everything and
start again.


>
> -----
> Thanks,
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp4099997.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.