You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeffery Yuan <yu...@gmail.com> on 2016/10/07 16:43:01 UTC

Whether replicationFactor=2 makes sense?

We are trying to building our solr cloud servers, we want to increase
replicationFactor, but don't want to set it as 3 as we have a lot of data.

So I am wondering whether it makes sense to set replicationFactor as 2, and
what's the impact, whether this will cause problem for replica leader
election such as split brain etc?

Thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Whether replicationFactor=2 makes sense?

Posted by Jeffery Yuan <yu...@gmail.com>.
Thanks Erick Erickson, that totally makes sense for me now :)



--
View this message in context: http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300271.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Whether replicationFactor=2 makes sense?

Posted by Erick Erickson <er...@gmail.com>.
you are correct, that's the whole point of SolrCloud.

The other thing replicas gain you is the ability to
serve more queries since you only query a single
replica for each shards.

Best,
Erick

On Fri, Oct 7, 2016 at 4:02 PM, Jeffery Yuan <yu...@gmail.com> wrote:
> Thanks so much for your reply, Erick Erickson.
>
> We want to increase replicationFactor from 1 to 2 to, but I am wondering
> what's the advantage to do so.
> Whether this will make our system more robust and resilient to temporary
> network failure issue?
>
> Say if we have 3 machines, and split data into 3 shards, if we set
> replicationFactor to 2,
> machine A contains data from shard 1. shard2, machine B contains shard2,
> shard 3, machine c contains shard3, shard 1
>
> If machine A is down or has temporally network issue, whether the system can
> continue work?
> -- I would guess so, as you suggested, the zookeeper is used to maintain
> cluster info, then zookeeper will figure out and choose new leader if needed
> and the system will keep running.
>
> Thanks again
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300257.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Whether replicationFactor=2 makes sense?

Posted by Jeffery Yuan <yu...@gmail.com>.
Thanks so much for your reply, Erick Erickson.

We want to increase replicationFactor from 1 to 2 to, but I am wondering
what's the advantage to do so.
Whether this will make our system more robust and resilient to temporary
network failure issue?

Say if we have 3 machines, and split data into 3 shards, if we set
replicationFactor to 2,
machine A contains data from shard 1. shard2, machine B contains shard2,
shard 3, machine c contains shard3, shard 1

If machine A is down or has temporally network issue, whether the system can
continue work?
-- I would guess so, as you suggested, the zookeeper is used to maintain
cluster info, then zookeeper will figure out and choose new leader if needed
and the system will keep running.

Thanks again




--
View this message in context: http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204p4300257.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Whether replicationFactor=2 makes sense?

Posted by Erick Erickson <er...@gmail.com>.
Sure, replicationFactor=2 is fine. Solr goes to a lot of effort to
avoid split-brain issues
using Zookeeper.

You're confusing, I think, Solr node replication and Zookeeper. The Solr
replicationFactor has nothing to do with quorum. Having 2 is the same as 3.
Solr uses Zookeeper's Quorum sensing to insure that all Solr nodes
have a consistent picture of the cluster. Solr will refuse to index data if
_Zookeeper_ loses quorum.

But whether Solr has 2 or 3 replicas is not relevant. Solr indexes data through
the leader of each shard, and that keeps all replicas consistent.

As far as other impacts, adding a replica will have an impact on indexing
throughput, you'll have to see whether that makes any difference in your
situation. This is usually on the order of 10% or so, YMMV. And this is only
on the first replica you add, i.e. going from leader-only to 2
replicas costs, say,
10% on throughput, but adding yet another replica does NOT add another 10%
since the leader->replica updates are done in parallel.

Best,
Erick

On Fri, Oct 7, 2016 at 9:43 AM, Jeffery Yuan <yu...@gmail.com> wrote:
> We are trying to building our solr cloud servers, we want to increase
> replicationFactor, but don't want to set it as 3 as we have a lot of data.
>
> So I am wondering whether it makes sense to set replicationFactor as 2, and
> what's the impact, whether this will cause problem for replica leader
> election such as split brain etc?
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Whether-replicationFactor-2-makes-sense-tp4300204.html
> Sent from the Solr - User mailing list archive at Nabble.com.