You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Johannes Knaus <Kn...@mpdl.mpg.de> on 2017/04/12 16:30:17 UTC

What does the replication factor parameter in collections api do?

Hi,

I am still quite new to Solr. I have the following setup:
A SolrCloud setup with 
38 nodes, 
maxShardsPerNode=2, 
implicit routing with routing field, 
and replication factor=2.

Now, I want to add replica. This works fine by first increasing the maxShardsPerNode to a higher number and then add replicas.
So far, so good. I can confirm changes of the maxShardsPerNode parameter and added replicas in the Admin UI.
However, the Solr Admin UI still is showing me a replication factor of 2.
I am a little confused about what the replicationfactor parameter actually does in my case:

1) What does that mean? Does Solr make use of all replicas I have or only of two?
2) Do I need to increase the replication factor value as well to really have more replicas available and usable? If this is true, do I need to restart/reload the collection newly upload configs to Zookeeper or anything alike?
3) Or is replicationfactor just a parameter that is needed for the first start of SolrCloud and can be ignored afterwards?

Thank you very much for your help,
All the best,
Johannes


Re: AW: What does the replication factor parameter in collections api do?

Posted by Johannes Knaus <Kn...@mpdl.mpg.de>.
Thank you all very much for your answers. That definitely explains it.
All the best,
Johannes

> Am 13.04.2017 um 17:03 schrieb Erick Erickson <er...@gmail.com>:
> 
> bq: Why is it possible then to alter replicationFactor via
> MODIFYCOLLECTION in the collections API
> 
> Because MODIFYCOLLECTION just changes properties in the collection
> definition generically and replicationFactor just happens to be one.
> IOW there's no overarching reason.
> 
> It would be extra work to dis-allow that one case and possibly
> introduce errors without changing any functionality so nobody was
> willing to put in the effort.
> 
> Best,
> Erick
> 
>> On Thu, Apr 13, 2017 at 5:48 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>> On 4/13/2017 3:22 AM, Johannes Knaus wrote:
>>> Ok. Thank you for your quick reply. Though I still feel a little
>>> uneasy. Why is it possible then to alter replicationFactor via
>>> MODIFYCOLLECTION in the collections API? What would be the use case
>>> for this parameter at all then?
>> 
>> If you use a very specific storage method for your indexes -- HDFS --
>> then replicationFactor has meaning beyond initial collection creation,
>> in conjunction with the "autoAddReplicas" feature.
>> 
>> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud
>> 
>> If you are NOT utilizing the very specific HDFS storage engine, then
>> everything you were told applies.  With standard storage mechanisms,
>> replicationFactor has zero meaning after initial collection creation,
>> and changing the value will have no effect.
>> 
>> Thanks,
>> Shawn
>> 

Re: AW: What does the replication factor parameter in collections api do?

Posted by Erick Erickson <er...@gmail.com>.
bq: Why is it possible then to alter replicationFactor via
MODIFYCOLLECTION in the collections API

Because MODIFYCOLLECTION just changes properties in the collection
definition generically and replicationFactor just happens to be one.
IOW there's no overarching reason.

It would be extra work to dis-allow that one case and possibly
introduce errors without changing any functionality so nobody was
willing to put in the effort.

Best,
Erick

On Thu, Apr 13, 2017 at 5:48 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 4/13/2017 3:22 AM, Johannes Knaus wrote:
>> Ok. Thank you for your quick reply. Though I still feel a little
>> uneasy. Why is it possible then to alter replicationFactor via
>> MODIFYCOLLECTION in the collections API? What would be the use case
>> for this parameter at all then?
>
> If you use a very specific storage method for your indexes -- HDFS --
> then replicationFactor has meaning beyond initial collection creation,
> in conjunction with the "autoAddReplicas" feature.
>
> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud
>
> If you are NOT utilizing the very specific HDFS storage engine, then
> everything you were told applies.  With standard storage mechanisms,
> replicationFactor has zero meaning after initial collection creation,
> and changing the value will have no effect.
>
> Thanks,
> Shawn
>

Re: AW: What does the replication factor parameter in collections api do?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/13/2017 3:22 AM, Johannes Knaus wrote:
> Ok. Thank you for your quick reply. Though I still feel a little
> uneasy. Why is it possible then to alter replicationFactor via
> MODIFYCOLLECTION in the collections API? What would be the use case
> for this parameter at all then? 

If you use a very specific storage method for your indexes -- HDFS --
then replicationFactor has meaning beyond initial collection creation,
in conjunction with the "autoAddReplicas" feature.

https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS#RunningSolronHDFS-AutomaticallyAddReplicasinSolrCloud

If you are NOT utilizing the very specific HDFS storage engine, then
everything you were told applies.  With standard storage mechanisms,
replicationFactor has zero meaning after initial collection creation,
and changing the value will have no effect.

Thanks,
Shawn


AW: What does the replication factor parameter in collections api do?

Posted by Johannes Knaus <Kn...@mpdl.mpg.de>.
Ok. Thank you for your quick reply. 
Though I still feel a little uneasy. Why is it possible then to alter replicationFactor via MODIFYCOLLECTION in the collections API? What would be the use case for this parameter at all then?


-----Ursprüngliche Nachricht-----
Von: Erick Erickson [mailto:erickerickson@gmail.com] 
Gesendet: Mittwoch, 12. April 2017 19:36
An: solr-user
Betreff: Re: What does the replication factor parameter in collections api do?

really <3>. replicationFactor is used to set up your collection initially, you have to be able to change your topology afterwards so it's ignored thereafter.

Once your replica is added, it's automatically made use of by the collection.

On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus <Kn...@mpdl.mpg.de> wrote:
> Hi,
>
> I am still quite new to Solr. I have the following setup:
> A SolrCloud setup with
> 38 nodes,
> maxShardsPerNode=2,
> implicit routing with routing field,
> and replication factor=2.
>
> Now, I want to add replica. This works fine by first increasing the maxShardsPerNode to a higher number and then add replicas.
> So far, so good. I can confirm changes of the maxShardsPerNode parameter and added replicas in the Admin UI.
> However, the Solr Admin UI still is showing me a replication factor of 2.
> I am a little confused about what the replicationfactor parameter actually does in my case:
>
> 1) What does that mean? Does Solr make use of all replicas I have or only of two?
> 2) Do I need to increase the replication factor value as well to really have more replicas available and usable? If this is true, do I need to restart/reload the collection newly upload configs to Zookeeper or anything alike?
> 3) Or is replicationfactor just a parameter that is needed for the first start of SolrCloud and can be ignored afterwards?
>
> Thank you very much for your help,
> All the best,
> Johannes
>

Re: What does the replication factor parameter in collections api do?

Posted by Erick Erickson <er...@gmail.com>.
really <3>. replicationFactor is used to set up your collection
initially, you have to be able to change your topology afterwards so
it's ignored thereafter.

Once your replica is added, it's automatically made use of by the collection.

On Wed, Apr 12, 2017 at 9:30 AM, Johannes Knaus <Kn...@mpdl.mpg.de> wrote:
> Hi,
>
> I am still quite new to Solr. I have the following setup:
> A SolrCloud setup with
> 38 nodes,
> maxShardsPerNode=2,
> implicit routing with routing field,
> and replication factor=2.
>
> Now, I want to add replica. This works fine by first increasing the maxShardsPerNode to a higher number and then add replicas.
> So far, so good. I can confirm changes of the maxShardsPerNode parameter and added replicas in the Admin UI.
> However, the Solr Admin UI still is showing me a replication factor of 2.
> I am a little confused about what the replicationfactor parameter actually does in my case:
>
> 1) What does that mean? Does Solr make use of all replicas I have or only of two?
> 2) Do I need to increase the replication factor value as well to really have more replicas available and usable? If this is true, do I need to restart/reload the collection newly upload configs to Zookeeper or anything alike?
> 3) Or is replicationfactor just a parameter that is needed for the first start of SolrCloud and can be ignored afterwards?
>
> Thank you very much for your help,
> All the best,
> Johannes
>