You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sourav Moitra <so...@gmail.com> on 2018/10/08 01:28:41 UTC

Deciding on the number of Shards and Replica

Hello all,

I am Solr newbie. I am trying to setup three servers running both
Zookeeper ensemble and Solr in cloud mode. Each server has 4 core and
16gb of RAM. To start with I have put Xmx value of 6144M to Zookeeper
and Xmx value of 2048 to Solr.We have created 3 shards and 3 replica
each. The size of each replica turned out to be 3GB each and I am
planning to host multiple such collection.

Now my question is what are the problems do you see with this kind of setup ?
How can I improve the setup ?
What is the rule of thumb for number of Shards and replicas ?
Is there any correlation between number of servers vs number of shards
and replicas ?

Thank you for looking into this.

Sourav Moitra
https://souravmoitra.com

Re: Deciding on the number of Shards and Replica

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/7/2018 7:28 PM, Sourav Moitra wrote:
> I am Solr newbie. I am trying to setup three servers running both
> Zookeeper ensemble and Solr in cloud mode. Each server has 4 core and
> 16gb of RAM. To start with I have put Xmx value of 6144M to Zookeeper
> and Xmx value of 2048 to Solr.We have created 3 shards and 3 replica
> each. The size of each replica turned out to be 3GB each and I am
> planning to host multiple such collection.
>
> Now my question is what are the problems do you see with this kind of setup ?
> How can I improve the setup ?
> What is the rule of thumb for number of Shards and replicas ?
> Is there any correlation between number of servers vs number of shards
> and replicas ?

In a nutshell: There are no generic answers, no rule of thumb.  None at all.

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

With some more detailed information, we can provide a GUESS about how 
you should size things.  But that's all it will be, and it could be a 
completely wrong guess.

Why are you giving 6GB of memory to zookeeper?  Unless you're going to 
have a LOT of shard replicas and servers in your cloud, I can't imagine 
each ZK server needing more than about 512MB, and it might even run with 
far less.

Some questions that will be important to answer:

How many documents are in that 3GB shard replica?  How much index data 
(both document count and size on disk) do you expect each machine to be 
handling?  16GB might be nowhere near enough total memory for the 
system, but without more information I can't even guess about that.

Do you know how many queries per second the cloud is likely to receive?

I saw a nearly identical question on the IRC channel a couple of hours 
ago.  I had to leave, and when I made it back, the person asking the 
question had left.

Thanks,
Shawn