You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Oakley, Craig (NIH/NLM/NCBI) [C]" <cr...@nih.gov> on 2018/06/04 22:36:30 UTC

sharding guidelines

I have a sharding question.





We have a collection (one shard, two replicas, currently running Solr6.6) which sometimes becomes unresponsive on the non-leader node. It is 214 gigabytes, and we were wondering whether there is a rule of thumb how large to allow a core to grow before sharding. I have a reference in my notes from the 2015 Solr conference in Austin "baseline no more than 100 million docs/shard" and "ideal shard-to-memory ratio, if at all possible index should fit into RAM, but other than that it gets really specific really fast"; but that was several versions ago, and so I wanted to ask whether these suggestions have been recalculated.

Thanks

Re: sharding guidelines

Posted by Erik Hatcher <er...@gmail.com>.
I’d say that 100M/shard is in the smallest doc use case possible, such as straight up log items with only a timestamp, id, and short message kind of thing.

In other contexts, big full text docs, 10M/shard is kind of a max.

How many documents do you have in your collection?

	Erik Hatcher
	Senior Solutions Architect
	Lucidworks.com



> On Jun 4, 2018, at 6:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] <cr...@nih.gov> wrote:
> 
> I have a sharding question.
> 
> 
> 
> 
> 
> We have a collection (one shard, two replicas, currently running Solr6.6) which sometimes becomes unresponsive on the non-leader node. It is 214 gigabytes, and we were wondering whether there is a rule of thumb how large to allow a core to grow before sharding. I have a reference in my notes from the 2015 Solr conference in Austin "baseline no more than 100 million docs/shard" and "ideal shard-to-memory ratio, if at all possible index should fit into RAM, but other than that it gets really specific really fast"; but that was several versions ago, and so I wanted to ask whether these suggestions have been recalculated.
> 
> Thanks


Re: sharding guidelines

Posted by Emir Arnautović <em...@sematext.com>.
In case you missed, following blog posts might come handy:

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ <https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/>
http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html <http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html>

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 6 Jun 2018, at 00:12, Shawn Heisey <el...@elyograg.org> wrote:
> 
> On 6/4/2018 4:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
>> We have a collection (one shard, two replicas, currently running Solr6.6) which sometimes becomes unresponsive on the non-leader node. It is 214 gigabytes, and we were wondering whether there is a rule of thumb how large to allow a core to grow before sharding. I have a reference in my notes from the 2015 Solr conference in Austin "baseline no more than 100 million docs/shard" and "ideal shard-to-memory ratio, if at all possible index should fit into RAM, but other than that it gets really specific really fast"; but that was several versions ago, and so I wanted to ask whether these suggestions have been recalculated.
> 
> In a word, no.
> 
> It is impossible to give generic advice.  One person may have very good performance with 300 million docs in a single index.  Another may have terrible performance with half a million docs per shard.  It depends on a lot of things, including but not limited to the specs of the servers you use, exactly what is in your index, how you have configured Solr, and the nature of your queries.
> 
> Thanks,
> Shawn
> 


Re: sharding guidelines

Posted by Shawn Heisey <el...@elyograg.org>.
On 6/4/2018 4:36 PM, Oakley, Craig (NIH/NLM/NCBI) [C] wrote:
> We have a collection (one shard, two replicas, currently running Solr6.6) which sometimes becomes unresponsive on the non-leader node. It is 214 gigabytes, and we were wondering whether there is a rule of thumb how large to allow a core to grow before sharding. I have a reference in my notes from the 2015 Solr conference in Austin "baseline no more than 100 million docs/shard" and "ideal shard-to-memory ratio, if at all possible index should fit into RAM, but other than that it gets really specific really fast"; but that was several versions ago, and so I wanted to ask whether these suggestions have been recalculated.

In a word, no.

It is impossible to give generic advice.  One person may have very good 
performance with 300 million docs in a single index.  Another may have 
terrible performance with half a million docs per shard.  It depends on 
a lot of things, including but not limited to the specs of the servers 
you use, exactly what is in your index, how you have configured Solr, 
and the nature of your queries.

Thanks,
Shawn