You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Muhammad Imad Qureshi <im...@yahoo.com.INVALID> on 2017/04/05 02:38:20 UTC

Number of shards - Best practice

Hi
I was recently told that ideally the number of shards in a SOLR cluster should be equal to a power of 2. If this is indeed a best practice, then what is the rationale behind this recommendation? ThanksImad

Re: Number of shards - Best practice

Posted by Mikhail Khludnev <mk...@apache.org>.
FWIW, you can pass ranges of arbitrary number of shards to SPLITSHARD. Thus
you can split on any number of shards.

On Wed, Apr 5, 2017 at 5:39 PM, Erick Erickson <er...@gmail.com>
wrote:

> You may be confusing the number of shards you configure and how they
> expand using the SPLITSHARD command. That command creates two shards
> where there was one before, so in that sense Solr collections can grow
> by a factor of 2. But that doesn't mean anything about the number of
> shards you started with. I.e. I can start with 3 shards, then use
> SPLITSHARD and have 6, use SPLITSHARD again and have 12 etc....
>
> Best,
> Erick
>
> On Tue, Apr 4, 2017 at 9:22 PM, Walter Underwood <wu...@wunderwood.org>
> wrote:
> >> On Apr 4, 2017, at 7:38 PM, Muhammad Imad Qureshi
> <im...@yahoo.com.INVALID> wrote:
> >>
> >> Hi
> >> I was recently told that ideally the number of shards in a SOLR cluster
> should be equal to a power of 2. If this is indeed a best practice, then
> what is the rationale behind this recommendation? ThanksImad
> >
> > I don’t know of any such recommendation. Assuming you are not RAM or
> disk limited, going to two or three shards won’t help a lot. If those get
> you out of a bottleneck, you’ll see a difference.
> >
> > I believe that some of the performance of Solr is proportional to the
> number of distinct terms in the index (the vocabulary). A rule of thumb is
> the vocabulary is proportional to the square root of the number of terms in
> the index. Which is often related to the number of documents. With this
> assumption, four shards gives a 2X speedup. Which has worked for me.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Re: Number of shards - Best practice

Posted by Erick Erickson <er...@gmail.com>.
You may be confusing the number of shards you configure and how they
expand using the SPLITSHARD command. That command creates two shards
where there was one before, so in that sense Solr collections can grow
by a factor of 2. But that doesn't mean anything about the number of
shards you started with. I.e. I can start with 3 shards, then use
SPLITSHARD and have 6, use SPLITSHARD again and have 12 etc....

Best,
Erick

On Tue, Apr 4, 2017 at 9:22 PM, Walter Underwood <wu...@wunderwood.org> wrote:
>> On Apr 4, 2017, at 7:38 PM, Muhammad Imad Qureshi <im...@yahoo.com.INVALID> wrote:
>>
>> Hi
>> I was recently told that ideally the number of shards in a SOLR cluster should be equal to a power of 2. If this is indeed a best practice, then what is the rationale behind this recommendation? ThanksImad
>
> I don’t know of any such recommendation. Assuming you are not RAM or disk limited, going to two or three shards won’t help a lot. If those get you out of a bottleneck, you’ll see a difference.
>
> I believe that some of the performance of Solr is proportional to the number of distinct terms in the index (the vocabulary). A rule of thumb is the vocabulary is proportional to the square root of the number of terms in the index. Which is often related to the number of documents. With this assumption, four shards gives a 2X speedup. Which has worked for me.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>

Re: Number of shards - Best practice

Posted by Walter Underwood <wu...@wunderwood.org>.
> On Apr 4, 2017, at 7:38 PM, Muhammad Imad Qureshi <im...@yahoo.com.INVALID> wrote:
> 
> Hi
> I was recently told that ideally the number of shards in a SOLR cluster should be equal to a power of 2. If this is indeed a best practice, then what is the rationale behind this recommendation? ThanksImad

I don’t know of any such recommendation. Assuming you are not RAM or disk limited, going to two or three shards won’t help a lot. If those get you out of a bottleneck, you’ll see a difference.

I believe that some of the performance of Solr is proportional to the number of distinct terms in the index (the vocabulary). A rule of thumb is the vocabulary is proportional to the square root of the number of terms in the index. Which is often related to the number of documents. With this assumption, four shards gives a 2X speedup. Which has worked for me. 

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)