You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by mtn search <se...@gmail.com> on 2021/06/25 16:15:25 UTC

Number of Collections in a SolrCloud

Hello,

I am interested to learn what others have experienced in terms of hitting a
limit for the number of collections supported by a SolrCloud instance.

Also, does anyone have any tips/questions for evaluating when to create a
new SolrCloud and begin adding new collections to it rather than grow the
original SolrCloud instance?

I realize there are likely a number of characteristics of a SolrCloud to
evaluate.  My guess is network resources will be the key factor.  I am
thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble.  With
Collections containing 10-30 million docs, small doc size, heavy indexing,
small query load.

Thanks,
Matt

Re: Number of Collections in a SolrCloud

Posted by "Natarajan, Rajeswari" <ra...@sap.com.INVALID>.
HI Brain and everyone,

How many solr nodes per solrcloud you have to support 1000 collections , replication factor and memory allotted.
In the mailing list several times the max limit of solr collections on  solrcloud was discussed. Interested in the specifics and also I assume this  is in prod.

Thanks,
Rajeswari


On 6/29/21, 7:39 AM, "Brian Lininger" <brian.lininger@veeva.com <ma...@veeva.com>> wrote:


Hi Matt,
Solr instance == Solr JVM. 80-90M docs is the total count of docs across
all collections we typically have several hundred collections per Solr
instance as we have a multi-tenent service and we keep all data segregated
by tenent.
Brian






On Mon, Jun 28, 2021, 7:41 PM mtn search <searchmtn@gmail.com <ma...@gmail.com>> wrote:


> Thanks Brian! Valuable information!
>
> Followup question. When you say Solr instance, in each case do you mean
> SolrCloud instance? It seems so when you speak of replica count, however
> when you stated 80-90 million docs I wondered if you meant Solr collection.
>
> Matt
>
> On Mon, Jun 28, 2021 at 5:17 PM Brian Lininger <brian.lininger@veeva.com <ma...@veeva.com>>
> wrote:
>
> > Hi Matt,
> > We're currently running Solr 6.6.6 using Solr Cloud. Depending on the
> > application and load, we've been able to stably run upwards of 1,000
> > collections without a problem in a single SolrCloud. We try to keep the
> > total replica count per Solr instance to less than 500, but have run
> > 600-700 replicas per Solr instance without issue if the user load is
> > light. Our Solr document sizes are pretty large, but we're able to
> handle
> > 80-90M docs per instance with 700-800G of total index size. 300B docs
> does
> > seem quite large, but if the size of your docs aren't huge and you've got
> > enough shards in your collection then I wouldn't be surprised if it
> worked
> > fine. The only thing we learned is that we had to change the number of
> > threads Solr uses for loading replicas because of our high numbers.... 8
> > threads would take forever upon startup (look at 'coreLoadThreads') . At
> > the very least, perf test out something on a similar scale of what you're
> > thinking and see how it scales.
> > Best of Luck,
> > Brian
> >
> > On Mon, Jun 28, 2021 at 12:50 PM mtn search <searchmtn@gmail.com <ma...@gmail.com>> wrote:
> >
> > > I am guessing the consideration of hitting the limit of the number of
> > > collections within a SolrCloud is not a common experience. I wanted to
> > > raise this question again if perhaps anyone has any lessons learned or
> > > things to consider. We are currently planning work to migrate 300
> > billion
> > > plus docs on the master nodes of a legacy master/slave installation to
> > > SolrCloud. I figure that we will push the limits of a single SolrCloud
> > > instance.
> > >
> > > Thanks again,
> > > Matt
> > >
> > > On Fri, Jun 25, 2021 at 10:15 AM mtn search <searchmtn@gmail.com <ma...@gmail.com>>
> wrote:
> > >
> > > > Hello,
> > > >
> > > > I am interested to learn what others have experienced in terms of
> > hitting
> > > > a limit for the number of collections supported by a SolrCloud
> > instance.
> > > >
> > > > Also, does anyone have any tips/questions for evaluating when to
> > create a
> > > > new SolrCloud and begin adding new collections to it rather than grow
> > the
> > > > original SolrCloud instance?
> > > >
> > > > I realize there are likely a number of characteristics of a SolrCloud
> > to
> > > > evaluate. My guess is network resources will be the key factor. I
> am
> > > > thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble. With
> > > > Collections containing 10-30 million docs, small doc size, heavy
> > > indexing,
> > > > small query load.
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > >
> >
> >
> > --
> >
> >
> > *Brian Lininger*
> > Technical Architect, Infrastructure & Search
> > *Veeva Systems *
> > brian.lininger@veeva.com <ma...@veeva.com>
> >
> > *Zoom:* https://veeva.zoom.us/j/8113896271 <https://veeva.zoom.us/j/8113896271>
> >
> > www.veeva.com
> >
> >
> > *This email and the information it contains are intended for the intended
> > recipient only, are confidential and may be privileged information exempt
> > from disclosure by law.*
> > *If you have received this email in error, please notify us immediately
> by
> > reply email and delete this message from your computer.*
> > *Please do not retain, copy or distribute this email.*
> >
>




Re: Number of Collections in a SolrCloud

Posted by Brian Lininger <br...@veeva.com>.
Hi Matt,
Solr instance == Solr JVM.  80-90M docs is the total count of docs across
all collections we typically have several hundred collections per Solr
instance as we have a multi-tenent service and we keep all data segregated
by tenent.
Brian



On Mon, Jun 28, 2021, 7:41 PM mtn search <se...@gmail.com> wrote:

> Thanks Brian!  Valuable information!
>
> Followup question.  When you say Solr instance, in each case do you mean
> SolrCloud instance?  It seems so when you speak of replica count, however
> when you stated 80-90 million docs I wondered if you meant Solr collection.
>
> Matt
>
> On Mon, Jun 28, 2021 at 5:17 PM Brian Lininger <br...@veeva.com>
> wrote:
>
> > Hi Matt,
> > We're currently running Solr 6.6.6 using Solr Cloud.  Depending on the
> > application and load, we've been able to stably run upwards of 1,000
> > collections without a problem in a single SolrCloud.  We try to keep the
> > total replica count per Solr instance to less than 500, but have run
> > 600-700 replicas per Solr instance without issue if the user load is
> > light.  Our Solr document sizes are pretty large, but we're able to
> handle
> > 80-90M docs per instance with 700-800G of total index size.  300B docs
> does
> > seem quite large, but if the size of your docs aren't huge and you've got
> > enough shards in your collection then I wouldn't be surprised if it
> worked
> > fine.  The only thing we learned is that we had to change the number of
> > threads Solr uses for loading replicas because of our high numbers.... 8
> > threads would take forever upon startup (look at 'coreLoadThreads') .  At
> > the very least, perf test out something on a similar scale of what you're
> > thinking and see how it scales.
> > Best of Luck,
> > Brian
> >
> > On Mon, Jun 28, 2021 at 12:50 PM mtn search <se...@gmail.com> wrote:
> >
> > > I am guessing the consideration of hitting the limit of the number of
> > > collections within a SolrCloud is not a common experience.  I wanted to
> > > raise this question again if perhaps anyone has any lessons learned or
> > > things to consider.  We are currently planning work to migrate 300
> > billion
> > > plus docs on the master nodes of a legacy master/slave installation to
> > > SolrCloud.  I figure that we will push the limits of a single SolrCloud
> > > instance.
> > >
> > > Thanks again,
> > > Matt
> > >
> > > On Fri, Jun 25, 2021 at 10:15 AM mtn search <se...@gmail.com>
> wrote:
> > >
> > > > Hello,
> > > >
> > > > I am interested to learn what others have experienced in terms of
> > hitting
> > > > a limit for the number of collections supported by a SolrCloud
> > instance.
> > > >
> > > > Also, does anyone have any tips/questions for evaluating when to
> > create a
> > > > new SolrCloud and begin adding new collections to it rather than grow
> > the
> > > > original SolrCloud instance?
> > > >
> > > > I realize there are likely a number of characteristics of a SolrCloud
> > to
> > > > evaluate.  My guess is network resources will be the key factor.  I
> am
> > > > thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble.  With
> > > > Collections containing 10-30 million docs, small doc size, heavy
> > > indexing,
> > > > small query load.
> > > >
> > > > Thanks,
> > > > Matt
> > > >
> > >
> >
> >
> > --
> >
> >
> > *Brian Lininger*
> > Technical Architect, Infrastructure & Search
> > *Veeva Systems *
> > brian.lininger@veeva.com
> >
> > *Zoom:* https://veeva.zoom.us/j/8113896271
> >
> > www.veeva.com
> >
> >
> > *This email and the information it contains are intended for the intended
> > recipient only, are confidential and may be privileged information exempt
> > from disclosure by law.*
> > *If you have received this email in error, please notify us immediately
> by
> > reply email and delete this message from your computer.*
> > *Please do not retain, copy or distribute this email.*
> >
>

Re: Number of Collections in a SolrCloud

Posted by mtn search <se...@gmail.com>.
Thanks Brian!  Valuable information!

Followup question.  When you say Solr instance, in each case do you mean
SolrCloud instance?  It seems so when you speak of replica count, however
when you stated 80-90 million docs I wondered if you meant Solr collection.

Matt

On Mon, Jun 28, 2021 at 5:17 PM Brian Lininger <br...@veeva.com>
wrote:

> Hi Matt,
> We're currently running Solr 6.6.6 using Solr Cloud.  Depending on the
> application and load, we've been able to stably run upwards of 1,000
> collections without a problem in a single SolrCloud.  We try to keep the
> total replica count per Solr instance to less than 500, but have run
> 600-700 replicas per Solr instance without issue if the user load is
> light.  Our Solr document sizes are pretty large, but we're able to handle
> 80-90M docs per instance with 700-800G of total index size.  300B docs does
> seem quite large, but if the size of your docs aren't huge and you've got
> enough shards in your collection then I wouldn't be surprised if it worked
> fine.  The only thing we learned is that we had to change the number of
> threads Solr uses for loading replicas because of our high numbers.... 8
> threads would take forever upon startup (look at 'coreLoadThreads') .  At
> the very least, perf test out something on a similar scale of what you're
> thinking and see how it scales.
> Best of Luck,
> Brian
>
> On Mon, Jun 28, 2021 at 12:50 PM mtn search <se...@gmail.com> wrote:
>
> > I am guessing the consideration of hitting the limit of the number of
> > collections within a SolrCloud is not a common experience.  I wanted to
> > raise this question again if perhaps anyone has any lessons learned or
> > things to consider.  We are currently planning work to migrate 300
> billion
> > plus docs on the master nodes of a legacy master/slave installation to
> > SolrCloud.  I figure that we will push the limits of a single SolrCloud
> > instance.
> >
> > Thanks again,
> > Matt
> >
> > On Fri, Jun 25, 2021 at 10:15 AM mtn search <se...@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I am interested to learn what others have experienced in terms of
> hitting
> > > a limit for the number of collections supported by a SolrCloud
> instance.
> > >
> > > Also, does anyone have any tips/questions for evaluating when to
> create a
> > > new SolrCloud and begin adding new collections to it rather than grow
> the
> > > original SolrCloud instance?
> > >
> > > I realize there are likely a number of characteristics of a SolrCloud
> to
> > > evaluate.  My guess is network resources will be the key factor.  I am
> > > thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble.  With
> > > Collections containing 10-30 million docs, small doc size, heavy
> > indexing,
> > > small query load.
> > >
> > > Thanks,
> > > Matt
> > >
> >
>
>
> --
>
>
> *Brian Lininger*
> Technical Architect, Infrastructure & Search
> *Veeva Systems *
> brian.lininger@veeva.com
>
> *Zoom:* https://veeva.zoom.us/j/8113896271
>
> www.veeva.com
>
>
> *This email and the information it contains are intended for the intended
> recipient only, are confidential and may be privileged information exempt
> from disclosure by law.*
> *If you have received this email in error, please notify us immediately by
> reply email and delete this message from your computer.*
> *Please do not retain, copy or distribute this email.*
>

Re: Number of Collections in a SolrCloud

Posted by David Smiley <ds...@apache.org>.
I second Brian's experience.  Specific version & numbers reached vary
somewhat.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jun 28, 2021 at 7:23 PM Brian Lininger <br...@veeva.com>
wrote:

> Hi Matt,
> We're currently running Solr 6.6.6 using Solr Cloud.  Depending on the
> application and load, we've been able to stably run upwards of 1,000
> collections without a problem in a single SolrCloud.  We try to keep the
> total replica count per Solr instance to less than 500, but have run
> 600-700 replicas per Solr instance without issue if the user load is
> light.  Our Solr document sizes are pretty large, but we're able to handle
> 80-90M docs per instance with 700-800G of total index size.  300B docs does
> seem quite large, but if the size of your docs aren't huge and you've got
> enough shards in your collection then I wouldn't be surprised if it worked
> fine.  The only thing we learned is that we had to change the number of
> threads Solr uses for loading replicas because of our high numbers.... 8
> threads would take forever upon startup (look at 'coreLoadThreads') .  At
> the very least, perf test out something on a similar scale of what you're
> thinking and see how it scales.
> Best of Luck,
> Brian
>
> On Mon, Jun 28, 2021 at 12:50 PM mtn search <se...@gmail.com> wrote:
>
> > I am guessing the consideration of hitting the limit of the number of
> > collections within a SolrCloud is not a common experience.  I wanted to
> > raise this question again if perhaps anyone has any lessons learned or
> > things to consider.  We are currently planning work to migrate 300
> billion
> > plus docs on the master nodes of a legacy master/slave installation to
> > SolrCloud.  I figure that we will push the limits of a single SolrCloud
> > instance.
> >
> > Thanks again,
> > Matt
> >
> > On Fri, Jun 25, 2021 at 10:15 AM mtn search <se...@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I am interested to learn what others have experienced in terms of
> hitting
> > > a limit for the number of collections supported by a SolrCloud
> instance.
> > >
> > > Also, does anyone have any tips/questions for evaluating when to
> create a
> > > new SolrCloud and begin adding new collections to it rather than grow
> the
> > > original SolrCloud instance?
> > >
> > > I realize there are likely a number of characteristics of a SolrCloud
> to
> > > evaluate.  My guess is network resources will be the key factor.  I am
> > > thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble.  With
> > > Collections containing 10-30 million docs, small doc size, heavy
> > indexing,
> > > small query load.
> > >
> > > Thanks,
> > > Matt
> > >
> >
>
>
> --
>
>
> *Brian Lininger*
> Technical Architect, Infrastructure & Search
> *Veeva Systems *
> brian.lininger@veeva.com
>
> *Zoom:* https://veeva.zoom.us/j/8113896271
>
> www.veeva.com
>
>
> *This email and the information it contains are intended for the intended
> recipient only, are confidential and may be privileged information exempt
> from disclosure by law.*
> *If you have received this email in error, please notify us immediately by
> reply email and delete this message from your computer.*
> *Please do not retain, copy or distribute this email.*
>

Re: Number of Collections in a SolrCloud

Posted by Brian Lininger <br...@veeva.com>.
Hi Matt,
We're currently running Solr 6.6.6 using Solr Cloud.  Depending on the
application and load, we've been able to stably run upwards of 1,000
collections without a problem in a single SolrCloud.  We try to keep the
total replica count per Solr instance to less than 500, but have run
600-700 replicas per Solr instance without issue if the user load is
light.  Our Solr document sizes are pretty large, but we're able to handle
80-90M docs per instance with 700-800G of total index size.  300B docs does
seem quite large, but if the size of your docs aren't huge and you've got
enough shards in your collection then I wouldn't be surprised if it worked
fine.  The only thing we learned is that we had to change the number of
threads Solr uses for loading replicas because of our high numbers.... 8
threads would take forever upon startup (look at 'coreLoadThreads') .  At
the very least, perf test out something on a similar scale of what you're
thinking and see how it scales.
Best of Luck,
Brian

On Mon, Jun 28, 2021 at 12:50 PM mtn search <se...@gmail.com> wrote:

> I am guessing the consideration of hitting the limit of the number of
> collections within a SolrCloud is not a common experience.  I wanted to
> raise this question again if perhaps anyone has any lessons learned or
> things to consider.  We are currently planning work to migrate 300 billion
> plus docs on the master nodes of a legacy master/slave installation to
> SolrCloud.  I figure that we will push the limits of a single SolrCloud
> instance.
>
> Thanks again,
> Matt
>
> On Fri, Jun 25, 2021 at 10:15 AM mtn search <se...@gmail.com> wrote:
>
> > Hello,
> >
> > I am interested to learn what others have experienced in terms of hitting
> > a limit for the number of collections supported by a SolrCloud instance.
> >
> > Also, does anyone have any tips/questions for evaluating when to create a
> > new SolrCloud and begin adding new collections to it rather than grow the
> > original SolrCloud instance?
> >
> > I realize there are likely a number of characteristics of a SolrCloud to
> > evaluate.  My guess is network resources will be the key factor.  I am
> > thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble.  With
> > Collections containing 10-30 million docs, small doc size, heavy
> indexing,
> > small query load.
> >
> > Thanks,
> > Matt
> >
>


-- 


*Brian Lininger*
Technical Architect, Infrastructure & Search
*Veeva Systems *
brian.lininger@veeva.com

*Zoom:* https://veeva.zoom.us/j/8113896271

www.veeva.com


*This email and the information it contains are intended for the intended
recipient only, are confidential and may be privileged information exempt
from disclosure by law.*
*If you have received this email in error, please notify us immediately by
reply email and delete this message from your computer.*
*Please do not retain, copy or distribute this email.*

Re: Number of Collections in a SolrCloud

Posted by mtn search <se...@gmail.com>.
I am guessing the consideration of hitting the limit of the number of
collections within a SolrCloud is not a common experience.  I wanted to
raise this question again if perhaps anyone has any lessons learned or
things to consider.  We are currently planning work to migrate 300 billion
plus docs on the master nodes of a legacy master/slave installation to
SolrCloud.  I figure that we will push the limits of a single SolrCloud
instance.

Thanks again,
Matt

On Fri, Jun 25, 2021 at 10:15 AM mtn search <se...@gmail.com> wrote:

> Hello,
>
> I am interested to learn what others have experienced in terms of hitting
> a limit for the number of collections supported by a SolrCloud instance.
>
> Also, does anyone have any tips/questions for evaluating when to create a
> new SolrCloud and begin adding new collections to it rather than grow the
> original SolrCloud instance?
>
> I realize there are likely a number of characteristics of a SolrCloud to
> evaluate.  My guess is network resources will be the key factor.  I am
> thinking of a SolrCloud with a 5, or 7 node Zookeeper ensemble.  With
> Collections containing 10-30 million docs, small doc size, heavy indexing,
> small query load.
>
> Thanks,
> Matt
>