You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Wei <we...@gmail.com> on 2023/01/09 21:29:59 UTC

Maximum number of shards and nodes in solr cloud

Hi team,

Is there a practical limit on the number of shards and nodes in a solr
cloud? We need to scale up the solr cloud and wonder if there is concern
when increasing to a couple of hundred shards and several thousand nodes in
a single cloud.  Any suggestions?

Thanks,
Wei

Re: Maximum number of shards and nodes in solr cloud

Posted by Walter Underwood <wu...@wunderwood.org>.
> On Jan 9, 2023, at 1:29 PM, Wei <we...@gmail.com> wrote:
> 
> Is there a practical limit on the number of shards and nodes in a solr
> cloud? We need to scale up the solr cloud and wonder if there is concern
> when increasing to a couple of hundred shards and several thousand nodes in
> a single cloud.  Any suggestions?

It was challenging to manage with 8 shards and a replication factor of 8. At that point, we scaled vertically to bigger AWS instances. It scaled smoothly up to 72 CPU instances.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


Re: Maximum number of shards and nodes in solr cloud

Posted by Wei <we...@gmail.com>.
Thanks Justin, it's really helpful.  Are the challenges for state.json
update caused by number of replicas or number of shards?  Is the Per
Replica State feature is available in Solr 8.8 and later?

Regards,
Wei



On Tue, Jan 10, 2023 at 12:02 PM Justin Sweeney <ju...@gmail.com>
wrote:

> We are running clusters with 100s of nodes in a single Solr Cloud with many
> collections, some of which are up to 2,048 shards. Generally the challenges
> that we have seen have been specifically related to handling updates to
> state.json particularly on restarts of nodes in the cluster since a node
> reboot results in a state change of the replica in state.json. For a large
> number of shards, this can result in frequent reading/writing of a large
> state.json file which can impact how quickly nodes are able to restart.
> Also, the state.json itself can exceed the default limit for a file size in
> Zookeeper of 1MB. This default limit can be configured in Zookeeper, but
> something to be aware of. I'm working on upstreaming a PR to allow
> compression of state.json in ZK which would help a bit here:
> https://github.com/apache/solr/pull/1267.
>
> Also, at FullStory working with Ishan/Noble, we developed an alternative
> means of managing replica up/down state called Per Replica State which you
> can read more about here: https://searchscale.com/blog/prs/. This has been
> effective for us when operating at large scale since it limits how much we
> have to read/write state.json.
>
>
>
> On Tue, Jan 10, 2023 at 12:21 PM Wei <we...@gmail.com> wrote:
>
> > Thanks Walter. We have 1-to-1 mapping,  each physical server hosts a
> single
> > solr core so that CPU resource per node is sufficient. Because of the
> > constant index size growth and each shard can only hold X million
> > documents, and we don't want shards to reach their maximum capacity,
> > therefore the number of shards can grow into several hundreds.  My
> question
> > is from solr cloud/Zookeeper perspective,  is there a hard limit or
> > performance impact when number of shards(and correspondingly number of
> > nodes) reachs a certain threshold?  Another option I see is to split the
> > data into multiple separate small solr clouds,  but then we have to
> handle
> > the aggregation of results from different clouds outside of solr, which
> has
> > challenges like how to compare respective solr scores and handle sorting
> > etc.
> >
> > Thanks,
> > Wei
> >
> > On Mon, Jan 9, 2023 at 7:39 PM Walter Underwood <wu...@wunderwood.org>
> > wrote:
> >
> > > > On Jan 9, 2023, at 1:29 PM, Wei <we...@gmail.com> wrote:
> > > >
> > > > Is there a practical limit on the number of shards and nodes in a
> solr
> > > > cloud? We need to scale up the solr cloud and wonder if there is
> > concern
> > > > when increasing to a couple of hundred shards and several thousand
> > nodes
> > > in
> > > > a single cloud.  Any suggestions?
> > >
> > > It was challenging to manage with 8 shards and a replication factor of
> 8.
> > > At that point, we scaled vertically to bigger AWS instances. It scaled
> > > smoothly up to 72 CPU instances.
> > >
> > > wunder
> > > Walter Underwood
> > > wunder@wunderwood.org
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
> > >
> >
>

Re: Maximum number of shards and nodes in solr cloud

Posted by Shawn Heisey <ap...@elyograg.org>.
On 1/10/2023 1:02 PM, Justin Sweeney wrote:
> I'm working on upstreaming a PR to allow
> compression of state.json in ZK which would help a bit here:
> https://github.com/apache/solr/pull/1267.

Compression is an interesting idea.  I'm not sure it would help with the 
problems I found here:

https://issues.apache.org/jira/browse/SOLR-7191

With that, every state.json was pretty small ... but there were a LOT of 
them.  The bottleneck in that situation seems to be the overseer.  It 
takes a nontrivial amount of time for each entry in the overseer queue 
to process.  Restarting a Solr node with thousands of collections causes 
a HUGE number of updates to go into the overseer queue.  If the overseer 
could do its job a lot faster, that problem might go away.

Thanks,
Shawn

Re: Maximum number of shards and nodes in solr cloud

Posted by Justin Sweeney <ju...@gmail.com>.
We are running clusters with 100s of nodes in a single Solr Cloud with many
collections, some of which are up to 2,048 shards. Generally the challenges
that we have seen have been specifically related to handling updates to
state.json particularly on restarts of nodes in the cluster since a node
reboot results in a state change of the replica in state.json. For a large
number of shards, this can result in frequent reading/writing of a large
state.json file which can impact how quickly nodes are able to restart.
Also, the state.json itself can exceed the default limit for a file size in
Zookeeper of 1MB. This default limit can be configured in Zookeeper, but
something to be aware of. I'm working on upstreaming a PR to allow
compression of state.json in ZK which would help a bit here:
https://github.com/apache/solr/pull/1267.

Also, at FullStory working with Ishan/Noble, we developed an alternative
means of managing replica up/down state called Per Replica State which you
can read more about here: https://searchscale.com/blog/prs/. This has been
effective for us when operating at large scale since it limits how much we
have to read/write state.json.



On Tue, Jan 10, 2023 at 12:21 PM Wei <we...@gmail.com> wrote:

> Thanks Walter. We have 1-to-1 mapping,  each physical server hosts a single
> solr core so that CPU resource per node is sufficient. Because of the
> constant index size growth and each shard can only hold X million
> documents, and we don't want shards to reach their maximum capacity,
> therefore the number of shards can grow into several hundreds.  My question
> is from solr cloud/Zookeeper perspective,  is there a hard limit or
> performance impact when number of shards(and correspondingly number of
> nodes) reachs a certain threshold?  Another option I see is to split the
> data into multiple separate small solr clouds,  but then we have to handle
> the aggregation of results from different clouds outside of solr, which has
> challenges like how to compare respective solr scores and handle sorting
> etc.
>
> Thanks,
> Wei
>
> On Mon, Jan 9, 2023 at 7:39 PM Walter Underwood <wu...@wunderwood.org>
> wrote:
>
> > > On Jan 9, 2023, at 1:29 PM, Wei <we...@gmail.com> wrote:
> > >
> > > Is there a practical limit on the number of shards and nodes in a solr
> > > cloud? We need to scale up the solr cloud and wonder if there is
> concern
> > > when increasing to a couple of hundred shards and several thousand
> nodes
> > in
> > > a single cloud.  Any suggestions?
> >
> > It was challenging to manage with 8 shards and a replication factor of 8.
> > At that point, we scaled vertically to bigger AWS instances. It scaled
> > smoothly up to 72 CPU instances.
> >
> > wunder
> > Walter Underwood
> > wunder@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >
>

Re: Maximum number of shards and nodes in solr cloud

Posted by Wei <we...@gmail.com>.
Thanks Walter. We have 1-to-1 mapping,  each physical server hosts a single
solr core so that CPU resource per node is sufficient. Because of the
constant index size growth and each shard can only hold X million
documents, and we don't want shards to reach their maximum capacity,
therefore the number of shards can grow into several hundreds.  My question
is from solr cloud/Zookeeper perspective,  is there a hard limit or
performance impact when number of shards(and correspondingly number of
nodes) reachs a certain threshold?  Another option I see is to split the
data into multiple separate small solr clouds,  but then we have to handle
the aggregation of results from different clouds outside of solr, which has
challenges like how to compare respective solr scores and handle sorting
etc.

Thanks,
Wei

On Mon, Jan 9, 2023 at 7:39 PM Walter Underwood <wu...@wunderwood.org>
wrote:

> > On Jan 9, 2023, at 1:29 PM, Wei <we...@gmail.com> wrote:
> >
> > Is there a practical limit on the number of shards and nodes in a solr
> > cloud? We need to scale up the solr cloud and wonder if there is concern
> > when increasing to a couple of hundred shards and several thousand nodes
> in
> > a single cloud.  Any suggestions?
>
> It was challenging to manage with 8 shards and a replication factor of 8.
> At that point, we scaled vertically to bigger AWS instances. It scaled
> smoothly up to 72 CPU instances.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>

Re: Maximum number of shards and nodes in solr cloud

Posted by Walter Underwood <wu...@wunderwood.org>.
> On Jan 9, 2023, at 1:29 PM, Wei <we...@gmail.com> wrote:
> 
> Is there a practical limit on the number of shards and nodes in a solr
> cloud? We need to scale up the solr cloud and wonder if there is concern
> when increasing to a couple of hundred shards and several thousand nodes in
> a single cloud.  Any suggestions?

It was challenging to manage with 8 shards and a replication factor of 8. At that point, we scaled vertically to bigger AWS instances. It scaled smoothly up to 72 CPU instances.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)