You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Webster Homer <we...@sial.com> on 2017/08/08 18:35:48 UTC

How many collections in a solrcloud are too many, how to determine this?

We have a Solrcloud environments that have 4 solr nodes and a 3 node
Zookeeper ensemble. All of the collections are configured to have 2 shards
with 2 replicas. In this environment we have 14 different collections. Some
of these collections are hardly touched others have a fairly heavy search
and update load.
1 collection his near real time updates every minutes and constant
searches, but it is not very large
another has a fairly constant search load with updates of a few records
every 15 minutes. 6 collections are search heavy but update light (1 full
load per week with daily partials)

Updates to production cloud are via CDCR from an "authoring" cloud which
replicates to two production clouds.
We often see issues with replicas not being updated, and tlogs accumulating.

We have autoCommit and autoSoftCommit set on all our collections, and CDCR
logs disabled. We are running Solr 6.2

We also run into errors saying that "no live solr Servers available to
 service the request" but all nodes appear healthy. So I've been wondering
if we just have too many collections for the number of nodes.

Are there tell tale diagnostics that could determine if the servers are
over loaded?

Are there any guidelines for number of collections vs number of nodes in a
solrcloud?

We run our zookeepers via supervisord, and all of this is behind firewalls.
So the Zookeeper JMX interface is useless. How do we get good diagnostics
from Zookeeper? I know that sometimes problems go away when we restart the
Zookeepers and the solr nodes.

Thanks

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: How many collections in a solrcloud are too many, how to determine this?

Posted by Yago Riveiro <ya...@gmail.com>.
I have a cluster (12 nodes) with 664 collection, 12 shards each and replication factor 2

The main bottleneck will be the zookeeper, it’s too easy overflow the overseer queue when a node ejects due a GC pause. Other problem is that the time to restart a node will increase from seconds to minutes.

The tradeoff is not easy, depends of the number of machines, the volume of data, hardware and so on.

--

/Yago Riveiro

On 8 Aug 2017 20:27 +0100, Webster Homer <we...@sial.com>, wrote:
> Yes we do see replicas go into recovery.
>
> Most of our clouds are hosted in the google cloud. So flaky networks are
> probably not an issue, though firewalls to the clouds can be
>
> On Tue, Aug 8, 2017 at 2:14 PM, Erick Erickson <erickerickson@gmail.com
> wrote:
>
> > So in total you have 56 replicas, correct? This shouldn't be a
> > problem, we've seen many more replicas than that. Many many many.
> >
> > Do you ever see any replicas go into recovery? One common problem is
> > that GC exceeds the timeouts for, say, Zookeeper to contact nodes and
> > they'll cycle through recovery. Have you captured the GC logs and seen
> > if you have large stop-the-world GC pauses?
> >
> > In short, what you've described should be easily handled. My guess is
> > GC pauses, I/O contention and/or flaky networks....
> >
> > Best,
> > Erick
> >
> > On Tue, Aug 8, 2017 at 11:35 AM, Webster Homer <webster.homer@sial.com
> > wrote:
> > > We have a Solrcloud environments that have 4 solr nodes and a 3 node
> > > Zookeeper ensemble. All of the collections are configured to have 2
> > shards
> > > with 2 replicas. In this environment we have 14 different collections.
> > Some
> > > of these collections are hardly touched others have a fairly heavy search
> > > and update load.
> > > 1 collection his near real time updates every minutes and constant
> > > searches, but it is not very large
> > > another has a fairly constant search load with updates of a few records
> > > every 15 minutes. 6 collections are search heavy but update light (1 full
> > > load per week with daily partials)
> > >
> > > Updates to production cloud are via CDCR from an "authoring" cloud which
> > > replicates to two production clouds.
> > > We often see issues with replicas not being updated, and tlogs
> > accumulating.
> > >
> > > We have autoCommit and autoSoftCommit set on all our collections, and
> > CDCR
> > > logs disabled. We are running Solr 6.2
> > >
> > > We also run into errors saying that "no live solr Servers available to
> > > service the request" but all nodes appear healthy. So I've been
> > wondering
> > > if we just have too many collections for the number of nodes.
> > >
> > > Are there tell tale diagnostics that could determine if the servers are
> > > over loaded?
> > >
> > > Are there any guidelines for number of collections vs number of nodes in
> > a
> > > solrcloud?
> > >
> > > We run our zookeepers via supervisord, and all of this is behind
> > firewalls.
> > > So the Zookeeper JMX interface is useless. How do we get good diagnostics
> > > from Zookeeper? I know that sometimes problems go away when we restart
> > the
> > > Zookeepers and the solr nodes.
> > >
> > > Thanks
> > >
> > > --
> > >
> > >
> > > This message and any attachment are confidential and may be privileged or
> > > otherwise protected from disclosure. If you are not the intended
> > recipient,
> > > you must not copy this message or attachment or disclose the contents to
> > > any other person. If you have received this transmission in error, please
> > > notify the sender immediately and delete the message and any attachment
> > > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not accept liability for any omissions or errors in this
> > > message which may arise as a result of E-Mail-transmission or for damages
> > > resulting from any unauthorized changes of the content of this message
> > and
> > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > > subsidiaries do not guarantee that this message is free of viruses and
> > does
> > > not accept liability for any damages caused by any virus transmitted
> > > therewith.
> > >
> > > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > > Spanish and Portuguese versions of this disclaimer.
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Re: How many collections in a solrcloud are too many, how to determine this?

Posted by Webster Homer <we...@sial.com>.
Yes we do see replicas go into recovery.

Most of our clouds are hosted in the google cloud. So flaky networks are
probably not an issue, though firewalls to the clouds can be

On Tue, Aug 8, 2017 at 2:14 PM, Erick Erickson <er...@gmail.com>
wrote:

> So in total you have 56 replicas, correct? This shouldn't be a
> problem, we've seen many more replicas than that. Many many many.
>
> Do you ever see any replicas go into recovery? One common problem is
> that GC exceeds the timeouts for, say, Zookeeper to contact nodes and
> they'll cycle through recovery. Have you captured the GC logs and seen
> if you have large stop-the-world GC pauses?
>
> In short, what you've described should be easily handled. My guess is
> GC pauses, I/O contention and/or flaky networks....
>
> Best,
> Erick
>
> On Tue, Aug 8, 2017 at 11:35 AM, Webster Homer <we...@sial.com>
> wrote:
> > We have a Solrcloud environments that have 4 solr nodes and a 3 node
> > Zookeeper ensemble. All of the collections are configured to have 2
> shards
> > with 2 replicas. In this environment we have 14 different collections.
> Some
> > of these collections are hardly touched others have a fairly heavy search
> > and update load.
> > 1 collection his near real time updates every minutes and constant
> > searches, but it is not very large
> > another has a fairly constant search load with updates of a few records
> > every 15 minutes. 6 collections are search heavy but update light (1 full
> > load per week with daily partials)
> >
> > Updates to production cloud are via CDCR from an "authoring" cloud which
> > replicates to two production clouds.
> > We often see issues with replicas not being updated, and tlogs
> accumulating.
> >
> > We have autoCommit and autoSoftCommit set on all our collections, and
> CDCR
> > logs disabled. We are running Solr 6.2
> >
> > We also run into errors saying that "no live solr Servers available to
> >  service the request" but all nodes appear healthy. So I've been
> wondering
> > if we just have too many collections for the number of nodes.
> >
> > Are there tell tale diagnostics that could determine if the servers are
> > over loaded?
> >
> > Are there any guidelines for number of collections vs number of nodes in
> a
> > solrcloud?
> >
> > We run our zookeepers via supervisord, and all of this is behind
> firewalls.
> > So the Zookeeper JMX interface is useless. How do we get good diagnostics
> > from Zookeeper? I know that sometimes problems go away when we restart
> the
> > Zookeepers and the solr nodes.
> >
> > Thanks
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: How many collections in a solrcloud are too many, how to determine this?

Posted by Erick Erickson <er...@gmail.com>.
So in total you have 56 replicas, correct? This shouldn't be a
problem, we've seen many more replicas than that. Many many many.

Do you ever see any replicas go into recovery? One common problem is
that GC exceeds the timeouts for, say, Zookeeper to contact nodes and
they'll cycle through recovery. Have you captured the GC logs and seen
if you have large stop-the-world GC pauses?

In short, what you've described should be easily handled. My guess is
GC pauses, I/O contention and/or flaky networks....

Best,
Erick

On Tue, Aug 8, 2017 at 11:35 AM, Webster Homer <we...@sial.com> wrote:
> We have a Solrcloud environments that have 4 solr nodes and a 3 node
> Zookeeper ensemble. All of the collections are configured to have 2 shards
> with 2 replicas. In this environment we have 14 different collections. Some
> of these collections are hardly touched others have a fairly heavy search
> and update load.
> 1 collection his near real time updates every minutes and constant
> searches, but it is not very large
> another has a fairly constant search load with updates of a few records
> every 15 minutes. 6 collections are search heavy but update light (1 full
> load per week with daily partials)
>
> Updates to production cloud are via CDCR from an "authoring" cloud which
> replicates to two production clouds.
> We often see issues with replicas not being updated, and tlogs accumulating.
>
> We have autoCommit and autoSoftCommit set on all our collections, and CDCR
> logs disabled. We are running Solr 6.2
>
> We also run into errors saying that "no live solr Servers available to
>  service the request" but all nodes appear healthy. So I've been wondering
> if we just have too many collections for the number of nodes.
>
> Are there tell tale diagnostics that could determine if the servers are
> over loaded?
>
> Are there any guidelines for number of collections vs number of nodes in a
> solrcloud?
>
> We run our zookeepers via supervisord, and all of this is behind firewalls.
> So the Zookeeper JMX interface is useless. How do we get good diagnostics
> from Zookeeper? I know that sometimes problems go away when we restart the
> Zookeepers and the solr nodes.
>
> Thanks
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.

Re: How many collections in a solrcloud are too many, how to determine this?

Posted by Toke Eskildsen <to...@kb.dk>.
On Tue, 2017-08-08 at 13:35 -0500, Webster Homer wrote:
> We have a Solrcloud environments that have 4 solr nodes and a 3 node
> Zookeeper ensemble. All of the collections are configured to have 2
> shards with 2 replicas.

Quick sanity check: Why 2 shards/collection?

There is a non-trivial overhead going from 1 to more than 1 shard. If
your collections are not too large, chances are that you will lower
your hardware requirements (and/or improve response times) by using
only 1 shard/collection.

- Toke Eskildsen, Royal Danish Library