You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Hongxu Ma <in...@outlook.com> on 2019/08/29 03:27:12 UTC

Question: Solr perform well with thousands of replicas?

Hi
I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.

To solve this issue, I have read the performance guide:
https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems

I noted there is a sentence on solr-cloud section:
"Recent Solr versions perform well with thousands of replicas."

I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)

My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)

Thanks for you help.

Re: Question: Solr perform well with thousands of replicas?

Posted by Hongxu Ma <in...@outlook.com>.

Hi Erick
Thanks for your help.

Before I visit wiki/maillist, I knew solr is unstable in 1000+ collections, and should be safe in 10~100 collections.
But in a specific env, what's the exact number which solr begin to become unstable? I don't know.

So I try to deploy a test cluster to get the number and try to optimize it bigger. (save my cost)
That's my purpose: quantitative analysis --> How many replicas can be supported in my env?
After get it, I will adjust my application: (when it's near the max number) prevent the creation of too many indexes or give a warning message to user.

________________________________
From: Erick Erickson <er...@gmail.com>
Sent: Monday, September 2, 2019 21:20
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Question: Solr perform well with thousands of replicas?

> why so many collection/replica: it's our customer needs, for example: each database table mappings a collection.

I always cringe when I see statements like this. What this means is that your customer doesn’t understand search and needs guidance in the proper use of any search technology, Solr included.

Solr is _not_ an RDBMS. Simply mapping the DB tables onto collections will almost certainly result in a poor experience. Next the customer will want to ask Solr to do the same thing a DB does, i.e. run a join across 10 tables etc., which will be abysmal. Solr isn’t designed for that. Some brilliant RDBMS people have spent many years making DBs to what they do and do it well.

That said, RDBMSs have poor search capabilities, they aren’t built to solve the search problem.

I suspect the time you spend making Solr load a thousand cores will be wasted. Once you do get them loaded, performance will be horrible. IMO you’d be far better off helping the customer define their problem so they properly model their search problem. This may mean that the result will be a hybrid where Solr is used for the free-text search and the RDBMS uses the results of the search to do something. Or vice versa.

FWIW
Erick

> On Sep 2, 2019, at 5:55 AM, Hongxu Ma <in...@outlook.com> wrote:
>
> Thanks @Jörn and @Erick
> I enlarged my JVM memory, so far it's stable (but used many memory).
> And I will check lower-level errors according to your suggestion if error happens.
>
> About my scenario:
>
>  *   why so many collection/replica: it's our customer needs, for example: each database table mappings a collection.
>  *   this env is just a test cluster: I want to verify the max collection number solr can support stably.
>
>
> ________________________________
> From: Erick Erickson <er...@gmail.com>
> Sent: Friday, August 30, 2019 20:05
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Question: Solr perform well with thousands of replicas?
>
> “no registered leader” is the effect of some problem usually, not the root cause. In this case, for instance, you could be running out of file handles and see other errors like “too many open files”. That’s just one example.
>
> One common problem is that Solr needs a lot of file handles and the system defaults are too low. We usually recommend you start with 65K file handles (ulimit) and bump up the number of processes to 65K too.
>
> So to throw some numbers out. With 1,000 replicas, and let’s say you have 50 segments in the index in each replica. Each segment consists of multiple files (I’m skipping “compound files” here as an advanced topic), so each segment has, let’s say, 10 segments. 1,000 * 50 * 10 would require 500,000 file handles on your system.
>
> Bottom line: look for other, lower-level errors in the log to try to understand what limit you’re running into.
>
> All that said, there’ll be a number of “gotchas” when running that many replicas on a particular node, I second Jörn;’s question...
>
> Best,
> Erick
>
>> On Aug 30, 2019, at 3:18 AM, Jörn Franke <jo...@gmail.com> wrote:
>>
>> What is the reason for this number of replicas? Solr should work fine, but maybe it is worth to consolidate some collections to avoid also administrative overhead.
>>
>>> Am 29.08.2019 um 05:27 schrieb Hongxu Ma <in...@outlook.com>:
>>>
>>> Hi
>>> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
>>>
>>> To solve this issue, I have read the performance guide:
>>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>>>
>>> I noted there is a sentence on solr-cloud section:
>>> "Recent Solr versions perform well with thousands of replicas."
>>>
>>> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)
>>>
>>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>>>
>>> Thanks for you help.
>>>
>

Re: Question: Solr perform well with thousands of replicas?

Posted by Erick Erickson <er...@gmail.com>.

> why so many collection/replica: it's our customer needs, for example: each database table mappings a collection.

I always cringe when I see statements like this. What this means is that your customer doesn’t understand search and needs guidance in the proper use of any search technology, Solr included.

Solr is _not_ an RDBMS. Simply mapping the DB tables onto collections will almost certainly result in a poor experience. Next the customer will want to ask Solr to do the same thing a DB does, i.e. run a join across 10 tables etc., which will be abysmal. Solr isn’t designed for that. Some brilliant RDBMS people have spent many years making DBs to what they do and do it well. 

That said, RDBMSs have poor search capabilities, they aren’t built to solve the search problem.

I suspect the time you spend making Solr load a thousand cores will be wasted. Once you do get them loaded, performance will be horrible. IMO you’d be far better off helping the customer define their problem so they properly model their search problem. This may mean that the result will be a hybrid where Solr is used for the free-text search and the RDBMS uses the results of the search to do something. Or vice versa.

FWIW
Erick

> On Sep 2, 2019, at 5:55 AM, Hongxu Ma <in...@outlook.com> wrote:
> 
> Thanks @Jörn and @Erick
> I enlarged my JVM memory, so far it's stable (but used many memory).
> And I will check lower-level errors according to your suggestion if error happens.
> 
> About my scenario:
> 
>  *   why so many collection/replica: it's our customer needs, for example: each database table mappings a collection.
>  *   this env is just a test cluster: I want to verify the max collection number solr can support stably.
> 
> 
> ________________________________
> From: Erick Erickson <er...@gmail.com>
> Sent: Friday, August 30, 2019 20:05
> To: solr-user@lucene.apache.org <so...@lucene.apache.org>
> Subject: Re: Question: Solr perform well with thousands of replicas?
> 
> “no registered leader” is the effect of some problem usually, not the root cause. In this case, for instance, you could be running out of file handles and see other errors like “too many open files”. That’s just one example.
> 
> One common problem is that Solr needs a lot of file handles and the system defaults are too low. We usually recommend you start with 65K file handles (ulimit) and bump up the number of processes to 65K too.
> 
> So to throw some numbers out. With 1,000 replicas, and let’s say you have 50 segments in the index in each replica. Each segment consists of multiple files (I’m skipping “compound files” here as an advanced topic), so each segment has, let’s say, 10 segments. 1,000 * 50 * 10 would require 500,000 file handles on your system.
> 
> Bottom line: look for other, lower-level errors in the log to try to understand what limit you’re running into.
> 
> All that said, there’ll be a number of “gotchas” when running that many replicas on a particular node, I second Jörn;’s question...
> 
> Best,
> Erick
> 
>> On Aug 30, 2019, at 3:18 AM, Jörn Franke <jo...@gmail.com> wrote:
>> 
>> What is the reason for this number of replicas? Solr should work fine, but maybe it is worth to consolidate some collections to avoid also administrative overhead.
>> 
>>> Am 29.08.2019 um 05:27 schrieb Hongxu Ma <in...@outlook.com>:
>>> 
>>> Hi
>>> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
>>> 
>>> To solve this issue, I have read the performance guide:
>>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>>> 
>>> I noted there is a sentence on solr-cloud section:
>>> "Recent Solr versions perform well with thousands of replicas."
>>> 
>>> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)
>>> 
>>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>>> 
>>> Thanks for you help.
>>> 
>

Re: Question: Solr perform well with thousands of replicas?

Posted by Hongxu Ma <in...@outlook.com>.

Thanks @Jörn and @Erick
I enlarged my JVM memory, so far it's stable (but used many memory).
And I will check lower-level errors according to your suggestion if error happens.

About my scenario:

  *   why so many collection/replica: it's our customer needs, for example: each database table mappings a collection.
  *   this env is just a test cluster: I want to verify the max collection number solr can support stably.

________________________________
From: Erick Erickson <er...@gmail.com>
Sent: Friday, August 30, 2019 20:05
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Question: Solr perform well with thousands of replicas?

“no registered leader” is the effect of some problem usually, not the root cause. In this case, for instance, you could be running out of file handles and see other errors like “too many open files”. That’s just one example.

One common problem is that Solr needs a lot of file handles and the system defaults are too low. We usually recommend you start with 65K file handles (ulimit) and bump up the number of processes to 65K too.

So to throw some numbers out. With 1,000 replicas, and let’s say you have 50 segments in the index in each replica. Each segment consists of multiple files (I’m skipping “compound files” here as an advanced topic), so each segment has, let’s say, 10 segments. 1,000 * 50 * 10 would require 500,000 file handles on your system.

Bottom line: look for other, lower-level errors in the log to try to understand what limit you’re running into.

All that said, there’ll be a number of “gotchas” when running that many replicas on a particular node, I second Jörn;’s question...

Best,
Erick

> On Aug 30, 2019, at 3:18 AM, Jörn Franke <jo...@gmail.com> wrote:
>
> What is the reason for this number of replicas? Solr should work fine, but maybe it is worth to consolidate some collections to avoid also administrative overhead.
>
>> Am 29.08.2019 um 05:27 schrieb Hongxu Ma <in...@outlook.com>:
>>
>> Hi
>> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
>>
>> To solve this issue, I have read the performance guide:
>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>>
>> I noted there is a sentence on solr-cloud section:
>> "Recent Solr versions perform well with thousands of replicas."
>>
>> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)
>>
>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>>
>> Thanks for you help.
>>

Re: Question: Solr perform well with thousands of replicas?

Posted by Erick Erickson <er...@gmail.com>.

 “no registered leader” is the effect of some problem usually, not the root cause. In this case, for instance, you could be running out of file handles and see other errors like “too many open files”. That’s just one example.

One common problem is that Solr needs a lot of file handles and the system defaults are too low. We usually recommend you start with 65K file handles (ulimit) and bump up the number of processes to 65K too.

So to throw some numbers out. With 1,000 replicas, and let’s say you have 50 segments in the index in each replica. Each segment consists of multiple files (I’m skipping “compound files” here as an advanced topic), so each segment has, let’s say, 10 segments. 1,000 * 50 * 10 would require 500,000 file handles on your system.

Bottom line: look for other, lower-level errors in the log to try to understand what limit you’re running into.

All that said, there’ll be a number of “gotchas” when running that many replicas on a particular node, I second Jörn;’s question...

Best,
Erick

> On Aug 30, 2019, at 3:18 AM, Jörn Franke <jo...@gmail.com> wrote:
> 
> What is the reason for this number of replicas? Solr should work fine, but maybe it is worth to consolidate some collections to avoid also administrative overhead.
> 
>> Am 29.08.2019 um 05:27 schrieb Hongxu Ma <in...@outlook.com>:
>> 
>> Hi
>> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
>> 
>> To solve this issue, I have read the performance guide:
>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>> 
>> I noted there is a sentence on solr-cloud section:
>> "Recent Solr versions perform well with thousands of replicas."
>> 
>> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)
>> 
>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>> 
>> Thanks for you help.
>>

Re: Question: Solr perform well with thousands of replicas?

Posted by Jörn Franke <jo...@gmail.com>.

What is the reason for this number of replicas? Solr should work fine, but maybe it is worth to consolidate some collections to avoid also administrative overhead.

> Am 29.08.2019 um 05:27 schrieb Hongxu Ma <in...@outlook.com>:
> 
> Hi
> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
> 
> To solve this issue, I have read the performance guide:
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
> 
> I noted there is a sentence on solr-cloud section:
> "Recent Solr versions perform well with thousands of replicas."
> 
> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)
> 
> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
> 
> Thanks for you help.
>

Re: Question: Solr perform well with thousands of replicas?

Posted by Erick Erickson <er...@gmail.com>.

There are two factors:
1> the raw number of replicas on a Solr node.
2> total resources Solr needs.

You say “..it’s unstalble…”. _How_ is it unstable? What symptoms are you seeing?

You might want to review: https://cwiki.apache.org/confluence/display/solr/UsingMailingLists

And not as you add more cores, you put more pressure on memory, I/O, etc. So
whether it’s the raw number of cores or you’re just exhausting memory, overloading
your CPU, etc. is hard to say without more information.

Best,
Erick

> On Aug 29, 2019, at 1:31 AM, Hendrik Haddorp <he...@gmx.net> wrote:
> 
> Hi,
> 
> we are usually using Solr Clouds with 5 nodes and up to 2000 collections
> and a replication factor of 2. So we have close to 1000 cores per node.
> That is on Solr 7.6 but I believe 7.3 worked as well. We tuned a few
> caches down to a minimum as otherwise the memory usage goes up a lot.
> The Solr UI is having some problems with a high number of collections,
> like lots of timeouts when loading the status.
> 
> Older Solr versions had problem with the overseer queue in ZooKeeper. If
> you restarted too many nodes at once then the queue got too long and
> Solr died and required some help and cleanup to start at all again.
> 
> regards,
> Hendrik
> 
> On 29.08.19 05:27, Hongxu Ma wrote:
>> Hi
>> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
>> 
>> To solve this issue, I have read the performance guide:
>> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>> 
>> I noted there is a sentence on solr-cloud section:
>> "Recent Solr versions perform well with thousands of replicas."
>> 
>> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)
>> 
>> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>> 
>> Thanks for you help.
>> 
>> 
>

Re: Question: Solr perform well with thousands of replicas?

Posted by Hendrik Haddorp <he...@gmx.net>.

Hi,

we are usually using Solr Clouds with 5 nodes and up to 2000 collections
and a replication factor of 2. So we have close to 1000 cores per node.
That is on Solr 7.6 but I believe 7.3 worked as well. We tuned a few
caches down to a minimum as otherwise the memory usage goes up a lot.
The Solr UI is having some problems with a high number of collections,
like lots of timeouts when loading the status.

Older Solr versions had problem with the overseer queue in ZooKeeper. If
you restarted too many nodes at once then the queue got too long and
Solr died and required some help and cleanup to start at all again.

regards,
Hendrik

On 29.08.19 05:27, Hongxu Ma wrote:
> Hi
> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
>
> To solve this issue, I have read the performance guide:
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>
> I noted there is a sentence on solr-cloud section:
> "Recent Solr versions perform well with thousands of replicas."
>
> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)
>
> My solr version is 7.3.1 and 6.6.2 (looks they are the same in performance)
>
> Thanks for you help.
>
>

Re: Question: Solr perform well with thousands of replicas?

Posted by Hongxu Ma <in...@outlook.com>.

Hi guys
Thanks for your helpful help!

More details about my env.
Cluster:
A 4 GCP(google cloud) hosts cluster, each host: 16Core cpu, 60G mem, 2TB HDD.
I set up 2 solr nodes on each host and there are 1000+ replicas on each solr node.
(Sorry for forgetting this before: 2 solr node on each host, so there are 2000+ replicas on each host...)
zookeeper has 3 instances, reuse the solr hosts (using a separated disk).
Workload:
just index tens of millions of record (total size near 100GB) into dozens (near 100) of indexes, 30 concurrent, no search operation at the same time (I will do search test later).
Error:
"unstable" means there are many solr errors in log and the solr request is failed.
e.g. "No registered leader was found after waiting for 4000ms , collection ..."

@ Hendrik
after saw your reply, I noted my replicas num is too big, so I adjusted to: 720 replicas on each host (reduced shard num), then all my index requests are successful. (happy)
but I saw the JVM peak mem usage is 24GB (via solr web UI), it's too big to be risky in the future (my JMV xmx is 32GB).
so would you give me some guides to reduce the memory usage? (like you mentioned "tuned a few caches down to a minimum")

@ Erick
I gave details above, please check.

@ Shawn
thanks for your info, it's a bad news...
hope solr-cloud can handle more collections in future.

________________________________
From: Shawn Heisey <ap...@elyograg.org>
Sent: Thursday, August 29, 2019 21:58
To: solr-user@lucene.apache.org <so...@lucene.apache.org>
Subject: Re: Question: Solr perform well with thousands of replicas?

On 8/28/2019 9:27 PM, Hongxu Ma wrote:
> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
>
> To solve this issue, I have read the performance guide:
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
>
> I noted there is a sentence on solr-cloud section:
> "Recent Solr versions perform well with thousands of replicas."

The SolrPerformanceProblems wiki page is my work.  I only wrote that
sentence because other devs working in SolrCloud code told me that was
the case.  Based on things said by people (including your comments on
this thread), I think newer versions probably aren't any better, and
that sentence needs to be removed from the wiki page.

See this issue that I created a few years ago:

https://issues.apache.org/jira/browse/SOLR-7191

This issue was closed with a 6.3 fix version ... but nothing was
committed with a tag for the issue, so I have no idea why it was closed.
  I think the problems described there are still there in recent Solr
versions, and MIGHT be even worse than they were in 4.x and 5.x.

> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)

A single standalone Solr instance can handle lots of indexes, but Solr
startup is probably going to be slow.

No matter how many nodes there are, SolrCloud has problems with
thousands of collections or replicas due to issues with the overseer
queue getting enormous.  When I created SOLR-7191, I found that
restarting a node in a cloud with thousands of replicas (cores) can
result in a performance death spiral.

I haven't ever administered a production setup with thousands of
indexes, I've only done some single machine testing for the issue I
created.  I need to repeat it with 8.x and see what happens.  But I have
very little free time these days.

Thanks,
Shawn

Re: Question: Solr perform well with thousands of replicas?

Posted by Shawn Heisey <ap...@elyograg.org>.

On 8/28/2019 9:27 PM, Hongxu Ma wrote:
> I have a solr-cloud cluster, but it's unstable when collection number is big: 1000 replica/core per solr node.
> 
> To solve this issue, I have read the performance guide:
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
> 
> I noted there is a sentence on solr-cloud section:
> "Recent Solr versions perform well with thousands of replicas."

The SolrPerformanceProblems wiki page is my work.  I only wrote that 
sentence because other devs working in SolrCloud code told me that was 
the case.  Based on things said by people (including your comments on 
this thread), I think newer versions probably aren't any better, and 
that sentence needs to be removed from the wiki page.

See this issue that I created a few years ago:

https://issues.apache.org/jira/browse/SOLR-7191

This issue was closed with a 6.3 fix version ... but nothing was 
committed with a tag for the issue, so I have no idea why it was closed. 
  I think the problems described there are still there in recent Solr 
versions, and MIGHT be even worse than they were in 4.x and 5.x.

> I want to know does it mean a single solr node can handle thousands of replicas? or a solr cluster can (if so, what's the size of the cluster?)

A single standalone Solr instance can handle lots of indexes, but Solr 
startup is probably going to be slow.

No matter how many nodes there are, SolrCloud has problems with 
thousands of collections or replicas due to issues with the overseer 
queue getting enormous.  When I created SOLR-7191, I found that 
restarting a node in a cloud with thousands of replicas (cores) can 
result in a performance death spiral.

I haven't ever administered a production setup with thousands of 
indexes, I've only done some single machine testing for the issue I 
created.  I need to repeat it with 8.x and see what happens.  But I have 
very little free time these days.

Thanks,
Shawn