You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sharif Shahrair <sh...@m2sys.com> on 2018/06/27 11:10:37 UTC

Maximum number of SolrCloud collections in limited hardware resource

Hi Guys,
We are in a use-case where we need to create a large number of
collections(1000 to 1500) in a SolrCloud. Here most of collections will
have a very limited number of documents(100 to 1000), even some collections
are empty. We are using single shard and 2 replicas.For each replica we
using a machine with 12GB  RAM , 32 GB SSD.

Now the problem is, when we create about 1400 collection(all of them are
empty i.e. no document is added yet) the solr service goes down showing out
of memory exception. We have few questions here-

1. When we are creating collections, each collection is taking about 8 MB
to 12 MB of memory when there is no document yet. Is there any way to
configure SolrCloud in a way that it takes low memory for each collection
initially(like 1MB for each collection), then we would be able to create
1500 collection using about 3GB of machines RAM?

2. Is there any way to clear/flush the cache of SolrCloud, specially from
those collections which we don't access for while(May be we can take those
inactive collections out of memory and load them back when they are needed
again)?

3. Is there any way to collect the Garbage Memory from SolrCloud(may be
created by deleting documents and collections) ?

Our target is without increasing the hardware resources, create maximum
number of collections, and keeping the highly accessed collections &
documents in memory. We'll appreciate your help.





Best Regards,
*Sharif Shahriar Ahmed*

Re: Maximum number of SolrCloud collections in limited hardware resource

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Does it need to be a SolrCloud? If it is just replication, maybe it can
just be double indexed from the client. Or old style replication. And then
use LotsOfCores autoloading.

Regards,
    Alex

On Wed, Jun 27, 2018, 8:46 AM Shawn Heisey, <el...@elyograg.org> wrote:

> On 6/27/2018 5:10 AM, Sharif Shahrair wrote:
> > Now the problem is, when we create about 1400 collection(all of them are
> > empty i.e. no document is added yet) the solr service goes down showing
> out
> > of memory exception. We have few questions here-
> >
> > 1. When we are creating collections, each collection is taking about 8 MB
> > to 12 MB of memory when there is no document yet. Is there any way to
> > configure SolrCloud in a way that it takes low memory for each collection
> > initially(like 1MB for each collection), then we would be able to create
> > 1500 collection using about 3GB of machines RAM?
>
> Solr doesn't dictate how much memory it allocates for a collection.  It
> allocates what it needs, and if the heap size is too small for that,
> then you get OOME.
>
> You're going to need a lot more than two Solr servers to handle that
> many collections, and they're going to need more than 12GB of memory.
> You should already have at least three servers in your setup, because
> ZooKeeper requires three servers for redundancy.
>
>
> http://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_zkMulitServerSetup
>
> Handling a large number of collections is one area where SolrCloud needs
> improvement.  Work is constantly happening towards this goal, but it's a
> very complex piece of software, so making design changes is not trivial.
>
> > 2. Is there any way to clear/flush the cache of SolrCloud, specially from
> > those collections which we don't access for while(May be we can take
> those
> > inactive collections out of memory and load them back when they are
> needed
> > again)?
>
> Unfortunately the functionality that allows index cores to be unloaded
> (which we have colloquially called "LotsOfCores") does not work when
> Solr is running in SolrCloud mode.SolrCloud functionality would break if
> its cores get unloaded.  It would take a fair amount of development
> effort to allow the two features to work together.
>
> > 3. Is there any way to collect the Garbage Memory from SolrCloud(may be
> > created by deleting documents and collections) ?
>
> Java handles garbage collection automatically.  It's possible to
> explicitly ask the system to collect garbage, but any good programming
> guide for Java will recommend that programmers should NOT explicitly
> trigger GC.  While it might be possible for Solr's memory usage to
> become more efficient through development effort, it's already pretty
> good.  To our knowledge, Solr does not currently have any memory leak
> bugs, and if any are found, they are taken seriously and fixed as fast
> as we can fix them.
>
> > Our target is without increasing the hardware resources, create maximum
> > number of collections, and keeping the highly accessed collections &
> > documents in memory. We'll appreciate your help.
>
> That goal will require a fair amount of hardware.  You may have no
> choice but to increase your hardware resources.
>
> Thanks,
> Shawn
>
>

Re: Maximum number of SolrCloud collections in limited hardware resource

Posted by Sharif Shahriar <sh...@m2sys.com>.
Thanks a lot Shawn for your details reply.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Maximum number of SolrCloud collections in limited hardware resource

Posted by Sharif Shahriar <sh...@m2sys.com>.
Hi Erick,
Setting the size parameter to 0 in solrconfig.xml can stop document caching,
but it cannot control how much memory it will take initially when creating a
collection, right?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Maximum number of SolrCloud collections in limited hardware resource

Posted by Erick Erickson <er...@gmail.com>.
Just set the size parameter in solrconfig.xml to 0.

Best,
Erick

On Wed, Jul 4, 2018 at 10:37 PM, Sharif Shahriar <sh...@m2sys.com> wrote:
> Hi Emir,
> Thanks a lot for your reply. In your reply you've mentioned-
> If you stick with multiple collections, you can turn off caches completely,
> monitor latency and turn on caches for collections when it is reaching some
> threshold.
>
> -How this can be done? Is there any configuration to turn off caches
> completely in SolrCloud?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Maximum number of SolrCloud collections in limited hardware resource

Posted by Sharif Shahriar <sh...@m2sys.com>.
Hi Emir,
Thanks a lot for your reply. In your reply you've mentioned-
If you stick with multiple collections, you can turn off caches completely,
monitor latency and turn on caches for collections when it is reaching some
threshold. 

-How this can be done? Is there any configuration to turn off caches
completely in SolrCloud? 




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Maximum number of SolrCloud collections in limited hardware resource

Posted by Emir Arnautović <em...@sematext.com>.
Hi,
It is probably the best if you merge some of your collections (or all) and have discriminator field that will be used to filter out tenant’s documents only. In case you go with multiple collections serving multiple tenants, you would have to have logic on top of it to resolve tenant to collection. Unfortunately, Solr does not have alias with filtering like ES that would come handy in such cases.
If you stick with multiple collections, you can turn off caches completely, monitor latency and turn on caches for collections when it is reaching some threshold.
Caches are invalidated on commit, so submitting dummy doc and committing should invalidate caches. Alternative is to reload collection.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Jun 2018, at 14:46, Shawn Heisey <el...@elyograg.org> wrote:
> 
> On 6/27/2018 5:10 AM, Sharif Shahrair wrote:
>> Now the problem is, when we create about 1400 collection(all of them are
>> empty i.e. no document is added yet) the solr service goes down showing out
>> of memory exception. We have few questions here-
>> 
>> 1. When we are creating collections, each collection is taking about 8 MB
>> to 12 MB of memory when there is no document yet. Is there any way to
>> configure SolrCloud in a way that it takes low memory for each collection
>> initially(like 1MB for each collection), then we would be able to create
>> 1500 collection using about 3GB of machines RAM?
> 
> Solr doesn't dictate how much memory it allocates for a collection.  It allocates what it needs, and if the heap size is too small for that, then you get OOME.
> 
> You're going to need a lot more than two Solr servers to handle that many collections, and they're going to need more than 12GB of memory.  You should already have at least three servers in your setup, because ZooKeeper requires three servers for redundancy.
> 
> http://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_zkMulitServerSetup
> 
> Handling a large number of collections is one area where SolrCloud needs improvement.  Work is constantly happening towards this goal, but it's a very complex piece of software, so making design changes is not trivial.
> 
>> 2. Is there any way to clear/flush the cache of SolrCloud, specially from
>> those collections which we don't access for while(May be we can take those
>> inactive collections out of memory and load them back when they are needed
>> again)?
> 
> Unfortunately the functionality that allows index cores to be unloaded (which we have colloquially called "LotsOfCores") does not work when Solr is running in SolrCloud mode.SolrCloud functionality would break if its cores get unloaded.  It would take a fair amount of development effort to allow the two features to work together.
> 
>> 3. Is there any way to collect the Garbage Memory from SolrCloud(may be
>> created by deleting documents and collections) ?
> 
> Java handles garbage collection automatically.  It's possible to explicitly ask the system to collect garbage, but any good programming guide for Java will recommend that programmers should NOT explicitly trigger GC.  While it might be possible for Solr's memory usage to become more efficient through development effort, it's already pretty good.  To our knowledge, Solr does not currently have any memory leak bugs, and if any are found, they are taken seriously and fixed as fast as we can fix them.
> 
>> Our target is without increasing the hardware resources, create maximum
>> number of collections, and keeping the highly accessed collections &
>> documents in memory. We'll appreciate your help.
> 
> That goal will require a fair amount of hardware.  You may have no choice but to increase your hardware resources.
> 
> Thanks,
> Shawn
> 


Re: Maximum number of SolrCloud collections in limited hardware resource

Posted by Shawn Heisey <el...@elyograg.org>.
On 6/27/2018 5:10 AM, Sharif Shahrair wrote:
> Now the problem is, when we create about 1400 collection(all of them are
> empty i.e. no document is added yet) the solr service goes down showing out
> of memory exception. We have few questions here-
>
> 1. When we are creating collections, each collection is taking about 8 MB
> to 12 MB of memory when there is no document yet. Is there any way to
> configure SolrCloud in a way that it takes low memory for each collection
> initially(like 1MB for each collection), then we would be able to create
> 1500 collection using about 3GB of machines RAM?

Solr doesn't dictate how much memory it allocates for a collection.  It 
allocates what it needs, and if the heap size is too small for that, 
then you get OOME.

You're going to need a lot more than two Solr servers to handle that 
many collections, and they're going to need more than 12GB of memory.  
You should already have at least three servers in your setup, because 
ZooKeeper requires three servers for redundancy.

http://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_zkMulitServerSetup

Handling a large number of collections is one area where SolrCloud needs 
improvement.  Work is constantly happening towards this goal, but it's a 
very complex piece of software, so making design changes is not trivial.

> 2. Is there any way to clear/flush the cache of SolrCloud, specially from
> those collections which we don't access for while(May be we can take those
> inactive collections out of memory and load them back when they are needed
> again)?

Unfortunately the functionality that allows index cores to be unloaded 
(which we have colloquially called "LotsOfCores") does not work when 
Solr is running in SolrCloud mode.SolrCloud functionality would break if 
its cores get unloaded.  It would take a fair amount of development 
effort to allow the two features to work together.

> 3. Is there any way to collect the Garbage Memory from SolrCloud(may be
> created by deleting documents and collections) ?

Java handles garbage collection automatically.  It's possible to 
explicitly ask the system to collect garbage, but any good programming 
guide for Java will recommend that programmers should NOT explicitly 
trigger GC.  While it might be possible for Solr's memory usage to 
become more efficient through development effort, it's already pretty 
good.  To our knowledge, Solr does not currently have any memory leak 
bugs, and if any are found, they are taken seriously and fixed as fast 
as we can fix them.

> Our target is without increasing the hardware resources, create maximum
> number of collections, and keeping the highly accessed collections &
> documents in memory. We'll appreciate your help.

That goal will require a fair amount of hardware.  You may have no 
choice but to increase your hardware resources.

Thanks,
Shawn