You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Jan Høydahl <ja...@cominvent.com> on 2022/08/05 11:50:10 UTC

Re: Autoscaling

Hi,

With mult tenants, scaling on the #tenants axis will be simply adding new collections to the cluster. That should be fairly simple with K8S and SolrOperator. First add N new nodes to your EKS cluster, then use --scale in your SolrOperator to add more PODs, which will then pop up as "empty" Solr nodes in the cluster. Finally, create the new collection(s) with desired number of shards/replicas, and let the new PlacementPlugins introduced in Solr 9 (https://solr.apache.org/guide/solr/latest/configuration-guide/replica-placement-plugins.html) take care of placing the new collection on the best PODs (typically the new empty ones).

Should a tenant start to see slowness due to too many docs per shard, you could then either migrate that collection to a new one with more shards, or build into your app's control plane a feature which would perform SPLITSHARD <https://solr.apache.org/guide/solr/latest/deployment-guide/shard-management.html#splitshard> + MOVEREPLICA <https://solr.apache.org/guide/solr/latest/deployment-guide/replica-management.html#movereplica> on that collection. Looks like MOVEREPLICA does not support automatically picking targetNode using placement logic, which would have made the operation much simpler.

Jan

> 18. jul. 2022 kl. 09:00 skrev Kaminski, Adi <Ad...@verint.com.INVALID>:
> 
> Shawn - thanks for your response !
> 
> 1M index was just an example. For instance, we are planning to have multiple customers on same SolrCloud cluster (each customer/tenant=collection) . Some customers may have 1-2M docs (small ones),
> some will have 3-12M docs (medium ones) and some will have 20-80M docs (large ones). If we migrate 100 such customers of different sizes, eventually we will end up with 1B+ docs in same SolrCloud cluster (depends on ratio of large vs medium vs small ones of course(.
> 
> The thing is that we cannot project the growth of each customer (Solr collection) other than relying on size/quota that the customer has with on-prem deployment before we migrate to cloud.
> And also, would like to prevent static tuning (#shards) and then manual operations management  (such as splits, rebalancing if supported, etc.) based on some rules/etc.
> 
> That's why we are asking whether some automatic capabilities exist in Solr to ease the maintenance work and simplify the tuning (we understand that some exist in Solr 8.11 but planned to be deprecated starting Solr 9.x)
> 
> Alternatively, if there are some other best practices to meet our use case, we'll be happy to hear some direction.
> 
> Thanks in advance,
> Adi
> 
> -----Original Message-----
> From: Shawn Heisey <ap...@elyograg.org>
> Sent: Monday, July 18, 2022 12:42 AM
> To: users@solr.apache.org
> Subject: Re: Autoscaling
> 
> On 7/17/22 11:25, Kaminski, Adi wrote:
>> For example, if we have 10 shards each 100k (1M total) documents size for best and optimized ingestion/query performance...adding more documents will make sense to have 11th shard, and reaching 1.1M total will make sense to add 12th one eventually.
> 
> One million total documents is actually a pretty small index, and as you were told in another reply, is not big enough in most situations to require sharding, unless your hardware has very little cpu/memory/storage.
> 
>> Is it reasonable to use some automation of collections API, splitting shards accordingly to some strategy (largest, oldest, etc.) ?
> 
> In a typical scenario, every shard will be approximately equal in size, and will contain documents of any age.  If you have a 10 shard index and you split one of the shards, then you will have 9 shards of relatively equal size and two shards that are each half the size of the other 9. To correctly redistribute the load, you would need to split ALL the shards, so you would end up with 20 shards, or some other multiple of 10, the starting point.
> 
> In my last reply, I mentioned the implicit router.  This is the router you would need to use if you want to organize your shards by something like date.  But then every single document you index must indicate what shard it will end up on -- there is no automatic routing.
> 
>> Aren't some out of the box capabilities in Solr Cloud search engine ? Or maybe some libraries/operators on top to simplify k8s deployments, but not only for queries and automatic PODs scaling but also automating data storage optimization (per volume, date, any other custom logic..).
> 
> I have no idea what you are asking here.
> 
> Thanks,
> Shawn
> 
> 
> 
> This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.