You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chamil Jeewantha <kd...@gmail.com> on 2016/09/06 09:28:24 UTC

Re: Solr for Multi Tenant architecture

Dear all,

Thank you for all your advices.

This comment says:

"SolrCloud starts to have serious problems when you create a lot of
collections.
We are aware of the scalability issues, but they are not easy to fix."

http://lucene.472066.n3.nabble.com/Fwd-Solr-Cloud-6-0-0-hangs-when-creating-large-amount-of-collections-and-node-fails-to-recover-aftert-tp4276364p4276404.html

So I am doubt whether it will affect when our system goes beyond thousands
of tenants..

One way I feel is adding a custom load balancing mechanism which will route
tenants to different solr clusters. Any easy way of dealing with this
situation?

Best Regards,
Chamil

On Wed, Aug 31, 2016 at 1:42 PM, Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> HI Chamil,
>
> One thing to consider is relevancy, especially in case tenants' domains
> are different (e.g. one is tech and other pharmacy). If you go with one
> collection and use same field (e.g. desc) for all tenants, you will get one
> field stats and could skew results ordering if you order by score (e.g.
> word 'cream' might be infrequent in tech tenant but could become frequent
> overall because of large pharmacy tenant).
>
> On the other side having large number of collection could also be
> problematic. You can address that issue with splitting tenants to multiple
> clusters, or having collections for large tenants and grouping smaller
> tenants by domain.
>
> Make sure that you use routing by tenant id in case of multi tenant
> collection.
>
> HTH,
> Emir
>
>
>
> On 28.08.2016 07:02, Chamil Jeewantha wrote:
>
>> Thank you everyone for your great support.
>>
>> I will update you with our final approach.
>>
>> Best regards,
>> Chamil
>>
>> On Aug 28, 2016 01:34, "John Bickerstaff" <jo...@johnbickerstaff.com>
>> wrote:
>>
>> In my own work, the risk to the business if every single client cannot
>>> access search is so great, we would never consider putting everything in
>>> one.  You should certainly ask that question of the business stakeholders
>>> before you decide.
>>>
>>> For that reason, I might recommend that each of the multiple collections
>>> suggested above by Erick could also be on a separate SolrCloud (or single
>>> Solr instance) so that no single failure can ever take down every
>>> tenant's
>>> ability to search -- only those on that particular SolrCloud...
>>>
>>> On Sat, Aug 27, 2016 at 10:36 AM, Erick Erickson <
>>> erickerickson@gmail.com>
>>> wrote:
>>>
>>> There's no one right answer here. I've also seen a hybrid approach
>>>> where there are multiple collections each of which has some
>>>> number of tenants resident. Eventually, you need to think of some
>>>> kind of partitioning, my rough number of documents for a single core
>>>> is 50M (NOTE: I've seen between 10M and 300M docs fit in a core).
>>>>
>>>> All that said, you may also be interested in the "transient cores"
>>>> option, see: https://cwiki.apache.org/confluence/display/solr/
>>>> Defining+core.properties
>>>> and the transient and transientCacheSize (this latter in solr.xml). Note
>>>> that this is stand-alone only so you can't move that concept to
>>>> SolrCloud if you eventually go there.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Fri, Aug 26, 2016 at 12:13 PM, Chamil Jeewantha <kd...@gmail.com>
>>>> wrote:
>>>>
>>>>> Dear Solr Members,
>>>>>
>>>>> We are using SolrCloud as the search provider of a multi-tenant cloud
>>>>>
>>>> based
>>>>
>>>>> application. We have one schema for all the tenants. The indexes will
>>>>>
>>>> have
>>>>
>>>>> large number(millions) of documents.
>>>>>
>>>>> As of our research, we have two options,
>>>>>
>>>>>     - One large collection for all the tenants and use Composite-ID
>>>>>
>>>> routing
>>>>
>>>>>     - Collection per tenant
>>>>>
>>>>> The below mail says,
>>>>>
>>>>>
>>>>> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
>>>>>
>>>> 201403.mbox/%3C5324CD4B.2020309@protulae.com%3E
>>>>
>>>>> SolrCloud is *more scalable in terms of index size*. Plus you get
>>>>> redundancy which can't be underestimated in a hosted solution.
>>>>>
>>>>>
>>>>> AND
>>>>>
>>>>> The issue is management. 1000s of cores/collections require a level of
>>>>> automation. On the other hand, having a single core/collection means if
>>>>> you make one change to the schema or solrconfig, it affects everyone.
>>>>>
>>>>>
>>>>> Based on the above facts we think One large collection will be the way
>>>>>
>>>> to
>>>
>>>> go.
>>>>>
>>>>> Questions:
>>>>>
>>>>>     1. Is that the right way to go?
>>>>>     2. Will it be a hassle when we need to do reindexing?
>>>>>     3. What is the chance of entire collection crash? (in that case all
>>>>>     tenants will be affected and reindexing will be painful.
>>>>>
>>>>> Thank you in advance for your kind opinion.
>>>>>
>>>>> Best Regards,
>>>>> Chamil
>>>>>
>>>>> --
>>>>> http://kavimalla.blgospot.com
>>>>> http://kdchamil.blogspot.com
>>>>>
>>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


-- 
http://kavimalla.blgospot.com
http://kdchamil.blogspot.com