You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Zisis T." <zi...@runbox.com> on 2017/06/13 16:35:09 UTC

Multi tenant setup

I'm trying to setup a multi-tenant Solr cluster (v6.5.1) which must meet the
following requirements. The tenants are different customers with similar
type of data.

* Ability to query per client but also across all clients
* Don't want to hit all shards for all type of requests (per client, across
clients)
* Don't want to have everything under a single multi-sharded collection to
avoid a SPOF and maintenance headaches
(e.g. a schema change will force an all-client reindexing. single huge
backup/restore)
* Ability to semi-support different schemas.

Based on the above I ruled out the following setups
* Single multi-sharded collection for all clients and all its variations
(e.g. multiple clients in a singe shard)
* One collection per client

My preference lies in a setup like the following
* Create a limited # of collections
* Split the clients in the collections created above based on some criteria
(size, content-type)
* Client specific requests will be limited in a single collection
* Across clients requests will target a limited # of collections (using
&collection=col_1,col_2,col_3)

The approach above meets the requirements posted above but the issue that is
blocking me is the Distributed IDF not working properly across collections.
(Check comment#3, bullet#2 of
http://lucene.472066.n3.nabble.com/Distributed-IDF-in-inter-collections-distributed-queries-td4317519.html)

-> Do you see anything wrong with my assumptions/approach above? Are there
any alternatives besides having separate clusters for the search across
clients and the individual clients?
-> Is it safe to go with a single collection? If it is, I still need to
handle the possible different schemas per client somehow.
-> Is there a way to enforce local stats when quering a single collection
and use global stats only when querying across collections? (see link above)

Thanks

--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-tenant-setup-tp4340377.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi tenant setup

Posted by Susheel Kumar <su...@gmail.com>.

I'll suggest to raise a JIRA and link to
https://issues.apache.org/jira/browse/SOLR-7759 but before that see if
updating the settings in Solrcofig for statsCache as described works here
https://issues.apache.org/jira/browse/SOLR-1632

Thanks,
Susheel

On Tue, Jun 13, 2017 at 5:16 PM, Zisis T. <zi...@runbox.com> wrote:

> We are talking about fewer collections,so that won't be an issue.
>
> The problem comes when - using the proposed setup - I want to send a query
> across all those collections and get properly ranked results. Each
> collection has its own IDF etc, so the scores are not comparable. This
> means
> that most probably results from one collection will dominate the results.
>
> This led me to try the /DistributedIDF/ configuration but this did not work
> either due to the issues described in the link of the original post.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Multi-tenant-setup-tp4340377p4340421.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Multi tenant setup

Posted by "Zisis T." <zi...@runbox.com>.

We are talking about fewer collections,so that won't be an issue. 

The problem comes when - using the proposed setup - I want to send a query
across all those collections and get properly ranked results. Each
collection has its own IDF etc, so the scores are not comparable. This means
that most probably results from one collection will dominate the results. 

This led me to try the /DistributedIDF/ configuration but this did not work
either due to the issues described in the link of the original post. 



--
View this message in context: http://lucene.472066.n3.nabble.com/Multi-tenant-setup-tp4340377p4340421.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi tenant setup

Posted by Susheel Kumar <su...@gmail.com>.

Going with single cluster having multiple collections (for each client) is
what I would try.  How many clients do you have? If 10K, mean 10K
collections and then how many documents, their size etc. you will need to
come up with to nail down #machines and their memory/cpu requirements.
Going with single collection is not really a multi-tenant setup and also
when you have different schema's.

Thanks,
Susheel


On Tue, Jun 13, 2017 at 12:35 PM, Zisis T. <zi...@runbox.com> wrote:

> I'm trying to setup a multi-tenant Solr cluster (v6.5.1) which must meet
> the
> following requirements. The tenants are different customers with similar
> type of data.
>
> * Ability to query per client but also across all clients
> * Don't want to hit all shards for all type of requests (per client, across
> clients)
> * Don't want to have everything under a single multi-sharded collection to
> avoid a SPOF and maintenance headaches
>    (e.g. a schema change will force an all-client reindexing. single huge
> backup/restore)
> * Ability to semi-support different schemas.
>
> Based on the above I ruled out the following setups
> * Single multi-sharded collection for all clients and all its variations
> (e.g. multiple clients in a singe shard)
> * One collection per client
>
> My preference lies in a setup like the following
> * Create a limited # of collections
> * Split the clients in the collections created above based on some criteria
> (size, content-type)
> * Client specific requests will be limited in a single collection
> * Across clients requests will target a limited # of collections (using
> &collection=col_1,col_2,col_3)
>
> The approach above meets the requirements posted above but the issue that
> is
> blocking me is the Distributed IDF not working properly across collections.
> (Check comment#3, bullet#2 of
> http://lucene.472066.n3.nabble.com/Distributed-IDF-in-
> inter-collections-distributed-queries-td4317519.html)
>
>
> -> Do you see anything wrong with my assumptions/approach above? Are there
> any alternatives besides having separate clusters for the search across
> clients and the individual clients?
> -> Is it safe to go with a single collection? If it is, I still need to
> handle the possible different schemas per client somehow.
> -> Is there a way to enforce local stats when quering a single collection
> and use global stats only when querying across collections? (see link
> above)
>
> Thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Multi-tenant-setup-tp4340377.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>