You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Georg Sorst <ge...@gmail.com> on 2016/09/20 18:48:25 UTC

How to limit resources in multi-tenant systems

Hi list!

I am running a multi-tenant system where the tenants can upload and import
their own data into their respective cores. Fortunately, Solr makes it easy
to make sure that the search indices don't mix and that clients can only
access their "cores".

However, isolating the resource consumption seems a little trickier. Of
course it's fairly easy to limit the number of documents and queries per
second for each tenant, but what if they add a few GBs of text to their
documents? What if they use millions of different filter values? This may
quickly fill up the VM heap and negatively impact the other tenants (I'm
totally fine if the search for that one tenant goes down).

Of course I can check their input data and apply a seemingly endless number
of limits for all kinds of cases but that smells. Is there a more general
solution to limit resource consumption per core? Something along the lines
of "each core may use up to 5% of the heap".

One suggestion I found on the mailing list was to run a separate Solr
instance for each tenant. While this is certainly possible there is a
significant administrative and resource overhead.

Another way may be to go full on SolrCloud and add shards and replicas as
required, but I have to limit the resources I can use.

Thanks!
Georg

Re: How to limit resources in multi-tenant systems

Posted by Erick Erickson <er...@gmail.com>.
There's really no OOB way that I know of to do what you're asking
about. I'm not even sure what the "right thing to do" would be if such
a limit was encountered. Fail the query? Try to execute it really slowly
within the constraints? (Actually I doubt this latter is possible).

The way Lucene sorts for instance, simply sorting requires an int array
maxDoc long.

The transient core stuff can help by limiting the total number of open
cores (NOTE: only stand-alone Solr, not SolrCloud). That doesn't
really address the question of one of the active cores firing a horribly
expensive query though.

What I've seen usually in this situation is that the number of docs in the
cumulative cores is monitored and tenants moved around when
the total number of docs per JVM approaches some limit (that you have
to determine empirically). Usually this follows a "long tail" pattern with
a few clients having their own dedicated JVMs down to 100s of clients
in the same JVM....

But if you allow free-form queries to come in then there's no effective way
to limit it. There is some work being done on estimating query costs and
doing something reasonable, but I don't have the JIRAs at hand and don't
know the current progress there. So often people will restrict the kinds of
queries that _can_ be performed at the app layer. After all, if you allow me
unrestricted access to Solr I can delete everything.....

Not much help I know,
Erick

On Tue, Sep 20, 2016 at 11:48 AM, Georg Sorst <ge...@gmail.com> wrote:
> Hi list!
>
> I am running a multi-tenant system where the tenants can upload and import
> their own data into their respective cores. Fortunately, Solr makes it easy
> to make sure that the search indices don't mix and that clients can only
> access their "cores".
>
> However, isolating the resource consumption seems a little trickier. Of
> course it's fairly easy to limit the number of documents and queries per
> second for each tenant, but what if they add a few GBs of text to their
> documents? What if they use millions of different filter values? This may
> quickly fill up the VM heap and negatively impact the other tenants (I'm
> totally fine if the search for that one tenant goes down).
>
> Of course I can check their input data and apply a seemingly endless number
> of limits for all kinds of cases but that smells. Is there a more general
> solution to limit resource consumption per core? Something along the lines
> of "each core may use up to 5% of the heap".
>
> One suggestion I found on the mailing list was to run a separate Solr
> instance for each tenant. While this is certainly possible there is a
> significant administrative and resource overhead.
>
> Another way may be to go full on SolrCloud and add shards and replicas as
> required, but I have to limit the resources I can use.
>
> Thanks!
> Georg