You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sofiya Strochyk <ss...@interlogic.com.ua> on 2019/05/16 15:27:18 UTC

Re: SolrCloud scaling/optimization for high request rate

Thanks to everyone for the suggestions. We managed to get the 
performance to a bearable level by splitting the index into ~20 separate 
collections (one collection per country) and spreading them between 
existing servers as evenly as possible. The largest country is also 
split into 2 shards. This means that

1. QPS is lower for each instance since it only receives requests to the 
corresponding country.

2. Index size is smaller for each instance as it only contains documents 
for the corresponding country.

3. If one instance fails then most of the other instances keep running 
(possibly except the ones colocated with the failed one)

We didn't make any changes to the main query, but have added a few 
fields to facet on. This had a small negative impact on performance but 
overall kept working nicely.


On 14.11.18 12:18, Toke Eskildsen wrote:
> On Mon, 2018-11-12 at 14:19 +0200, Sofiya Strochyk wrote:
>> I'll check if the filter queries or the main query tokenizers/filters
>> might have anything to do with this, but I'm afraid query
>> optimization can only get us so far.
> Why do you think that? As you tried eliminating sorting and retrieval
> previously, the queries are all that's left. There are multiple
> performance traps when querying and a lot of them can be bypassed by
> changing the index or querying in a different way.
>
>> Since we will have to add facets later, the queries will only become
>> heavier, and there has to be a way to scale this setup and deal with
>> both higher load and more complex queries.
> There is of course a way. It is more a question of what you are willing
> to pay.
>
> If you have money, just buy more hardware: We know (with very high
> probability) that it will work as your problem is search throughput,
> which can be solved by adding more replicas on extra machines.
>
> If you have more engineering hours, you can use them on some of the
> things discussed previously:
>
> * Pinpoint query bottlenecks
> * Use less/more shards
> * Applyhttps://issues.apache.org/jira/browse/LUCENE-8374
> * Experiment with different amounts of concurrent requests to see what
> gives the optimum throughput. This also tells you how much extra
> hardware you need, if you decide you need to expand..
>
>
> - Toke Eskildsen, Royal Danish Library
>
>

-- 
Email Signature
*Sofiia Strochyk
*


ss@interlogic.com.ua <ma...@interlogic.com.ua>
	InterLogic
www.interlogic.com.ua <https://www.interlogic.com.ua>

Facebook icon <https://www.facebook.com/InterLogicOfficial> LinkedIn 
icon <https://www.linkedin.com/company/interlogic>