You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Yannick <ya...@yahoo.com.INVALID> on 2014/10/09 15:06:04 UTC

Solr Cloud has lower performance with more servers

Hello good Solr people,

I have the following surprising situation. 

I created a group of 2 Solr servers with a load-balancer in front (Haproxy). I have a batch client that sends requests (read-only) continuously to the load-balancer. The problem is: the performance is slower with 2 servers than it is with a single server (still via the load-balancer, with the second server down, so it's not the load-balancer itself causing the slowdown). My batch execution time is about 5 minutes with a single server, and more than 6 minutes with two servers.

Both servers are VMs, hosted on two different physical computers, with no resource sharing. I tried throwing in a third server, the performance was even lower.

I'm trying to find ideas on what could cause this and/or what else I could try? My goal is to decrease the execution times of these batches (which may last for many hours), but this clearly seems to be going the wrong way.

Thanks in advance,

Yann

Re: Solr Cloud has lower performance with more servers

Posted by Erick Erickson <er...@gmail.com>.
Just to check: your index is NOT sharded, correct?

Assuming not sharded, is it SolrCloud? If not SolrCloud, how are the
indexes kept in synch? Master/slave? Manual copy?

But for an unchanging index, this is definitely odd.

Best,
Erick

On Thu, Oct 9, 2014 at 7:40 AM, Walter Underwood <wu...@wunderwood.org> wrote:
> Is this a production log of queries, with lots of repeats? If so, you may be seeing the normal effect of lower cache hit rates.
>
> Check the hit rate for the query result cache in the two setups. With a single machine, the second occurrence of a query will be a cache hit. With two machines, it will not be if the two queries are routed to different machines.
>
> I was running some benchmarks here. With one machine, the query cache had a 50% hit rate. With eight machines, it was 20%.
>
> You can address this with a reverse proxy HTTP cache in front of the cluster, something like Varnish.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Oct 9, 2014, at 7:21 AM, Yannick <ya...@yahoo.com.INVALID> wrote:
>
>> Hi Toke,
>>
>> thanks for your suggestion - definitely an interesting idea. But unfortunately no, no indexing job is running; those are static indexes being queried. The execution time is also very consistent in each condition, I did quite a few tests.
>>
>> Yann
>>
>>
>> On Thursday, October 9, 2014 3:56 PM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
>>
>>
>>
>> On Thu, 2014-10-09 at 15:06 +0200, Yannick wrote:
>>
>>
>>> I created a group of 2 Solr servers with a load-balancer in front
>>> (Haproxy). I have a batch client that sends requests (read-only)
>>> continuously to the load-balancer. The problem is: the performance is
>>> slower with 2 servers than it is with a single server (still via the
>>> load-balancer, with the second server down, so it's not the
>>> load-balancer itself causing the slowdown).
>>
>> (speculating a lot here:)
>>
>> Is another job updating the indexes while you are batch-searching?
>> If so, the slowdown could be explained by the servers disk caches being
>> flushed by the indexing job. When a request arrives some cache is
>> reclaimed, but is will be a battle between the update and the search
>> jobs. With more machines, there will be fewer request/machine, so the
>> search-cache has a lower chance of being used again before it is
>> reclaimed by the updater.
>>
>> Still, worse performance for 2 machines sounds pretty bad.
>>
>> - Toke Eskildsen, State and University Library, Denmark
>

Re: Solr Cloud has lower performance with more servers

Posted by Walter Underwood <wu...@wunderwood.org>.
Is this a production log of queries, with lots of repeats? If so, you may be seeing the normal effect of lower cache hit rates.

Check the hit rate for the query result cache in the two setups. With a single machine, the second occurrence of a query will be a cache hit. With two machines, it will not be if the two queries are routed to different machines.

I was running some benchmarks here. With one machine, the query cache had a 50% hit rate. With eight machines, it was 20%.

You can address this with a reverse proxy HTTP cache in front of the cluster, something like Varnish.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/


On Oct 9, 2014, at 7:21 AM, Yannick <ya...@yahoo.com.INVALID> wrote:

> Hi Toke,
> 
> thanks for your suggestion - definitely an interesting idea. But unfortunately no, no indexing job is running; those are static indexes being queried. The execution time is also very consistent in each condition, I did quite a few tests.
> 
> Yann 
> 
> 
> On Thursday, October 9, 2014 3:56 PM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> 
> 
> 
> On Thu, 2014-10-09 at 15:06 +0200, Yannick wrote:
> 
> 
>> I created a group of 2 Solr servers with a load-balancer in front
>> (Haproxy). I have a batch client that sends requests (read-only)
>> continuously to the load-balancer. The problem is: the performance is
>> slower with 2 servers than it is with a single server (still via the
>> load-balancer, with the second server down, so it's not the
>> load-balancer itself causing the slowdown).
> 
> (speculating a lot here:)
> 
> Is another job updating the indexes while you are batch-searching?
> If so, the slowdown could be explained by the servers disk caches being
> flushed by the indexing job. When a request arrives some cache is
> reclaimed, but is will be a battle between the update and the search
> jobs. With more machines, there will be fewer request/machine, so the
> search-cache has a lower chance of being used again before it is
> reclaimed by the updater.
> 
> Still, worse performance for 2 machines sounds pretty bad.
> 
> - Toke Eskildsen, State and University Library, Denmark


Re: Solr Cloud has lower performance with more servers

Posted by Erick Erickson <er...@gmail.com>.
Perhaps a really silly question, but... Is your batch job sending
queries serially? In which case this is understandable and would be
sensitive to the lower hit ratios in your caches.

If this is the case, you still won't get 2x the performance with two
servers, I'm guessing your total time to run the batch job with 2
threads would be closer to 3 minutes rather than 2.5....

Which brings up another question, if you fired 4 threads at your two
servers (or 8 or...) I think you'd see something of a throughput
increase but it wouldn't be linear once you start getting up past the
number of CPU cores available.

Which suggests another set of tests....
1> single node with 2 (or 4 or 8 or whatever) threads
2> same thing with a load balancer.

FWIW,
Erick

On Fri, Oct 10, 2014 at 2:30 AM, Yannick <ya...@yahoo.com.invalid> wrote:
> Hi guys,
>
> Thanks a lot for the cogent feedback, much appreciated. I'll group my answers in one message.
>
> 1- (Charlie): regarding the type of queries I make: here is a typical one; it's fairly vanilla, no faceting or anything fancy.
>
> q=+((titleStart:"was sixteen")(subTitleStart:"was sixteen"))+(participantStart:"fairport convention")&fq=+(type:COCV)
>
> 2- (Erick): correct, my index is not sharded, and there is no SolrCloud for the test. In a production env, I will setup a SolrCloud, but I think I need to understand what's going on here first. It's a medium-size index (30 million docs). I did a sharded test and the performance was worse on 2 servers / 2 shards, than 2 servers / same single shard on each.
>
> 3- (Walter): regarding the cache situation : you may be on to something; this is what I observed:
>
> document cache: hitrate 0.38 (1 server condition) / 0.26 (on each of the 2 servers)
> filter cache : hitrate 1  in both conditions
> query result cache : hitrate 0.24 (1 server) / 0.14 (on each of the 2 servers)
>
> Is it possible this is the issue? My batch issues ~300,000 queries, corresponding to ~150,000 unique queries.
>
> Will look into Varnish, thanks for the pointer.
>
> Yannick

Re: Solr Cloud has lower performance with more servers

Posted by Yannick <ya...@yahoo.com.INVALID>.
Hi guys,

Thanks a lot for the cogent feedback, much appreciated. I'll group my answers in one message.

1- (Charlie): regarding the type of queries I make: here is a typical one; it's fairly vanilla, no faceting or anything fancy.

q=+((titleStart:"was sixteen")(subTitleStart:"was sixteen"))+(participantStart:"fairport convention")&fq=+(type:COCV)

2- (Erick): correct, my index is not sharded, and there is no SolrCloud for the test. In a production env, I will setup a SolrCloud, but I think I need to understand what's going on here first. It's a medium-size index (30 million docs). I did a sharded test and the performance was worse on 2 servers / 2 shards, than 2 servers / same single shard on each.

3- (Walter): regarding the cache situation : you may be on to something; this is what I observed:

document cache: hitrate 0.38 (1 server condition) / 0.26 (on each of the 2 servers)
filter cache : hitrate 1  in both conditions
query result cache : hitrate 0.24 (1 server) / 0.14 (on each of the 2 servers)

Is it possible this is the issue? My batch issues ~300,000 queries, corresponding to ~150,000 unique queries.

Will look into Varnish, thanks for the pointer.

Yannick

Re: Solr Cloud has lower performance with more servers

Posted by Yannick <ya...@yahoo.com.INVALID>.
Hi Toke,

thanks for your suggestion - definitely an interesting idea. But unfortunately no, no indexing job is running; those are static indexes being queried. The execution time is also very consistent in each condition, I did quite a few tests.

Yann 


On Thursday, October 9, 2014 3:56 PM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
 


On Thu, 2014-10-09 at 15:06 +0200, Yannick wrote:


> I created a group of 2 Solr servers with a load-balancer in front
> (Haproxy). I have a batch client that sends requests (read-only)
> continuously to the load-balancer. The problem is: the performance is
> slower with 2 servers than it is with a single server (still via the
> load-balancer, with the second server down, so it's not the
> load-balancer itself causing the slowdown).

(speculating a lot here:)

Is another job updating the indexes while you are batch-searching?
If so, the slowdown could be explained by the servers disk caches being
flushed by the indexing job. When a request arrives some cache is
reclaimed, but is will be a battle between the update and the search
jobs. With more machines, there will be fewer request/machine, so the
search-cache has a lower chance of being used again before it is
reclaimed by the updater.

Still, worse performance for 2 machines sounds pretty bad.

- Toke Eskildsen, State and University Library, Denmark

Re: Solr Cloud has lower performance with more servers

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Thu, 2014-10-09 at 15:06 +0200, Yannick wrote:

> I created a group of 2 Solr servers with a load-balancer in front
> (Haproxy). I have a batch client that sends requests (read-only)
> continuously to the load-balancer. The problem is: the performance is
> slower with 2 servers than it is with a single server (still via the
> load-balancer, with the second server down, so it's not the
> load-balancer itself causing the slowdown).

(speculating a lot here:)

Is another job updating the indexes while you are batch-searching?
If so, the slowdown could be explained by the servers disk caches being
flushed by the indexing job. When a request arrives some cache is
reclaimed, but is will be a battle between the update and the search
jobs. With more machines, there will be fewer request/machine, so the
search-cache has a lower chance of being used again before it is
reclaimed by the updater.

Still, worse performance for 2 machines sounds pretty bad.

- Toke Eskildsen, State and University Library, Denmark



Re: Solr Cloud has lower performance with more servers

Posted by Charlie Hull <ch...@flax.co.uk>.
On 09/10/2014 14:06, Yannick wrote:
> Hello good Solr people,
>
> I have the following surprising situation.
>
> I created a group of 2 Solr servers with a load-balancer in front
> (Haproxy). I have a batch client that sends requests (read-only)
> continuously to the load-balancer. The problem is: the performance is
> slower with 2 servers than it is with a single server (still via the
> load-balancer, with the second server down, so it's not the
> load-balancer itself causing the slowdown). My batch execution time
> is about 5 minutes with a single server, and more than 6 minutes with
> two servers.

What sort of queries are you doing? We're seeing some interesting 
effects of distributed facet queries with a current client - it seems 
there are some unexpected uses of caches.

Charlie
>
> Both servers are VMs, hosted on two different physical computers,
> with no resource sharing. I tried throwing in a third server, the
> performance was even lower.
>
> I'm trying to find ideas on what could cause this and/or what else I
> could try? My goal is to decrease the execution times of these
> batches (which may last for many hours), but this clearly seems to be
> going the wrong way.
>
> Thanks in advance,
>
> Yann
>


-- 
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk