You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Anshul Sharma <an...@gmail.com> on 2015/12/21 08:18:20 UTC

TPS with Solr Cloud

Hi,
I am trying to evaluate solr for one of my project for which i need to
check the scalability in terms of tps(transaction per second) for my
application.
I have configured solr on 1 AWS server as standalone application which is
giving me a tps of ~8000 for my query.
In order to test the scalability, i have done sharding of the same data
across two AWS servers with 2.5 milion records each .When i try to query
the cluster with the same query as before it gives me a tps of ~2500 .
My understanding is the tps should have been increased in a cluster as
these are two different machines which will perform separate I/O operations.
I have not configured any seperate load balancer as the document says that
by default solr cloud will perform load balancing in a round robin fashion.
Can you please help me in understanding the issue.

Re: TPS with Solr Cloud

Posted by Walter Underwood <wu...@wunderwood.org>.

How many documents do you have? How big is the index?

You can increase total throughput with replicas. Shards will make it slower, but allow more documents.

At 8000 queries/s, I assume you are using the same query over and over. If so, that is a terrible benchmark. Everything is served out of cache.

Test with production logs. Choose logs where the number of distinct queries is much larger than your cache sizes. If your caches are 1024, it would be good to have a 100K distinct queries. That might mean of total log size of a few million queries.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 21, 2015, at 9:47 AM, Upayavira <uv...@odoko.co.uk> wrote:
> 
> 
> You add shards to reduce response times. If your responses are too slow
> for 1 shard, try it with three. Skip two for reasons stated above.
> 
> Upayavira
> 
> On Mon, Dec 21, 2015, at 04:27 PM, Erick Erickson wrote:
>> 8,000 TPS almost certainly means you're firing the same (or
>> same few) requests over and over and hitting the queryResultCache,
>> look in the adminUI>>core>>plugins/stats>>cache>>queryResultCache.
>> I bet you're seeing a hit ratio near 100%. This is what Toke means
>> when he says your tests are too lightweight.
>> 
>> 
>> As others have outlined, to increase TPS (after you straighten out
>> your test harness) you add _replicas_ rather than add _shards_.
>> Only add shards when your collections are too big to fit on a single
>> Solr instance.
>> 
>> Best,
>> Erick
>> 
>> On Mon, Dec 21, 2015 at 1:56 AM, Emir Arnautovic
>> <em...@sematext.com> wrote:
>>> Hi Anshul,
>>> TPS depends on number of concurrent request you can run and request
>>> processing time. With sharding you reduce processing time with reducing
>>> amount of data single node process, but you have overhead of inter shard
>>> communication and merging results from different shards. If that overhead is
>>> smaller than time you get when processing half of index, you will see
>>> increase of TPS. If you are running same query in a loop, first request will
>>> be processed and others will likely be returned from cache, so response time
>>> will not vary with index size hence sharding overhead will cause TPS to go
>>> down.
>>> If you are happy with your response time, and want more TPS, you go with
>>> replications - that will increase number of concurrent requests you can run.
>>> 
>>> Also, make sure your tests are realistic in order to avoid having false
>>> estimates and have surprises when start running real load.
>>> 
>>> Regards,
>>> Emir
>>> 
>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>> 
>>> 
>>> 
>>> 
>>> On 21.12.2015 08:18, Anshul Sharma wrote:
>>>> 
>>>> Hi,
>>>> I am trying to evaluate solr for one of my project for which i need to
>>>> check the scalability in terms of tps(transaction per second) for my
>>>> application.
>>>> I have configured solr on 1 AWS server as standalone application which is
>>>> giving me a tps of ~8000 for my query.
>>>> In order to test the scalability, i have done sharding of the same data
>>>> across two AWS servers with 2.5 milion records each .When i try to query
>>>> the cluster with the same query as before it gives me a tps of ~2500 .
>>>> My understanding is the tps should have been increased in a cluster as
>>>> these are two different machines which will perform separate I/O
>>>> operations.
>>>> I have not configured any seperate load balancer as the document says that
>>>> by default solr cloud will perform load balancing in a round robin
>>>> fashion.
>>>> Can you please help me in understanding the issue.
>>>> 
>>>

Re: TPS with Solr Cloud

Posted by Upayavira <uv...@odoko.co.uk>.

You add shards to reduce response times. If your responses are too slow
for 1 shard, try it with three. Skip two for reasons stated above.

Upayavira

On Mon, Dec 21, 2015, at 04:27 PM, Erick Erickson wrote:
> 8,000 TPS almost certainly means you're firing the same (or
> same few) requests over and over and hitting the queryResultCache,
> look in the adminUI>>core>>plugins/stats>>cache>>queryResultCache.
> I bet you're seeing a hit ratio near 100%. This is what Toke means
> when he says your tests are too lightweight.
> 
> 
> As others have outlined, to increase TPS (after you straighten out
> your test harness) you add _replicas_ rather than add _shards_.
> Only add shards when your collections are too big to fit on a single
> Solr instance.
> 
> Best,
> Erick
> 
> On Mon, Dec 21, 2015 at 1:56 AM, Emir Arnautovic
> <em...@sematext.com> wrote:
> > Hi Anshul,
> > TPS depends on number of concurrent request you can run and request
> > processing time. With sharding you reduce processing time with reducing
> > amount of data single node process, but you have overhead of inter shard
> > communication and merging results from different shards. If that overhead is
> > smaller than time you get when processing half of index, you will see
> > increase of TPS. If you are running same query in a loop, first request will
> > be processed and others will likely be returned from cache, so response time
> > will not vary with index size hence sharding overhead will cause TPS to go
> > down.
> > If you are happy with your response time, and want more TPS, you go with
> > replications - that will increase number of concurrent requests you can run.
> >
> > Also, make sure your tests are realistic in order to avoid having false
> > estimates and have surprises when start running real load.
> >
> > Regards,
> > Emir
> >
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> >
> > On 21.12.2015 08:18, Anshul Sharma wrote:
> >>
> >> Hi,
> >> I am trying to evaluate solr for one of my project for which i need to
> >> check the scalability in terms of tps(transaction per second) for my
> >> application.
> >> I have configured solr on 1 AWS server as standalone application which is
> >> giving me a tps of ~8000 for my query.
> >> In order to test the scalability, i have done sharding of the same data
> >> across two AWS servers with 2.5 milion records each .When i try to query
> >> the cluster with the same query as before it gives me a tps of ~2500 .
> >> My understanding is the tps should have been increased in a cluster as
> >> these are two different machines which will perform separate I/O
> >> operations.
> >> I have not configured any seperate load balancer as the document says that
> >> by default solr cloud will perform load balancing in a round robin
> >> fashion.
> >> Can you please help me in understanding the issue.
> >>
> >

Re: TPS with Solr Cloud

Posted by Erick Erickson <er...@gmail.com>.

8,000 TPS almost certainly means you're firing the same (or
same few) requests over and over and hitting the queryResultCache,
look in the adminUI>>core>>plugins/stats>>cache>>queryResultCache.
I bet you're seeing a hit ratio near 100%. This is what Toke means
when he says your tests are too lightweight.


As others have outlined, to increase TPS (after you straighten out
your test harness) you add _replicas_ rather than add _shards_.
Only add shards when your collections are too big to fit on a single
Solr instance.

Best,
Erick

On Mon, Dec 21, 2015 at 1:56 AM, Emir Arnautovic
<em...@sematext.com> wrote:
> Hi Anshul,
> TPS depends on number of concurrent request you can run and request
> processing time. With sharding you reduce processing time with reducing
> amount of data single node process, but you have overhead of inter shard
> communication and merging results from different shards. If that overhead is
> smaller than time you get when processing half of index, you will see
> increase of TPS. If you are running same query in a loop, first request will
> be processed and others will likely be returned from cache, so response time
> will not vary with index size hence sharding overhead will cause TPS to go
> down.
> If you are happy with your response time, and want more TPS, you go with
> replications - that will increase number of concurrent requests you can run.
>
> Also, make sure your tests are realistic in order to avoid having false
> estimates and have surprises when start running real load.
>
> Regards,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
> On 21.12.2015 08:18, Anshul Sharma wrote:
>>
>> Hi,
>> I am trying to evaluate solr for one of my project for which i need to
>> check the scalability in terms of tps(transaction per second) for my
>> application.
>> I have configured solr on 1 AWS server as standalone application which is
>> giving me a tps of ~8000 for my query.
>> In order to test the scalability, i have done sharding of the same data
>> across two AWS servers with 2.5 milion records each .When i try to query
>> the cluster with the same query as before it gives me a tps of ~2500 .
>> My understanding is the tps should have been increased in a cluster as
>> these are two different machines which will perform separate I/O
>> operations.
>> I have not configured any seperate load balancer as the document says that
>> by default solr cloud will perform load balancing in a round robin
>> fashion.
>> Can you please help me in understanding the issue.
>>
>

Re: TPS with Solr Cloud

Posted by Emir Arnautovic <em...@sematext.com>.

Hi Anshul,
TPS depends on number of concurrent request you can run and request 
processing time. With sharding you reduce processing time with reducing 
amount of data single node process, but you have overhead of inter shard 
communication and merging results from different shards. If that 
overhead is smaller than time you get when processing half of index, you 
will see increase of TPS. If you are running same query in a loop, first 
request will be processed and others will likely be returned from cache, 
so response time will not vary with index size hence sharding overhead 
will cause TPS to go down.
If you are happy with your response time, and want more TPS, you go with 
replications - that will increase number of concurrent requests you can run.

Also, make sure your tests are realistic in order to avoid having false 
estimates and have surprises when start running real load.

Regards,
Emir

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On 21.12.2015 08:18, Anshul Sharma wrote:
> Hi,
> I am trying to evaluate solr for one of my project for which i need to
> check the scalability in terms of tps(transaction per second) for my
> application.
> I have configured solr on 1 AWS server as standalone application which is
> giving me a tps of ~8000 for my query.
> In order to test the scalability, i have done sharding of the same data
> across two AWS servers with 2.5 milion records each .When i try to query
> the cluster with the same query as before it gives me a tps of ~2500 .
> My understanding is the tps should have been increased in a cluster as
> these are two different machines which will perform separate I/O operations.
> I have not configured any seperate load balancer as the document says that
> by default solr cloud will perform load balancing in a round robin fashion.
> Can you please help me in understanding the issue.
>

Re: TPS with Solr Cloud

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

Anshul Sharma <an...@gmail.com> wrote:
> I have configured solr on 1 AWS server as standalone application which is
> giving me a tps of ~8000 for my query.

[...]

> In order to test the scalability, i have done sharding of the same data
> across two AWS servers with 2.5 milion records each .When i try to query
> the cluster with the same query as before it gives me a tps of ~2500 .

Sharding means two-phase processing and a merge of the shard-results. The overhead of sharding was larger than the gains, for your setup. I am afraid your test is too light-weight for performance-estimation at scale.

- Toke Eskildsen