You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kasimjinwala <ji...@gmail.com> on 2016/07/18 06:08:27 UTC

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

currently I am using solrCloud 5.0 and I am facing query performance issue
while using 3 implicit shards, each shard contain around 10K records. 
when I am specifying shards parameter(*shards=shard1*) in query it gives
30K-35K qps. but while removing shards parameter from query it give
*1000-1500qps*. performance decreases drastically.

please provide comment or suggestion to solve above issue



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4287600.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

Posted by Erick Erickson <er...@gmail.com>.
15M docs may still comfortably fit in a single shard!
I've seen up to 300M docs fit on a shard. Then
again I've seen 10M docs make things unacceptably
slow.

You simply cannot extrapolate from 10K to
5M reliably. Put all 5M docs on the stand-alone
servers and test _that_. Whenever I see numbers
like 30K qps (assuming this is queries, not number
of docs indexed) I wonder if you're using the
same query over and over and hitting the query
result cache rather than doing any actual
searches.

But to answer your question (again). Sharding adds
overhead. There's no way to make that overhead
magically disappear. What you measure is what
you can expect, and you must measure.

Best,
Erick

On Tue, Jul 19, 2016 at 8:32 AM, Susheel Kumar <su...@gmail.com> wrote:
> You may want to utilise Document routing (_route_) option to have your
> query serve faster but above you are trying to compare apple with oranges
> meaning your performance tests numbers have to be based on either your
> actual numbers like 3-5 million docs per shard or sufficient enough to see
> advantage of using sharding.  10K is nothing for your performance tests and
> will not give you anything.
>
> Otherwise as Eric mentioned don't shard  and add replica's if there is no
> need to distribute/divide data into shards.
>
>
> See
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
> https://cwiki.apache.org/confluence/display/solr/Advanced+Distributed+Request+Options
>
>
> Thanks,
> Susheel
>
> On Tue, Jul 19, 2016 at 1:41 AM, kasimjinwala <ji...@gmail.com>
> wrote:
>
>> This is just for performance testing we have taken 10K records per shard.
>> In
>> live scenario it would be 30L-50L per shard. I want to search document from
>> all shards, it will slow down and take too long time.
>>
>> I know in case of solr Cloud, it will query all shard node and then return
>> result. Is there any way to search document in all shard with best
>> performance(qps)
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4287763.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

Posted by Susheel Kumar <su...@gmail.com>.
You may want to utilise Document routing (_route_) option to have your
query serve faster but above you are trying to compare apple with oranges
meaning your performance tests numbers have to be based on either your
actual numbers like 3-5 million docs per shard or sufficient enough to see
advantage of using sharding.  10K is nothing for your performance tests and
will not give you anything.

Otherwise as Eric mentioned don't shard  and add replica's if there is no
need to distribute/divide data into shards.


See
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

https://cwiki.apache.org/confluence/display/solr/Advanced+Distributed+Request+Options


Thanks,
Susheel

On Tue, Jul 19, 2016 at 1:41 AM, kasimjinwala <ji...@gmail.com>
wrote:

> This is just for performance testing we have taken 10K records per shard.
> In
> live scenario it would be 30L-50L per shard. I want to search document from
> all shards, it will slow down and take too long time.
>
> I know in case of solr Cloud, it will query all shard node and then return
> result. Is there any way to search document in all shard with best
> performance(qps)
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4287763.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

Posted by kasimjinwala <ji...@gmail.com>.
This is just for performance testing we have taken 10K records per shard. In
live scenario it would be 30L-50L per shard. I want to search document from
all shards, it will slow down and take too long time. 

I know in case of solr Cloud, it will query all shard node and then return
result. Is there any way to search document in all shard with best
performance(qps)



--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4287763.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

Posted by Erick Erickson <er...@gmail.com>.
+1 to Susheel's question. Sharding inevitably adds
overhead. Roughly each shard is queried
for its top N docs (10 if, say, rows=10). The
doc ID and sort criteria (score by default) are returned
to the node that originally got the request. That node
then sorts the lists into the real top 10 to return to
the user. Then the node handling the request re-queries
the shards for the contents of those docs.

Sharding is a way to handle very large data sets, the
general recommendation is to shard _only_ when you
have too many documents to get good query perf
from a single shard.

If you need to increase QPS, add _replicas_ not shards.
Only go to sharding when you have too many documents
fit on your hardware.

Best,
Erick

On Mon, Jul 18, 2016 at 6:31 AM, Susheel Kumar <su...@gmail.com> wrote:
> Hello,
>
> Question:  Do you really need sharding/can live without sharding since you
> mentioned only 10K records in one shard. What's your index/document size?
>
> Thanks,
> Susheel
>
> On Mon, Jul 18, 2016 at 2:08 AM, kasimjinwala <ji...@gmail.com>
> wrote:
>
>> currently I am using solrCloud 5.0 and I am facing query performance issue
>> while using 3 implicit shards, each shard contain around 10K records.
>> when I am specifying shards parameter(*shards=shard1*) in query it gives
>> 30K-35K qps. but while removing shards parameter from query it give
>> *1000-1500qps*. performance decreases drastically.
>>
>> please provide comment or suggestion to solve above issue
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4287600.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Re: SolrCloud - Query performance degrades with multiple servers(Shards)

Posted by Susheel Kumar <su...@gmail.com>.
Hello,

Question:  Do you really need sharding/can live without sharding since you
mentioned only 10K records in one shard. What's your index/document size?

Thanks,
Susheel

On Mon, Jul 18, 2016 at 2:08 AM, kasimjinwala <ji...@gmail.com>
wrote:

> currently I am using solrCloud 5.0 and I am facing query performance issue
> while using 3 implicit shards, each shard contain around 10K records.
> when I am specifying shards parameter(*shards=shard1*) in query it gives
> 30K-35K qps. but while removing shards parameter from query it give
> *1000-1500qps*. performance decreases drastically.
>
> please provide comment or suggestion to solve above issue
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Query-performance-degrades-with-multiple-servers-tp4024660p4287600.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>