You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Stanislav Sandalnikov <s....@gmail.com> on 2013/05/07 13:09:41 UTC

Search performance: shards or replications?

Hi,

We are moving to SolrCloud architecture. And I have question about search
performance and its correlation with shards or replicas. What will be more
efficient: to split all index we have to several shards or create several
replications of index? Is parallel search works with both shards and
replicas?

Please share your experience regarding this matter.

Thanks in advance.

Regards,
Stanislav

Re: Search performance: shards or replications?

Posted by Stanislav Sandalnikov <s....@gmail.com>.
P.S. Sorry for misspelling your name, Jan


2013/5/7 Stanislav Sandalnikov <s....@gmail.com>

> Hi Yan,
>
> Thanks for the quick reply.
>
> Thus, replication seems to be the preferable solution. QTime decreases
> proportional to replications number or there are any other drawbacks?
>
> Just to clarify, what amount of documents stands for "tons of documents"
> in your opinion? :)
>
>
> 2013/5/7 Jan Høydahl <ja...@cominvent.com>
>
>> Hi,
>>
>> It depends(TM) on what kind of search performance problems you are seeing.
>> If you simply have so high query load that the server starts to kneal, it
>> will
>> definitely not help to shard, since ALL the shards will still be hit with
>> ALL the queries, and you add some extra overhead with sharding as well.
>>
>> But if your QPS is moderate and you have tons of documents, you may gain
>> better performance both for indexing latency and search latency by
>> sharding.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov <
>> s.sandalnikov@gmail.com>:
>>
>> > Hi,
>> >
>> > We are moving to SolrCloud architecture. And I have question about
>> search
>> > performance and its correlation with shards or replicas. What will be
>> more
>> > efficient: to split all index we have to several shards or create
>> several
>> > replications of index? Is parallel search works with both shards and
>> > replicas?
>> >
>> > Please share your experience regarding this matter.
>> >
>> > Thanks in advance.
>> >
>> > Regards,
>> > Stanislav
>>
>>
>

Re: Search performance: shards or replications?

Posted by Stanislav Sandalnikov <s....@gmail.com>.
Thank you, everything seems clear.
 07.05.2013 20:17 пользователь "Andre Bois-Crettez" <an...@kelkoo.com>
написал:

> Some clarifications :
>
> 1) *lots of docs, few queries* : If you have a high number of documents
> (+dozen millions) and lowish number of queries per second (say less than
> 10), replicas will not help to reduce the Qtime. For this kind of task
> it is better to shard the index, as each query will effectively be
> processed in parallel by N shards, thus reducing Qtime.
>
> 2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the
> contrary, you want more replicas to handle more traffic, and avoid
> overloaded servers (which would increase the Qtime).
>
> 3) *lots of docs, lots of queries* : do both sharding and replicas.
>
> Actual numbers depends on the hardware, the type of docs and queries, etc.
> The best is to benchmark your setup varying the load so that you case
> trace a hockey stick graph of Qtime versus qps.
> Feel free to ask for details if needed.
>
>
>
> André
>
> On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote:
>
>> Hi Yan,
>>
>> Thanks for the quick reply.
>>
>> Thus, replication seems to be the preferable solution. QTime decreases
>> proportional to replications number or there are any other drawbacks?
>>
>> Just to clarify, what amount of documents stands for "tons of documents"
>> in
>> your opinion? :)
>>
>>
>> 2013/5/7 Jan Høydahl<ja...@cominvent.com>
>>
>>  Hi,
>>>
>>> It depends(TM) on what kind of search performance problems you are
>>> seeing.
>>> If you simply have so high query load that the server starts to kneal, it
>>> will
>>> definitely not help to shard, since ALL the shards will still be hit with
>>> ALL the queries, and you add some extra overhead with sharding as well.
>>>
>>> But if your QPS is moderate and you have tons of documents, you may gain
>>> better performance both for indexing latency and search latency by
>>> sharding.
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>> 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov<s.sandalnikov@**
>>> gmail.com <s....@gmail.com>
>>>
>>>> :
>>>> Hi,
>>>>
>>>> We are moving to SolrCloud architecture. And I have question about
>>>> search
>>>> performance and its correlation with shards or replicas. What will be
>>>>
>>> more
>>>
>>>> efficient: to split all index we have to several shards or create
>>>> several
>>>> replications of index? Is parallel search works with both shards and
>>>> replicas?
>>>>
>>>> Please share your experience regarding this matter.
>>>>
>>>> Thanks in advance.
>>>>
>>>> Regards,
>>>> Stanislav
>>>>
>>>
>>>
>> --
>> André Bois-Crettez
>>
>> Search technology, Kelkoo
>> http://www.kelkoo.com/
>>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>

Re: Search performance: shards or replications?

Posted by Andre Bois-Crettez <an...@kelkoo.com>.
Some clarifications :

1) *lots of docs, few queries* : If you have a high number of documents
(+dozen millions) and lowish number of queries per second (say less than
10), replicas will not help to reduce the Qtime. For this kind of task
it is better to shard the index, as each query will effectively be
processed in parallel by N shards, thus reducing Qtime.

2) *few docs, lots of queries* : less than 10M docs and 30+ qps, on the
contrary, you want more replicas to handle more traffic, and avoid
overloaded servers (which would increase the Qtime).

3) *lots of docs, lots of queries* : do both sharding and replicas.

Actual numbers depends on the hardware, the type of docs and queries, etc.
The best is to benchmark your setup varying the load so that you case
trace a hockey stick graph of Qtime versus qps.
Feel free to ask for details if needed.



André

On 05/07/2013 01:56 PM, Stanislav Sandalnikov wrote:
> Hi Yan,
>
> Thanks for the quick reply.
>
> Thus, replication seems to be the preferable solution. QTime decreases
> proportional to replications number or there are any other drawbacks?
>
> Just to clarify, what amount of documents stands for "tons of documents" in
> your opinion? :)
>
>
> 2013/5/7 Jan Høydahl<ja...@cominvent.com>
>
>> Hi,
>>
>> It depends(TM) on what kind of search performance problems you are seeing.
>> If you simply have so high query load that the server starts to kneal, it
>> will
>> definitely not help to shard, since ALL the shards will still be hit with
>> ALL the queries, and you add some extra overhead with sharding as well.
>>
>> But if your QPS is moderate and you have tons of documents, you may gain
>> better performance both for indexing latency and search latency by
>> sharding.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov<s.sandalnikov@gmail.com
>>> :
>>> Hi,
>>>
>>> We are moving to SolrCloud architecture. And I have question about search
>>> performance and its correlation with shards or replicas. What will be
>> more
>>> efficient: to split all index we have to several shards or create several
>>> replications of index? Is parallel search works with both shards and
>>> replicas?
>>>
>>> Please share your experience regarding this matter.
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Stanislav
>>
>
> --
> André Bois-Crettez
>
> Search technology, Kelkoo
> http://www.kelkoo.com/

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.

Re: Search performance: shards or replications?

Posted by Stanislav Sandalnikov <s....@gmail.com>.
Hi Yan,

Thanks for the quick reply.

Thus, replication seems to be the preferable solution. QTime decreases
proportional to replications number or there are any other drawbacks?

Just to clarify, what amount of documents stands for "tons of documents" in
your opinion? :)


2013/5/7 Jan Høydahl <ja...@cominvent.com>

> Hi,
>
> It depends(TM) on what kind of search performance problems you are seeing.
> If you simply have so high query load that the server starts to kneal, it
> will
> definitely not help to shard, since ALL the shards will still be hit with
> ALL the queries, and you add some extra overhead with sharding as well.
>
> But if your QPS is moderate and you have tons of documents, you may gain
> better performance both for indexing latency and search latency by
> sharding.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> 7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov <s.sandalnikov@gmail.com
> >:
>
> > Hi,
> >
> > We are moving to SolrCloud architecture. And I have question about search
> > performance and its correlation with shards or replicas. What will be
> more
> > efficient: to split all index we have to several shards or create several
> > replications of index? Is parallel search works with both shards and
> > replicas?
> >
> > Please share your experience regarding this matter.
> >
> > Thanks in advance.
> >
> > Regards,
> > Stanislav
>
>

Re: Search performance: shards or replications?

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

It depends(TM) on what kind of search performance problems you are seeing.
If you simply have so high query load that the server starts to kneal, it will
definitely not help to shard, since ALL the shards will still be hit with
ALL the queries, and you add some extra overhead with sharding as well.

But if your QPS is moderate and you have tons of documents, you may gain
better performance both for indexing latency and search latency by sharding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. mai 2013 kl. 13:09 skrev Stanislav Sandalnikov <s....@gmail.com>:

> Hi,
> 
> We are moving to SolrCloud architecture. And I have question about search
> performance and its correlation with shards or replicas. What will be more
> efficient: to split all index we have to several shards or create several
> replications of index? Is parallel search works with both shards and
> replicas?
> 
> Please share your experience regarding this matter.
> 
> Thanks in advance.
> 
> Regards,
> Stanislav