You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Amey Patil <am...@germin8.com> on 2014/05/28 15:49:51 UTC

How sorting works with multiple shards?

Hi,

I wanted to know how sorting works with multiple shards.

Suppose I have queried with 4 shards specified. Records per page specified
as 100 & sort-field as creationDate. So will it sort & fetch 100 documents
from each shard, and then they will be aggregated, sorted again & top 100
will be given as a result discarding remaining 300?

My use case is -

I want to fetch documents with doc-id say A (or B or C etc.) and category W
X Y Z. Solr shards are created based on field "category", so all the
documents with category W are in shard-W, all the documents with type X are
in shard-X and so on...

1st approach - query will be
(doc-id:A AND category:(W OR X)) OR (doc-id:B AND category:(W OR Y)) OR
  (doc-id:C AND category:(W OR X OR Y OR Z)).... sorted on creationDate
Hit the query on all the shards.

2nd approach - there will be multiple queries
category:W AND (doc-id:(A OR B OR C))... sorted on creationDate. Hit this
query on shard-W
category:X AND (doc-id:(A OR C))... sorted on creationDate. Hit this query
on shard-X
category:Y AND (doc-id:(B OR C))... sorted on creationDate. Hit this query
on shard-Y
category:Z AND (doc-id:(C))... sorted on creationDate. Hit this query on
shard-Z
So there will be 4 queries, but avoiding the sort on aggregation.

I am using solr 3.4

Which approach will be efficient? My assumption about the working of
sorting in solr shards, is it correct?

Thanks,
Amey

Re: How sorting works with multiple shards?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Your understanding of the sorting mechanism with many shards is almost
right. In reality, Solr doesn't fetch the entire document from each shard.
Instead, it fetches just the uniqueKey and the sort field's value and then
merges them to get the top N and then fetches the actual doc content for
those docs from the respective shards.

If you do not need to merge results from shards ever then it may be faster
to go the 2nd approach but if you do want merged results from all shards
then Solr can do this faster than you and you should use approach #1.

As always, it is best to benchmark yourself.


On Wed, May 28, 2014 at 7:19 PM, Amey Patil <am...@germin8.com> wrote:

> Hi,
>
> I wanted to know how sorting works with multiple shards.
>
> Suppose I have queried with 4 shards specified. Records per page specified
> as 100 & sort-field as creationDate. So will it sort & fetch 100 documents
> from each shard, and then they will be aggregated, sorted again & top 100
> will be given as a result discarding remaining 300?
>
> My use case is -
>
> I want to fetch documents with doc-id say A (or B or C etc.) and category W
> X Y Z. Solr shards are created based on field "category", so all the
> documents with category W are in shard-W, all the documents with type X are
> in shard-X and so on...
>
> 1st approach - query will be
> (doc-id:A AND category:(W OR X)) OR (doc-id:B AND category:(W OR Y)) OR
>   (doc-id:C AND category:(W OR X OR Y OR Z)).... sorted on creationDate
> Hit the query on all the shards.
>
> 2nd approach - there will be multiple queries
> category:W AND (doc-id:(A OR B OR C))... sorted on creationDate. Hit this
> query on shard-W
> category:X AND (doc-id:(A OR C))... sorted on creationDate. Hit this query
> on shard-X
> category:Y AND (doc-id:(B OR C))... sorted on creationDate. Hit this query
> on shard-Y
> category:Z AND (doc-id:(C))... sorted on creationDate. Hit this query on
> shard-Z
> So there will be 4 queries, but avoiding the sort on aggregation.
>
> I am using solr 3.4
>
> Which approach will be efficient? My assumption about the working of
> sorting in solr shards, is it correct?
>
> Thanks,
> Amey
>



-- 
Regards,
Shalin Shekhar Mangar.