You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by shlomi java <sh...@gmail.com> on 2012/01/01 10:34:46 UTC

Merging results from Shards - relevancy and performance

hola,

1) When distributing search across several Shards, is the merged result
reflects the overall ranking, cross-shards?
I'm talking about stuff like "document frequency".
I guess it does, otherwise distributed search wouldn't have overhead.

talking about overhead,
2) is there a known ratio of the overhead of using shards against
single core, and the impact on performance for adding the N+1 shard to the
distributed index?

thanks for any knowledge/thought.
ShlomiJ

Re: Merging results from Shards - relevancy and performance

Posted by Erick Erickson <er...@gmail.com>.

1> Yes. Note that the distributed tf/idf is an issue, although it's changing.
     That is, if your documents are statistically very different across
     shards, the scores aren't really comparable. This is changing, but
    I don't think it's committed yet.
2> Well, you're mixing apples and oranges I think. The general
     recommendation is to use a single core and *replicate* it
     across as many machines as necessary until you index gets
     too big to fit on your machines (i.e. you cannot get decent
     query times at all). This is NOT distributed searching as
     each request is wholly serviced by a single slave searcher.

     Once you cross the threshold of what fits on your hardware,
     you really have no choice except to shard and use distributed
     searching.

     There is certainly some overhead, but since you have no
     choice but to pay it, you just cope <G>.

     At very large scale (i.e. lots of shards on lots of machines),
     you run into the "laggard problem". That is, as the number
     of shards increases, so does the chance that at least one
     of them will, for whatever reason, take an anomalously long
     time to complete which will slow your final results.

FWIW
Erick

On Sun, Jan 1, 2012 at 4:34 AM, shlomi java <sh...@gmail.com> wrote:
> hola,
>
> 1) When distributing search across several Shards, is the merged result
> reflects the overall ranking, cross-shards?
> I'm talking about stuff like "document frequency".
> I guess it does, otherwise distributed search wouldn't have overhead.
>
> talking about overhead,
> 2) is there a known ratio of the overhead of using shards against
> single core, and the impact on performance for adding the N+1 shard to the
> distributed index?
>
> thanks for any knowledge/thought.
> ShlomiJ