You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by shlomi java <sh...@gmail.com> on 2012/01/01 10:34:46 UTC
Merging results from Shards - relevancy and performance
hola,
1) When distributing search across several Shards, is the merged result
reflects the overall ranking, cross-shards?
I'm talking about stuff like "document frequency".
I guess it does, otherwise distributed search wouldn't have overhead.
talking about overhead,
2) is there a known ratio of the overhead of using shards against
single core, and the impact on performance for adding the N+1 shard to the
distributed index?
thanks for any knowledge/thought.
ShlomiJ
Re: Merging results from Shards - relevancy and performance
Posted by Erick Erickson <er...@gmail.com>.
1> Yes. Note that the distributed tf/idf is an issue, although it's changing.
That is, if your documents are statistically very different across
shards, the scores aren't really comparable. This is changing, but
I don't think it's committed yet.
2> Well, you're mixing apples and oranges I think. The general
recommendation is to use a single core and *replicate* it
across as many machines as necessary until you index gets
too big to fit on your machines (i.e. you cannot get decent
query times at all). This is NOT distributed searching as
each request is wholly serviced by a single slave searcher.
Once you cross the threshold of what fits on your hardware,
you really have no choice except to shard and use distributed
searching.
There is certainly some overhead, but since you have no
choice but to pay it, you just cope <G>.
At very large scale (i.e. lots of shards on lots of machines),
you run into the "laggard problem". That is, as the number
of shards increases, so does the chance that at least one
of them will, for whatever reason, take an anomalously long
time to complete which will slow your final results.
FWIW
Erick
On Sun, Jan 1, 2012 at 4:34 AM, shlomi java <sh...@gmail.com> wrote:
> hola,
>
> 1) When distributing search across several Shards, is the merged result
> reflects the overall ranking, cross-shards?
> I'm talking about stuff like "document frequency".
> I guess it does, otherwise distributed search wouldn't have overhead.
>
> talking about overhead,
> 2) is there a known ratio of the overhead of using shards against
> single core, and the impact on performance for adding the N+1 shard to the
> distributed index?
>
> thanks for any knowledge/thought.
> ShlomiJ