You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by johnson hong <ho...@goodhope.net> on 2009/12/31 08:29:29 UTC

numFound is changing when query across distributed-seach with the same query.

Hi,all.
    I found a problem on distributed-seach.
    when i use "?q=keyword&start=0&rows=20" to query across
distributed-seach,it will return numFound="181" ,then I
    change the start param from 0 to 100,it will return numFound="131".
    why return different numFound with same query ?

-- 
View this message in context: http://old.nabble.com/numFound-is-changing-when-query-across-distributed-seach-with-the-same-query.-tp26976128p26976128.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: numFound is changing when query across distributed-seach with the same query.

Posted by Lance Norskog <go...@gmail.com>.
The current distributed search design assumes that all document ids
are unique across the set of cores. If you have duplicates, you're on
your on.

On Fri, Jan 1, 2010 at 7:10 AM, Yonik Seeley <yo...@lucidimagination.com> wrote:
> On Thu, Dec 31, 2009 at 10:26 PM, Chris Hostetter
> <ho...@fucit.org> wrote:
>> why do we bother detecthing/removing the duplicates?
>>
>> strictly speaking docs with duplicate IDs on multiple shards is a "garbage
>> in" situation, i can understanding Solr taking a little extra effort to
>> not fail hard if this situation is encountered, but why update the
>> numFound at all, or remove the duplicates from the list? ... why not leave
>> them in as is?  (then numFound would never change)
>
> Distrib search keys some things off of the unique id, so when we
> encountered duplicates in the past it failed hard.  IIRC only keeping
> one doc with the same id was actually the easiest way to not fail
> hard.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: numFound is changing when query across distributed-seach with the same query.

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Dec 31, 2009 at 10:26 PM, Chris Hostetter
<ho...@fucit.org> wrote:
> why do we bother detecthing/removing the duplicates?
>
> strictly speaking docs with duplicate IDs on multiple shards is a "garbage
> in" situation, i can understanding Solr taking a little extra effort to
> not fail hard if this situation is encountered, but why update the
> numFound at all, or remove the duplicates from the list? ... why not leave
> them in as is?  (then numFound would never change)

Distrib search keys some things off of the unique id, so when we
encountered duplicates in the past it failed hard.  IIRC only keeping
one doc with the same id was actually the easiest way to not fail
hard.

-Yonik
http://www.lucidimagination.com

Re: numFound is changing when query across distributed-seach with the same query.

Posted by Chris Hostetter <ho...@fucit.org>.
: You probably have duplicates (docs on different shards with the same id).
: Deeper paging will detect more of them.
: It does raise the question of if we should be changing numFound, or
: indicating a separate duplicate count.  Duplicates aren't eliminated

random thought (from someone whose never really considered distributed 
searching in much depth) ...

why do we bother detecthing/removing the duplicates?

strictly speaking docs with duplicate IDs on multiple shards is a "garbage 
in" situation, i can understanding Solr taking a little extra effort to 
not fail hard if this situation is encountered, but why update the 
numFound at all, or remove the duplicates from the list? ... why not leave 
them in as is?  (then numFound would never change)


-Hoss


Re: numFound is changing when query across distributed-seach with the same query.

Posted by johnson hong <ho...@goodhope.net>.


Yonik Seeley-2 wrote:
> 
> On Thu, Dec 31, 2009 at 2:29 AM, johnson hong
> <ho...@goodhope.net> wrote:
>>
>> Hi,all.
>>    I found a problem on distributed-seach.
>>    when i use "?q=keyword&start=0&rows=20" to query across
>> distributed-seach,it will return numFound="181" ,then I
>>    change the start param from 0 to 100,it will return numFound="131".
> 
> You probably have duplicates (docs on different shards with the same id).
> Deeper paging will detect more of them.
> It does raise the question of if we should be changing numFound, or
> indicating a separate duplicate count.  Duplicates aren't eliminated
> from things like faceting or statistics, so it might be nice to have a
> number that was consistent with those numbers.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
Thank you Yonik,Happy New Year all.
I will check the index soon after the festival.
-- 
View this message in context: http://old.nabble.com/numFound-is-changing-when-query-across-distributed-seach-with-the-same-query.-tp26976128p26984236.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: numFound is changing when query across distributed-seach with the same query.

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Thu, Dec 31, 2009 at 2:29 AM, johnson hong
<ho...@goodhope.net> wrote:
>
> Hi,all.
>    I found a problem on distributed-seach.
>    when i use "?q=keyword&start=0&rows=20" to query across
> distributed-seach,it will return numFound="181" ,then I
>    change the start param from 0 to 100,it will return numFound="131".

You probably have duplicates (docs on different shards with the same id).
Deeper paging will detect more of them.
It does raise the question of if we should be changing numFound, or
indicating a separate duplicate count.  Duplicates aren't eliminated
from things like faceting or statistics, so it might be nice to have a
number that was consistent with those numbers.

-Yonik
http://www.lucidimagination.com