You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ba...@oracle.com on 2021/06/09 17:07:56 UTC

Potential bug

Hi,-

  i think this is a potential bug


i set this time totalHitsThreshold to 10 and i get totalhits reported as 
1655 but i get 10 results in total.

I think this suggests that there might be a bug with 
TopScoreDocCollector algorithm.


Best regards



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Potential bug

Posted by ba...@oracle.com.
Thanks Adrien, but the differences is too far apart.

I think the algorithm needs to be revised.


what if the user needs to limit the search process?

that leaves no control.

there should be a way to speedup lucene then if this is not possible,

since for some simple queries it takes half a second which is too long.

Best regards


On 6/9/21 1:13 PM, Adrien Grand wrote:
> Hi Baris,
>
> totalhitsThreshold is actually a minimum threshold, not a maximum threshold.
>
> The problem is that Lucene cannot directly identify the top matching
> documents for a given query. The strategy it adopts is to start collecting
> hits naively in doc ID order and to progressively raise the bar about the
> minimum score that is required for a hit to be competitive in order to skip
> non-competitive documents. So it's expected that Lucene still collects 100s
> or 1000s of hits, even though the collector is configured to only compute
> the top 10 hits.
>
> On Wed, Jun 9, 2021 at 7:07 PM <ba...@oracle.com> wrote:
>
>> Hi,-
>>
>>    i think this is a potential bug
>>
>>
>> i set this time totalHitsThreshold to 10 and i get totalhits reported as
>> 1655 but i get 10 results in total.
>>
>> I think this suggests that there might be a bug with
>> TopScoreDocCollector algorithm.
>>
>>
>> Best regards
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Potential bug

Posted by Adrien Grand <jp...@gmail.com>.
Hi Baris,

totalhitsThreshold is actually a minimum threshold, not a maximum threshold.

The problem is that Lucene cannot directly identify the top matching
documents for a given query. The strategy it adopts is to start collecting
hits naively in doc ID order and to progressively raise the bar about the
minimum score that is required for a hit to be competitive in order to skip
non-competitive documents. So it's expected that Lucene still collects 100s
or 1000s of hits, even though the collector is configured to only compute
the top 10 hits.

On Wed, Jun 9, 2021 at 7:07 PM <ba...@oracle.com> wrote:

> Hi,-
>
>   i think this is a potential bug
>
>
> i set this time totalHitsThreshold to 10 and i get totalhits reported as
> 1655 but i get 10 results in total.
>
> I think this suggests that there might be a bug with
> TopScoreDocCollector algorithm.
>
>
> Best regards
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

-- 
Adrien