You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by LuXugang <xu...@icloud.com.INVALID> on 2021/07/01 09:56:39 UTC

[Lucene] Selection of threshold

Hi,

While reading Lucene source code, I have a tiny question about the selection of threshold:threshold = value >>> 3.

eg. in NumericComparator#updateCompetitiveIterator(), as 'threshold = iteratorCost >>> 3'  a condition for  whether to update iterator

eg. in IndexOrDocValuesQuery, as 'threshold = cost() >>> 3'  a condition for choosing indexScorerSupplier or dvScorerSupplier

So the selection of threshold base some theory or tradeoff or other reason?

Could  I get some suggestion?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: [Lucene] Selection of threshold

Posted by LuXugang <xu...@icloud.com.INVALID>.
Thanks for sharing your ideas,  Adrien~~

> 2021年7月2日 上午1:26,Adrien Grand <jp...@gmail.com> 写道:
> 
> Hi,
> 
> This is just a number that proved to work well in practice.
> 
> The general idea is that we want to narrow down the set of candidates periodically in order to speed up query execution. If we do it too often, then we might spend more time narrowing down the set of candidates than actually evaluating candidates, and if we don't do it often enough, then we're still evaluating lots of candidates that have no chance of being competitive and the query is slow too. What the code samples you shared mean is that Lucene would only re-evaluate the set of candidates whenever it seems that we could reduce the number of candidates by 8x.
> 
> On Thu, Jul 1, 2021 at 11:57 AM LuXugang <xu...@icloud.com.invalid> wrote:
> Hi,
> 
> While reading Lucene source code, I have a tiny question about the selection of threshold:threshold = value >>> 3.
> 
> eg. in NumericComparator#updateCompetitiveIterator(), as 'threshold = iteratorCost >>> 3'  a condition for  whether to update iterator
> 
> eg. in IndexOrDocValuesQuery, as 'threshold = cost() >>> 3'  a condition for choosing indexScorerSupplier or dvScorerSupplier
> 
> So the selection of threshold base some theory or tradeoff or other reason?
> 
> Could  I get some suggestion?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <ma...@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org <ma...@lucene.apache.org>
> 
> 
> 
> -- 
> Adrien


Re: [Lucene] Selection of threshold

Posted by Adrien Grand <jp...@gmail.com>.
Hi,

This is just a number that proved to work well in practice.

The general idea is that we want to narrow down the set of candidates
periodically in order to speed up query execution. If we do it too often,
then we might spend more time narrowing down the set of candidates than
actually evaluating candidates, and if we don't do it often enough, then
we're still evaluating lots of candidates that have no chance of being
competitive and the query is slow too. What the code samples you shared
mean is that Lucene would only re-evaluate the set of candidates whenever
it seems that we could reduce the number of candidates by 8x.

On Thu, Jul 1, 2021 at 11:57 AM LuXugang <xu...@icloud.com.invalid>
wrote:

> Hi,
>
> While reading Lucene source code, I have a tiny question about the
> selection of threshold:threshold = value >>> 3.
>
> eg. in NumericComparator#updateCompetitiveIterator(), as 'threshold =
> iteratorCost >>> 3'  a condition for  whether to update iterator
>
> eg. in IndexOrDocValuesQuery, as 'threshold = cost() >>> 3'  a condition
> for choosing indexScorerSupplier or dvScorerSupplier
>
> So the selection of threshold base some theory or tradeoff or other reason?
>
> Could  I get some suggestion?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

-- 
Adrien