You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Atri Sharma (JIRA)" <ji...@apache.org> on 2019/07/19 10:20:00 UTC

[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

    [ https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16888766#comment-16888766 ] 

Atri Sharma commented on LUCENE-8727:
-------------------------------------

[~jpountz] Here are two thoughts for the implementation of same:

 

1) Shared Priority Queue: A shared priority queue which is held in parent CollectorManager is used by all Collectors. This flows down naturally since post collection of top N hits globally, the minimum competitive score can be increased without Collectors getting involved and further hits will be ranked accordingly. However, the downside is that the priority queue implementation will have to be synchronized, so there can be performance hit as the critical path of segment collection will be affected.

 

2) Alternate way can be that for N hits, each slice gets an equal number of prorated hits to start with (M collectors, so N/M hits). Each Collector gets a callback supplier which the Collector will call with the number of hits collected till the point and the score of the highest scoring local hit. The callback will return the minimum competitive hit globally seen till now, and the Collector will use that score to filter out remaining hits. The point in time when a Collector calls the callback mechanism can be relative, simplest being after each N/M hits. The callback will be provided by the CollectorManager. The downside of this approach is that there is communication involved between Collectors and CollectorManager, and some redundant hits can be collected due to the periodic callback invocation. In contrast, the shared priority queue mechanism allows for accurate filtering.

 

WDYT?

> IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8727
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8727
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> If IndexSearcher is configured with an executor, then the top docs for each slice are computed separately before being merged once the top docs for all slices are computed. With block-max WAND this is a bit of a waste of resources: it would be better if an increase of the min competitive score could help skip non-competitive hits on every slice and not just the current one.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org