You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Zach Chen (Jira)" <ji...@apache.org> on 2021/05/01 06:35:00 UTC

[jira] [Commented] (LUCENE-9335) Add a bulk scorer for disjunctions that does dynamic pruning

    [ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337742#comment-17337742 ] 

Zach Chen commented on LUCENE-9335:
-----------------------------------

Hi [~jpountz], I've done another pass and fixed a few issues in [https://github.com/apache/lucene/pull/101]. I tried some other optimizations as well (such as moving scorer from essential to non-essential list every time minCompetitiveScore gets updated), but they didn't seems to improve the benchmark results much for pure disjunction queries in both implementations. Assuming there's no major miss / bug in the two implementations so far, I also feel that compared with BMW, the main bottleneck in BMM for 2-clause OR queries run by the benchmark is indeed the additional frequent operations performed to check and align on the max score boundary.

 

What do you think? Do you have any suggestion where I should look next?

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and PISA at [https://tantivy-search.github.io/bench/] or against research prototypes in Table 1 of [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. Given that top-level disjunctions of term queries are commonly used for benchmarking, it would be nice to optimize this case a bit more, I suspect that we could make fewer per-document decisions by implementing a BulkScorer instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org