You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Adrien Grand (Jira)" <ji...@apache.org> on 2021/09/23 18:19:00 UTC
[jira] [Updated] (LUCENE-10121) WANDScorer could skip more

     [ https://issues.apache.org/jira/browse/LUCENE-10121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand updated LUCENE-10121:
----------------------------------
    Description: 
I was looking at the NYC Taxis benchmark recently and got puzzled by the fact that the query (cab_color:y OR cab_color:g) ran so slowly: http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps. This is supposed to be a best-case scenario for WAND: there are only two possible scores for documents, this query should return instantly in the sorted case.

After digging I noticed that this is due to the scaling that we due in WANDScorer to avoid floating-point rounding errors: documents can be considered as possible matches according to the scaled scores (which are rounded) while they cannot possibly match according to the actual scores. This is especially visible when many blocks contain a document that has the maximum score across the entire postings list, so any field indexed with indexOptions=DOCS or constant-scoring queries for instance.

  was:
I was looking at the NYC Taxis benchmark recently and got puzzled by the fact that the query (cab_color:y OR cab_color:g) ran so slowly: http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps. This is supposed to be a best-case scenario for WAND: there are only two possible scores for documents, this query should return instantly in all scenarios (dense, sparse, sparse and sorted).

After digging I noticed that this is due to the scaling that we due in WANDScorer to avoid floating-point rounding errors: documents can be considered as possible matches according to the scaled scores (which are rounded) while they cannot possibly match according to the actual scores. This is especially visible when many blocks contain a document that has the maximum score across the entire postings list, so any field indexed with indexOptions=DOCS or constant-scoring queries for instance.


> WANDScorer could skip more
> --------------------------
>
>                 Key: LUCENE-10121
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10121
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> I was looking at the NYC Taxis benchmark recently and got puzzled by the fact that the query (cab_color:y OR cab_color:g) ran so slowly: http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#search_bq_qps. This is supposed to be a best-case scenario for WAND: there are only two possible scores for documents, this query should return instantly in the sorted case.
> After digging I noticed that this is due to the scaling that we due in WANDScorer to avoid floating-point rounding errors: documents can be considered as possible matches according to the scaled scores (which are rounded) while they cannot possibly match according to the actual scores. This is especially visible when many blocks contain a document that has the maximum score across the entire postings list, so any field indexed with indexOptions=DOCS or constant-scoring queries for instance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org