You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michele Palmia (Jira)" <ji...@apache.org> on 2020/03/10 12:58:00 UTC

[jira] [Comment Edited] (LUCENE-9269) Blended queries with boolean rewrite can result in inconstitent scores

    [ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055891#comment-17055891 ] 

Michele Palmia edited comment on LUCENE-9269 at 3/10/20, 12:57 PM:
-------------------------------------------------------------------

I added a very simple test (with my very limited Lucene testing skills) that emulates example c) above and checks for the score of the top document. As there is no "right" score, I just check for one of the two possible scores and have the test fail on the other.

I'm having a hard time wrapping my head around what the right behavior should be in this case (and thus coming up with a more sensible test and fix).

In case that's useful, I should probably add that the randomness in the scoring behavior is due to the HashMap underlying MultiSet: when should clauses are processed for deduplication, they're served in an arbitrary order (see [BooleanQuery.java:370|[https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java#L370]])


was (Author: micpalmia):
I added a very simple test (with my very limited Lucene testing skills) that simply emulates example c) above and checks for the score of the top document. As there is no "right" score, I just check for one of the two possible scores and have the test fail on the other.

I'm having a hard time wrapping my head around what the right behavior should be in this case (and thus coming up with a more sensible test and fix).

> Blended queries with boolean rewrite can result in inconstitent scores
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-9269
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9269
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 8.4
>            Reporter: Michele Palmia
>            Priority: Minor
>         Attachments: LUCENE-9269-test.patch
>
>
> If two blended queries are should clauses of a boolean query and are built so that
>  * some of their terms are the same
>  * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
>  # if the overlapping terms are not boosted, the df of the term in the first blended query is used
>  # if any of the overlapping terms is boosted, the df is picked at (what looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> a)
> Blended(f:a f:b) Blended (f:a)
>         df: 3             df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3      df:2
> b)
> Blended(f:a) Blended(f:a f:b)
>         df: 2        df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
>  df: 2     df:2
> c)
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
>         df: 3                  df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
>  df:?       df:2
> {code}
> with ? either 2 or 3, depending on the run.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org