You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Michele Palmia (Jira)" <ji...@apache.org> on 2020/03/10 12:15:00 UTC
[jira] [Updated] (LUCENE-9269) Blended queries with boolean rewrite
can result in inconstitent scores
[ https://issues.apache.org/jira/browse/LUCENE-9269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michele Palmia updated LUCENE-9269:
-----------------------------------
Description:
If two blended queries are should clauses of a boolean query and are built so that
* some of their terms are the same
* their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
the docFreq for the overlapping terms used for scoring is picked as follow:
* if the overlapping terms are not boosted, the df of the term in the first blended query is used
* if any of the overlapping terms is boosted, the df is picked at (what looks like) random.
A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3 df:2
Blended(f:a) Blended(f:a f:b)
df: 2 df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
df: 2 df:2
Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3 df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
df:? df:2
{code}
with ? either 2 or 3, depending on the run.
was:
If two blended queries are built so that
* some of their terms are the same
* their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
the docFreq for the overlapping terms used for scoring is picked as follow:
* if the overlapping terms are not boosted, the df of the term in the first blended query is used
* if any of the overlapping terms is boosted, the df is picked at (what looks like) random.
A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
{code:java}
1.
Blended(f:a f:b) Blended (f:a)
df: 3 df: 2
gets rewritten to:
(f:a)^2.0 (f:b)
df: 3 df:2
Blended(f:a) Blended(f:a f:b)
df: 2 df: 3
gets rewritten to:
(f:a)^2.0 (f:b)
df: 2 df:2
Blended(f:a f:b^0.66) Blended (f:a^0.75)
df: 3 df: 2
gets rewritten to:
(f:a)^1.75 (f:b)^0.66
df:? df:2
{code}
with ? either 2 or 3, depending on the run.
> Blended queries with boolean rewrite can result in inconstitent scores
> ----------------------------------------------------------------------
>
> Key: LUCENE-9269
> URL: https://issues.apache.org/jira/browse/LUCENE-9269
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: 8.4
> Reporter: Michele Palmia
> Priority: Minor
>
> If two blended queries are should clauses of a boolean query and are built so that
> * some of their terms are the same
> * their rewrite method is BlendedTermQuery.BOOLEAN_REWRITE
> the docFreq for the overlapping terms used for scoring is picked as follow:
> * if the overlapping terms are not boosted, the df of the term in the first blended query is used
> * if any of the overlapping terms is boosted, the df is picked at (what looks like) random.
> A few examples using a field with 2 terms: f:a (df: 2), and f:b (df: 3).
> {code:java}
> 1.
> Blended(f:a f:b) Blended (f:a)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 3 df:2
> Blended(f:a) Blended(f:a f:b)
> df: 2 df: 3
> gets rewritten to:
> (f:a)^2.0 (f:b)
> df: 2 df:2
> Blended(f:a f:b^0.66) Blended (f:a^0.75)
> df: 3 df: 2
> gets rewritten to:
> (f:a)^1.75 (f:b)^0.66
> df:? df:2
> {code}
> with ? either 2 or 3, depending on the run.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org