You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/02/02 04:07:00 UTC
[jira] [Commented] (LUCENE-10236) CombinedFieldsQuery to use fieldAndWeights.values() when constructing MultiNormsLeafSimScorer for scoring
[ https://issues.apache.org/jira/browse/LUCENE-10236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485557#comment-17485557 ]
ASF subversion and git services commented on LUCENE-10236:
----------------------------------------------------------
Commit a17d2ebcd5af4f3d51e0265370931f9ad397dd81 in lucene's branch refs/heads/branch_9x from zacharymorn
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=a17d2eb ]
LUCENE-10236: Update field-weight used in CombinedFieldQuery scoring calculation (9.1.0 Backporting) (#588)
> CombinedFieldsQuery to use fieldAndWeights.values() when constructing MultiNormsLeafSimScorer for scoring
> ---------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-10236
> URL: https://issues.apache.org/jira/browse/LUCENE-10236
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/sandbox
> Reporter: Zach Chen
> Assignee: Zach Chen
> Priority: Minor
> Time Spent: 6h 50m
> Remaining Estimate: 0h
>
> This is a spin-off issue from discussion in [https://github.com/apache/lucene/pull/418#issuecomment-967790816], for a quick fix in CombinedFieldsQuery scoring.
> Currently CombinedFieldsQuery would use a constructed [fields|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L420-L421] object to create a MultiNormsLeafSimScorer for scoring, but the fields object may contain duplicated field-weight pairs as it is [built from looping over fieldTerms|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L404-L414], resulting into duplicated norms being added during scoring calculation in MultiNormsLeafSimScorer.
> E.g. for CombinedFieldsQuery with two fields and two values matching a particular doc:
> {code:java}
> CombinedFieldQuery query =
> new CombinedFieldQuery.Builder()
> .addField("field1", (float) 1.0)
> .addField("field2", (float) 1.0)
> .addTerm(new BytesRef("foo"))
> .addTerm(new BytesRef("zoo"))
> .build(); {code}
> I would imagine the scoring to be based on the following:
> # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo)
> # Sum of norms on doc = norm(field1) + norm(field2)
> but the current logic would use the following for scoring:
> # Sum of freqs on doc = freq(field1:foo) + freq(field2:foo) + freq(field1:zoo) + freq(field2:zoo)
> # Sum of norms on doc = norm(field1) + norm(field2) + norm(field1) + norm(field2)
>
> In addition, this differs from how MultiNormsLeafSimScorer is constructed from CombinedFieldsQuery explain function, which [uses fieldAndWeights.values()|https://github.com/apache/lucene/blob/3b914a4d73eea8923f823cbdb869de39213411dd/lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java#L387-L389] and does not contain duplicated field-weight pairs.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org