You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (JIRA)" <ji...@apache.org> on 2011/01/23 14:34:43 UTC

[jira] Commented: (LUCENE-2879) MultiPhraseQuery sums its own idf instead of Similarity.

    [ https://issues.apache.org/jira/browse/LUCENE-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985318#action_12985318 ] 

Doron Cohen commented on LUCENE-2879:
-------------------------------------

+1 for fixing this inconsistent behavior.
BTW also SpanWeight calls idfExplain() for same reason.
Patch looks good, new test case passes with the fix and fails without it.

A small thing that bothered me was that an explanation is created although the user did not call explain(), and in general explain() is considered slower, but it is called once per query, so it should not be a perf issue, and that's the case already for two other queries so anyhow this one (MFQ) should first be made consistent, which is done by this patch.

It is interesting that the implementation of a similar logic in SpanWeight is more compact:
{code:title=SpanWeight: calls extractTerms()}
terms=new HashSet<Term>();
query.extractTerms(terms);
idfExp = similarity.idfExplain(terms, searcher);
{code}

But doing the same in MFQ would change its logic, as it would consider each term only once. 
Not saying that the patch should change, just pointing out the difference in sum-of-square-weights computation between SpanWeight and MFQ.
Boolean Query fore example, would iterate over its sub queries and sum theirs, and so, if it so happens that the same term appears in two descendant queries that term would contribute twice to the sum. In this sense, MFQ and BQ behave similarly, both differ from SpanQuery... well I guess this falls to the "black magic" area :)

> MultiPhraseQuery sums its own idf instead of Similarity.
> --------------------------------------------------------
>
>                 Key: LUCENE-2879
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2879
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 2.9.5, 3.0.4, 3.1, 4.0
>
>         Attachments: LUCENE-2879.patch
>
>
> MultiPhraseQuery is a generalized version of PhraseQuery, and computes IDF the same way by default (by summing across the terms).
> The problem is it doesn't let the Similarity do this: PhraseQuery calls Similarity.idfExplain(Collection<Term> terms, IndexSearcher searcher),
> but MultiPhraseQuery just sums itself, calling Similarity.idf(int, int) for each term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org