You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Paul Elschot (JIRA)" <ji...@apache.org> on 2016/12/11 21:44:58 UTC

[jira] [Comment Edited] (LUCENE-7580) Spans tree scoring

    [ https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740402#comment-15740402 ] 

Paul Elschot edited comment on LUCENE-7580 at 12/11/16 9:44 PM:
----------------------------------------------------------------

Compared to the previous patch, this adds a nonMatchSlop attribute to SpanNearQuery,
and drops the nonMatchSlopFactor argument from SpansTreeQuery.

nonMatchSlop is the distance for determining a slop factor that is to be used for non matching occurrences of a SpanNearQuery.
Smaller values for this distance will increase the score contribution of non matching occurrences via
SimScorer.computeSlopFactor()

But smaller values for this distance, i.e. higher score contribution of non matching occurrences,
may lead to a scoring inconsistency between two span near queries that only differ in the allowed slop.
For example consider query A with a smaller allowed slop and query B with a larger one.
For query B there can be more matches, and these should increase the score of B
when compared to the score of A.
So for each extra match at B, the non matching score for query A should be lower than
the matching score for query B.
This may not be the case when the non matching score contribution is too high.

To have consistent scoring between two such queries,
choose a non matching slop that is larger than the largest allowed match slop,
and provide that non matching slop to both queries.
In case this consistency is not needed, nonMatchSlop can be chosen to be somewhat
larger than the maximum allowed match slop.

This nonMatchSlop is used in SpansTreeWeight to compute a minimal nested slop factor
from the maximum possible slops that can occur
in a SpanQuery for the nested SpanNearQueries and for nested SpanOrQueries with distance.
Finally, this minimal nested slop factor is used as the weight for scoring non matching terms.

The default nonMatchSlop for SpanNearQuery is large, Integer.MAX_VALUE/2.
Therefore by default non matching occurrences have no real score contribution.



was (Author: paul.elschot@xs4all.nl):
This adds a nonMatchSlop attribute to SpanNearQuery,
and drops the nonMatchSlopFactor argument from SpansTreeQuery.

nonMatchSlop is the distance for determining a slop factor that is to be used for non matching occurrences of a SpanNearQuery.
Smaller values for this distance will increase the score contribution of non matching occurrences via
SimScorer.computeSlopFactor()

But smaller values for this distance, i.e. higher score contribution of non matching occurrences,
may lead to a scoring inconsistency between two span near queries that only differ in the allowed slop.
For example consider query A with a smaller allowed slop and query B with a larger one.
For query B there can be more matches, and these should increase the score of B
when compared to the score of A.
So for each extra match at B, the non matching score for query A should be lower than
the matching score for query B.
This may not be the case when the non matching score contribution is too high.

To have consistent scoring between two such queries,
choose a non matching slop that is larger than the largest allowed match slop,
and provide that non matching slop to both queries.
In case this consistency is not needed, nonMatchSlop can be chosen to be somewhat
larger than the maximum allowed match slop.

This nonMatchSlop is used in SpansTreeWeight to compute a minimal nested slop factor
from the maximum possible slops that can occur
in a SpanQuery for the nested SpanNearQueries and for nested SpanOrQueries with distance.
Finally, this minimal nested slop factor is used as the weight for scoring non matching terms.

The default nonMatchSlop for SpanNearQuery is large, Integer.MAX_VALUE/2.
Therefore by default non matching occurrences have no real score contribution.


> Spans tree scoring
> ------------------
>
>                 Key: LUCENE-7580
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7580
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: master (7.0)
>            Reporter: Paul Elschot
>            Priority: Minor
>             Fix For: 6.x
>
>         Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org