You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Wendt (JIRA)" <ji...@apache.org> on 2016/03/30 14:54:25 UTC

[jira] [Updated] (LUCENE-7151) Nested spanNear scoring error when inner clauses overlap positions

     [ https://issues.apache.org/jira/browse/LUCENE-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Wendt updated LUCENE-7151:
--------------------------------
    Description: 
For spanNear([spanNear([contents:word1, contents:word3], 2, true), spanNear([contents:word2, contents:word3], 2, true)], 2, false)

Scores for the following two documents should be the same but are not.
doc1: [----- word1 word2 ----- word2 word3 ----- word1 word2 word3 -----]
doc2: [----- word2 word3 ----- word1 word3 ----- word1 word2 word3 -----]

The positions of the inner clauses effect the scoring for the of the final 3-term phrase. This appears to be a side-effect of the span-scoring rewrite in 5.2(?). 

NearSpansUnordered's SpansCell.adjustMax() uses end-position values to decide maxEndPositionCell while the SpanPositionQueue uses start-position and end-position values to sort the SpanCells. This means that maxEndPositionCell will be incorrectly set or not set depending on previous positions.

I can provide example code illustrating the score error.

  was:
For spanNear([spanNear([contents:word1, contents:word3], 2, true), spanNear([contents:word2, contents:word3], 2, true)], 2, false)

Scores for the following two documents should be the same but are not.
doc1: [----- word1 word2 ----- word2 word3 ----- word1 word2 word3 -----]
doc2: [----- word2 word3 ----- word1 word3 ----- word1 word2 word3 -----]

The positions of the inner clauses effect the scoring for the of the final 3-term phrase. This appears to be a side-effect of the span-scoring rewrite in 5.2(?). 

The SpansCell.adjustMax() uses end-position values to decide maxEndPositionCell while the SpanPositionQueue uses start-position and end-position values to sort the SpanCells. This means that maxEndPositionCell will be incorrectly set or not set depending on previous positions.

I can provide example code illustrating the score error.


> Nested spanNear scoring error when inner clauses overlap positions
> ------------------------------------------------------------------
>
>                 Key: LUCENE-7151
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7151
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/query/scoring
>    Affects Versions: 5.3.1, 5.5
>         Environment: Windows, Linux
>            Reporter: David Wendt
>              Labels: newbie
>         Attachments: SpanScore5Bug.java
>
>
> For spanNear([spanNear([contents:word1, contents:word3], 2, true), spanNear([contents:word2, contents:word3], 2, true)], 2, false)
> Scores for the following two documents should be the same but are not.
> doc1: [----- word1 word2 ----- word2 word3 ----- word1 word2 word3 -----]
> doc2: [----- word2 word3 ----- word1 word3 ----- word1 word2 word3 -----]
> The positions of the inner clauses effect the scoring for the of the final 3-term phrase. This appears to be a side-effect of the span-scoring rewrite in 5.2(?). 
> NearSpansUnordered's SpansCell.adjustMax() uses end-position values to decide maxEndPositionCell while the SpanPositionQueue uses start-position and end-position values to sort the SpanCells. This means that maxEndPositionCell will be incorrectly set or not set depending on previous positions.
> I can provide example code illustrating the score error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org