You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Paul Elschot (JIRA)" <ji...@apache.org> on 2017/03/02 20:55:45 UTC

[jira] [Comment Edited] (LUCENE-7398) Nested Span Queries are buggy

    [ https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892972#comment-15892972 ] 

Paul Elschot edited comment on LUCENE-7398 at 3/2/17 8:54 PM:
--------------------------------------------------------------

One way to view the problem is that when span end positions are used to determine the slop, it becomes impossible to determine an order for moving the subspans to a next position.

So one direction out of this could be: use NearSpans that determines the slop only by the start positions of the subspans. That leaves only the cases in which the subspans can start (and maybe also end) at the same position.
To make sure that all the subspans move forward after a match we could move them all forward until after the current match, and while doing that also count/collect them for scoring/highlighting as long as they are within the match. That should solve the bug reported here, which is about scoring a missed matching occurrence.

This limits the required slop to using only the starting positions of the subspans. Could this work?



was (Author: paul.elschot@xs4all.nl):
On way to view the problem is that when span end positions are used to determine the slop, it becomes impossible to determine an order for moving the subspans to a next position.

So one direction out of this could be: use NearSpans that determines the slop only by the start positions of the subspans. That leaves only the cases in which the subspans can start (and maybe also end) at the same position.
To make sure that all the subspans move forward after a match we could move them all forward until after the current match, and while doing that also count/collect them for scoring/highlighting as long as they are within the match. That should solve the bug reported here, which is about scoring a missed matching occurrence.

This limits the required slop to using only the starting positions of the subspans. Could this work?


> Nested Span Queries are buggy
> -----------------------------
>
>                 Key: LUCENE-7398
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7398
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 5.5, 6.x
>            Reporter: Christoph Goller
>            Assignee: Alan Woodward
>            Priority: Critical
>         Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as "coordinate gene research". It does not match  "coordinate gene mapping research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It probably stopped working with the changes on SpanQueries in 5.3. I will attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org