You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Alan Woodward (Jira)" <ji...@apache.org> on 2019/12/18 10:22:00 UTC

[jira] [Created] (LUCENE-9099) Correctly handle repeats in ordered and unordered intervals

Alan Woodward created LUCENE-9099:
-------------------------------------

             Summary: Correctly handle repeats in ordered and unordered intervals
                 Key: LUCENE-9099
                 URL: https://issues.apache.org/jira/browse/LUCENE-9099
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Alan Woodward
            Assignee: Alan Woodward


If you have repeating intervals in an ordered or unordered interval source, you currently get somewhat confusing behaviour:

* ORDERED(a, a, b) will return an extra interval over just `a b` if it first matches `a a b`, meaning that you can get incorrect results if used in a CONTAINING filter - CONTAINING(ORDERED(x, y), ORDERED(a, a, b)) will match on the document `a x a b y`
* UNORDERED(a, a) will match on documents that just containg a single `a`.

It is possible to deal with the unordered case when building sources by rewriting duplicates to nested ORDERED clauses, so that UNORDERED(a, b, c, a, b) becomes UNORDERED(ORDERED(a, a), ORDERED(b, b), c), but this then breaks MAXGAPS filtering.

We should try and fix this within intervals themselves.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org