You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2019/12/18 10:30:16 UTC

[GitHub] [lucene-solr] romseygeek opened a new pull request #1097: LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals

romseygeek opened a new pull request #1097: LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals
URL: https://github.com/apache/lucene-solr/pull/1097
 
 
   If you have repeating intervals in an ordered or unordered interval source, you currently get somewhat confusing behaviour:
   
   * `ORDERED(a, a, b)` will return an extra interval over just `a b` if it first matches `a a b`, meaning that you can get incorrect results if used in a `CONTAINING` filter - `CONTAINING(ORDERED(x, y), ORDERED(a, a, b))` will match on the document `a x a b y`
   * `UNORDERED(a, a)` will match on documents that just containg a single `a`.
   
   This commit adds a `RepeatingIntervalsSource` that correctly handles repeats within ordered and unordered sources.  It also changes the way that gaps are calculated within ordered and unordered sources, by using a new `width()` method on `IntervalIterator`.  The default implementation just returns `end() - start() + 1`, but `RepeatingIntervalsSource` instead returns the sum of the widths of its child iterators.  This preserves `maxgaps` filtering on ordered and unordered sources that contain repeats.
   
   In order to correctly handle matches in this scenario, `IntervalsSource#matches` now always returns an explicit `IntervalsMatchesIterator` rather than a plain `MatchesIterator`, which adds `gaps()` and `width()` methods so that submatches can be combined in the same way that subiterators are.  Extra checks have been added to `checkIntervals()` to ensure that the same intervals are returned by both iterator and matches, and a fix to `DisjunctionIntervalIterator#matches()` is also included - `DisjunctionIntervalIterator` minimizes its intervals, while `MatchesUtils.disjunction` does not, so there was a discrepancy between the two methods.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] romseygeek merged pull request #1097: LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals

Posted by GitBox <gi...@apache.org>.
romseygeek merged pull request #1097: LUCENE-9099: Correctly handle repeats in ORDERED and UNORDERED intervals
URL: https://github.com/apache/lucene-solr/pull/1097
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org