You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2018/03/08 16:23:00 UTC

[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields

    [ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391478#comment-16391478 ] 

Adrien Grand commented on LUCENE-8196:
--------------------------------------

Thanks Alan. I agree that growing a separate hierarchy of objects might help land this feature. We might even want to put first iterations of this work in sandbox to give time for the API to stabilize before we move it to core or misc.

I have some questions/comments:
 - Do we need {{IntervalIterator.score()}}? It seems to be the same value on all implementations.
 - Do we need {{advanceTo}}? It seems to me that things would be simpler and as efficient if you documented that nextPosition() may only be called when the approximation is positioned and then {{advanceTo}} would be equivalent to checking the return value of {{nextInterval}}?
 - Let's make the {{IntervalFunction}} API an implementation detail?
 - The documentation of {{cost()}} says it is the cost of finding the next interval but given how you use it in the query it looks like it is actually more about the average cost of iterating over _all_ intervals.
 - In terms of testing I would like some form of AssertingIntervalsSource to make sure that intervals are always consumed in legal ways and behave correctly.
 - More docs would help read the code. For instance IntervalsSource.intervals has no docs. By the way we might want to mention there that the same instance might be reused across calls.
 - TermIntervalsSource should check whether positions were indexed.
 - I was a bit annoyed to see the field masking hack but actually those intervals source do not need term statistics which makes the hack less horrible. Could you still document it to make sure users are aware it is a hack and explain it which circumstances it might be ok?

> Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
> ---------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8196
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8196
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8196.patch
>
>
> This ticket proposes an alternative implementation of the SpanQuery family that uses minimum-interval semantics from [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] to implement positional queries across term-based fields.  Rather than using TermQueries to construct the interval operators, as in LUCENE-2878 or the current Spans implementation, we instead use a new IntervalsSource object, which will produce IntervalIterators over a particular segment and field.  These are constructed using various static helper methods, and can then be passed to a new IntervalQuery which will return documents that contain one or more intervals so defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org