You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/02/11 15:46:13 UTC

[jira] [Reopened] (LUCENE-6198) two phase intersection

     [ https://issues.apache.org/jira/browse/LUCENE-6198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adrien Grand reopened LUCENE-6198:
----------------------------------
    Lucene Fields:   (was: New)

I'll try to summarize API challenges that have been mentioned or that I can think of:

 - should match confirmation be built-in DocIdSetIterator (ie. adding a matches() method and requiring callers to always verify matches)? While it would work, one issue I have is that it would also make the simple cases such as TermScorer more complicated? So I like having an optional method or marker interface better.

 - ideally this would not be intrusive and just an incremental improvement over what we currently have today

 - this thing cannot be a marker interface, otherwise wrappers like ConstantScoreQuery could not work properly

 - we need to somehow reuse the DocIdSetIterator abstraction for code reuse (approximations cannot be a totally different object)

 - one concern was that it should work well for queries and filters, but since we are slowly merging both, it would probably ok to make it work for queries only (which potentially means that we could expose methods only on Scorer instead of DISI, at least as a start).

 - should we extend DocIdSetIterator and add a 'matches' method, or have another class that exposes a DocIdSetIterator 'approximation' and a 'matches' method. While the patch on LUCENE-6198 uses option 1, I like the fact that with option 2 we do not extend DocIdSetIterator and more clearly separate the approximation from the confirmation (like the API proposal on SOLR-7044)

 - in a conjunction disi, should there be a way to configure the order in which confirmations should be performed (kind-of similarly to the cost API, by trying to confirm the cheapest instances first)? I think so but I we can probably delay this problem to later?

Here is a new patch which is very similar to the current one, but with two main differences:
 - the approximation DISI has been replaced with a TwoPhaseDocIdSetIterator class which exposes an iterator called 'approximation' and a 'boolean matches()' method
 - approximation is only exposed on Scorer

> two phase intersection
> ----------------------
>
>                 Key: LUCENE-6198
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6198
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>         Attachments: LUCENE-6198.patch
>
>
> Currently some scorers have to do a lot of per-document work to determine if a document is a match. The simplest example is a phrase scorer, but there are others (spans, sloppy phrase, geospatial, etc).
> Imagine a conjunction with two MUST clauses, one that is a term that matches all odd documents, another that is a phrase matching all even documents. Today this conjunction will be very expensive, because the zig-zag intersection is reading a ton of useless positions.
> The same problem happens with filteredQuery and anything else that acts like a conjunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org