You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Alan Woodward (JIRA)" <ji...@apache.org> on 2019/03/15 17:29:00 UTC

[jira] [Commented] (LUCENE-8477) Improve handling of inner disjunctions in intervals

    [ https://issues.apache.org/jira/browse/LUCENE-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793794#comment-16793794 ] 

Alan Woodward commented on LUCENE-8477:
---------------------------------------

Here is a proposal to fix this, using the new QueryVisitor API to work out if disjunctions have any sub-clauses with common first terms.  Given an interval {{BLOCK(a,or(BLOCK(b,c),b),d)}} we can ensure that all matches are collected by rewriting things so that the final clause {{d}} is moved inside the disjunction, yielding {{BLOCK(a,or(BLOCK(b,c,d),BLOCK(b,d)))}}.  Checking for common prefixes means that intervals of the form {{BLOCK(a,or(BLOCK(b,c),d),e)}} don't need to be rewritten, which will be more efficient when the query is run as we only need to iterate positions for the final term once.

> Improve handling of inner disjunctions in intervals
> ---------------------------------------------------
>
>                 Key: LUCENE-8477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8477
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8477.patch
>
>
> The current implementation of the disjunction interval produced by {{Intervals.or}} is a direct implementation of the OR operator from the Vigna paper.  This produces minimal intervals, meaning that (a) is preferred over (a b), and (b) also over (a b).  This has advantages when it comes to counting intervals for scoring, but also has drawbacks when it comes to matching.  For example, a phrase query for ((a OR (a b)) BLOCK (c)) will not match the document (a b c), because (a) will be preferred over (a b), and (a c) does not match.
> This ticket is to discuss the best way of dealing with disjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org