You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Adrien Grand (JIRA)" <ji...@apache.org> on 2015/08/06 16:52:05 UTC

[jira] [Comment Edited] (LUCENE-6717) TermAutomatonQuery should be two-phased

    [ https://issues.apache.org/jira/browse/LUCENE-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660104#comment-14660104 ] 

Adrien Grand edited comment on LUCENE-6717 at 8/6/15 2:51 PM:
--------------------------------------------------------------

I like it too, I'm curious how it compares to PhraseQuery now. :-)

I think we should set cost=required.cost() when there are required terms, otherwise this query will still return the same cost as a disjunction even though it can do much better.

I'm curious about the assertion at the beginning of the doNext() method, it's been both changed and commented out, should we just remove it if invariants are hard to verify?

TermAutomatonQuery.termIsRequired is documented as public for testing, but it looks to me that pkg-private would be enough?


was (Author: jpountz):
I like it too, I'm curious how it compared to PhraseQuery now. :-)

I think we should set cost=required.cost() when there are required terms, otherwise this query will still return the same cost as a disjunction even though it can do much better.

I'm curious about the assertion at the beginning of the doNext() method, it's been both changed and commented out, should we just remove it if invariants are hard to verify?

TermAutomatonQuery.termIsRequired is documented as public for testing, but it looks to me that pkg-private would be enough?

> TermAutomatonQuery should be two-phased
> ---------------------------------------
>
>                 Key: LUCENE-6717
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6717
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-6717.patch
>
>
> {{TermAutomatonQuery}} (still in sandbox) is a simple way to get accurate query-time multi-token synonyms using the new {{SynonymGraphFilter}} from LUCENE-6664.  It already has a utility class to directly translate an incoming {{TokenStream}} into a corresponding query.
> However the query is likely quite slow because it always iterates positions for all terms in the automaton.
> I think one simple approach is to walk the automaton and find the subset of terms (if any) that appear in common to all paths, and then approximate with {{ConjunctionDISI}} like {{PhraseQuery}} does.  Such a subset doesn't always exist for an automaton (i.e. it could be empty), so the logic would have to be conditional...
> And I think there are more complex approximations we could make, but using {{ConjunctionDISI}} seems like a simple start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org