You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Dawid Weiss <da...@gmail.com> on 2020/09/17 13:20:02 UTC

Fuzzy-phrase query with "holes" using intervals?

Hmm... Is there any way to express a query for a phrase-like sequence of tokens:

a b c d

but with potential "holes" (one or more terms missing):

- b c d
a - c d
a b - d
...

I've experimented with ordered(term("a"), term(b), ...), gaps and
atLeast but I can't get it to work. I could expand terms into several
queries manually but the number of potential subsets is quite large,
hence the question. Thanks for tips.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Fuzzy-phrase query with "holes" using intervals?

Posted by Dawid Weiss <da...@gmail.com>.
Thanks Alan. I don't think my foo is strong enough to dive deep into
implementing intervals...  yet. :) I'll try to clean up what's active
on my plate and maybe later I'll return to this.

Dawid

On Thu, Sep 17, 2020 at 3:53 PM Alan Woodward <ro...@gmail.com> wrote:
>
> I think you need a sort of ‘ordered atLeast’ here.  Currently atLeast() is a mixture of a disjunction and an unordered interval, it should be possible to add something that adds additional constraints to the sets that it finds.  I think you’d need to write some code though, I can’t see a way of doing it with the current group of interval operators.
>
> > On 17 Sep 2020, at 14:20, Dawid Weiss <da...@gmail.com> wrote:
> >
> > Hmm... Is there any way to express a query for a phrase-like sequence of tokens:
> >
> > a b c d
> >
> > but with potential "holes" (one or more terms missing):
> >
> > - b c d
> > a - c d
> > a b - d
> > ...
> >
> > I've experimented with ordered(term("a"), term(b), ...), gaps and
> > atLeast but I can't get it to work. I could expand terms into several
> > queries manually but the number of potential subsets is quite large,
> > hence the question. Thanks for tips.
> >
> > Dawid
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Fuzzy-phrase query with "holes" using intervals?

Posted by Alan Woodward <ro...@gmail.com>.
I think you need a sort of ‘ordered atLeast’ here.  Currently atLeast() is a mixture of a disjunction and an unordered interval, it should be possible to add something that adds additional constraints to the sets that it finds.  I think you’d need to write some code though, I can’t see a way of doing it with the current group of interval operators.

> On 17 Sep 2020, at 14:20, Dawid Weiss <da...@gmail.com> wrote:
> 
> Hmm... Is there any way to express a query for a phrase-like sequence of tokens:
> 
> a b c d
> 
> but with potential "holes" (one or more terms missing):
> 
> - b c d
> a - c d
> a b - d
> ...
> 
> I've experimented with ordered(term("a"), term(b), ...), gaps and
> atLeast but I can't get it to work. I could expand terms into several
> queries manually but the number of potential subsets is quite large,
> hence the question. Thanks for tips.
> 
> Dawid
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org