You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Saurabh Sethi <sa...@sendgrid.com> on 2017/07/21 18:34:14 UTC

Wildcard query difference

I have a question in terms of how solr/lucene will lookup terms from
postings list for the below two queries:

1. a*
2. a*gh

My understanding is that for first, it will get all terms starting with 'a'
and issue query on those terms.
For second, it will again get all terms starting with 'a', then remove
those then do not end with 'gh' and issue query on remaining terms.

Please let me know if my understanding is correct. And if not, what am I
missing?

I am trying to do some optimization based on above assumption, that both
these queries will behave differently.

Thanks,
Saurabh

Re: Wildcard query difference

Posted by Erick Erickson <er...@gmail.com>.

It's the same in both cases:
enumerate all terms that start with "a" and collect them into
(conceptually) a huge OR query and execute it. There's been some work
lately to avoid the TooManyBooleanClauses exception, but it's still
the case that every term starting with "a" has to be examined and
either added to the terms to be searched or not.

Best,
Erick

On Fri, Jul 21, 2017 at 11:34 AM, Saurabh Sethi
<sa...@sendgrid.com> wrote:
> I have a question in terms of how solr/lucene will lookup terms from
> postings list for the below two queries:
>
> 1. a*
> 2. a*gh
>
> My understanding is that for first, it will get all terms starting with 'a'
> and issue query on those terms.
> For second, it will again get all terms starting with 'a', then remove
> those then do not end with 'gh' and issue query on remaining terms.
>
> Please let me know if my understanding is correct. And if not, what am I
> missing?
>
> I am trying to do some optimization based on above assumption, that both
> these queries will behave differently.
>
> Thanks,
> Saurabh