You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Modassar Ather <mo...@gmail.com> on 2016/04/18 09:18:53 UTC

Wildcard query behavior.

Hi,

Please help me understand following.

I have analysis chain which uses KStemFilterFactory for a field. Solr
version is 5.4.0

When I search for f:validator I get 80K+ documents whereas if I search for
f:validator* I get only around 150 results.

When I checked on analysis page I see that validator is changed to
validate. Per my understanding in both the above cases it should at-least
give the exact same result of around 80K+ documents.

I understand in some cases wildcards can result in sub-optimal results for
stemmed content. Please correct me if I am wrong.

Thanks,
Modassar

Re: Wildcard query behavior.

Posted by Modassar Ather <mo...@gmail.com>.
Thanks Reth for your response.

When validator is changed to validate, both at query time and index time,
then should not validator*/validator return the same results at-least?

E.g. 5 documents contains validator. At index time validator got changed to
validate.
Now when validator* is searched it will also change to validate and should
match all 5 documents. In this case I am not sure how the wildcard
internally is handled meaning what the query will transform to.

Please help me understand the internals of wildcard with stemming or point
me to some documents as I could not find any details on it.

Best,
Modassar

On Mon, Apr 18, 2016 at 1:04 PM, Reth RM <re...@gmail.com> wrote:

> If you search for f:validat*, then I believe you will get same number of
> results. Please check.
>
> f:validator* is searching for records that have prefix "validator" where as
> field with stemmer which stems "validator" to "validate" (if this stemming
> was applied at index time as well as query time) its looking for records
> that have "validate" or "validator", so for obvious reasons, numFound might
> have been different.
>
>
>
> On Mon, Apr 18, 2016 at 12:48 PM, Modassar Ather <mo...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Please help me understand following.
> >
> > I have analysis chain which uses KStemFilterFactory for a field. Solr
> > version is 5.4.0
> >
> > When I search for f:validator I get 80K+ documents whereas if I search
> for
> > f:validator* I get only around 150 results.
> >
> > When I checked on analysis page I see that validator is changed to
> > validate. Per my understanding in both the above cases it should at-least
> > give the exact same result of around 80K+ documents.
> >
> > I understand in some cases wildcards can result in sub-optimal results
> for
> > stemmed content. Please correct me if I am wrong.
> >
> > Thanks,
> > Modassar
> >
>

Re: Wildcard query behavior.

Posted by Reth RM <re...@gmail.com>.
If you search for f:validat*, then I believe you will get same number of
results. Please check.

f:validator* is searching for records that have prefix "validator" where as
field with stemmer which stems "validator" to "validate" (if this stemming
was applied at index time as well as query time) its looking for records
that have "validate" or "validator", so for obvious reasons, numFound might
have been different.



On Mon, Apr 18, 2016 at 12:48 PM, Modassar Ather <mo...@gmail.com>
wrote:

> Hi,
>
> Please help me understand following.
>
> I have analysis chain which uses KStemFilterFactory for a field. Solr
> version is 5.4.0
>
> When I search for f:validator I get 80K+ documents whereas if I search for
> f:validator* I get only around 150 results.
>
> When I checked on analysis page I see that validator is changed to
> validate. Per my understanding in both the above cases it should at-least
> give the exact same result of around 80K+ documents.
>
> I understand in some cases wildcards can result in sub-optimal results for
> stemmed content. Please correct me if I am wrong.
>
> Thanks,
> Modassar
>

Re: Wildcard query behavior.

Posted by Modassar Ather <mo...@gmail.com>.
Yes! wildcards are not analyzed. Thanks Shwan for reminding me.
Thanks Erick for your response.

Best,
Modassar

On Mon, Apr 18, 2016 at 8:53 PM, Erick Erickson <er...@gmail.com>
wrote:

> Here's a blog on the subject:
>
> https://lucidworks.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
>
> bq: When validator is changed to validate, both at query time and index
> time,
> then should not validator*/validator return the same results at-least?
>
> This is one of those problems that's easy to state, but hard to solve. And
> there are so many variations that any attempt to solve it will _always_
> have lots of surprises. Simple example (and remember that the
> stemming is usually algorithmic). "validator" probably stems to "validat".
> However, "validato" (note the 'o') may not stem
> the same way at all, so searching for "validato*" wouldn't produce the
> expected response.....
>
> Best,
> Erick
>
> On Mon, Apr 18, 2016 at 6:23 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> > On 4/18/2016 1:18 AM, Modassar Ather wrote:
> >> When I search for f:validator I get 80K+ documents whereas if I search
> for
> >> f:validator* I get only around 150 results.
> >>
> >> When I checked on analysis page I see that validator is changed to
> >> validate. Per my understanding in both the above cases it should
> at-least
> >> give the exact same result of around 80K+ documents.
> >
> > What Reth was trying to tell you, but did not state clearly, is that
> > when you use wildcards, your query is NOT analyzed -- none of your
> > filters, including the stemmer, are used.
> >
> > Thanks,
> > Shawn
> >
>

Re: Wildcard query behavior.

Posted by Erick Erickson <er...@gmail.com>.
Here's a blog on the subject:
https://lucidworks.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

bq: When validator is changed to validate, both at query time and index time,
then should not validator*/validator return the same results at-least?

This is one of those problems that's easy to state, but hard to solve. And
there are so many variations that any attempt to solve it will _always_
have lots of surprises. Simple example (and remember that the
stemming is usually algorithmic). "validator" probably stems to "validat".
However, "validato" (note the 'o') may not stem
the same way at all, so searching for "validato*" wouldn't produce the
expected response.....

Best,
Erick

On Mon, Apr 18, 2016 at 6:23 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> On 4/18/2016 1:18 AM, Modassar Ather wrote:
>> When I search for f:validator I get 80K+ documents whereas if I search for
>> f:validator* I get only around 150 results.
>>
>> When I checked on analysis page I see that validator is changed to
>> validate. Per my understanding in both the above cases it should at-least
>> give the exact same result of around 80K+ documents.
>
> What Reth was trying to tell you, but did not state clearly, is that
> when you use wildcards, your query is NOT analyzed -- none of your
> filters, including the stemmer, are used.
>
> Thanks,
> Shawn
>

Re: Wildcard query behavior.

Posted by Shawn Heisey <ap...@elyograg.org>.
On 4/18/2016 1:18 AM, Modassar Ather wrote:
> When I search for f:validator I get 80K+ documents whereas if I search for
> f:validator* I get only around 150 results.
>
> When I checked on analysis page I see that validator is changed to
> validate. Per my understanding in both the above cases it should at-least
> give the exact same result of around 80K+ documents.

What Reth was trying to tell you, but did not state clearly, is that
when you use wildcards, your query is NOT analyzed -- none of your
filters, including the stemmer, are used.

Thanks,
Shawn