You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mhd Wrk <mh...@gmail.com> on 2013/12/04 10:46:08 UTC

Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
getting empty result.

qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
+(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])&start=0&rows=10&fl=id

If I change it to a not fuzzy query by simply dropping tildes from the
terms (see below) then it returns the expected result! Is this a bug?
Shouldn't fuzzy version of a query always return a super set of its
not-fuzzy equivalent?

qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
+(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])&start=0&rows=10&fl=id

Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

Posted by Mhd Wrk <mh...@gmail.com>.
I'm using snowball stemmer and, you are correct, swimming has been stored
as swim.

Should I wrap snowball filter in a multiterm analyzer?

Thanks
 On Dec 4, 2013 2:02 PM, "Jack Krupansky" <ja...@basetechnology.com> wrote:

> Ah... although the lower case filtering does get applied properly in a
> "multiterm" analysis scenario, stemming does not. What stemmer are you
> using? I suspect that "swimming" normally becomes "swim". Compare the debug
> output of the two queries.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mhd Wrk
> Sent: Wednesday, December 04, 2013 2:08 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Shouldn't fuzzy version of a solr query always return a super
> set of its not-fuzzy equivalent
>
> Debug shows that all terms are lowercased properly.
>
> Thanks
> On Dec 4, 2013 3:18 AM, "Erik Hatcher" <er...@gmail.com> wrote:
>
>  Chances are you're not getting those fuzzy terms analyzed as you'd like.
>>  See debug (&debug=true) output to be sure.  Most likely the fuzzy terms
>> are not being lowercased.  See
>> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
>> applies to fuzzy, not just wildcard) terms too.
>>
>>         Erik
>>
>>
>> On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:
>>
>> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
>> > getting empty result.
>> >
>> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
>> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
>> > 2013-12-04T00:23:00Z] -endDate:[* TO
>> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>> >
>> > If I change it to a not fuzzy query by simply dropping tildes from the
>> > terms (see below) then it returns the expected result! Is this a bug?
>> > Shouldn't fuzzy version of a query always return a super set of its
>> > not-fuzzy equivalent?
>> >
>> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
>> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
>> > 2013-12-04T00:23:00Z] -endDate:[* TO
>> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>>
>>
>>
>

Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

Posted by Jack Krupansky <ja...@basetechnology.com>.
Ah... although the lower case filtering does get applied properly in a 
"multiterm" analysis scenario, stemming does not. What stemmer are you 
using? I suspect that "swimming" normally becomes "swim". Compare the debug 
output of the two queries.

-- Jack Krupansky

-----Original Message----- 
From: Mhd Wrk
Sent: Wednesday, December 04, 2013 2:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Shouldn't fuzzy version of a solr query always return a super 
set of its not-fuzzy equivalent

Debug shows that all terms are lowercased properly.

Thanks
On Dec 4, 2013 3:18 AM, "Erik Hatcher" <er...@gmail.com> wrote:

> Chances are you're not getting those fuzzy terms analyzed as you'd like.
>  See debug (&debug=true) output to be sure.  Most likely the fuzzy terms
> are not being lowercased.  See
> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
> applies to fuzzy, not just wildcard) terms too.
>
>         Erik
>
>
> On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:
>
> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> > getting empty result.
> >
> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
> >
> > If I change it to a not fuzzy query by simply dropping tildes from the
> > terms (see below) then it returns the expected result! Is this a bug?
> > Shouldn't fuzzy version of a query always return a super set of its
> > not-fuzzy equivalent?
> >
> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>
> 


Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

Posted by Mhd Wrk <mh...@gmail.com>.
Debug shows that all terms are lowercased properly.

Thanks
On Dec 4, 2013 3:18 AM, "Erik Hatcher" <er...@gmail.com> wrote:

> Chances are you're not getting those fuzzy terms analyzed as you'd like.
>  See debug (&debug=true) output to be sure.  Most likely the fuzzy terms
> are not being lowercased.  See
> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
> applies to fuzzy, not just wildcard) terms too.
>
>         Erik
>
>
> On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:
>
> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> > getting empty result.
> >
> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
> >
> > If I change it to a not fuzzy query by simply dropping tildes from the
> > terms (see below) then it returns the expected result! Is this a bug?
> > Shouldn't fuzzy version of a query always return a super set of its
> > not-fuzzy equivalent?
> >
> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>
>

Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent

Posted by Erik Hatcher <er...@gmail.com>.
Chances are you're not getting those fuzzy terms analyzed as you'd like.  See debug (&debug=true) output to be sure.  Most likely the fuzzy terms are not being lowercased.  See http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this applies to fuzzy, not just wildcard) terms too.

	Erik


On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:

> I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> getting empty result.
> 
> qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
> 
> If I change it to a not fuzzy query by simply dropping tildes from the
> terms (see below) then it returns the expected result! Is this a bug?
> Shouldn't fuzzy version of a query always return a super set of its
> not-fuzzy equivalent?
> 
> qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id