You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mhd Wrk <mh...@gmail.com> on 2013/12/04 10:46:08 UTC
Shouldn't fuzzy version of a solr query always return a super set of
its not-fuzzy equivalent
I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
getting empty result.
qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
+(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
If I change it to a not fuzzy query by simply dropping tildes from the
terms (see below) then it returns the expected result! Is this a bug?
Shouldn't fuzzy version of a query always return a super set of its
not-fuzzy equivalent?
qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
+(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
2013-12-04T00:23:00Z] -endDate:[* TO
2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
Re: Shouldn't fuzzy version of a solr query always return a super set
of its not-fuzzy equivalent
Posted by Mhd Wrk <mh...@gmail.com>.
I'm using snowball stemmer and, you are correct, swimming has been stored
as swim.
Should I wrap snowball filter in a multiterm analyzer?
Thanks
On Dec 4, 2013 2:02 PM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
> Ah... although the lower case filtering does get applied properly in a
> "multiterm" analysis scenario, stemming does not. What stemmer are you
> using? I suspect that "swimming" normally becomes "swim". Compare the debug
> output of the two queries.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mhd Wrk
> Sent: Wednesday, December 04, 2013 2:08 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Shouldn't fuzzy version of a solr query always return a super
> set of its not-fuzzy equivalent
>
> Debug shows that all terms are lowercased properly.
>
> Thanks
> On Dec 4, 2013 3:18 AM, "Erik Hatcher" <er...@gmail.com> wrote:
>
> Chances are you're not getting those fuzzy terms analyzed as you'd like.
>> See debug (&debug=true) output to be sure. Most likely the fuzzy terms
>> are not being lowercased. See
>> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
>> applies to fuzzy, not just wildcard) terms too.
>>
>> Erik
>>
>>
>> On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:
>>
>> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
>> > getting empty result.
>> >
>> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
>> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
>> > 2013-12-04T00:23:00Z] -endDate:[* TO
>> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>> >
>> > If I change it to a not fuzzy query by simply dropping tildes from the
>> > terms (see below) then it returns the expected result! Is this a bug?
>> > Shouldn't fuzzy version of a query always return a super set of its
>> > not-fuzzy equivalent?
>> >
>> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
>> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
>> > 2013-12-04T00:23:00Z] -endDate:[* TO
>> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>>
>>
>>
>
Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent
Posted by Jack Krupansky <ja...@basetechnology.com>.
Ah... although the lower case filtering does get applied properly in a
"multiterm" analysis scenario, stemming does not. What stemmer are you
using? I suspect that "swimming" normally becomes "swim". Compare the debug
output of the two queries.
-- Jack Krupansky
-----Original Message-----
From: Mhd Wrk
Sent: Wednesday, December 04, 2013 2:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Shouldn't fuzzy version of a solr query always return a super
set of its not-fuzzy equivalent
Debug shows that all terms are lowercased properly.
Thanks
On Dec 4, 2013 3:18 AM, "Erik Hatcher" <er...@gmail.com> wrote:
> Chances are you're not getting those fuzzy terms analyzed as you'd like.
> See debug (&debug=true) output to be sure. Most likely the fuzzy terms
> are not being lowercased. See
> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
> applies to fuzzy, not just wildcard) terms too.
>
> Erik
>
>
> On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:
>
> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> > getting empty result.
> >
> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
> >
> > If I change it to a not fuzzy query by simply dropping tildes from the
> > terms (see below) then it returns the expected result! Is this a bug?
> > Shouldn't fuzzy version of a query always return a super set of its
> > not-fuzzy equivalent?
> >
> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>
>
Re: Shouldn't fuzzy version of a solr query always return a super set
of its not-fuzzy equivalent
Posted by Mhd Wrk <mh...@gmail.com>.
Debug shows that all terms are lowercased properly.
Thanks
On Dec 4, 2013 3:18 AM, "Erik Hatcher" <er...@gmail.com> wrote:
> Chances are you're not getting those fuzzy terms analyzed as you'd like.
> See debug (&debug=true) output to be sure. Most likely the fuzzy terms
> are not being lowercased. See
> http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this
> applies to fuzzy, not just wildcard) terms too.
>
> Erik
>
>
> On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:
>
> > I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> > getting empty result.
> >
> > qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> > +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
> >
> > If I change it to a not fuzzy query by simply dropping tildes from the
> > terms (see below) then it returns the expected result! Is this a bug?
> > Shouldn't fuzzy version of a query always return a super set of its
> > not-fuzzy equivalent?
> >
> > qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> > +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> > 2013-12-04T00:23:00Z] -endDate:[* TO
> > 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>
>
Re: Shouldn't fuzzy version of a solr query always return a super set of its not-fuzzy equivalent
Posted by Erik Hatcher <er...@gmail.com>.
Chances are you're not getting those fuzzy terms analyzed as you'd like. See debug (&debug=true) output to be sure. Most likely the fuzzy terms are not being lowercased. See http://wiki.apache.org/solr/MultitermQueryAnalysis for more details (this applies to fuzzy, not just wildcard) terms too.
Erik
On Dec 4, 2013, at 4:46 AM, Mhd Wrk <mh...@gmail.com> wrote:
> I'm using the following query to do a fuzzy search on Solr 4.5.1 and am
> getting empty result.
>
> qt=standard&q=+(field1|en_CA|:Swimming~2 field1|en|:Swimming~2)
> +(field1|en_CA|:Goggle~1 field1|en|:Goggle~1) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id
>
> If I change it to a not fuzzy query by simply dropping tildes from the
> terms (see below) then it returns the expected result! Is this a bug?
> Shouldn't fuzzy version of a query always return a super set of its
> not-fuzzy equivalent?
>
> qt=standard&q=+(field1|en_CA|:Swimming field1|en|:Swimming)
> +(field1|en_CA|:Goggle field1|en|:Goggle) +(+startDate:[* TO
> 2013-12-04T00:23:00Z] -endDate:[* TO
> 2013-12-04T00:23:00Z])&start=0&rows=10&fl=id