You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Prasi S <pr...@gmail.com> on 2013/08/28 13:43:25 UTC

Solr 4.0 -> Fuzzy query and Proximity query

Hi,
with solr 4.0 the fuzzy query syntax is like  <keyword>~1 (or 2)
Proximity search is like "value"~20.

How does this differentiate between the two searches. My thought was
promiximity would be on phrases and fuzzy on individual words. Is that
correct?

I wasnted to do a promiximity search for text field and gave the below
query,
<ip>:<port>/collection1/select?q="trinity%20service"~50&debugQuery=yes,

it gives me results as

<result name="response" numFound="111" start="0" maxScore="4.1237307">
<doc>
<str name="business_name">*Trinidad *Services</str>
</doc>
<doc>
<str name="business_name">Trinity Services</str>
</doc>
<doc>
<str name="business_name">Trinity Services</str>
</doc>
<doc>
<str name="business_name">*Trinitee *Service</str>

How to differentiate between fuzzy and proximity.


Thanks,
Prasi

Re: Solr 4.0 -> Fuzzy query and Proximity query

Posted by Walter Underwood <wu...@wunderwood.org>.
Mixing fuzzy with phonetic can give bizarre matches. I worked on a search engine that did that.

You really don't want to mix stemming, phonetic, and fuzzy. They are distinct transformations of the surface word that do different things.

Stemming: conflate different inflections of the same word, like car and cars.
Phonetic: conflate words that sound similar, like moody and mudie.
Fuzzy: conflate words with different spellings or misspellings, like smith, smyth, and smit.

If you want all of these, make three fields with separate transformations.

wunder

On Aug 28, 2013, at 5:46 AM, Erick Erickson wrote:

> No, ComplexPhraseQuery has been around for quite a while but
> never incorporated into the code base, it's pretty much what you
> need to do both fuzzy and phrase at once.
> 
> But, doesn't phonetic really incorporate at least a flavor of fuzzy?
> Is it close enough for your needs to just do phonetic matches?
> 
> Best
> Erick
> 
> 
> On Wed, Aug 28, 2013 at 8:31 AM, Prasi S <pr...@gmail.com> wrote:
> 
>> sry , i copied it wrong. Below is the correct analysis.
>> 
>> Index time
>> 
>> ST
>> trinity
>> services
>> SF
>> trinity
>> services
>> LCF
>> trinity
>> services
>> SF
>> trinity
>> services
>> SF
>> trinity
>> services
>> WDF
>> trinity
>> services
>> SF
>> triniti
>> servic
>> PF
>> TRNTtriniti
>> SRFKservic
>> HWF
>> TRNTtriniti
>> SRFKservic
>> PSF
>> TRNTtriniti
>> SRFKservic
>> 
>> 
>> 
>> *Query time*
>> ST
>> trinity
>> services
>> SF
>> trinity
>> services
>> LCF
>> trinity
>> services
>> WDF
>> trinity
>> services
>> SF
>> triniti
>> servic
>> PSF
>> triniti
>> servic
>> PF
>> TRNTtriniti
>> SRFKservic
>> 
>> Apart from this, fuzzy would be for indivual words and proximity would be
>> phrase. Is this correct.
>> also can we have fuzzy on phrases?
>> 
>> 
>> On Wed, Aug 28, 2013 at 5:58 PM, Prasi S <pr...@gmail.com> wrote:
>> 
>>> hi Erick,
>>> Yes it is correct. These results are because of stemming + phonetic
>>> matching. Below is the
>>> 
>>> Index time
>>> 
>>> ST
>>>   trinity
>>>  services
>>> SF
>>>   trinity
>>>  services
>>> LCF
>>>   trinity
>>>  services
>>> SF
>>>   trinity
>>>  services
>>> SF
>>>   trinity
>>>  services
>>> WDF
>>>   trinity
>>>  services
>>> Query time
>>> 
>>> SF
>>>   triniti
>>>  servic
>>> PF
>>>   TRNT  triniti
>>>  SRFK  servic
>>> HWF
>>>   TRNT  triniti
>>>  SRFK  servic
>>> PSF
>>>   TRNT  triniti
>>>  SRFK  servic
>>> Apart from this, fuzzy would be for indivual words and proximity would be
>>> phrase. Is this correct.
>>> also can we have fuzzy on phrases?
>>> 
>>> 
>>> 
>>> On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson <erickerickson@gmail.com
>>> wrote:
>>> 
>>>> The first thing I'd recommend is to look at the admin/analysis
>>>> page. I suspect you aren't seeing fuzzy query results
>>>> at all, what you're seeing is the result of stemming.
>>>> 
>>>> Stemming is algorithmic, so sometimes produces very
>>>> surprising results, i.e. Trinidad and Trinigee may stem
>>>> to something like triniti.
>>>> 
>>>> But you didn't provide the field definition so it's just a guess.
>>>> 
>>>> Best
>>>> Erick
>>>> 
>>>> 
>>>> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S <pr...@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> with solr 4.0 the fuzzy query syntax is like  <keyword>~1 (or 2)
>>>>> Proximity search is like "value"~20.
>>>>> 
>>>>> How does this differentiate between the two searches. My thought was
>>>>> promiximity would be on phrases and fuzzy on individual words. Is that
>>>>> correct?
>>>>> 
>>>>> I wasnted to do a promiximity search for text field and gave the below
>>>>> query,
>>>>> 
>> <ip>:<port>/collection1/select?q="trinity%20service"~50&debugQuery=yes,
>>>>> 
>>>>> it gives me results as
>>>>> 
>>>>> <result name="response" numFound="111" start="0" maxScore="4.1237307">
>>>>> <doc>
>>>>> <str name="business_name">*Trinidad *Services</str>
>>>>> </doc>
>>>>> <doc>
>>>>> <str name="business_name">Trinity Services</str>
>>>>> </doc>
>>>>> <doc>
>>>>> <str name="business_name">Trinity Services</str>
>>>>> </doc>
>>>>> <doc>
>>>>> <str name="business_name">*Trinitee *Service</str>
>>>>> 
>>>>> How to differentiate between fuzzy and proximity.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Prasi
>>>>> 
>>>> 
>>> 
>>> 
>> 

--
Walter Underwood
wunder@wunderwood.org




Re: Solr 4.0 -> Fuzzy query and Proximity query

Posted by Erick Erickson <er...@gmail.com>.
No, ComplexPhraseQuery has been around for quite a while but
never incorporated into the code base, it's pretty much what you
need to do both fuzzy and phrase at once.

But, doesn't phonetic really incorporate at least a flavor of fuzzy?
Is it close enough for your needs to just do phonetic matches?

Best
Erick


On Wed, Aug 28, 2013 at 8:31 AM, Prasi S <pr...@gmail.com> wrote:

> sry , i copied it wrong. Below is the correct analysis.
>
> Index time
>
> ST
> trinity
> services
> SF
> trinity
> services
> LCF
> trinity
> services
> SF
> trinity
> services
> SF
> trinity
> services
> WDF
> trinity
> services
> SF
> triniti
> servic
> PF
> TRNTtriniti
> SRFKservic
> HWF
> TRNTtriniti
> SRFKservic
> PSF
> TRNTtriniti
> SRFKservic
>
>
>
> *Query time*
> ST
> trinity
> services
> SF
> trinity
> services
> LCF
> trinity
> services
> WDF
> trinity
> services
> SF
> triniti
> servic
> PSF
> triniti
> servic
> PF
> TRNTtriniti
> SRFKservic
>
> Apart from this, fuzzy would be for indivual words and proximity would be
> phrase. Is this correct.
> also can we have fuzzy on phrases?
>
>
> On Wed, Aug 28, 2013 at 5:58 PM, Prasi S <pr...@gmail.com> wrote:
>
> > hi Erick,
> > Yes it is correct. These results are because of stemming + phonetic
> > matching. Below is the
> >
> > Index time
> >
> >  ST
> >    trinity
> >   services
> >  SF
> >    trinity
> >   services
> >  LCF
> >    trinity
> >   services
> >  SF
> >    trinity
> >   services
> >  SF
> >    trinity
> >   services
> >  WDF
> >    trinity
> >   services
> > Query time
> >
> > SF
> >    triniti
> >   servic
> >  PF
> >    TRNT  triniti
> >   SRFK  servic
> >  HWF
> >    TRNT  triniti
> >   SRFK  servic
> >  PSF
> >    TRNT  triniti
> >   SRFK  servic
> > Apart from this, fuzzy would be for indivual words and proximity would be
> > phrase. Is this correct.
> > also can we have fuzzy on phrases?
> >
> >
> >
> > On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> The first thing I'd recommend is to look at the admin/analysis
> >> page. I suspect you aren't seeing fuzzy query results
> >> at all, what you're seeing is the result of stemming.
> >>
> >> Stemming is algorithmic, so sometimes produces very
> >> surprising results, i.e. Trinidad and Trinigee may stem
> >> to something like triniti.
> >>
> >> But you didn't provide the field definition so it's just a guess.
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S <pr...@gmail.com> wrote:
> >>
> >> > Hi,
> >> > with solr 4.0 the fuzzy query syntax is like  <keyword>~1 (or 2)
> >> > Proximity search is like "value"~20.
> >> >
> >> > How does this differentiate between the two searches. My thought was
> >> > promiximity would be on phrases and fuzzy on individual words. Is that
> >> > correct?
> >> >
> >> > I wasnted to do a promiximity search for text field and gave the below
> >> > query,
> >> >
> <ip>:<port>/collection1/select?q="trinity%20service"~50&debugQuery=yes,
> >> >
> >> > it gives me results as
> >> >
> >> > <result name="response" numFound="111" start="0" maxScore="4.1237307">
> >> > <doc>
> >> > <str name="business_name">*Trinidad *Services</str>
> >> > </doc>
> >> > <doc>
> >> > <str name="business_name">Trinity Services</str>
> >> > </doc>
> >> > <doc>
> >> > <str name="business_name">Trinity Services</str>
> >> > </doc>
> >> > <doc>
> >> > <str name="business_name">*Trinitee *Service</str>
> >> >
> >> > How to differentiate between fuzzy and proximity.
> >> >
> >> >
> >> > Thanks,
> >> > Prasi
> >> >
> >>
> >
> >
>

Re: Solr 4.0 -> Fuzzy query and Proximity query

Posted by Prasi S <pr...@gmail.com>.
sry , i copied it wrong. Below is the correct analysis.

Index time

ST
trinity
services
SF
trinity
services
LCF
trinity
services
SF
trinity
services
SF
trinity
services
WDF
trinity
services
SF
triniti
servic
PF
TRNTtriniti
SRFKservic
HWF
TRNTtriniti
SRFKservic
PSF
TRNTtriniti
SRFKservic



*Query time*
ST
trinity
services
SF
trinity
services
LCF
trinity
services
WDF
trinity
services
SF
triniti
servic
PSF
triniti
servic
PF
TRNTtriniti
SRFKservic

Apart from this, fuzzy would be for indivual words and proximity would be
phrase. Is this correct.
also can we have fuzzy on phrases?


On Wed, Aug 28, 2013 at 5:58 PM, Prasi S <pr...@gmail.com> wrote:

> hi Erick,
> Yes it is correct. These results are because of stemming + phonetic
> matching. Below is the
>
> Index time
>
>  ST
>    trinity
>   services
>  SF
>    trinity
>   services
>  LCF
>    trinity
>   services
>  SF
>    trinity
>   services
>  SF
>    trinity
>   services
>  WDF
>    trinity
>   services
> Query time
>
> SF
>    triniti
>   servic
>  PF
>    TRNT  triniti
>   SRFK  servic
>  HWF
>    TRNT  triniti
>   SRFK  servic
>  PSF
>    TRNT  triniti
>   SRFK  servic
> Apart from this, fuzzy would be for indivual words and proximity would be
> phrase. Is this correct.
> also can we have fuzzy on phrases?
>
>
>
> On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson <er...@gmail.com>wrote:
>
>> The first thing I'd recommend is to look at the admin/analysis
>> page. I suspect you aren't seeing fuzzy query results
>> at all, what you're seeing is the result of stemming.
>>
>> Stemming is algorithmic, so sometimes produces very
>> surprising results, i.e. Trinidad and Trinigee may stem
>> to something like triniti.
>>
>> But you didn't provide the field definition so it's just a guess.
>>
>> Best
>> Erick
>>
>>
>> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S <pr...@gmail.com> wrote:
>>
>> > Hi,
>> > with solr 4.0 the fuzzy query syntax is like  <keyword>~1 (or 2)
>> > Proximity search is like "value"~20.
>> >
>> > How does this differentiate between the two searches. My thought was
>> > promiximity would be on phrases and fuzzy on individual words. Is that
>> > correct?
>> >
>> > I wasnted to do a promiximity search for text field and gave the below
>> > query,
>> > <ip>:<port>/collection1/select?q="trinity%20service"~50&debugQuery=yes,
>> >
>> > it gives me results as
>> >
>> > <result name="response" numFound="111" start="0" maxScore="4.1237307">
>> > <doc>
>> > <str name="business_name">*Trinidad *Services</str>
>> > </doc>
>> > <doc>
>> > <str name="business_name">Trinity Services</str>
>> > </doc>
>> > <doc>
>> > <str name="business_name">Trinity Services</str>
>> > </doc>
>> > <doc>
>> > <str name="business_name">*Trinitee *Service</str>
>> >
>> > How to differentiate between fuzzy and proximity.
>> >
>> >
>> > Thanks,
>> > Prasi
>> >
>>
>
>

Re: Solr 4.0 -> Fuzzy query and Proximity query

Posted by Prasi S <pr...@gmail.com>.
hi Erick,
Yes it is correct. These results are because of stemming + phonetic
matching. Below is the

Index time

ST
trinity
services
SF
trinity
services
LCF
trinity
services
SF
trinity
services
SF
trinity
services
WDF
trinity
services
Query time

SF
triniti
servic
PF
TRNTtriniti
SRFKservic
HWF
TRNTtriniti
SRFKservic
PSF
TRNTtriniti
SRFKservic
Apart from this, fuzzy would be for indivual words and proximity would be
phrase. Is this correct.
also can we have fuzzy on phrases?



On Wed, Aug 28, 2013 at 5:36 PM, Erick Erickson <er...@gmail.com>wrote:

> The first thing I'd recommend is to look at the admin/analysis
> page. I suspect you aren't seeing fuzzy query results
> at all, what you're seeing is the result of stemming.
>
> Stemming is algorithmic, so sometimes produces very
> surprising results, i.e. Trinidad and Trinigee may stem
> to something like triniti.
>
> But you didn't provide the field definition so it's just a guess.
>
> Best
> Erick
>
>
> On Wed, Aug 28, 2013 at 7:43 AM, Prasi S <pr...@gmail.com> wrote:
>
> > Hi,
> > with solr 4.0 the fuzzy query syntax is like  <keyword>~1 (or 2)
> > Proximity search is like "value"~20.
> >
> > How does this differentiate between the two searches. My thought was
> > promiximity would be on phrases and fuzzy on individual words. Is that
> > correct?
> >
> > I wasnted to do a promiximity search for text field and gave the below
> > query,
> > <ip>:<port>/collection1/select?q="trinity%20service"~50&debugQuery=yes,
> >
> > it gives me results as
> >
> > <result name="response" numFound="111" start="0" maxScore="4.1237307">
> > <doc>
> > <str name="business_name">*Trinidad *Services</str>
> > </doc>
> > <doc>
> > <str name="business_name">Trinity Services</str>
> > </doc>
> > <doc>
> > <str name="business_name">Trinity Services</str>
> > </doc>
> > <doc>
> > <str name="business_name">*Trinitee *Service</str>
> >
> > How to differentiate between fuzzy and proximity.
> >
> >
> > Thanks,
> > Prasi
> >
>

Re: Solr 4.0 -> Fuzzy query and Proximity query

Posted by Erick Erickson <er...@gmail.com>.
The first thing I'd recommend is to look at the admin/analysis
page. I suspect you aren't seeing fuzzy query results
at all, what you're seeing is the result of stemming.

Stemming is algorithmic, so sometimes produces very
surprising results, i.e. Trinidad and Trinigee may stem
to something like triniti.

But you didn't provide the field definition so it's just a guess.

Best
Erick


On Wed, Aug 28, 2013 at 7:43 AM, Prasi S <pr...@gmail.com> wrote:

> Hi,
> with solr 4.0 the fuzzy query syntax is like  <keyword>~1 (or 2)
> Proximity search is like "value"~20.
>
> How does this differentiate between the two searches. My thought was
> promiximity would be on phrases and fuzzy on individual words. Is that
> correct?
>
> I wasnted to do a promiximity search for text field and gave the below
> query,
> <ip>:<port>/collection1/select?q="trinity%20service"~50&debugQuery=yes,
>
> it gives me results as
>
> <result name="response" numFound="111" start="0" maxScore="4.1237307">
> <doc>
> <str name="business_name">*Trinidad *Services</str>
> </doc>
> <doc>
> <str name="business_name">Trinity Services</str>
> </doc>
> <doc>
> <str name="business_name">Trinity Services</str>
> </doc>
> <doc>
> <str name="business_name">*Trinitee *Service</str>
>
> How to differentiate between fuzzy and proximity.
>
>
> Thanks,
> Prasi
>