You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Warren Bell <wa...@clarksnutrition.com> on 2014/07/22 22:29:35 UTC

How to get Lacuma to match Lucuma

What field type or filters do I use to get something like the word “Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has been indexed in a field with field type text_en_splitting that came with the original solar examples.

Thanks,

Warren


   <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>


-- 
This email was Virus checked by Clark's Nutrition's Astaro Security Gateway.

The information contained in this e-mail is intended only for use of
the individual or entity named above. This e-mail, and any documents,
files, previous e-mails or other information attached to it, may contain
confidential information that is legally privileged. If you are not the
intended recipient of this e-mail, or the employee or agent responsible
for delivering it to the intended recipient, you are hereby notified
that any disclosure, dissemination, distribution, copying or other use
of this e-mail or any of the information contained in or attached to it
is strictly prohibited. If you have received this e-mail in error,
please immediately notify us by return e-mail or by telephone at
(951)321-1960, and destroy the original e-mail and its attachments
without reading or saving it in any manner. Thank you.

Clark’s Nutrition is a registered trademark of Clark's Nutritional Centers, Inc.

Re: How to get Lacuma to match Lucuma

Posted by Warren Bell <wa...@clarksnutrition.com>.
Is there a way to make solr do fuzzy searches automatically without having to add the tilda character ? And are there disadvantages of doing a fuzzy searches ?

Warren

On Jul 22, 2014, at 1:54 PM, Anshum Gupta <an...@anshumgupta.net> wrote:

> Hi Warren,
> 
> Check out the section about fuzzy search here
> https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser.
> 
> 
> On Tue, Jul 22, 2014 at 1:29 PM, Warren Bell <wa...@clarksnutrition.com>
> wrote:
> 
>> What field type or filters do I use to get something like the word
>> “Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has been
>> indexed in a field with field type text_en_splitting that came with the
>> original solar examples.
>> 
>> Thanks,
>> 
>> Warren
>> 
>> 
>>   <fieldType name="text_en_splitting" class="solr.TextField"
>> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>>      <analyzer type="index">
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>        <!-- in this example, we will only use synonyms at query time
>>        <filter class="solr.SynonymFilterFactory"
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>        -->
>>        <!-- Case insensitive stop word removal.
>>        -->
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>                words="lang/stopwords_en.txt"
>>                />
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.PorterStemFilterFactory"/>
>>      </analyzer>
>>      <analyzer type="query">
>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>        <filter class="solr.StopFilterFactory"
>>                ignoreCase="true"
>>                words="lang/stopwords_en.txt"
>>                />
>>        <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>        <filter class="solr.LowerCaseFilterFactory"/>
>>        <filter class="solr.KeywordMarkerFilterFactory"
>> protected="protwords.txt"/>
>>        <filter class="solr.PorterStemFilterFactory"/>
>>      </analyzer>
>>    </fieldType>
>> 
>> 
>> --
>> This email was Virus checked by Clark's Nutrition's Astaro Security
>> Gateway.
>> 
>> The information contained in this e-mail is intended only for use of
>> the individual or entity named above. This e-mail, and any documents,
>> files, previous e-mails or other information attached to it, may contain
>> confidential information that is legally privileged. If you are not the
>> intended recipient of this e-mail, or the employee or agent responsible
>> for delivering it to the intended recipient, you are hereby notified
>> that any disclosure, dissemination, distribution, copying or other use
>> of this e-mail or any of the information contained in or attached to it
>> is strictly prohibited. If you have received this e-mail in error,
>> please immediately notify us by return e-mail or by telephone at
>> (951)321-1960, and destroy the original e-mail and its attachments
>> without reading or saving it in any manner. Thank you.
>> 
>> Clark’s Nutrition is a registered trademark of Clark's Nutritional
>> Centers, Inc.
>> 
> 
> 
> 
> -- 
> 
> Anshum Gupta
> http://www.anshumgupta.net
> 
> -- 
> This email was Virus checked by Clark's Nutrition's Astaro Security Gateway.


Re: How to get Lacuma to match Lucuma

Posted by Jack Krupansky <ja...@basetechnology.com>.
Or possibly use the synonym filter at query or index time for common 
misspellings or misunderstandings about the spelling. That would be 
automatic, without the user needing to add the explicit fuzzy query 
operator.

-- Jack Krupansky

-----Original Message----- 
From: Anshum Gupta
Sent: Tuesday, July 22, 2014 4:54 PM
To: solr-user@lucene.apache.org
Subject: Re: How to get Lacuma to match Lucuma

Hi Warren,

Check out the section about fuzzy search here
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser.


On Tue, Jul 22, 2014 at 1:29 PM, Warren Bell <wa...@clarksnutrition.com>
wrote:

> What field type or filters do I use to get something like the word
> “Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has 
> been
> indexed in a field with field type text_en_splitting that came with the
> original solar examples.
>
> Thanks,
>
> Warren
>
>
>    <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> --
> This email was Virus checked by Clark's Nutrition's Astaro Security
> Gateway.
>
> The information contained in this e-mail is intended only for use of
> the individual or entity named above. This e-mail, and any documents,
> files, previous e-mails or other information attached to it, may contain
> confidential information that is legally privileged. If you are not the
> intended recipient of this e-mail, or the employee or agent responsible
> for delivering it to the intended recipient, you are hereby notified
> that any disclosure, dissemination, distribution, copying or other use
> of this e-mail or any of the information contained in or attached to it
> is strictly prohibited. If you have received this e-mail in error,
> please immediately notify us by return e-mail or by telephone at
> (951)321-1960, and destroy the original e-mail and its attachments
> without reading or saving it in any manner. Thank you.
>
> Clark’s Nutrition is a registered trademark of Clark's Nutritional
> Centers, Inc.
>



-- 

Anshum Gupta
http://www.anshumgupta.net 


Re: How to get Lacuma to match Lucuma

Posted by Anshum Gupta <an...@anshumgupta.net>.
Hi Warren,

Check out the section about fuzzy search here
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser.


On Tue, Jul 22, 2014 at 1:29 PM, Warren Bell <wa...@clarksnutrition.com>
wrote:

> What field type or filters do I use to get something like the word
> “Lacuma” to return results with “Lucuma” in it ? The word “Lucuma” has been
> indexed in a field with field type text_en_splitting that came with the
> original solar examples.
>
> Thanks,
>
> Warren
>
>
>    <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="lang/stopwords_en.txt"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
>
> --
> This email was Virus checked by Clark's Nutrition's Astaro Security
> Gateway.
>
> The information contained in this e-mail is intended only for use of
> the individual or entity named above. This e-mail, and any documents,
> files, previous e-mails or other information attached to it, may contain
> confidential information that is legally privileged. If you are not the
> intended recipient of this e-mail, or the employee or agent responsible
> for delivering it to the intended recipient, you are hereby notified
> that any disclosure, dissemination, distribution, copying or other use
> of this e-mail or any of the information contained in or attached to it
> is strictly prohibited. If you have received this e-mail in error,
> please immediately notify us by return e-mail or by telephone at
> (951)321-1960, and destroy the original e-mail and its attachments
> without reading or saving it in any manner. Thank you.
>
> Clark’s Nutrition is a registered trademark of Clark's Nutritional
> Centers, Inc.
>



-- 

Anshum Gupta
http://www.anshumgupta.net