You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shyamsunder Reddy <sj...@yahoo.com> on 2009/03/17 20:40:24 UTC

Solr SpellCheker configuration for multiple fields same time

My advanced search option allows users to search for three different fields same time.
The fields are - first name, last name and org name. Now I have to add spell checking feature for the fields.

When wrong spelling is entered for each of these words like first name: jahn, last name: smath, org and org name: bpple

the search result should return a suggestion like (collation) firstname:john AND lastname:smith AND orgname: apple


What is the best approach to implement spell checking for these three different fields:

1. Build a single directory for all fields by copying them into a 'spell' field as:
    schema.xml configuration
    <!--Setup simple analysis for spell checking-->
    <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
    <field name="FIRST_NAME" type="text" indexed="true" stored="true"/>
    <field name="LAST_NAME" type="text" indexed="true" stored="true"/>
     <field name="ORG_NAME" type="text" indexed="true" stored="true" required="true"/>
    <field name="spell" type="textSpell" indexed="true" stored="true" multiValued="true"/>
    
    <copyField source="FIRST_NAME" dest="spell"/>
    <copyField source="LAST_NAME" dest="spell"/>
    <copyField source="ORG_NAME" dest="spell"/>
  
    solrconfig.xml configuration
    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">    
        <str name="queryAnalyzerFieldType">textSpell</str>    
        <lst name="spellchecker">
          <str name="name">default</str>
          <str name="field">spell</str>
          <str name="spellcheckIndexDir">./spellchecker</str>
        </lst>
    </searchComponent>

Now the queries:
1a. <URL>/select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true

The spell check searches against the dictionary './spllechecker' returns the suggestions as
FIRST_NAME:john, LAST_NAME:smath and ORG_NAME:apple. Works as expected.

1b. <URL>/select?q=LAST_NAME:jahn&spellcheck=true
The spell check searches against the dictionary './spllechecker' returns the suggestions for LAST_NAME as 'john'
But there is no last name 'john' for the field LAST_NAME. So the sub sequent search returns NO results, which is not accepted.

So, this approach seems to be wrong for me......

2. Build a separate directory for each field. 
    schema.xml configuration
    <!--Setup simple analysis for spell checking-->
    <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>
    <field name="FIRST_NAME" type="text" indexed="true" stored="true"/>
    <field name="LAST_NAME" type="text" indexed="true" stored="true"/>
     <field name="ORG_NAME" type="text" indexed="true" stored="true" required="true"/>
    <field name="spell_fname" type="textSpell" indexed="true" stored="true" multiValued="true"/>
    <field name="spell_lname" type="textSpell" indexed="true" stored="true" multiValued="true"/>
    <field name="spell_org_name" type="textSpell" indexed="true" stored="true" multiValued="true"/>
    
    <copyField source="FIRST_NAME" dest="spell_fname"/>
    <copyField source="LAST_NAME" dest="spell_lname"/>
    <copyField source="ORG_NAME" dest="spell_org_name"/>
  
    solrconfig.xml configuration
    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">    
        <str name="queryAnalyzerFieldType">textSpell</str>    
        <lst name="spellchecker">
      <str name="name">firstname</str>
      <str name="field">spell_fname</str>
      <str name="spellcheckIndexDir">./fname_spellchecker</str>
        </lst>  
        <lst name="spellchecker">
      <str name="name">lastname</str>
      <str name="field">spell_lname</str>
      <str name="spellcheckIndexDir">./lname_spellchecker</str>
        </lst>  
        <lst name="spellchecker">
      <str name="name">oname</str>
      <str name="field">spell_org_name</str>
      <str name="spellcheckIndexDir">./orgname_spellchecker</str>
        </lst>
    </searchComponent>
    
Now the queries:
1a. <URL>/select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true

How can I mention in the query to search against different dictionaries for different fields like
FIRST_NAME in fname_spellchecker, LAST_NAME in lname_spellchecker and ORG_NAME in orgname_spellchecker?

Or can I make the spell checker to store the field names and its values.

Please discuss my approaches and suggest a solution?



      

Re: Solr SpellCheker configuration for multiple fields same time

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Yes,  approach #2 will certainly be useful. I'll open an issue.

On Wed, Mar 18, 2009 at 6:20 PM, Grant Ingersoll <gs...@apache.org>wrote:

> Hmm, I don't think there is currently a solution for this.  #1 is not
> viable for the reasons you mentioned and #2 is not supported by the current
> code.  That being said, I think it wouldn't be too hard to for someone to
> work up a patch for this.  Essentially, we need the ability to add in per
> dictionary queries and then be able to route them to each dictionary.  If
> you feel up to this, I can take a look at your patch, if not at least open a
> JIRA issue for it so someone else might take it up.  Unfortunately, I don't
> have the time right at the moment, but maybe in a few weeks.
>
> -Grant
>
>
> On Mar 17, 2009, at 3:40 PM, Shyamsunder Reddy wrote:
>
>  My advanced search option allows users to search for three different
>> fields same time.
>> The fields are - first name, last name and org name. Now I have to add
>> spell checking feature for the fields.
>>
>> When wrong spelling is entered for each of these words like first name:
>> jahn, last name: smath, org and org name: bpple
>>
>> the search result should return a suggestion like (collation)
>> firstname:john AND lastname:smith AND orgname: apple
>>
>>
>> What is the best approach to implement spell checking for these three
>> different fields:
>>
>> 1. Build a single directory for all fields by copying them into a 'spell'
>> field as:
>>    schema.xml configuration
>>    <!--Setup simple analysis for spell checking-->
>>    <fieldType name="textSpell" class="solr.TextField"
>> positionIncrementGap="100" >
>>      <analyzer>
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>    <filter class="solr.LowerCaseFilterFactory"/>
>>    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>      </analyzer>
>>    </fieldType>
>>    <field name="FIRST_NAME" type="text" indexed="true" stored="true"/>
>>    <field name="LAST_NAME" type="text" indexed="true" stored="true"/>
>>     <field name="ORG_NAME" type="text" indexed="true" stored="true"
>> required="true"/>
>>    <field name="spell" type="textSpell" indexed="true" stored="true"
>> multiValued="true"/>
>>
>>    <copyField source="FIRST_NAME" dest="spell"/>
>>    <copyField source="LAST_NAME" dest="spell"/>
>>    <copyField source="ORG_NAME" dest="spell"/>
>>
>>    solrconfig.xml configuration
>>    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>>        <str name="queryAnalyzerFieldType">textSpell</str>
>>        <lst name="spellchecker">
>>          <str name="name">default</str>
>>          <str name="field">spell</str>
>>          <str name="spellcheckIndexDir">./spellchecker</str>
>>        </lst>
>>    </searchComponent>
>>
>> Now the queries:
>> 1a.
>> <URL>/select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true
>>
>> The spell check searches against the dictionary './spllechecker' returns
>> the suggestions as
>> FIRST_NAME:john, LAST_NAME:smath and ORG_NAME:apple. Works as expected.
>>
>> 1b. <URL>/select?q=LAST_NAME:jahn&spellcheck=true
>> The spell check searches against the dictionary './spllechecker' returns
>> the suggestions for LAST_NAME as 'john'
>> But there is no last name 'john' for the field LAST_NAME. So the sub
>> sequent search returns NO results, which is not accepted.
>>
>> So, this approach seems to be wrong for me......
>>
>> 2. Build a separate directory for each field.
>>    schema.xml configuration
>>    <!--Setup simple analysis for spell checking-->
>>    <fieldType name="textSpell" class="solr.TextField"
>> positionIncrementGap="100" >
>>      <analyzer>
>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>    <filter class="solr.LowerCaseFilterFactory"/>
>>    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>      </analyzer>
>>    </fieldType>
>>    <field name="FIRST_NAME" type="text" indexed="true" stored="true"/>
>>    <field name="LAST_NAME" type="text" indexed="true" stored="true"/>
>>     <field name="ORG_NAME" type="text" indexed="true" stored="true"
>> required="true"/>
>>    <field name="spell_fname" type="textSpell" indexed="true" stored="true"
>> multiValued="true"/>
>>    <field name="spell_lname" type="textSpell" indexed="true" stored="true"
>> multiValued="true"/>
>>    <field name="spell_org_name" type="textSpell" indexed="true"
>> stored="true" multiValued="true"/>
>>
>>    <copyField source="FIRST_NAME" dest="spell_fname"/>
>>    <copyField source="LAST_NAME" dest="spell_lname"/>
>>    <copyField source="ORG_NAME" dest="spell_org_name"/>
>>
>>    solrconfig.xml configuration
>>    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>>        <str name="queryAnalyzerFieldType">textSpell</str>
>>        <lst name="spellchecker">
>>      <str name="name">firstname</str>
>>      <str name="field">spell_fname</str>
>>      <str name="spellcheckIndexDir">./fname_spellchecker</str>
>>        </lst>
>>        <lst name="spellchecker">
>>      <str name="name">lastname</str>
>>      <str name="field">spell_lname</str>
>>      <str name="spellcheckIndexDir">./lname_spellchecker</str>
>>        </lst>
>>        <lst name="spellchecker">
>>      <str name="name">oname</str>
>>      <str name="field">spell_org_name</str>
>>      <str name="spellcheckIndexDir">./orgname_spellchecker</str>
>>        </lst>
>>    </searchComponent>
>>
>> Now the queries:
>> 1a.
>> <URL>/select?q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true
>>
>> How can I mention in the query to search against different dictionaries
>> for different fields like
>> FIRST_NAME in fname_spellchecker, LAST_NAME in lname_spellchecker and
>> ORG_NAME in orgname_spellchecker?
>>
>> Or can I make the spell checker to store the field names and its values.
>>
>> Please discuss my approaches and suggest a solution?
>>
>>
>>
>>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr SpellCheker configuration for multiple fields same time

Posted by Grant Ingersoll <gs...@apache.org>.
Hmm, I don't think there is currently a solution for this.  #1 is not  
viable for the reasons you mentioned and #2 is not supported by the  
current code.  That being said, I think it wouldn't be too hard to for  
someone to work up a patch for this.  Essentially, we need the ability  
to add in per dictionary queries and then be able to route them to  
each dictionary.  If you feel up to this, I can take a look at your  
patch, if not at least open a JIRA issue for it so someone else might  
take it up.  Unfortunately, I don't have the time right at the moment,  
but maybe in a few weeks.

-Grant

On Mar 17, 2009, at 3:40 PM, Shyamsunder Reddy wrote:

> My advanced search option allows users to search for three different  
> fields same time.
> The fields are - first name, last name and org name. Now I have to  
> add spell checking feature for the fields.
>
> When wrong spelling is entered for each of these words like first  
> name: jahn, last name: smath, org and org name: bpple
>
> the search result should return a suggestion like (collation)  
> firstname:john AND lastname:smith AND orgname: apple
>
>
> What is the best approach to implement spell checking for these  
> three different fields:
>
> 1. Build a single directory for all fields by copying them into a  
> 'spell' field as:
>     schema.xml configuration
>     <!--Setup simple analysis for spell checking-->
>     <fieldType name="textSpell" class="solr.TextField"  
> positionIncrementGap="100" >
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>     <field name="FIRST_NAME" type="text" indexed="true"  
> stored="true"/>
>     <field name="LAST_NAME" type="text" indexed="true" stored="true"/>
>      <field name="ORG_NAME" type="text" indexed="true" stored="true"  
> required="true"/>
>     <field name="spell" type="textSpell" indexed="true"  
> stored="true" multiValued="true"/>
>
>     <copyField source="FIRST_NAME" dest="spell"/>
>     <copyField source="LAST_NAME" dest="spell"/>
>     <copyField source="ORG_NAME" dest="spell"/>
>
>     solrconfig.xml configuration
>     <searchComponent name="spellcheck"  
> class="solr.SpellCheckComponent">
>         <str name="queryAnalyzerFieldType">textSpell</str>
>         <lst name="spellchecker">
>           <str name="name">default</str>
>           <str name="field">spell</str>
>           <str name="spellcheckIndexDir">./spellchecker</str>
>         </lst>
>     </searchComponent>
>
> Now the queries:
> 1a. <URL>/select? 
> q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true
>
> The spell check searches against the dictionary './spllechecker'  
> returns the suggestions as
> FIRST_NAME:john, LAST_NAME:smath and ORG_NAME:apple. Works as  
> expected.
>
> 1b. <URL>/select?q=LAST_NAME:jahn&spellcheck=true
> The spell check searches against the dictionary './spllechecker'  
> returns the suggestions for LAST_NAME as 'john'
> But there is no last name 'john' for the field LAST_NAME. So the sub  
> sequent search returns NO results, which is not accepted.
>
> So, this approach seems to be wrong for me......
>
> 2. Build a separate directory for each field.
>     schema.xml configuration
>     <!--Setup simple analysis for spell checking-->
>     <fieldType name="textSpell" class="solr.TextField"  
> positionIncrementGap="100" >
>       <analyzer>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>     <filter class="solr.LowerCaseFilterFactory"/>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>
>     </fieldType>
>     <field name="FIRST_NAME" type="text" indexed="true"  
> stored="true"/>
>     <field name="LAST_NAME" type="text" indexed="true" stored="true"/>
>      <field name="ORG_NAME" type="text" indexed="true" stored="true"  
> required="true"/>
>     <field name="spell_fname" type="textSpell" indexed="true"  
> stored="true" multiValued="true"/>
>     <field name="spell_lname" type="textSpell" indexed="true"  
> stored="true" multiValued="true"/>
>     <field name="spell_org_name" type="textSpell" indexed="true"  
> stored="true" multiValued="true"/>
>
>     <copyField source="FIRST_NAME" dest="spell_fname"/>
>     <copyField source="LAST_NAME" dest="spell_lname"/>
>     <copyField source="ORG_NAME" dest="spell_org_name"/>
>
>     solrconfig.xml configuration
>     <searchComponent name="spellcheck"  
> class="solr.SpellCheckComponent">
>         <str name="queryAnalyzerFieldType">textSpell</str>
>         <lst name="spellchecker">
>       <str name="name">firstname</str>
>       <str name="field">spell_fname</str>
>       <str name="spellcheckIndexDir">./fname_spellchecker</str>
>         </lst>
>         <lst name="spellchecker">
>       <str name="name">lastname</str>
>       <str name="field">spell_lname</str>
>       <str name="spellcheckIndexDir">./lname_spellchecker</str>
>         </lst>
>         <lst name="spellchecker">
>       <str name="name">oname</str>
>       <str name="field">spell_org_name</str>
>       <str name="spellcheckIndexDir">./orgname_spellchecker</str>
>         </lst>
>     </searchComponent>
>
> Now the queries:
> 1a. <URL>/select? 
> q=FIRST_NAME:jahn&LAST_NAME:smath&ORG_NAME:bpple&spellcheck=true
>
> How can I mention in the query to search against different  
> dictionaries for different fields like
> FIRST_NAME in fname_spellchecker, LAST_NAME in lname_spellchecker  
> and ORG_NAME in orgname_spellchecker?
>
> Or can I make the spell checker to store the field names and its  
> values.
>
> Please discuss my approaches and suggest a solution?
>
>
>