You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jeff Newburn <jn...@zappos.com> on 2008/11/11 01:00:24 UTC

SpellChecker Component

I am still relatively new to solr.  I have gotten the
spellcheckerrequesthandler working the way I would like.  Now I am diving
into the search component version of the spell checker.  I was hoping
someone could help explain 1. What specifically does the searchcomponent
offer and how would I go about putting it into all search terms with the
dismax type.  

-Jeff

Re: SpellChecker Component

Posted by Grant Ingersoll <gs...@apache.org>.
See https://issues.apache.org/jira/browse/LUCENE-1417 and http://lucene.markmail.org/message/sktohlgqxcpmpf7z?q=list:org%2Eapache%2Elucene%2Esolr-user+spellchecker+Rennie

In short, frequency is the second order sort level.  I think it should  
be made pluggable.    A patch would be most welcome.  I don't have  
time to produce one at the moment, but can shepherd it through.

FWIW, you might also try the Jaro-Winkler (JW) distance as the  
default.  Edit distance is not as good, since it treats differences  
the same no matter where in the word they occur, whereas most people  
tend to make spelling mistakes later on in a word, which I believe JW  
takes into account when scoring.

On Nov 11, 2008, at 11:52 AM, Jeff Newburn wrote:

> Ok.  I have managed to get the search component added (You rock  
> Grant).  I
> am having some interesting issues now with the suggestions.  We sell  
> shoes
> online so I am trying to get it to spellcheck for brand name.
>
> When I search konverse with spelling on it returns converse correctly
> however when I search nice (instead of nike) I am returned all sorts  
> of
> results not sorted by frequency.  I have even turned on  
> onlyMorePopular but
> it still is returning all of the different words in no order.  Nike  
> is by
> far the most frequent term how do I get it to the top?
>
> I am currently using the svn build of solr1.4.  I have included the
> configuration as well as the resultset return for spelling  
> suggestions.
>
>
> Below is the configuration:
>  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>    <!--<str name="queryAnalyzerFieldType">textSpell</str>-->
>    <str name="buildOnCommit">true</str>
>
>    <lst name="spellchecker">
>      <str name="name">default</str>
>      <str name="classname">solr.IndexBasedSpellChecker</str>
>      <str name="field">word</str>
>      <str name="spellcheckIndexDir">./spellchecker1</str>
>      <str name="accuracy">0.5</str>
>    </lst>
>    <lst name="spellchecker">
>      <str name="name">jarowinkler</str>
>      <str name="field">word</str>
>      <!-- Use a different Distance Measure -->
>      <str
> name 
> = 
> "distanceMeasure 
> ">org.apache.lucene.search.spell.JaroWinklerDistance</s
> tr>
>      <str name="spellcheckIndexDir">./spellchecker2</str>
>
>    </lst>
>
>    <lst name="spellchecker">
>      <str name="classname">solr.FileBasedSpellChecker</str>
>      <str name="name">file</str>
>      <str name="sourceLocation">spellings.txt</str>
>      <str name="characterEncoding">UTF-8</str>
>      <str name="indexDir">./spellcheckerFile</str>
>    </lst>
>  </searchComponent>
>
> Return results:
> <lst name="spellcheck">
> ?
> <lst name="suggestions">
> ?
> <lst name="nice">
> <int name="numFound">20</int>
> <int name="startOffset">0</int>
> <int name="endOffset">4</int>
> <int name="origFreq">0</int>
> ?
> <lst name="suggestion">
> <int name="frequency">47</int>
> <str name="word">Mice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">26</int>
> <str name="word">Vice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">14</int>
> <str name="word">Nice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">4</int>
> <str name="word">Bice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">1</int>
> <str name="word">Dice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">4099</int>
> <str name="word">Nike</str>
> </lst>
>
>
> On 11/11/08 4:39 AM, "Grant Ingersoll" <gs...@apache.org> wrote:
>
>> Hi Jeff,
>>
>> A SearchComponent allows you to connect functionality with any  
>> Request
>> Handler, allowing you to inline spelling requests (or other things
>> like MoreLikeThis) with your queries, saving you from having to make
>> an extra request.
>>
>> I walk through a lot of this in my article on Solr 1.3 for IBM
>> devWorks:
>> http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01&
>> S_CMP=HP
>>
>> You can also refer to the Wiki at:
>> http://wiki.apache.org/solr/SearchComponent
>> and specifically:
>> http://wiki.apache.org/solr/SpellCheckComponent
>>
>> It works independently from the query parser (i.e. dismax).
>>
>> -Grant
>>
>>
>> On Nov 10, 2008, at 7:00 PM, Jeff Newburn wrote:
>>
>>> I am still relatively new to solr.  I have gotten the
>>> spellcheckerrequesthandler working the way I would like.  Now I am
>>> diving
>>> into the search component version of the spell checker.  I was  
>>> hoping
>>> someone could help explain 1. What specifically does the
>>> searchcomponent
>>> offer and how would I go about putting it into all search terms with
>>> the
>>> dismax type.
>>>
>>> -Jeff
>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: SpellChecker Component

Posted by Jeff Newburn <jn...@zappos.com>.
Ok.  I have managed to get the search component added (You rock Grant).  I
am having some interesting issues now with the suggestions.  We sell shoes
online so I am trying to get it to spellcheck for brand name.

When I search konverse with spelling on it returns converse correctly
however when I search nice (instead of nike) I am returned all sorts of
results not sorted by frequency.  I have even turned on onlyMorePopular but
it still is returning all of the different words in no order.  Nike is by
far the most frequent term how do I get it to the top?

I am currently using the svn build of solr1.4.  I have included the
configuration as well as the resultset return for spelling suggestions.


Below is the configuration:
  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">

    <!--<str name="queryAnalyzerFieldType">textSpell</str>-->
    <str name="buildOnCommit">true</str>

    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="classname">solr.IndexBasedSpellChecker</str>
      <str name="field">word</str>
      <str name="spellcheckIndexDir">./spellchecker1</str>
      <str name="accuracy">0.5</str>
    </lst>
    <lst name="spellchecker">
      <str name="name">jarowinkler</str>
      <str name="field">word</str>
      <!-- Use a different Distance Measure -->
      <str 
name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</s
tr>
      <str name="spellcheckIndexDir">./spellchecker2</str>

    </lst>

    <lst name="spellchecker">
      <str name="classname">solr.FileBasedSpellChecker</str>
      <str name="name">file</str>
      <str name="sourceLocation">spellings.txt</str>
      <str name="characterEncoding">UTF-8</str>
      <str name="indexDir">./spellcheckerFile</str>
    </lst>
  </searchComponent>

Return results:
<lst name="spellcheck">
?
<lst name="suggestions">
?
<lst name="nice">
<int name="numFound">20</int>
<int name="startOffset">0</int>
<int name="endOffset">4</int>
<int name="origFreq">0</int>
?
<lst name="suggestion">
<int name="frequency">47</int>
<str name="word">Mice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">26</int>
<str name="word">Vice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">14</int>
<str name="word">Nice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">4</int>
<str name="word">Bice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">1</int>
<str name="word">Dice</str>
</lst>
?
<lst name="suggestion">
<int name="frequency">4099</int>
<str name="word">Nike</str>
</lst>


On 11/11/08 4:39 AM, "Grant Ingersoll" <gs...@apache.org> wrote:

> Hi Jeff,
> 
> A SearchComponent allows you to connect functionality with any Request
> Handler, allowing you to inline spelling requests (or other things
> like MoreLikeThis) with your queries, saving you from having to make
> an extra request.
> 
> I walk through a lot of this in my article on Solr 1.3 for IBM
> devWorks: 
> http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01&
> S_CMP=HP
> 
> You can also refer to the Wiki at:
> http://wiki.apache.org/solr/SearchComponent
> and specifically:
> http://wiki.apache.org/solr/SpellCheckComponent
> 
> It works independently from the query parser (i.e. dismax).
> 
> -Grant
> 
> 
> On Nov 10, 2008, at 7:00 PM, Jeff Newburn wrote:
> 
>> I am still relatively new to solr.  I have gotten the
>> spellcheckerrequesthandler working the way I would like.  Now I am
>> diving
>> into the search component version of the spell checker.  I was hoping
>> someone could help explain 1. What specifically does the
>> searchcomponent
>> offer and how would I go about putting it into all search terms with
>> the
>> dismax type.
>> 
>> -Jeff
> 
> --------------------------
> Grant Ingersoll
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: SpellChecker Component

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Jeff,

A SearchComponent allows you to connect functionality with any Request  
Handler, allowing you to inline spelling requests (or other things  
like MoreLikeThis) with your queries, saving you from having to make  
an extra request.

I walk through a lot of this in my article on Solr 1.3 for IBM  
devWorks: http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01&S_CMP=HP

You can also refer to the Wiki at:
http://wiki.apache.org/solr/SearchComponent
and specifically:
http://wiki.apache.org/solr/SpellCheckComponent

It works independently from the query parser (i.e. dismax).

-Grant


On Nov 10, 2008, at 7:00 PM, Jeff Newburn wrote:

> I am still relatively new to solr.  I have gotten the
> spellcheckerrequesthandler working the way I would like.  Now I am  
> diving
> into the search component version of the spell checker.  I was hoping
> someone could help explain 1. What specifically does the  
> searchcomponent
> offer and how would I go about putting it into all search terms with  
> the
> dismax type.
>
> -Jeff

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ