You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Yao Ge <ya...@gmail.com> on 2009/06/02 17:50:08 UTC

spell checking

Can someone help providing a tutorial like introduction on how to get
spell-checking work in Solr. It appears many steps are requires before the
spell-checkering functions can be used. It also appears that a dictionary (a
list of correctly spelled words) is required to setup the spell checker. Can
anyone validate my impression?

Thanks.
-- 
View this message in context: http://www.nabble.com/spell-checking-tp23835427p23835427.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: spell checking

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

This is how you build the SC index:
http://wiki.apache.org/solr/SpellCheckComponent#head-78f5afcf43df544832809abc68dd36b98152670c

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yao Ge <ya...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 2, 2009 5:03:24 PM
> Subject: Re: spell checking
> 
> 
> Yes. I did. I was not able to grasp the concept of making spell checking
> work.
> For example, the wiki page says an spell check index need to be built. But
> did not say how to do it. Does Solr buid the index out of thin air? Or the
> index is buit from the main index? or index is built form a dictionary or
> word list?
> 
> Please help.
> 
> 
> Grant Ingersoll-6 wrote:
> > 
> > Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent
> > 
> > 
> > On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:
> > 
> >>
> >> Can someone help providing a tutorial like introduction on how to get
> >> spell-checking work in Solr. It appears many steps are requires  
> >> before the
> >> spell-checkering functions can be used. It also appears that a  
> >> dictionary (a
> >> list of correctly spelled words) is required to setup the spell  
> >> checker. Can
> >> anyone validate my impression?
> >>
> >> Thanks.
> >> -- 
> >> View this message in context:
> >> http://www.nabble.com/spell-checking-tp23835427p23835427.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> > 
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> > 
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> > using Solr/Lucene:
> > http://www.lucidimagination.com/search
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/spell-checking-tp23835427p23840843.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: spell checking

Posted by Jeff Newburn <jn...@zappos.com>.
The spell checking dictionary should be built on startup with spellchecking
is enabled in the system.

First we defined the component in solrconfig.xml.  Notice how it has
buildOnCommit to tell it rebuild the dictionary.

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="classname">solr.IndexBasedSpellChecker</str>
      <str name="field">field</str>
      <str name="spellcheckIndexDir">./spellchecker1</str>
      <str name="accuracy">0.5</str>
      <str name="buildOnCommit">true</str>
    </lst>
    <lst name="spellchecker">
      <str name="name">jarowinkler</str>
      <str name="field">field</str>
      <!-- Use a different Distance Measure -->
      <str 
name="distanceMeasure">org.apache.lucene.search.spell.JaroWinklerDistance</s
tr>
      <str name="spellcheckIndexDir">./spellchecker2</str>
      <str name="accuracy">0.5</str>
      <str name="buildOnCommit">true</str>
    </lst>

Second we added the component to the dismax handler:
         <arr name="last-components">
               <str>spellcheck</str>
         </arr>

This seems to work for us.  Hope it helps

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewburn@zappos.com - 702-943-7562


> From: Yao Ge <ya...@gmail.com>
> Reply-To: <so...@lucene.apache.org>
> Date: Tue, 2 Jun 2009 14:03:24 -0700 (PDT)
> To: <so...@lucene.apache.org>
> Subject: Re: spell checking
> 
> 
> Yes. I did. I was not able to grasp the concept of making spell checking
> work.
> For example, the wiki page says an spell check index need to be built. But
> did not say how to do it. Does Solr buid the index out of thin air? Or the
> index is buit from the main index? or index is built form a dictionary or
> word list?
> 
> Please help.
> 
> 
> Grant Ingersoll-6 wrote:
>> 
>> Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent
>> 
>> 
>> On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:
>> 
>>> 
>>> Can someone help providing a tutorial like introduction on how to get
>>> spell-checking work in Solr. It appears many steps are requires
>>> before the
>>> spell-checkering functions can be used. It also appears that a
>>> dictionary (a
>>> list of correctly spelled words) is required to setup the spell
>>> checker. Can
>>> anyone validate my impression?
>>> 
>>> Thanks.
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/spell-checking-tp23835427p23835427.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/spell-checking-tp23835427p23840843.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: spell checking

Posted by Yao Ge <ya...@gmail.com>.
Yes. I did. I was not able to grasp the concept of making spell checking
work.
For example, the wiki page says an spell check index need to be built. But
did not say how to do it. Does Solr buid the index out of thin air? Or the
index is buit from the main index? or index is built form a dictionary or
word list?

Please help.


Grant Ingersoll-6 wrote:
> 
> Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent
> 
> 
> On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:
> 
>>
>> Can someone help providing a tutorial like introduction on how to get
>> spell-checking work in Solr. It appears many steps are requires  
>> before the
>> spell-checkering functions can be used. It also appears that a  
>> dictionary (a
>> list of correctly spelled words) is required to setup the spell  
>> checker. Can
>> anyone validate my impression?
>>
>> Thanks.
>> -- 
>> View this message in context:
>> http://www.nabble.com/spell-checking-tp23835427p23835427.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/spell-checking-tp23835427p23840843.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: spell checking

Posted by Grant Ingersoll <gs...@apache.org>.
Have you gone through: http://wiki.apache.org/solr/SpellCheckComponent


On Jun 2, 2009, at 8:50 AM, Yao Ge wrote:

>
> Can someone help providing a tutorial like introduction on how to get
> spell-checking work in Solr. It appears many steps are requires  
> before the
> spell-checkering functions can be used. It also appears that a  
> dictionary (a
> list of correctly spelled words) is required to setup the spell  
> checker. Can
> anyone validate my impression?
>
> Thanks.
> -- 
> View this message in context: http://www.nabble.com/spell-checking-tp23835427p23835427.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: spell checking

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Jun 4, 2009 at 7:26 PM, Walter Underwood <wu...@netflix.com>wrote:

> "query suggest" --wunder
>
>
How about DidYouMeanComponent?

-- 
Regards,
Shalin Shekhar Mangar.

Re: spell checking

Posted by Michael Ludwig <ml...@as-guides.com>.
Walter Underwood schrieb:
> "query suggest" --wunder

That's very good.

On the other hand, I noticed how the term "spellcheck" is spread
all over the place, and that would be a massive renaming orgy.
An explanation at the appropriate place in the documentation is
less invasive. I added two sentences to the "Introduction" of:

http://wiki.apache.org/solr/SpellCheckComponent

Michael Ludwig

Re: spell checking

Posted by Walter Underwood <wu...@netflix.com>.
"query suggest" --wunder

On 6/4/09 1:25 AM, "Michael Ludwig" <ml...@as-guides.com> wrote:

> Yao Ge schrieb:
> 
>> Maybe we should call this "alternative search terms" or
>> "suggested search terms" instead of spell checking. It is
>> misleading as there is no right or wrong in spelling, there
>> is only popular (term frequency?) alternatives.
> 
> I had exactly the same difficulty in understanding the concept
> because of the name given to the feature, which usually denotes
> just what it says, i.e. a spellchecker, which is driven by an
> authoritative dictionary and a set of rules, as integrated in
> word processors, in order to ensure orthography.
> 
> What we have here is quite different from a spellchecker.
> 
> IMHO, a name conveying the actual meaning, along the lines of
> "suggest", would make more sense.
> 
> Michael Ludwig


Re: spell checking

Posted by Michael Ludwig <ml...@as-guides.com>.
Yao Ge schrieb:

> Maybe we should call this "alternative search terms" or
> "suggested search terms" instead of spell checking. It is
> misleading as there is no right or wrong in spelling, there
> is only popular (term frequency?) alternatives.

I had exactly the same difficulty in understanding the concept
because of the name given to the feature, which usually denotes
just what it says, i.e. a spellchecker, which is driven by an
authoritative dictionary and a set of rules, as integrated in
word processors, in order to ensure orthography.

What we have here is quite different from a spellchecker.

IMHO, a name conveying the actual meaning, along the lines of
"suggest", would make more sense.

Michael Ludwig

Re: spell checking

Posted by Otis Gospodnetic <ot...@yahoo.com>.
I'm glad my late night explanation helped.
You may be right about there being a better name for this functionality.
Note that we do have support for file-based (dictionary-like) spellchecker, too.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yao Ge <ya...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 2, 2009 9:42:48 PM
> Subject: Re: spell checking
> 
> 
> Excellent. Now everything make sense to me. :-)
> 
> The spell checking suggestion is the closest variance of user input that
> actually existed in the main index. So called "correction" is relative the
> text existed indexed. So there is no need for a brute force list of all
> correctly spelled words. Maybe we should call this "alternative search
> terms" or "suggested search terms" instead of spell checking. It is
> misleading as there is no right or wrong in spelling, there is only popular
> (term frequency?) alternatives.
> 
> Thanks for the insight.
> 
> 
> Otis Gospodnetic wrote:
> > 
> > 
> > Hello,
> > 
> > In short, the assumption behind this type of SC is that the text in the
> > main index is (mostly) correctly spelled.  When the SC finds query
> > terms that are close in spelling to words indexed in SC, it offers
> > spelling suggestions/correction using those presumably correctly spelled
> > terms (there are other parameters that control the exact behaviour, but
> > this is the idea)
> > 
> > Solr (Lucene's spellchecker, which Solr uses under the hood, actually)
> > turn the input text (values from those fields you copy to the spell field)
> > into so called n-grams.  You can see that if you open up the SC index with
> > something like Luke.  Please see
> > http://wiki.apache.org/jakarta-lucene/SpellChecker .
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Yao Ge 
> >> To: solr-user@lucene.apache.org
> >> Sent: Tuesday, June 2, 2009 5:34:07 PM
> >> Subject: Re: spell checking
> >> 
> >> 
> >> Sorry for not be able to get my point across.
> >> 
> >> I know the syntax that leads to a index build for spell checking. I
> >> actually
> >> run the command saw some additional file created in data\spellchecker1
> >> directory. What I don't understand is what is in there as I can not trick
> >> Solr to make spell suggestions based on the documented query structure in
> >> wiki. 
> >> 
> >> Can anyone tell me what happened after when the default spell check is
> >> built? In my case, I used copyField to copy a couple of text fields into
> >> a
> >> field called "spell". These fields are the original text, they are the
> >> ones
> >> with typos that I need to run spell check on. But how can these original
> >> data be used as a base for spell checking? How does Solr know what are
> >> correctly spelled words?
> >> 
> >>  
> >> multiValued="true"/>
> >>  
> >> multiValued="true"/>
> >>    ...
> >>  
> >> multiValued="true"/>
> >>    ...
> >>  
> >>  
> >> 
> >> 
> >> 
> >> Yao Ge wrote:
> >> > 
> >> > Can someone help providing a tutorial like introduction on how to get
> >> > spell-checking work in Solr. It appears many steps are requires before
> >> the
> >> > spell-checkering functions can be used. It also appears that a
> >> dictionary
> >> > (a list of correctly spelled words) is required to setup the spell
> >> > checker. Can anyone validate my impression?
> >> > 
> >> > Thanks.
> >> > 
> >> 
> >> -- 
> >> View this message in context: 
> >> http://www.nabble.com/spell-checking-tp23835427p23841373.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/spell-checking-tp23835427p23844050.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: spell checking

Posted by Yao Ge <ya...@gmail.com>.
Excellent. Now everything make sense to me. :-)

The spell checking suggestion is the closest variance of user input that
actually existed in the main index. So called "correction" is relative the
text existed indexed. So there is no need for a brute force list of all
correctly spelled words. Maybe we should call this "alternative search
terms" or "suggested search terms" instead of spell checking. It is
misleading as there is no right or wrong in spelling, there is only popular
(term frequency?) alternatives.

Thanks for the insight.


Otis Gospodnetic wrote:
> 
> 
> Hello,
> 
> In short, the assumption behind this type of SC is that the text in the
> main index is (mostly) correctly spelled.  When the SC finds query
> terms that are close in spelling to words indexed in SC, it offers
> spelling suggestions/correction using those presumably correctly spelled
> terms (there are other parameters that control the exact behaviour, but
> this is the idea)
> 
> Solr (Lucene's spellchecker, which Solr uses under the hood, actually)
> turn the input text (values from those fields you copy to the spell field)
> into so called n-grams.  You can see that if you open up the SC index with
> something like Luke.  Please see
> http://wiki.apache.org/jakarta-lucene/SpellChecker .
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: Yao Ge <ya...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, June 2, 2009 5:34:07 PM
>> Subject: Re: spell checking
>> 
>> 
>> Sorry for not be able to get my point across.
>> 
>> I know the syntax that leads to a index build for spell checking. I
>> actually
>> run the command saw some additional file created in data\spellchecker1
>> directory. What I don't understand is what is in there as I can not trick
>> Solr to make spell suggestions based on the documented query structure in
>> wiki. 
>> 
>> Can anyone tell me what happened after when the default spell check is
>> built? In my case, I used copyField to copy a couple of text fields into
>> a
>> field called "spell". These fields are the original text, they are the
>> ones
>> with typos that I need to run spell check on. But how can these original
>> data be used as a base for spell checking? How does Solr know what are
>> correctly spelled words?
>> 
>>   
>> multiValued="true"/>
>>   
>> multiValued="true"/>
>>    ...
>>   
>> multiValued="true"/>
>>    ...
>>   
>>   
>> 
>> 
>> 
>> Yao Ge wrote:
>> > 
>> > Can someone help providing a tutorial like introduction on how to get
>> > spell-checking work in Solr. It appears many steps are requires before
>> the
>> > spell-checkering functions can be used. It also appears that a
>> dictionary
>> > (a list of correctly spelled words) is required to setup the spell
>> > checker. Can anyone validate my impression?
>> > 
>> > Thanks.
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/spell-checking-tp23835427p23841373.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/spell-checking-tp23835427p23844050.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: spell checking

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hello,

In short, the assumption behind this type of SC is that the text in the
main index is (mostly) correctly spelled.  When the SC finds query
terms that are close in spelling to words indexed in SC, it offers
spelling suggestions/correction using those presumably correctly spelled terms (there are other parameters that control the exact behaviour, but this is the idea)

Solr (Lucene's spellchecker, which Solr uses under the hood, actually) turn the input text (values from those fields you copy to the spell field) into so called n-grams.  You can see that if you open up the SC index with something like Luke.  Please see
http://wiki.apache.org/jakarta-lucene/SpellChecker .

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Yao Ge <ya...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 2, 2009 5:34:07 PM
> Subject: Re: spell checking
> 
> 
> Sorry for not be able to get my point across.
> 
> I know the syntax that leads to a index build for spell checking. I actually
> run the command saw some additional file created in data\spellchecker1
> directory. What I don't understand is what is in there as I can not trick
> Solr to make spell suggestions based on the documented query structure in
> wiki. 
> 
> Can anyone tell me what happened after when the default spell check is
> built? In my case, I used copyField to copy a couple of text fields into a
> field called "spell". These fields are the original text, they are the ones
> with typos that I need to run spell check on. But how can these original
> data be used as a base for spell checking? How does Solr know what are
> correctly spelled words?
> 
>   
> multiValued="true"/>
>   
> multiValued="true"/>
>    ...
>   
> multiValued="true"/>
>    ...
>   
>   
> 
> 
> 
> Yao Ge wrote:
> > 
> > Can someone help providing a tutorial like introduction on how to get
> > spell-checking work in Solr. It appears many steps are requires before the
> > spell-checkering functions can be used. It also appears that a dictionary
> > (a list of correctly spelled words) is required to setup the spell
> > checker. Can anyone validate my impression?
> > 
> > Thanks.
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/spell-checking-tp23835427p23841373.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: spell checking

Posted by Yao Ge <ya...@gmail.com>.
Sorry for not be able to get my point across.

I know the syntax that leads to a index build for spell checking. I actually
run the command saw some additional file created in data\spellchecker1
directory. What I don't understand is what is in there as I can not trick
Solr to make spell suggestions based on the documented query structure in
wiki. 

Can anyone tell me what happened after when the default spell check is
built? In my case, I used copyField to copy a couple of text fields into a
field called "spell". These fields are the original text, they are the ones
with typos that I need to run spell check on. But how can these original
data be used as a base for spell checking? How does Solr know what are
correctly spelled words?

   <field name="tech_comment" type="text" indexed="true" stored="true"
multiValued="true"/>
   <field name="cust_comment" type="text" indexed="true" stored="true"
multiValued="true"/>
   ...
   <field name="spell" type="textSpell" indexed="true" stored="true"
multiValued="true"/>
   ...
   <copyField source="tech_comment" dest="spell"/>
   <copyField source="cust_comment" dest="spell"/>



Yao Ge wrote:
> 
> Can someone help providing a tutorial like introduction on how to get
> spell-checking work in Solr. It appears many steps are requires before the
> spell-checkering functions can be used. It also appears that a dictionary
> (a list of correctly spelled words) is required to setup the spell
> checker. Can anyone validate my impression?
> 
> Thanks.
> 

-- 
View this message in context: http://www.nabble.com/spell-checking-tp23835427p23841373.html
Sent from the Solr - User mailing list archive at Nabble.com.