You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2009/02/05 20:50:46 UTC
[Solr Wiki] Update of "SpellCheckingAnalysis" by GrantIngersoll
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by GrantIngersoll:
http://wiki.apache.org/solr/SpellCheckingAnalysis
New page:
= Introduction =
Analysis is a very important factor in spell checking. Stemming and other techniques that change tokens is not recommended since it will result in giving stems as suggestions. Instead, you should use a very minimal tokenization/analysis process like the !StandardAnalyzer or even the !WhitespaceTokenizer plus a simple lower casing filter and a filter that removes apostrophes and the like. As with most things in search, there are always tradeoffs and you should evaluate the results in your application.
That being said, a common configuration for spell checking is:
{{{
<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
}}}
Use a <copyField> to divert your main text fields to the spell field and then configure your spell checker to use the "spell" field to derive the spelling index.