You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2009/07/14 02:39:05 UTC

Re: Spell checking: Is there a way to exclude words known to be wrong?

I don't think there is a way currently, but it might make a nice patch. Or
you could just implement a custom SolrSpellChecker - both
FileBasedSpellChecker and IndexBasedSpellChecker are actually like maybe 50
lines of code or less. It would be fairly quick to just plug a custom
version in as a plugin.

-- 
- Mark

http://www.lucidimagination.com

On Mon, Jul 13, 2009 at 8:27 PM, Jay Hill <ja...@gmail.com> wrote:

> We're building a spell index from a field in our main index with the
> following configuration:
>  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>    <str name="queryAnalyzerFieldType">textSpell</str>
>    <lst name="spellchecker">
>      <str name="name">default</str>
>      <str name="field">spell</str>
>      <str name="spellcheckIndexDir">./spellchecker</str>
>      <str name="buildOnCommit">true</str>
>    </lst>
>  </searchComponent>
>
> This works great and re-builds the spelling index on commits as expected.
> However, we know there are misspellings in the "spell" field of our main
> index. We could remove these from the spelling index using Luke, however
> they will be added again on commits. What we need is something similar to
> how the protwords.txt file is used. So that when we notice misspelled words
> such as "beginnning" being pulled from our main index we could add them to
> an exclusion file so they are not added to the spelling index again.
>
> Any tricks to make this possible?
>
> -Jay
>