You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2013/04/18 18:24:25 UTC

[Solr Wiki] Update of "FileBasedSpellChecker" by MarkBennett

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "FileBasedSpellChecker" page has been changed by MarkBennett:
http://wiki.apache.org/solr/FileBasedSpellChecker?action=diff&rev1=2&rev2=3

Comment:
Adding hints about using the file based spellchecker

  junk
  }}}
  
+ = Hints =
+ 
+ The !FileBasedSpellChecker is very similar in operation to the other v3 !IndexBasedSpellChecker, both of which require a separate Lucene index to be built; this can be a bit confusing to those who've only ever used v4's DirectSolrSpellChecker which doesn't have that requirements.  In particular, you still need to index the dictionary file once by issuing a search with '''&spellcheck.build=true''' on the end of the URL; if you system doesn't update that dictionary file, then this only needs to be done once.    This manual step may be required even if your configuration sets build=true and reload=true.
+ 
+ In the default solrconfig.xml there's a sample commented out configuration, '''<str name="name">file</str>''', that can be used as a template.  The name "file" is arbitrary and you can have several file based spellcheckers, pointing to different flat files, with different names.  Elsewhere in solrconfig you'd reference the named '''file''' configuration with <str name="spellcheck.dictionary">file</str>.
+ 
+ Make sure the "field" parameter points to a valid field name in your schema.xml file
+ 
+ Using a full English dictionary is possible and may seem like a good idea at first, but a full dictionary may have words that doesn't appear in your index, so suggesting them to users would typically be a bad idea because they'd get zero hits.  If you're using the collator then it should filter out corrections that have zero hits, so might avoid the problem, although it brings into question the value of having brought in those extra suggestions in the first place.
+ 
+ If you do have some use case that would benefit from full-language spelling suggestions (perhaps as an education tool, vs. creating clickable links), there are numerous open source dictionaries available for different languages.  A google search for '''open source dictionaries''' should help; both '''GNU ASpell''' and '''!OpenOffice''' have open source dictionaries, however they are often in various nested and compressed formats so some preprocessing will be needed to get them into Solr's plain text format.  Indexing of the entire !OpenOffice English dictionary took less than a minute on a 2012 !MacBook pro.
+