You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ilay raja <il...@gmail.com> on 2013/04/08 15:02:12 UTC

FileBasedSpellchecker with Frequency comaprator

Hi

  I want to configure file based spellchecker for my application. I am
taking the words frol spellcheck.txt file and building the spellcheckerFile
directory index. It works fine. But it is not using the frequency of the
words into consideration while giving the spell suggestion. I have
duplicated the terms that are important in the spellcheck.txt file, by
repeating as many times as needed, since FileBasedSpellcheker cannot take
the numeric frequency. But still it does not reflect in scoring. Is it the
way to go about it, can someone please explain this clearly how solr
supports to build FileBasedSpell Check index from a file along with
frequency? Is it doable by configuring in solrconfig.xml or should we need
to write spellcheck client explicity?

RE: FileBasedSpellchecker with Frequency comaprator

Posted by "Dyer, James" <Ja...@ingramcontent.com>.
I do not believe that FileBasedSpellchecker takes frequency into account at all.  That would be a nice enhancement though.

To get what you wanted, you could index one or more documents containing the words in your file then create a spellchecker using IndexBasedSpellChecker or DirectSolrSpellChecker.  I don't remember off-hand how the spellcheckers count document frequency whether or not multiple occurances in the same document count (I think they do).  If so, you could accomplish this with 1 dummy spellcheck-building document and 1 big indexed field. You could even create an IndexBasedSpellChecker dictionary then delete the dummy document(s).  (but be sure to lock down "spellcheck.build", possibly by putting it in the "invariants" section of all your request handlers so that you don't accidently overlay it). 

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: ilay raja [mailto:ilay.msp@gmail.com] 
Sent: Monday, April 08, 2013 8:02 AM
To: solr-user@lucene.apache.org; solr-dev@lucene.apache.org
Subject: FileBasedSpellchecker with Frequency comaprator

Hi

  I want to configure file based spellchecker for my application. I am
taking the words frol spellcheck.txt file and building the spellcheckerFile
directory index. It works fine. But it is not using the frequency of the
words into consideration while giving the spell suggestion. I have
duplicated the terms that are important in the spellcheck.txt file, by
repeating as many times as needed, since FileBasedSpellcheker cannot take
the numeric frequency. But still it does not reflect in scoring. Is it the
way to go about it, can someone please explain this clearly how solr
supports to build FileBasedSpell Check index from a file along with
frequency? Is it doable by configuring in solrconfig.xml or should we need
to write spellcheck client explicity?