You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2007/05/15 21:48:20 UTC
[Lucene-java Wiki] Update of "SpellChecker" by DanielNaber

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by DanielNaber:
http://wiki.apache.org/jakarta-lucene/SpellChecker

The comment on the change is:
typo and grammar fixes

------------------------------------------------------------------------------
  === SpellChecker ===
  
- A Spell Checker allows to suggest a list of words closed from a misspelled word. This implementation is based on the David Spencer's code using the n-gram method and the Levensthein distance.
+ A Spell Checker allows to suggest a list of words similar to a misspelled word. This implementation is based on David Spencer's code using the n-gram method and the Levenshtein distance.
  
  == Structure of a dictionary index ==
- A  Index (the dictionary) with all the possible words (a lucene index) must be  created. The structure of this index is (for a 3-4 gram).
+ An  index (the dictionary) with all the possible words (a lucene index) must be  created. The structure of this index is (for a 3-4 gram) this:
  || Index Structure || Example ||
  || word || kings ||
  ||gram3|| kin, ing, ngs ||
@@ -15, +15 @@

  ||end3|| ngs||
  ||end4|| ings||
  
- == Importation: add words to the dictionary ==
+ == Import: Adding Words to the Dictionary ==
- we can add the words coming from a Lucene Index (more precisely a set of Lucene fields), why not, from a file with a list of words.
+ We can add the words coming from a Lucene Index (more precisely from a set of Lucene fields), and  from a text file with a list of words.
  
   * Example: we can add all the keywords of a given Lucene field of my index.
   {{{
@@ -24, +24 @@

  spell.indexDictionary(new LuceneDictionary(my_luceneReader,my_fieldname));
   }}}
  
- == get a list of suggested words ==
+ == Getting a List of Suggested Words ==
- The suggestSimilar method return a list of suggested words sorted by:
+ The suggestSimilar method returns a list of suggested words sorted by:
-   1.   the Levenshtein distance (the closest words of the misspelled word is the first of the list).
+   1.   the Levenshtein distance (the most similar word to the misspelled word is the first in the list).
-   2.   (optionaly) the popularity of the word in a given Lucene Field.
+   2.   (optionally) the popularity of the word in a given Lucene Field.
  
- furthermore, that list can be restricted only to the words present in a given Lucene Field.
+ Furthermore, that list can be restricted only to the words present in a given Lucene Field.
  
   * First example: the suggestSimilar(misspelled_word, num_list) method.
    The ''num_list'' is the maximum number of words returned.
@@ -39, +39 @@

     //l[0] = "seventy"
   }}}
  
-  * Second example: the suggestSimilar(misspelled_word, num_list, myIndex_Redear,myField, morePopular)
+  * Second example: the suggestSimilar(misspelled_word, num_list, myIndexReader,myField, morePopular)
-  ''''Note'''': if myIndex_reader and myField are null this method is the same as the first method
+  ''''Note'''': if myIndexReader and myField are null this method is the same as the first method
  
-   1.   The returned words are restricted only to the words presents in the field ''myField'' of the Lucene Index "myIndex_Reader"
+   1.   The returned words are restricted only to the words presents in the field ''myField'' of the Lucene Index "myIndexReader"
-   2.   the list is also sorted with a second criterium: the popularity (the frequence) of the word in the user field
+   2.   The list is also sorted with a second criterium: the popularity (the frequency) of the word in the user field
-   3.   If ''morePopular'' is true and the mispelled word exist in the user field , return only the words more frequent than this.
+   3.   If ''morePopular'' is true and the mispelled word exists in the user field, return only the words more frequent than this.
  
-  See the test case code for example
+  See the test case code for an example.
- 
  
  == Changes ==
  Version 1.1 :
   * sort fixed (the sort was inversed!)
-  * set gram dynamicaly (depending of the length of the word)
+  * set gram dynamically (depending of the length of the word)
   * use the FuzzyQuery score: ((edit distance)/(length of word))
-  * new Dictionary interface + LuceneDictionary  and PlaintextDictionary implementation
+  * new Dictionary interface + LuceneDictionary and PlaintextDictionary implementation
   * replace addWords method by indexDictionary(Dictionnary dic)
-  * add  a new public method: boolean exist(word)
+  * add a new public method: boolean exist(word)
   * add a build.xml
  
  == Credits ==