You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/10/10 13:47:35 UTC

DO NOT REPLY [Bug 31617] New: - David Spencer Spell Checker improved

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=31617>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=31617

David Spencer Spell Checker improved

           Summary: David Spencer Spell Checker improved
           Product: Lucene
           Version: unspecified
          Platform: All
        OS/Version: Other
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: Search
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: nicoo_@hotmail.com


hy,
i developed a SpellChecker based on the David Spencer code (DSc) but more flexible.
the structure of the index is inspired of the DSc (for a 3-4 gram):
word:
gram3:
gram4:
 
3start:
4start:
..
3end:
4end:
..
transposition:
 
This index is a dictonary so there isn't the "freq" field like with DSc version.
it's independant of the user index. So we can add words becoming to several
fields of several index for example or, why not, to a file with a list of words.
The suggestSimilar method return a list of suggests word sorted by the
Levenshtein distance and optionaly to the popularity of the word for a specific
field in a user index. More of that, this list can be restricted only to words
present in a specific field of a user index.
 
See the test case.
 
i hope this code will be put in the lucene sandbox. 
 
Nicolas Maisonneuve

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org