You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by bu...@apache.org on 2004/10/10 13:47:35 UTC
DO NOT REPLY [Bug 31617] New: -
David Spencer Spell Checker improved
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=31617>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=31617
David Spencer Spell Checker improved
Summary: David Spencer Spell Checker improved
Product: Lucene
Version: unspecified
Platform: All
OS/Version: Other
Status: NEW
Severity: Enhancement
Priority: Other
Component: Search
AssignedTo: lucene-dev@jakarta.apache.org
ReportedBy: nicoo_@hotmail.com
hy,
i developed a SpellChecker based on the David Spencer code (DSc) but more flexible.
the structure of the index is inspired of the DSc (for a 3-4 gram):
word:
gram3:
gram4:
3start:
4start:
..
3end:
4end:
..
transposition:
This index is a dictonary so there isn't the "freq" field like with DSc version.
it's independant of the user index. So we can add words becoming to several
fields of several index for example or, why not, to a file with a list of words.
The suggestSimilar method return a list of suggests word sorted by the
Levenshtein distance and optionaly to the popularity of the word for a specific
field in a user index. More of that, this list can be restricted only to words
present in a specific field of a user index.
See the test case.
i hope this code will be put in the lucene sandbox.
Nicolas Maisonneuve
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org