You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Naber <da...@t-online.de> on 2004/08/08 20:16:25 UTC
The fuzziness of FuzzyQuery
Hi,
I think FuzzyQuery is not as useful as it could be, because it's too fuzzy.
For a word with 10 characters it allows an edit distance of 4, i.e. almost
half of the word can be different. I suggest to add an option so the
fuzziness can be configured, as in the attached patch. If nobody objects,
I will commit it (plus test cases). I'll later also try to modify
QueryParser to support this, but I cannot promise to get that working.
One thing I don't quite understand is the meaning of scale_factor. Does it
make sense to configure that from outside, too?
Regards
Daniel
--
http://www.danielnaber.de
Re: The fuzziness of FuzzyQuery
Posted by Christoph Goller <go...@detego-software.de>.
Daniel Naber wrote:
> Hi,
>
> I think FuzzyQuery is not as useful as it could be, because it's too fuzzy.
> For a word with 10 characters it allows an edit distance of 4, i.e. almost
> half of the word can be different. I suggest to add an option so the
> fuzziness can be configured, as in the attached patch. If nobody objects,
> I will commit it (plus test cases). I'll later also try to modify
> QueryParser to support this, but I cannot promise to get that working.
>
> One thing I don't quite understand is the meaning of scale_factor. Does it
> make sense to configure that from outside, too?
+1 for these changes.
I think it does not make sense to change scale_factor from outside.
It has to be computed from minimumSimilarity/FUZZY_THRESHOLD so that
the difference for exact matches remains 1.0 (used as boost later).
Christoph
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org