You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Naber <da...@t-online.de> on 2004/08/08 20:16:25 UTC

The fuzziness of FuzzyQuery

Hi,

I think FuzzyQuery is not as useful as it could be, because it's too fuzzy. 
For a word with 10 characters it allows an edit distance of 4, i.e. almost 
half of the word can be different. I suggest to add an option so the 
fuzziness can be configured, as in the attached patch. If nobody objects, 
I will commit it (plus test cases). I'll later also try to modify 
QueryParser to support this, but I cannot promise to get that working.

One thing I don't quite understand is the meaning of scale_factor. Does it 
make sense to configure that from outside, too?

Regards
 Daniel

-- 
http://www.danielnaber.de

Re: The fuzziness of FuzzyQuery

Posted by Christoph Goller <go...@detego-software.de>.
Daniel Naber wrote:
> Hi,
> 
> I think FuzzyQuery is not as useful as it could be, because it's too fuzzy. 
> For a word with 10 characters it allows an edit distance of 4, i.e. almost 
> half of the word can be different. I suggest to add an option so the 
> fuzziness can be configured, as in the attached patch. If nobody objects, 
> I will commit it (plus test cases). I'll later also try to modify 
> QueryParser to support this, but I cannot promise to get that working.
> 
> One thing I don't quite understand is the meaning of scale_factor. Does it 
> make sense to configure that from outside, too?

+1 for these changes.
I think it does not make sense to change scale_factor from outside.
It has to be computed from minimumSimilarity/FUZZY_THRESHOLD so that
the difference for exact matches remains 1.0 (used as boost later).

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org