You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Fabian Vigna <fv...@oceansys.com> on 2013/11/27 15:55:20 UTC

Similarity - No Match

Hello everybody,

 

The case I have pending pertains to BANK REFAH.   If you enter BANK REFHA
inverting the last two letters, it does not find a match with Similarity 6.
It does find it with similarity 5.

 

(REFHA~0.6 AND BANK~0.6)

 

My question is: Why just inverting the last 2 letters it does not find a
match?

 

 

Thanks!

 

Fabian


Re: Similarity - No Match

Posted by Jack Krupansky <ja...@basetechnology.com>.
The decimal similarity gets translated into a number of characters, based on the term length. so it will be 1, 2, 3, or 4, which correspond to 0.25, 0.50, 0.75, or 1.00. Your 0.6 is getting rounded up to 0.75, which means three-quarters or three out of four characters must match. With 0.5, only two out of four characters must match.

(Note: This is not a precise description of fuzzy matching, but close enough to explain the issue here.)

Also, decimal similarity for fuzzy query is deprecated in favor of specifying the editing distance, so you should be using ~1 or ~2 – only 0, 1, and 2 are supported.

-- Jack Krupansky

From: Fabian Vigna 
Sent: Wednesday, November 27, 2013 9:55 AM
To: dev@lucene.apache.org 
Subject: Similarity - No Match

Hello everybody,

 

The case I have pending pertains to BANK REFAH.   If you enter BANK REFHA inverting the last two letters, it does not find a match with Similarity 6.  It does find it with similarity 5.

 

(REFHA~0.6 AND BANK~0.6)

 

My question is: Why just inverting the last 2 letters it does not find a match?

 

 

Thanks!

 

Fabian