You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Bruno P. Kinoshita (JIRA)" <ji...@apache.org> on 2017/04/05 10:09:41 UTC

[jira] [Resolved] (TEXT-76) Jaro Winkler implementation introduced in 3.5 is not correct

     [ https://issues.apache.org/jira/browse/TEXT-76?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bruno P. Kinoshita resolved TEXT-76.
------------------------------------
    Resolution: Fixed

Fixed by removing the Math.round, and returning the original jaro winkler distance. 

The jaro winkler values may vary within the decimal digits. So even fixing the round issue (e.g. by using BigDecimal and rounding with DOWN or FLOOR) we would still have cases returning 0.99 for several pairs, while if you looked at the original value you would be able to tell which are closer to each other.

So now we return the original value as other libraries (e.g. Python Jellyfish, java-string-similarity).

Cheers
Bruno

> Jaro Winkler implementation introduced in 3.5 is not correct
> ------------------------------------------------------------
>
>                 Key: TEXT-76
>                 URL: https://issues.apache.org/jira/browse/TEXT-76
>             Project: Commons Text
>          Issue Type: Bug
>    Affects Versions: 1.0
>            Reporter: Luc Boutier
>            Assignee: Bruno P. Kinoshita
>
> Using 3.5 commons-lang the following call return a distance of 1
> StringUtils.getJaroWinklerDistance(“/opt/software1”,  “/opt/software2”)
> Jaro Winkler says that distance of 1 means equal string which is not the case here.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)