You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "M. Steiger (JIRA)" <ji...@apache.org> on 2016/01/13 16:54:39 UTC

[jira] [Created] (LANG-1199) Incorrect implementation of StringUtils.getJaroWinklerDistance()

M. Steiger created LANG-1199:
--------------------------------

             Summary: Incorrect implementation of StringUtils.getJaroWinklerDistance()
                 Key: LANG-1199
                 URL: https://issues.apache.org/jira/browse/LANG-1199
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 3.4
            Reporter: M. Steiger


The current implementation of StringUtils.getJaroWinklerDistance() does not compute the correct result in some cases. See #LANG-944 for the initial code contribution.

StringUtils.getJaroWinklerDistance("Haus Ingeborg", "Ingeborg Esser") == 0.0

This is due to the incorrect computation of common characters, which causes the algorithm to exit prematurely.

In contrast, the implementation in Lucene gives ~0.63, which is about right.

    JaroWinklerDistance d = new JaroWinklerDistance();
    getDistance("Haus Ingeborg", "Ingeborg Esser");

See https://lucene.apache.org/core/3_0_3/api/contrib-spellchecker/org/apache/lucene/search/spell/JaroWinklerDistance.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)