You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Bruno P. Kinoshita (JIRA)" <ji...@apache.org> on 2014/10/25 03:23:33 UTC

[jira] [Commented] (LANG-591) A more complex Levenshtein distance would be useful

    [ https://issues.apache.org/jira/browse/LANG-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183841#comment-14183841 ] 

Bruno P. Kinoshita commented on LANG-591:
-----------------------------------------

Hello, 

I need to do some data matching for a project, and started using the levenshtein distance from StringUtils. Ended up using a mix of code from other projects (simmetric, lingpipe, talend, etc), and realized there are several edit distance algorithms (jaccard, jaro-wrinkler, damerau-levenshtein, bitap, q-gram, etc).

Are there plans to include these other algorithms in [lang]? IIRC, somewhere someone talked about a commons-text component, though I'm not aware if there's such a component in sandbox or attic, but maybe these algorithms could fit there? 

> A more complex Levenshtein distance would be useful
> ---------------------------------------------------
>
>                 Key: LANG-591
>                 URL: https://issues.apache.org/jira/browse/LANG-591
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>    Affects Versions: 3.0
>            Reporter: Benson Margulies
>             Fix For: Review Patch
>
>         Attachments: LANG-591.patch
>
>
> For some applications, it is necessary to get insert/delete/substitution counts from the distance algorithm. I am attaching a patch that provides this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)