You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Bruno P. Kinoshita (JIRA)" <ji...@apache.org> on 2014/10/25 03:23:33 UTC
[jira] [Commented] (LANG-591) A more complex Levenshtein distance
would be useful
[ https://issues.apache.org/jira/browse/LANG-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183841#comment-14183841 ]
Bruno P. Kinoshita commented on LANG-591:
-----------------------------------------
Hello,
I need to do some data matching for a project, and started using the levenshtein distance from StringUtils. Ended up using a mix of code from other projects (simmetric, lingpipe, talend, etc), and realized there are several edit distance algorithms (jaccard, jaro-wrinkler, damerau-levenshtein, bitap, q-gram, etc).
Are there plans to include these other algorithms in [lang]? IIRC, somewhere someone talked about a commons-text component, though I'm not aware if there's such a component in sandbox or attic, but maybe these algorithms could fit there?
> A more complex Levenshtein distance would be useful
> ---------------------------------------------------
>
> Key: LANG-591
> URL: https://issues.apache.org/jira/browse/LANG-591
> Project: Commons Lang
> Issue Type: New Feature
> Components: lang.*
> Affects Versions: 3.0
> Reporter: Benson Margulies
> Fix For: Review Patch
>
> Attachments: LANG-591.patch
>
>
> For some applications, it is necessary to get insert/delete/substitution counts from the distance algorithm. I am attaching a patch that provides this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)