You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by yufcuy <gi...@git.apache.org> on 2016/09/13 09:42:06 UTC

[GitHub] commons-lang pull request #189: new impl of LevenshteinDistance

GitHub user yufcuy opened a pull request:

    https://github.com/apache/commons-lang/pull/189

    new impl of LevenshteinDistance

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yufcuy/commons-lang master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/commons-lang/pull/189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #189
    
----
commit aa5b88ec1012a738bfd24d21ed91e88b3edd094d
Author: yufcuy <yu...@gmail.com>
Date:   2016-09-13T09:38:00Z

    new impl of LevenshteinDistance

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    Though I also agree with @britter that some better description would help reviewing it :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    I think string distances were the one of the first things we thought about moving to [text]. Might be interesting to discuss implementing - if it makes sense - this change there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang pull request #189: new impl of LevenshteinDistance

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/commons-lang/pull/189


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by PascalSchumacher <gi...@git.apache.org>.
Github user PascalSchumacher commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    @kinow I think this has to stay in lang. At least until lang 4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by britter <gi...@git.apache.org>.
Github user britter commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    @kinow can you take the lead in reviewing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by yufcuy <gi...@git.apache.org>.
Github user yufcuy commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    Hello, @britter @kinow   
    The details of Levenshtein distance can be find at https://en.wikipedia.org/wiki/Levenshtein_distance, as it describe, the algorithm is compute a matrix to hold the Levenshtein distances between all prefixes of the first string and all prefixes of the second. For all i and j, d[i][j] will hold the Levenshtein distance between the first i characters of s and the first j characters of t. 
    The previous impl use two matrix rows for the construction. When we calculate the value of d[i][j] we find only the value of d[i-1][j-1], d[i-1][j] and d[i][j-1] are used according to the algorithm. And we use a matrix row p[] for the construction, when calculate p[j] at the ith iteration, it actually calculate the value of d[i][j], and only d[i-1][j-1] is covered by p[j-1], so we use variable 'upper_left' to hold d[i-1][j-1], and use variable 'upper' to hold next 'upper_left'.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by britter <gi...@git.apache.org>.
Github user britter commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    Hello @yufcuy,
    
    can you provide more information as to why you think this change is necessary? Does it improve performance properties? Do you have benchmarked the new implementation against the old implementation?
    
    Regards,
    Benedikt


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by PascalSchumacher <gi...@git.apache.org>.
Github user PascalSchumacher commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    @yufcuy: Thanks for the pull request. :+1: 
    
    @kinow: Thanks for the review. I have added the link and replaced tabs with spaces after merging. Feel free to change/improve anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    
    [![Coverage Status](https://coveralls.io/builds/7859475/badge)](https://coveralls.io/builds/7859475)
    
    Coverage increased (+0.02%) to 93.592% when pulling **aa5b88ec1012a738bfd24d21ed91e88b3edd094d on yufcuy:master** into **dad86bc0a29689fd29bf03b382a39621718e8b05 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    Ack @PascalSchumacher after @yufcuy 's feedback we can merge it and include in 3.x releases, and then think where, and if, we should move text-related code :-)
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] commons-lang issue #189: new impl of LevenshteinDistance

Posted by kinow <gi...@git.apache.org>.
Github user kinow commented on the issue:

    https://github.com/apache/commons-lang/pull/189
  
    Hi @yufcuy,
    
    Sorry for the delay to look into this.
    
    I looked at the first two implements this morning to refresh my memory. The first one creating the whole comparison table, and the second one with just the current and previous row. Both described in the Wikipedia page you linked (thanks for that).
    
    Then I started reviewing your pull request, and I believe your implementation is correct :-) though I couldn't find the exact algorithm implementation description on Wikipedia. The best I could find was this page: http://blog.softwx.net/2014/12/optimizing-levenshtein-algorithm-in-c.html
    
    The page mentioned above mentions the single-array approach. We could add a link to it in the Javadoc, as for the previous two implementations. What do you think?
    
    There are other trivial changes regarding spelling, typos, tabs vs. spaces, etc. So I will add a few more comments, but all tests are passing, I see no regression (feature-wise, or in performance), and if you agree with the minor adjustments we may have to do, then I believe we are ready to merge it.
    
    @britter should we keep it in [lang], or in [text]? Either way, I will replicate the change in the Levenshtein implementation in [text] :-)
    
    Cheers
    Bruno


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---