You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by bu...@apache.org on 2006/03/09 19:20:46 UTC

DO NOT REPLY [Bug 38911] New: - Commons Lang StringUtils#getLevenshteinDistance() performance is sub-optimal

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911

           Summary: Commons Lang StringUtils#getLevenshteinDistance()
                    performance is sub-optimal
           Product: Commons
           Version: unspecified
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Lang
        AssignedTo: commons-dev@jakarta.apache.org
        ReportedBy: cedrik.lime@gmail.com


The implementation of Commons Lang StringUtils#getLevenshteinDistance(String,
String) is based on work from <http://www.merriampark.com/ld.htm>. While this
implementation works, it is *very* memory hungry and can thus slow down heavy
computations (GC has much more to collect in memory-constrained environment).
Actual implementation needs x*y byte of memory.

An improved implementation can be found at
<http://www.merriampark.com/ldjava.htm>, which can lead to performance
improvements of up to 3 times (my own internal benchmarks in low-memory
situation). This new implementation needs x+y bytes of memory.

Please change the getLevenshteinDistance() implementation to use the one at
<http://www.merriampark.com/ldjava.htm>.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 38911] - [lang] StringUtils#getLevenshteinDistance() performance is sub-optimal

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911


ggregory@seagullsw.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Commons Lang                |[lang]
                   |StringUtils#getLevenshteinDi|StringUtils#getLevenshteinDi
                   |stance() performance is sub-|stance() performance is sub-
                   |optimal                     |optimal




------- Additional Comments From ggregory@seagullsw.com  2006-03-09 18:32 -------
C�drik: Please feel free to provide a diff patch.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 38911] - [lang] StringUtils#getLevenshteinDistance() performance is sub-optimal

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911


bayard@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |2.2




-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 38911] - [lang] StringUtils#getLevenshteinDistance() performance is sub-optimal

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911


bayard@apache.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From bayard@apache.org  2006-03-14 06:08 -------
Code applied. Unit tests pass. 

I'll drop an email to both Chas Emerick and Michael Gilleland to let them know
the code has finally gone in - my fault two and a half years ago.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 38911] - [lang] StringUtils#getLevenshteinDistance() performance is sub-optimal

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911





------- Additional Comments From bayard@apache.org  2006-03-14 05:59 -------

Here's the history:
http://mail-archives.apache.org/mod_mbox/jakarta-commons-dev/200310.mbox/%3CA34BA4886ADBBD4A804452178366659B160B14@solidus-fs1.solidusnetworks.com%3E

Not sure why it didn't get applied - either I dropped the ball, or I was waiting
for a second reply.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 38911] - [lang] StringUtils#getLevenshteinDistance() performance is sub-optimal

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911





------- Additional Comments From bayard@apache.org  2006-03-14 06:24 -------
Sending        src/java/org/apache/commons/lang/StringUtils.java
Transmitting file data .
Committed revision 385745. 

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 38911] - [lang] StringUtils#getLevenshteinDistance() performance is sub-optimal

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911





------- Additional Comments From cedrik.lime@gmail.com  2006-03-09 21:02 -------
Created an attachment (id=17859)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17859&action=view)
Source code for new implementation

In case I didn't get the pach file right... :)

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


DO NOT REPLY [Bug 38911] - [lang] StringUtils#getLevenshteinDistance() performance is sub-optimal

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=38911>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=38911





------- Additional Comments From cedrik.lime@gmail.com  2006-03-09 21:01 -------
Created an attachment (id=17858)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=17858&action=view)
patch file

Diff between original StringUtils.java and my patched one.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org