You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Andrea Vacondio (JIRA)" <ji...@apache.org> on 2011/03/02 11:57:36 UTC

[jira] Created: (LANG-680) StringUtils - Longest Common Substring / Longest common susequence

StringUtils - Longest Common Substring / Longest common susequence
------------------------------------------------------------------

                 Key: LANG-680
                 URL: https://issues.apache.org/jira/browse/LANG-680
             Project: Commons Lang
          Issue Type: New Feature
          Components: lang.*
            Reporter: Andrea Vacondio


I recently needed to perform Longest commons substring on a collection of filenames and I think it could be usefull to have it in StringUtils (I couldn't find any discussion about it).
Some detail here 
http://en.wikipedia.org/wiki/Longest_common_substring
and here
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (LANG-680) StringUtils - Longest Common Substring / Longest common susequence

Posted by "Thomas Neidhart (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229742#comment-13229742 ] 

Thomas Neidhart commented on LANG-680:
--------------------------------------

Work on this issue has been started, please see https://github.com/netomi/suffixtree for more information.

The code is already functional, there are some things missing tough:

 * better test coverage
 * API review
 * javadoc
                
> StringUtils - Longest Common Substring / Longest common susequence
> ------------------------------------------------------------------
>
>                 Key: LANG-680
>                 URL: https://issues.apache.org/jira/browse/LANG-680
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>            Reporter: Andrea Vacondio
>              Labels: LCS,, Longest, common, substring
>             Fix For: 3.x
>
>
> I recently needed to perform Longest commons substring on a collection of filenames and I think it could be usefull to have it in StringUtils (I couldn't find any discussion about it).
> Some detail here 
> http://en.wikipedia.org/wiki/Longest_common_substring
> and here
> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (LANG-680) StringUtils - Longest Common Substring / Longest common susequence

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LANG-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068858#comment-13068858 ] 

Henri Yandell commented on LANG-680:
------------------------------------

Digging into this, the easy API would be:

  public static CharSequence lcs(CharSequence, CharSequence)

That returns one of the longest common substrings; either the first or last found.

Variants would be to return CharSequence[], ie) all of them; and to implement this for N strings. Leading us to:

  public static CharSequence[] lcs(CharSequence...)

> StringUtils - Longest Common Substring / Longest common susequence
> ------------------------------------------------------------------
>
>                 Key: LANG-680
>                 URL: https://issues.apache.org/jira/browse/LANG-680
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>            Reporter: Andrea Vacondio
>              Labels: LCS,, Longest, common, substring
>             Fix For: 3.x
>
>
> I recently needed to perform Longest commons substring on a collection of filenames and I think it could be usefull to have it in StringUtils (I couldn't find any discussion about it).
> Some detail here 
> http://en.wikipedia.org/wiki/Longest_common_substring
> and here
> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (LANG-680) StringUtils - Longest Common Substring / Longest common susequence

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LANG-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart updated LANG-680:
---------------------------------

    Attachment: LANG-680.patch

The attached patch implements the longest common substring methods using the dynamic programming algorithm.

The interface is the same as outlined by Henri, the variant with an array of strings is omitted so far, as this would require a suffix tree data structure to compute the lcs.

For now, the lcsAll method returns *all* found longest common substrings, and returns null if none could be found (i.e. the lcs would be an empty string). This could be changed to return an empty array, but I found this more suited.
                
> StringUtils - Longest Common Substring / Longest common susequence
> ------------------------------------------------------------------
>
>                 Key: LANG-680
>                 URL: https://issues.apache.org/jira/browse/LANG-680
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>            Reporter: Andrea Vacondio
>              Labels: LCS,, Longest, common, substring
>             Fix For: 3.x
>
>         Attachments: LANG-680.patch
>
>
> I recently needed to perform Longest commons substring on a collection of filenames and I think it could be usefull to have it in StringUtils (I couldn't find any discussion about it).
> Some detail here 
> http://en.wikipedia.org/wiki/Longest_common_substring
> and here
> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (LANG-680) StringUtils - Longest Common Substring / Longest common susequence

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LANG-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-680:
-------------------------------

    Fix Version/s: 3.1

Seems like a fair feature. Setting to 3.1, needs someone to implement a patch with implementation and unit test.

> StringUtils - Longest Common Substring / Longest common susequence
> ------------------------------------------------------------------
>
>                 Key: LANG-680
>                 URL: https://issues.apache.org/jira/browse/LANG-680
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>            Reporter: Andrea Vacondio
>              Labels: LCS,, Longest, common, substring
>             Fix For: 3.1
>
>
> I recently needed to perform Longest commons substring on a collection of filenames and I think it could be usefull to have it in StringUtils (I couldn't find any discussion about it).
> Some detail here 
> http://en.wikipedia.org/wiki/Longest_common_substring
> and here
> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira