You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Michael Dang (JIRA)" <ji...@apache.org> on 2006/07/04 02:18:29 UTC

[jira] Created: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Shouldn't Commons Lang's StringUtils have a "common" string method?
-------------------------------------------------------------------

         Key: LANG-269
         URL: http://issues.apache.org/jira/browse/LANG-269
     Project: Commons Lang
        Type: New Feature

 Environment: generic
    Reporter: Michael Dang


A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.

This is very common operation.  For example, one may want to find the common directory of a set of path strings.

passing in:

"/foo/f1.txt"
"/foo/moo/f2.txt"
"/foo/moo/f3.txt"

should returns "/foo/"

It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Michael Dang (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LANG-269?page=comments#action_12420180 ] 

Michael Dang commented on LANG-269:
-----------------------------------

This looks great!  Thanks Scott!

BTW.  should we add getCommonSuffix as well, or that is too much?

--Michael

> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch, v2_StringUtils.java.patch, v2_StringUtilsTest.java.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Updated: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Scott Johnson (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LANG-269?page=all ]

Scott Johnson updated LANG-269:
-------------------------------

    Attachment: StringUtils.patch

Interesting problem...

Here's a patch that adds two new methods to StringUtils. 

The first is a new method indexOfDifference(String[]).  It's very similar to indexOfDifference(String,String) except it operates on an array of Strings.  (It actually uses indexOfDifference(String, String) internally.)  

The second method is called getCommonPrefix (String[]) which does what Michael requested.  It calls indexOfDifference(String[]) to find the position of the first difference, then returns the beginning character sequence that is shared between all of the strings.

Both methods handle null arrays, null array entries and empty Strings.

Let me know what you think.

> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Michael Dang (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LANG-269?page=comments#action_12419905 ] 

Michael Dang commented on LANG-269:
-----------------------------------

Scott, I think this is a very elegant approach.  I like the method name getCommonPrefix and the idea of adding the indexOfDifference(String[]) method.  However, I would like to add one comment about the indexOfDifference(String[]) calling indexOfDifference(String, String).  Looking from the performance angle, this may not be an efficient implementation.  

For example, if most of the strings are long and pretty much the same except the last string is very short (or null), all the beginning comparisions are wasted.  Although it is very natural to call indexOfDifference(String, String) from within indexOfDifference(String[]), I would think that it would perform better if we just scan through strings vertically, char by char, and short circuit out once the first difference char is found.

Feel free to disagree and I maybe worrying about performance too much.  :)

--Michael

> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Scott Johnson (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LANG-269?page=comments#action_12419937 ] 

Scott Johnson commented on LANG-269:
------------------------------------

I've attached an updated patch reflecting your idea, Michael.  It will definitely improve performance.

I was also able to eliminate the String.equals() calls in the first loop which will improve performance further.

Thanks again,
Scott



> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch, v2_StringUtils.java.patch, v2_StringUtilsTest.java.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Scott Johnson (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LANG-269?page=comments#action_12420473 ] 

Scott Johnson commented on LANG-269:
------------------------------------

I thought about that when I was naming the methods.  But I wasn't sure a suffix method would be useful.  

If it will provide value, I can create one.  

Scott




> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch, v2_StringUtils.java.patch, v2_StringUtilsTest.java.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Updated: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Scott Johnson (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LANG-269?page=all ]

Scott Johnson updated LANG-269:
-------------------------------

    Attachment: v2_StringUtilsTest.java.patch

> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch, v2_StringUtils.java.patch, v2_StringUtilsTest.java.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Commented: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Scott Johnson (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/LANG-269?page=comments#action_12419930 ] 

Scott Johnson commented on LANG-269:
------------------------------------

Thank you, Michael.

The indexOfDifference(String[]) method initially scans the String array for the shortest String and uses that in the comparisons.  So I don't think there is a performance issue with the scenario where there are a bunch of long strings with a final short string.  In fact, the algorithm should be pretty efficient because it'll compare only few characters of the long strings.

But your idea of comparing the first character in each string then the second, third, etc, is more efficient in some scenarios.  For example, if there are a large number of long, similar strings with a few very different strings mixed in.  This algorithm will reduce the number of comparisons in that case.

When I get some time this afternoon, I'll give that a try.

Thanks for the suggestion.

Scott



> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Updated: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Henri Yandell (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LANG-269?page=all ]

Henri Yandell updated LANG-269:
-------------------------------

    Fix Version: 3.0

Sounds attractive. It needs to be coded etc, so I've put it in 3.0 for the moment. Patches with unit tests are welcomed so it can go in an earlier release.

> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0

>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[jira] Updated: (LANG-269) Shouldn't Commons Lang's StringUtils have a "common" string method?

Posted by "Scott Johnson (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/LANG-269?page=all ]

Scott Johnson updated LANG-269:
-------------------------------

    Attachment: v2_StringUtils.java.patch

> Shouldn't Commons Lang's StringUtils have a "common" string method?
> -------------------------------------------------------------------
>
>          Key: LANG-269
>          URL: http://issues.apache.org/jira/browse/LANG-269
>      Project: Commons Lang
>         Type: New Feature

>  Environment: generic
>     Reporter: Michael Dang
>      Fix For: 3.0
>  Attachments: StringUtils.patch, v2_StringUtils.java.patch
>
> A method which accepts a string array and returns a string which is the common portion of all the strings starting from the left, or more general form of that.
> This is very common operation.  For example, one may want to find the common directory of a set of path strings.
> passing in:
> "/foo/f1.txt"
> "/foo/moo/f2.txt"
> "/foo/moo/f3.txt"
> should returns "/foo/"
> It is tedious to implement this in every project.  And I think Commons Lang StringUtils should have some methods to help in this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org