You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Al Chou <ho...@yahoo.com> on 2003/11/20 07:21:33 UTC

[lang] unexpected StringUtils.split behavior (was RE: suggestion for new StringUtils.method)

I guess my previous post got lost in the noise, so I'm reposting.  I have two
new StringUtils.split methods that can split a string at occurrences of a
substring rather than splitting at the individual characters in the specified
delimiter string.

While testing, I discovered that my expectations for the behavior of the split(
*, ..., int max ) methods didn't match their actual behavior.  I expected to
get a maximum of "max" substrings, all of which were delimited in the parent
string by the specified delimiters.  Instead, what you get is "max - 1" such
substrings, plus the rest of the parent string as the final result substring. 
This behavior seems counter to what StringTokenizer would do, which is
surprising, given the Javadoc comments about using the split methods as
alternatives to StringTokenizer.

Currently, my tests reflect my expectations for the behavior, and I modified
the existing split( String, String, int ) method to match my expectations.  I
didn't want to submit such a change as a proposed patch without first getting
feedback from the community about whether my expectations are wrong.  I am
happy to submit only code that does not change the behavior of the existing
methods, if need be.


Al


--- Al Chou <ho...@yahoo.com> wrote:
> This thread is a good entree for my question.  I was adding a new
> StringUtils.split method that can split a string using a whole string as the
> delimiter, rather than the characters within that string.  In running my
> JUnit tests, I discovered unexpected behavior in the existing method:
> 
> String stringToSplitOnNulls = "ab   de fg" ;
> String[] splitOnNullExpectedResults = { "ab", "de" } ;
> 
> String[] splitOnNullResults = StringUtils.split( stringToSplitOnNulls, null,
> 2
> ) ;
> assertEquals( splitOnNullExpectedResults.length, splitOnNullResults.length )
> ;
> for ( int i = 0 ; i < splitOnNullExpectedResults.length ; i+= 1 )
> {
>     assertEquals( splitOnNullExpectedResults[i], splitOnNullResults[i] ) ;
> }
> 
> 
> The result of the split call is
> 
> "ab", "de fg"
> 
> and it doesn't look to me like StringTokenizer's documentation implies this
> behavior....
> 
> 
> Al
> 
> =====
> Albert Davidson Chou
> 
>     Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] unexpected StringUtils.split behavior (was RE: suggestion for new StringUtils.method)

Posted by Al Chou <ho...@yahoo.com>.
Hi, Phil,

I thought no one would ever ask, and I was sitting here modifying my code to
conform to the existing tests so that I could at least submit my two new
methods.  I'll open a ticket with a patch for those, then open a ticket for my
proposed change in behavior, plus patches to split( String, String, int ) and
my split( String, String, boolean, int ) to implement the change.

Thanks,
Al


--- Phil Steitz <ph...@steitz.com> wrote:
> Al Chou wrote:
> 
> > 
> > While testing, I discovered that my expectations for the behavior of the
> split(
> > *, ..., int max ) methods didn't match their actual behavior.  I expected
> to
> > get a maximum of "max" substrings, all of which were delimited in the
> parent
> > string by the specified delimiters.  Instead, what you get is "max - 1"
> such
> > substrings, plus the rest of the parent string as the final result
> substring.
> > This behavior seems counter to what StringTokenizer would do, which is
> > surprising, given the Javadoc comments about using the split methods as
> > alternatives to StringTokenizer.
> > 
> 
> Can you open a Bugzilla ticket and attach a test case that shows the 
> problem and a patch that shows how you think it should be fixed?
> 
> Phil

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] unexpected StringUtils.split behavior (was RE: suggestion for new StringUtils.method)

Posted by Phil Steitz <ph...@steitz.com>.
Al Chou wrote:

> 
> While testing, I discovered that my expectations for the behavior of the split(
> *, ..., int max ) methods didn't match their actual behavior.  I expected to
> get a maximum of "max" substrings, all of which were delimited in the parent
> string by the specified delimiters.  Instead, what you get is "max - 1" such
> substrings, plus the rest of the parent string as the final result substring.
> This behavior seems counter to what StringTokenizer would do, which is
> surprising, given the Javadoc comments about using the split methods as
> alternatives to StringTokenizer.
> 

Can you open a Bugzilla ticket and attach a test case that shows the 
problem and a patch that shows how you think it should be fixed?

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org