You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Alex Chaffee / Purple Technology <gu...@stinky.com> on 2003/03/05 19:25:58 UTC

[lang] StringUtils Questions and Suggestions

In reviewing StringUtils in preparing to integrate my Purpletech code,
I discovered some inconsistencies and came up with the following
questions and suggestions:

* Rename overlayString to overlay (to be consistent with other method
  names, and more concise)

* It would be great if many the methods could be written to work on a
  PrintWriter as well as just creating and returning a string

* Um, what's the difference between mid() and substring()?

* And why isn't strip*() called trim*()?

* chomp() and chop() have slightly (but significantly!) different
  semantics than in Perl.  It would be great if StringUtils behaved in
  line with expectations.

Later -

 - Alex


-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Mid - StringUtils Questions and Suggestions

Posted by Stephen Colebourne <sc...@btopenworld.com>.
> * Um, what's the difference between mid() and substring()?

mid (and left & right) are BASIC inspired methods. They differ from
substring in that
- they use length
- they do not throw index out of bounds exceptions

Stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] chop and chomp

Posted by Morgan Delagrange <md...@yahoo.com>.
--- Alex Chaffee / Purple Technology <gu...@stinky.com>
wrote:
> On Wed, Mar 05, 2003 at 12:59:28PM -0800, Morgan
> Delagrange wrote:
> > 
> > --- Alex Chaffee / Purple Technology
> <gu...@stinky.com>
> > wrote:
> > > 
> > > Perl:
> > > 
> > > chop removes the final character, no matter what
> it
> > > is
> > > 
> > > chomp removes the final character if and only if
> > > it's a newline
> > > (or, technically, the $INPUT_RECORD_SEPARATOR).
> > > 
> > 
> > Technically, that's incorrect.  Perl's chomp
> command
> > deletes all consecutive substrings matching the
> > $INPUT_RECORD_SEPARATOR from the end of the
> string. 
> 
> I admit I'm always confused about these details, so
> let's ask Perl:
> 
> [alex@meat jakarta-commons]$ perl -e '$x = "foo";
> chomp($x); print $x;'
> foo[alex@meat jakarta-commons]$ perl -e '$x =
> "foo\n"; chomp($x); print $x;'
> foo[alex@meat jakarta-commons]$ perl -e '$x =
> "foo\n\n\n\n"; chomp($x); print $x
> foo
> 
> 
> [alex@meat jakarta-commons]$
> 
> So it looks like it only chomps one separator, not
> all.

Oops, you're right.  recollection + weird blue book
wording = error.  :)  I still have a recollection of a
greedier version of chomp, but I may just be smoking
crack.

> Perl also seems to glom \r\n; furthermore, I think
> that's the natural
> expectation in the platform-independent world of
> Java.

Yup.

- Morgan

> -- 
> Alex Chaffee                              
> mailto:alex@jguru.com
> Purple Technology - Code and Consulting   
> http://www.purpletech.com/
> jGuru - Java News and FAQs                
> http://www.jguru.com/alex/
> Gamelan - the Original Java site          
> http://www.gamelan.com/
> Stinky - Art and Angst                    
> http://www.stinky.com/
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-dev-help@jakarta.apache.org
> 


=====
Morgan Delagrange
http://jakarta.apache.org/taglibs
http://jakarta.apache.org/commons
http://axion.tigris.org
http://jakarta.apache.org/watchdog

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] chop and chomp

Posted by Alex Chaffee / Purple Technology <gu...@stinky.com>.
On Wed, Mar 05, 2003 at 12:59:28PM -0800, Morgan Delagrange wrote:
> 
> --- Alex Chaffee / Purple Technology <gu...@stinky.com>
> wrote:
> > 
> > Perl:
> > 
> > chop removes the final character, no matter what it
> > is
> > 
> > chomp removes the final character if and only if
> > it's a newline
> > (or, technically, the $INPUT_RECORD_SEPARATOR).
> > 
> 
> Technically, that's incorrect.  Perl's chomp command
> deletes all consecutive substrings matching the
> $INPUT_RECORD_SEPARATOR from the end of the string. 

I admit I'm always confused about these details, so let's ask Perl:

[alex@meat jakarta-commons]$ perl -e '$x = "foo"; chomp($x); print $x;'
foo[alex@meat jakarta-commons]$ perl -e '$x = "foo\n"; chomp($x); print $x;'
foo[alex@meat jakarta-commons]$ perl -e '$x = "foo\n\n\n\n"; chomp($x); print $x
foo


[alex@meat jakarta-commons]$

So it looks like it only chomps one separator, not all.

Perl also seems to glom \r\n; furthermore, I think that's the natural
expectation in the platform-independent world of Java.

-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] chop and chomp

Posted by Henri Yandell <ba...@generationjava.com>.

On Wed, 5 Mar 2003, Alex Chaffee / Purple Technology wrote:

> On Wed, Mar 05, 2003 at 02:02:17PM -0500, Henri Yandell wrote:
> >
> > > * chomp() and chop() have slightly (but significantly!) different
> > >   semantics than in Perl.  It would be great if StringUtils behaved in
> > >   line with expectations.
> >
> > Yeah, they evolved after I copied them from the php description [having
> > liked them previously in perl].
> >
> > I'm a bit too close to the StringUtils versions now though, care to
> > highlight the differences?
>
> Perl:
>
> chop removes the final character, no matter what it is
>
> chomp removes the final character if and only if it's a newline
> (or, technically, the $INPUT_RECORD_SEPARATOR).
>
>
> Current StringUtils:
>
> chop removes the final character, no matter what it is, and glomming
> \r\n as if it were a single character

This is akin to Ruby's chop and shouldn't be changed I think.

> chopNewline removes the final character if and only if it's a newline
> (glomming \r\n) -- behaving like Perl chomp
>
> chomp removes the last newline *and all succeeding characters*
> (i.e. the last unterminated line)

Bad evolution. My fault. The all-succeeding characters feature is one I've
been using a lot. Given:

From: Fred

I can do:

String head = StringUtils.chomp(hdr, ": ");
String name = StringUtils.prechomp(hdr, ": ");

However, this could also be done by a split into two arrays, which would
look nice in Perl but not in Java:

String[] elemnets = StringUtils.split(hdr, ": ", 2);
String head = elements[0];
String name = elements[1];

> chompLast removes the last character if and only if it's a newline
> (glomming \r\n) -- This is redundant with chopNewline, and matches
> Perl chomp

chompLast(String) is akin to chopNewline, whereas chompLast(String,
String) is more powerful.

> getChomp - since Henri's chomp might delete more than just the
> separator, this returns the portion that got deleted

It also returns the separator, so that getChomp()+chomp() gives the
initial text, ie) getChomp() and chomp() are exact inverses of each other.
I often find this a pain tbh, I want getChomp() to not return the inverse.

> My advice is that Henri's chomp is dangerous, since in Perl chomp is
> quite safe, and just prepares a string by removing potential spurious
> newlines without changing the content at all.
>
> I'd suggest that you consider removing or renaming chomp, chompLast,
> and getChomp, and renaming chopNewline to chomp for Perl
> compatibility.  Maybe we can do a poll and find out if anyone's
> actually using them?

I use prechomp/chomp tons :) I'm not adverse to a different names though,
as I said, they've evolved a fair bit and I ought to have renamed them
before they ended up in Lang.

Hen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] chop and chomp

Posted by Morgan Delagrange <md...@yahoo.com>.
--- Alex Chaffee / Purple Technology <gu...@stinky.com>
wrote:
> 
> Perl:
> 
> chop removes the final character, no matter what it
> is
> 
> chomp removes the final character if and only if
> it's a newline
> (or, technically, the $INPUT_RECORD_SEPARATOR).
> 

Technically, that's incorrect.  Perl's chomp command
deletes all consecutive substrings matching the
$INPUT_RECORD_SEPARATOR from the end of the string. 
Chomp returns the number of substrings deleted, while
chop returns the character deleted.  I believe that
Perl chop magic considers /r/n to be a single
character, but I'm not positive.

- Morgan

=====
Morgan Delagrange
http://jakarta.apache.org/taglibs
http://jakarta.apache.org/commons
http://axion.tigris.org
http://jakarta.apache.org/watchdog

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, more
http://taxes.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[lang] chop and chomp

Posted by Alex Chaffee / Purple Technology <gu...@stinky.com>.
On Wed, Mar 05, 2003 at 02:02:17PM -0500, Henri Yandell wrote:
> 
> > * chomp() and chop() have slightly (but significantly!) different
> >   semantics than in Perl.  It would be great if StringUtils behaved in
> >   line with expectations.
> 
> Yeah, they evolved after I copied them from the php description [having
> liked them previously in perl].
> 
> I'm a bit too close to the StringUtils versions now though, care to
> highlight the differences?

Perl:

chop removes the final character, no matter what it is

chomp removes the final character if and only if it's a newline
(or, technically, the $INPUT_RECORD_SEPARATOR).


Current StringUtils:

chop removes the final character, no matter what it is, and glomming
\r\n as if it were a single character

chopNewline removes the final character if and only if it's a newline
(glomming \r\n) -- behaving like Perl chomp

chomp removes the last newline *and all succeeding characters*
(i.e. the last unterminated line)

chompLast removes the last character if and only if it's a newline
(glomming \r\n) -- This is redundant with chopNewline, and matches
Perl chomp

getChomp - since Henri's chomp might delete more than just the
separator, this returns the portion that got deleted


My advice is that Henri's chomp is dangerous, since in Perl chomp is
quite safe, and just prepares a string by removing potential spurious
newlines without changing the content at all.  

I'd suggest that you consider removing or renaming chomp, chompLast,
and getChomp, and renaming chopNewline to chomp for Perl
compatibility.  Maybe we can do a poll and find out if anyone's
actually using them?


-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] StringUtils Questions and Suggestions

Posted by Alex Chaffee / Purple Technology <gu...@stinky.com>.
On Wed, Mar 05, 2003 at 02:02:17PM -0500, Henri Yandell wrote:
>
> > * It would be great if many the methods could be written to work on a
> >   PrintWriter as well as just creating and returning a string
> 
> Ditto for char[] and StringBuffer. Problem is, how to implement this.
> We either end up with 3 bad performance and 1 good performant class, or we
> have lots of redundant code.


My gut says that the real performance loss is in copying strings
around, so anything we do should endeavor to write the characters once
only.  With that in mind, I suggest we unify implementations to use a
PrintWriter.  Then the clients that want a String can use
StringPrintWriter; the ones that want a char[] can use
CharArrayWriter. 

If there's a client who wants to append to an existing StringBuffer,
then maybe we can write a simple subclass of PrintWriter that accepts
an existing StringBuffer and appends onto it.  Ditto for writing to an
existing char[] (though I've never seen this case in the wild).

That way there's no implementation bifurcation: We would provide
helper methods that wrap these core PrintWriter methods by making a
fresh new StringPrintWriter() and returning its output.

(Oh yeah, I want to submit StringPrintWriter too.  It turns this:

	StringWriter sout = new StringWriter();
	PrintWriter pout = new PrintWriter(sout);
	pout.print("foo");
	pout.flush();             // must flush one...
	return sout.toString();   // and return the other

into this:

	StringPrintWriter out = new StringPrintWriter();
	out.print("foo");
	return out.getString();

Where would that one go?  Note that it's a trivial class:
http://www.purpletech.com/code/src/com/purpletech/util/StringPrintWriterTest.java
)


> > * Um, what's the difference between mid() and substring()?
> 
> It seems to just be that you specify length in mid. Bit too much of
> overkill I agree.

At least mid() should call substring() to avoid code duplication.


> > * And why isn't strip*() called trim*()?
> 
> trim implies whitespace, whereas strip is any character. That was my only
> reason for not naming it the same originally.

It's a bit confusing.  Not a huge problem.  My first impression of the
difference was that strip would work on the whole string, where trim
works on the edges; that turned out not to be so.


-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] StringUtils Questions and Suggestions

Posted by Henri Yandell <ba...@generationjava.com>.

On Wed, 5 Mar 2003, Alex Chaffee / Purple Technology wrote:

>
> In reviewing StringUtils in preparing to integrate my Purpletech code,
> I discovered some inconsistencies and came up with the following
> questions and suggestions:
>
> * Rename overlayString to overlay (to be consistent with other method
>   names, and more concise)

+1 [the Apache yea].

Will deprecate this in the next release.

> * It would be great if many the methods could be written to work on a
>   PrintWriter as well as just creating and returning a string

Ditto for char[] and StringBuffer. Problem is, how to implement this.
We either end up with 3 bad performance and 1 good performant class, or we
have lots of redundant code.

> * Um, what's the difference between mid() and substring()?

It seems to just be that you specify length in mid. Bit too much of
overkill I agree.

> * And why isn't strip*() called trim*()?

trim implies whitespace, whereas strip is any character. That was my only
reason for not naming it the same originally.

> * chomp() and chop() have slightly (but significantly!) different
>   semantics than in Perl.  It would be great if StringUtils behaved in
>   line with expectations.

Yeah, they evolved after I copied them from the php description [having
liked them previously in perl].

I'm a bit too close to the StringUtils versions now though, care to
highlight the differences?

Thanks,

Hen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org