You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Alex Chaffee / Purple Technology <gu...@stinky.com> on 2003/03/05 19:24:27 UTC

[lang] Adding Purple to StringUtils

For many years, I've published my personal source code libraries as
open source.  By far the most heavily-downloaded class was Utils.java,
containing my string processing routines.  I'm psyched that Jakarta
Commons now exists, and I'd like to donate my code to
Lang.StringUtils.

You can see the code at http://www.purpletech.com/code, specifically
http://www.purpletech.com/code/src/com/purpletech/util/Utils.java and
http://www.purpletech.com/code/src/com/purpletech/util/UtilsTest.java
(a set of unit tests that may help clarify the usage of the API.

I'll list these independently, so we can start haggling over 

1. yea or nay
2. naming
3. API / method signature
4. appropriate package (if lang.StringUtils is not the right place)

for each in turn.  Naturally, I'm open to any negotiation; the methods
I'd really like to lobby for are htmlescape and strdiffVerbose.

 - Alex


* isWhitespace(String str)

Returns true if the str contains only whitespace characters.

Fills a gap in the isAlphanumeric family.  (Funny that isWhitespace is
the only one of the bunch that I had occasion to write!)


* abbreviate(String str, int max)

Turn "Now is the time for all good men" into "Now is the time for..."

Specifically:

If str is less than max characters long, return it.
Else abbreviate it to (substring(str, 0, max-3) + "...").
If max is less than 3, throw an IllegalArgumentException.
In no case will it return a string of length greater than max.


* String strdiff(String s1, String s2)

Compare two strings, and return the portion where they differ.  (More
precisely, return the remainder of the second string, starting from
where it's different from the first.)  
e.g. strdiff("i am a machine", "i am a robot") -> "robot"


* int strdiffat(String s1, String s2)

Compare two strings, and return the index at which the strings begin
to diverge.
E.g. strdiff("i am a machine", "i am a robot") -> 7<p>
return -1 if they are the same


* String strdiffVerbose(String expected, String actual)

Compare two strings, and return a verbose description of how
they differ. Shows a window around the location to provide
context.  

E.g. strdiffVerbose("i am a robot", "i am a machine") might return a
string containing

strings differ at character 7
Expected: ...am a robot
  Actual: ...am a machine

This was developed in order to provide some sanity to JUnit's
assertEquals display.


* String rtrim(String orig)
* String ltrim(String orig)

Trim the whitespace off the right or left side of a String
only. (Probably rename trimRight and trimLeft.)


* int maxLength(Iterator i)

return the length of the longest string in i.  If i contains other
than strings, uses toString() value.


* htmlescape(String)     (should rename: escapeHtml)
* htmlunescape(String)   (should rename: unescapeHtml)

Turns funky characters into HTML entity equivalents (and vice versa).
This is far and away the most-used function I have.  There is
definitely a need for it to be in Commons.  Apparently there's some
debate as to whether it's appropriate for StringUtils.

My code supports all the named entities I know about. See 
http://hotwired.lycos.com/webmonkey/reference/special_characters/


* String jsEscape    (should rename: escapeJavaScript)

JavaScript's string escaping is nearly identical to Java's, and it's
sort of a no-brainer to support it as well as Java.  (In JS, a single
quote is escaped; in Java it's not.)


* uncurlQuotes(String)

Turn Windows and Mac curly-quotes into their non-curly ASCII equivalents.


* capitalize

Properly-spelled version of capitalise :-)


* lowerize

Symmetrical with capitalize. Turn the first character into a
lower-case letter.


* pluralize

turn String s into a plural English noun (doing the right thing with
"story" -> "stories" and "mess" -> "messes")


* toUnderscore

Converts camelCaseVersusC to camel_case_versus_c 

I'd love to get some ideas on a better name for this.  Also, it'd be
nice to have a full, symmetrical set of camelCase vs. under_score
vs. CONSTANT_NAMING vs. "separate words" naming style converters.


* getStackTrace(Throwable t)

Convenience method to get a stack trace as a string:
        StringWriter s = new StringWriter();
        PrintWriter p = new PrintWriter(s);
        t.printStackTrace(p);
        p.close();
        return s.toString();


-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Adding Purple to StringUtils

Posted by Alex Chaffee / Purple Technology <gu...@stinky.com>.
On Wed, Mar 05, 2003 at 01:52:42PM -0500, Henri Yandell wrote:
> 
> > * String strdiffVerbose(String expected, String actual)
> >
> > Compare two strings, and return a verbose description of how
> > they differ. Shows a window around the location to provide
> > context.
> >
> > E.g. strdiffVerbose("i am a robot", "i am a machine") might return a
> > string containing
> >
> > strings differ at character 7
> > Expected: ...am a robot
> >   Actual: ...am a machine
> >
> > This was developed in order to provide some sanity to JUnit's
> > assertEquals display.
> 
> Nay. It's not a 'util' but an applicatin on top of the utils.

I concur.  

I was thinking about how to separate model from view, and it occured
to me that maybe it should return a StringDiff object, containing
accessors for

	String expected
	String actual
	int differenceStartsAtCharacter
	String windowedExpected
	String windowedActual
	String verboseString

In any case, we can table it for now.

Ooh, just had another thought: maybe abbreviate() should have a
variant where it makes a window -- give it the start and the desired
width, and it puts ellipses at the start and/or at the end as
appropriate.  And/or one where it puts the ellipses at the start
only...


> > * String rtrim(String orig)
> > * String ltrim(String orig)
> >
> > Trim the whitespace off the right or left side of a String
> > only. (Probably rename trimRight and trimLeft.)
> 
> stripStart and stripEnd I believe.

Again, I don't yet concur with your strip/trim distinction, but it's
not that big a deal.


> map( StringUtils.strip, array )
> 
> though the above has a touch of max logic to it. I think it is silly for
> us to add these. [quick note to implement 'map' in Java].

Quick note to add function pointers to Java :-)


> > * htmlescape(String)     (should rename: escapeHtml)
> > * htmlunescape(String)   (should rename: unescapeHtml)
> 
> Definitely not going to get in StringUtils. It is a very useful function
> though.

Can you suggest where in Commons it might belong?  This is the one I
really want to get into a wider distribution.


> > * pluralize
> >
> > turn String s into a plural English noun (doing the right thing with
> > "story" -> "stories" and "mess" -> "messes")
> 
> Does it handle all english though? ie) It will fail on pluralising abacus
> etc.

No, it doesn't do proper stemming.  I put it in when writing a code
generator that needed a way to get names for functions dealing with
collections.  I don't think I've used it since.  Not a high priority,
though it's just sooo cute! :-)



> > * toUnderscore
> >
> > Converts camelCaseVersusC to camel_case_versus_c
> >
> > I'd love to get some ideas on a better name for this.  Also, it'd be
> > nice to have a full, symmetrical set of camelCase vs. under_score
> > vs. CONSTANT_NAMING vs. "separate words" naming style converters.
> 
> Hmm. Probably a bit specific for StringUtils. Do people often need this?

This was another code generator utility.  Not a high priority, but I
think it fits in with the capitali[zs]e family.


-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [lang] Summarising Purple Was: [lang] Adding Purple to StringUtils

Posted by Steven Caswell <st...@caswell.name>.



> -----Original Message-----
> From: Alex Chaffee / Purple Technology [mailto:guru@stinky.com] 
> Sent: Sunday, March 09, 2003 4:25 PM
> To: 'Jakarta Commons Developers List'
> Subject: Re: [lang] Summarising Purple Was: [lang] Adding 
> Purple to StringUtils
> 
> 
> On Sun, Mar 09, 2003 at 04:05:09PM -0500, Steven Caswell wrote:
> > > I think "bisect" is good since it explicitly means "two
> > > parts" rather than "split" which returns many parts.
> > 
> > Wouldn't "removeFromLast" describe the action more succiently than 
> > "bisect" or "divide"?
> 
> "from" is ambiguous... Which is clearer:
> 
>   removeAfterLast("my dog has dog fleas", "dog") -> "my dog has "
>   removeBeforeLast("my dog has dog fleas", "dog") -> " fleas" or
>   bisectBeforeLast("my dog has dog fleas", "dog") -> "my dog has "
>   bisectAfterLast("my dog has dog fleas", "dog") -> " fleas"
> 
> Or is "split" really the right root after all?

Or even
truncateAfterLast(String)
truncateBeforeLast(String)

since isn't that what is really happening?

For some reason I'm uneasy about bisect. But your point about being in
naming hell is appropriate.
So yeah, we should pick something close to the purpose and reserve the right
to change.

> 
> (This is the sort of tough naming decision that becomes clear 
> only after one has used the API in the wild for a while, and 
> (these days) renamed it a few times with a refactoring 
> tool...  So maybe we should just pick one and reserve the 
> right to change our mind before the first release.)
> 
> > > * toUnderscoreName, toCamelCaseName
> > > 
> > I still think these are more functionality than is intended in 
> > StringUtils. Would it make sense to put them into a 
> > StringConvertUtils?
> 
> But there are many "conversion" routines already in 
> StringUtils, so this would create an arbitrary and confusing 
> disjunction (confusing to those trying to figure out in which 
> class a certain method lies).

A good point, and since I don't have much more of an argument, I'll let it
go. I'm fuzzy on where the line is anyway. I just don't want to see the API
to get too out of control.

> 
> 
> -- 
> Alex Chaffee                               mailto:alex@jguru.com
> Purple Technology - Code and Consulting    http://www.purpletech.com/
> jGuru - Java News and FAQs                 http://www.jguru.com/alex/
> Gamelan - the Original Java site           http://www.gamelan.com/
> Stinky - Art and Angst                     http://www.stinky.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 

Steven Caswell
steven@caswell.name
a.k.a Mungo Knotwise of Michel Delving
"One ring to rule them all, one ring to find them..."



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Summarising Purple Was: [lang] Adding Purple to StringUtils

Posted by Alex Chaffee / Purple Technology <gu...@stinky.com>.
On Sun, Mar 09, 2003 at 04:05:09PM -0500, Steven Caswell wrote:
> > I think "bisect" is good since it explicitly means "two 
> > parts" rather than "split" which returns many parts.
> 
> Wouldn't "removeFromLast" describe the action more succiently than
> "bisect" or "divide"?

"from" is ambiguous... Which is clearer:

  removeAfterLast("my dog has dog fleas", "dog") -> "my dog has "
  removeBeforeLast("my dog has dog fleas", "dog") -> " fleas"
or
  bisectBeforeLast("my dog has dog fleas", "dog") -> "my dog has "
  bisectAfterLast("my dog has dog fleas", "dog") -> " fleas"

Or is "split" really the right root after all?

(This is the sort of tough naming decision that becomes clear only
after one has used the API in the wild for a while, and (these days)
renamed it a few times with a refactoring tool...  So maybe we should
just pick one and reserve the right to change our mind before the
first release.)

> > * toUnderscoreName, toCamelCaseName
> > 
> I still think these are more functionality than is intended in StringUtils.
> Would it make sense to put them into a StringConvertUtils?

But there are many "conversion" routines already in StringUtils, so
this would create an arbitrary and confusing disjunction (confusing to
those trying to figure out in which class a certain method lies).


-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [lang] Summarising Purple Was: [lang] Adding Purple to StringUtils

Posted by Steven Caswell <st...@caswell.name>.
See below...

> -----Original Message-----
> From: Alex Chaffee / Purple Technology [mailto:guru@stinky.com] 
> Sent: Sunday, March 09, 2003 3:09 PM
> To: Jakarta Commons Developers List
> Subject: Re: [lang] Summarising Purple Was: [lang] Adding 
> Purple to StringUtils
> 
> 
> * abbreviate
>  
> OK!  How about I add this one myself?  I'm a Tomcat 
> committer; whom do I ask to add me to the karma list for 
> commons?  (I think I heard that Tomcat committers don't need 
> a confirmation vote.)
> 
> 
> * integrate truncateNicely and abbreviate
> 
> This I'd have to do research on; probably should wait until
> abbreviate() is done.
> 
> 
> * differentAt, differentText
> 
> I like these names.  Another suggestion is to use 
> "difference" since that describes the return value (and 
> "different" implies it's returning a boolean, like "is 
> different" vs. "get the difference").
> 
> So can I get a +1 on:
> 	public String difference(String s1, String s2)
> 	public int differenceAt(String s1, String s2)
> 

+1, for the reasons enumerated

> 
> * Change chomp to match perl
> 
> I can do this too.
> 
> 
> * Rename current chomp
> 
> I've been getting in the habit of restricting the use of 
> "get" in method names (properties only).  So I prefer a verb 
> like "bisect" or "divide" rather than "getAfterFirst" and such.  
> 
> I think "bisect" is good since it explicitly means "two 
> parts" rather than "split" which returns many parts.

Wouldn't "removeFromLast" describe the action more succiently than "bisect"
or "divide"?

> 
> 
> * Change getPrechomp [aka getAfterFirst or bisectAfter] to 
> not return the separator.
> 
> That sounds fine; how do we handle deprecation warnings for 
> this and for chomp?  (In case people rely on the old behavior.)
> 
> 
> * escapeHtml, unescapeHtml 
> * escapeJava, escapeJavaScript
> * future: escapeSql, escapeXml, unescape*
> 
> Henri suggested to make StringEscapeUtils.java to hold all 
> these.  I like that a lot.  Any more +1s?

+1. I like the separation.

> 
> As for deprecation, the only method that would need to be 
> deprecated is StringUtils.escape.  Since Java escaping is the 
> natural thing you'd expect StringUtils to do, it makes sense 
> to leave it in there as StringUtils.escape.  So I propose 
> leaving that as it is, and instead just making it call 
> StringEscapeUtils.escapeJava.
> 

+1, because it avoids deprecation

> 
> * uncurlQuotes
> 
> There was no vote on this.  It's not escaping, but 
> converting, so I think it belongs in the base StringUtils.  
> Any objections?
> 
> 
> * toUnderscoreName, toCamelCaseName
> 
> This is not escaping, so it doesn't belong in 
> StringEscapeUtils.  I'd be happy to put them in StringUtils...
> 
> ... especially since I just realized that these two methods, 
> in combination with upperCase and lowerCase and and 
> replace(s, "_", " ") and uncapitalise and capitaliseAllWords, 
> can be used to convert between all of
> 
> 	yo mama
> 	Yo Mama
> 	Yo_Mama
> 	yo_mama
> 	YO_MAMA
> 	YoMama
> 	yoMama
> 
> The only question now is how many convenience methods we want 
> to clutter the API with.

I still think these are more functionality than is intended in StringUtils.
Would it make sense to put them into a StringConvertUtils?

> 
> 
> 
> -- 
> Alex Chaffee                               mailto:alex@jguru.com
> Purple Technology - Code and Consulting    http://www.purpletech.com/
> jGuru - Java News and FAQs                 http://www.jguru.com/alex/
> Gamelan - the Original Java site           http://www.gamelan.com/
> Stinky - Art and Angst                     http://www.stinky.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 

Steven Caswell
steven@caswell.name
a.k.a Mungo Knotwise of Michel Delving
"One ring to rule them all, one ring to find them..."




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Summarising Purple Was: [lang] Adding Purple to StringUtils

Posted by Alex Chaffee / Purple Technology <gu...@stinky.com>.
* abbreviate
 
OK!  How about I add this one myself?  I'm a Tomcat committer; whom do
I ask to add me to the karma list for commons?  (I think I heard that
Tomcat committers don't need a confirmation vote.)


* integrate truncateNicely and abbreviate

This I'd have to do research on; probably should wait until
abbreviate() is done.


* differentAt, differentText

I like these names.  Another suggestion is to use "difference" since
that describes the return value (and "different" implies it's
returning a boolean, like "is different" vs. "get the difference").

So can I get a +1 on:
	public String difference(String s1, String s2)
	public int differenceAt(String s1, String s2)


* Change chomp to match perl

I can do this too.


* Rename current chomp

I've been getting in the habit of restricting the use of "get" in
method names (properties only).  So I prefer a verb like "bisect" or
"divide" rather than "getAfterFirst" and such.  

I think "bisect" is good since it explicitly means "two parts" rather
than "split" which returns many parts.


* Change getPrechomp [aka getAfterFirst or bisectAfter] to not return
the separator.

That sounds fine; how do we handle deprecation warnings for this and
for chomp?  (In case people rely on the old behavior.)


* escapeHtml, unescapeHtml 
* escapeJava, escapeJavaScript
* future: escapeSql, escapeXml, unescape*

Henri suggested to make StringEscapeUtils.java to hold all these.  I
like that a lot.  Any more +1s?

As for deprecation, the only method that would need to be deprecated
is StringUtils.escape.  Since Java escaping is the natural thing you'd
expect StringUtils to do, it makes sense to leave it in there as
StringUtils.escape.  So I propose leaving that as it is, and instead
just making it call StringEscapeUtils.escapeJava.


* uncurlQuotes

There was no vote on this.  It's not escaping, but converting, so I
think it belongs in the base StringUtils.  Any objections?


* toUnderscoreName, toCamelCaseName

This is not escaping, so it doesn't belong in StringEscapeUtils.  I'd
be happy to put them in StringUtils...

... especially since I just realized that these two methods, in
combination with upperCase and lowerCase and and replace(s, "_", " ")
and uncapitalise and capitaliseAllWords, can be used to convert
between all of

	yo mama
	Yo Mama
	Yo_Mama
	yo_mama
	YO_MAMA
	YoMama
	yoMama

The only question now is how many convenience methods we want to
clutter the API with.



-- 
Alex Chaffee                               mailto:alex@jguru.com
Purple Technology - Code and Consulting    http://www.purpletech.com/
jGuru - Java News and FAQs                 http://www.jguru.com/alex/
Gamelan - the Original Java site           http://www.gamelan.com/
Stinky - Art and Angst                     http://www.stinky.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Summarising Purple Was: [lang] Adding Purple to StringUtils

Posted by Henri Yandell <ba...@generationjava.com>.

On Thu, 6 Mar 2003, Stephen Colebourne wrote:

> From: "Henri Yandell" <ba...@generationjava.com>
> > 1) Add abbreviate(), merging StringTaglib truncateNicely functionality in
> > if different.
> +1
>
> > 2) strdiff/strdiffat functionality added, under the names:
> > differentText and differentAt.
> +1
> were these the suggested names?

I may have adjusted a touch :) differsAt and remainingDifference. I
modified the names to have more of a similarity.

> > 3) camelCaseToUnderscore. Consider addition of this method. I'm tempted to
> > think this could go in:  StringEscapeUtils, along with javascript, sql-99,
> > html, xml, java escape methods.
> Is a StringCaseUtils called for? It would involve quite a bit of deprecation
> though.

Possibly. But is this really a case method? It seems more of a conversion
one.


Hen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Summarising Purple Was: [lang] Adding Purple to StringUtils

Posted by Stephen Colebourne <sc...@btopenworld.com>.
From: "Henri Yandell" <ba...@generationjava.com>
> 1) Add abbreviate(), merging StringTaglib truncateNicely functionality in
> if different.
+1

> 2) strdiff/strdiffat functionality added, under the names:
> differentText and differentAt.
+1
were these the suggested names?

> 3) camelCaseToUnderscore. Consider addition of this method. I'm tempted to
> think this could go in:  StringEscapeUtils, along with javascript, sql-99,
> html, xml, java escape methods.
Is a StringCaseUtils called for? It would involve quite a bit of deprecation
though.

Stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


[lang] Summarising Purple Was: [lang] Adding Purple to StringUtils

Posted by Henri Yandell <ba...@generationjava.com>.
So, summarising the purple thread:


1) Add abbreviate(), merging StringTaglib truncateNicely functionality in
if different.

2) strdiff/strdiffat functionality added, under the names:
differentText and differentAt.

3) camelCaseToUnderscore. Consider addition of this method. I'm tempted to
think this could go in:  StringEscapeUtils, along with javascript, sql-99,
html, xml, java escape methods.

4) Changes to chomp. Basically write a new chomp with correct behaviour.
Change prechomp->getBeforeFirst, chomp->getAfterLast, and other methods to
fit. Change the getPrechomp [aka getAfterFirst] to not return the
separator.


Have I missed anything Alex et al?

Hen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [lang] Adding Purple to StringUtils

Posted by Steven Caswell <st...@caswell.name>.
Yeah, I can see the use in code generation. I was +1ing mostly to Henri's
comment that it seems a bit specific for StringUtils.


Steven Caswell
steven@caswell.name
a.k.a Mungo Knotwise of Michel Delving
"One ring to rule them all, one ring to find them..."


> -----Original Message-----
> From: Steve Downey [mailto:steve.downey@geowealth.com] 
> Sent: Wednesday, March 05, 2003 9:06 PM
> To: Jakarta Commons Developers List; steven@caswell.name
> Subject: Re: [lang] Adding Purple to StringUtils
> 
> 
> From: "Steven Caswell" <st...@caswell.name>
> To: "'Jakarta Commons Developers List'" 
> <co...@jakarta.apache.org>; <al...@jguru.com>
> Sent: Wednesday, March 05, 2003 4:54 PM
> Subject: RE: [lang] Adding Purple to StringUtils
> 
> 
> > I mostly agree with Henri's comments. Just a couple of additional 
> > comments thrown in...
> >
> >
> 
> <snip/>
> 
> > > > * toUnderscore
> > > >
> > > > Converts camelCaseVersusC to camel_case_versus_c
> > > >
> > > > I'd love to get some ideas on a better name for this.
> > > Also, it'd be
> > > > nice to have a full, symmetrical set of camelCase vs.
> > > under_score vs.
> > > > CONSTANT_NAMING vs. "separate words" naming style converters.
> > >
> > > Hmm. Probably a bit specific for StringUtils. Do people 
> often need 
> > > this?
> >
> > +1 to Henri's comment. Seems beyond StringUtils scope.
> >
> 
> Well, if you do any kind of code generation, you do this kind 
> of thing all the time. SQL developers seem to avoid 
> CamelCase, relying on underscores instead. CONSTANT_NAMING 
> seems to be endemic in SQL, in particular. It's at least as 
> useful as Character.isJavaIdentifierStart() and 
> Character.isJavaIdentifierPart().
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Adding Purple to StringUtils

Posted by Steve Downey <st...@geowealth.com>.
From: "Steven Caswell" <st...@caswell.name>
To: "'Jakarta Commons Developers List'" <co...@jakarta.apache.org>;
<al...@jguru.com>
Sent: Wednesday, March 05, 2003 4:54 PM
Subject: RE: [lang] Adding Purple to StringUtils


> I mostly agree with Henri's comments. Just a couple of additional comments
> thrown in...
>
>

<snip/>

> > > * toUnderscore
> > >
> > > Converts camelCaseVersusC to camel_case_versus_c
> > >
> > > I'd love to get some ideas on a better name for this.
> > Also, it'd be
> > > nice to have a full, symmetrical set of camelCase vs.
> > under_score vs.
> > > CONSTANT_NAMING vs. "separate words" naming style converters.
> >
> > Hmm. Probably a bit specific for StringUtils. Do people often
> > need this?
>
> +1 to Henri's comment. Seems beyond StringUtils scope.
>

Well, if you do any kind of code generation, you do this kind of thing all
the time. SQL developers seem to avoid CamelCase, relying on underscores
instead. CONSTANT_NAMING seems to be endemic in SQL, in particular. It's at
least as useful as Character.isJavaIdentifierStart() and
Character.isJavaIdentifierPart().


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [lang] Adding Purple to StringUtils

Posted by Steven Caswell <st...@caswell.name>.
I mostly agree with Henri's comments. Just a couple of additional comments
thrown in...


Steven Caswell
steven@caswell.name
a.k.a Mungo Knotwise of Michel Delving
"One ring to rule them all, one ring to find them..."


> -----Original Message-----
> From: Henri Yandell [mailto:bayard@generationjava.com] 
> Sent: Wednesday, March 05, 2003 1:53 PM
> To: Jakarta Commons Developers List; alex@jguru.com
> Subject: Re: [lang] Adding Purple to StringUtils
> 
> 
> 
> 
> On Wed, 5 Mar 2003, Alex Chaffee / Purple Technology wrote:
> 
> > I'll list these independently, so we can start haggling over
> >
> > 1. yea or nay
> > 2. naming
> > 3. API / method signature
> > 4. appropriate package (if lang.StringUtils is not the right place)
> 
> Seems good.
> 
> > for each in turn.  Naturally, I'm open to any negotiation; 
> the methods 
> > I'd really like to lobby for are htmlescape and strdiffVerbose.
> >
> >  - Alex
> >
> > * isWhitespace(String str)
> >
> > Returns true if the str contains only whitespace characters.
> >
> > Fills a gap in the isAlphanumeric family.  (Funny that 
> isWhitespace is 
> > the only one of the bunch that I had occasion to write!)
> 
> Already there. I suspect it's just not in the 1.0 release.
> 
> > * abbreviate(String str, int max)
> >
> > Turn "Now is the time for all good men" into "Now is the 
> time for..."
> >
> > Specifically:
> >
> > If str is less than max characters long, return it.
> > Else abbreviate it to (substring(str, 0, max-3) + "...").
> > If max is less than 3, throw an IllegalArgumentException.
> > In no case will it return a string of length greater than max.
> 
> String Taglib has a truncateNicely method which is similar to 
> this. I like the abbreviate name and am yea on this, as long 
> as it incorporates trunc-nicely [STATUS.html has a note to 
> add truncateNicely]
> 
> > * String strdiff(String s1, String s2)
> >
> > Compare two strings, and return the portion where they 
> differ.  (More 
> > precisely, return the remainder of the second string, starting from 
> > where it's different from the first.) e.g. strdiff("i am a 
> machine", 
> > "i am a robot") -> "robot"
> >
> > * int strdiffat(String s1, String s2)
> >
> > Compare two strings, and return the index at which the 
> strings begin 
> > to diverge. E.g. strdiff("i am a machine", "i am a robot") -> 7<p>
> > return -1 if they are the same
> 
> yea on both of these.

I'd rename them so the names better indicate the functionality. How about
remainingDifference and differsAt


> 
> >
> > * String strdiffVerbose(String expected, String actual)
> >
> > Compare two strings, and return a verbose description of how they 
> > differ. Shows a window around the location to provide context.
> >
> > E.g. strdiffVerbose("i am a robot", "i am a machine") might 
> return a 
> > string containing
> >
> > strings differ at character 7
> > Expected: ...am a robot
> >   Actual: ...am a machine
> >
> > This was developed in order to provide some sanity to JUnit's 
> > assertEquals display.
> 
> Nay. It's not a 'util' but an applicatin on top of the utils.
> 
> > * String rtrim(String orig)
> > * String ltrim(String orig)
> >
> > Trim the whitespace off the right or left side of a String only. 
> > (Probably rename trimRight and trimLeft.)
> 
> stripStart and stripEnd I believe.
> 
> > * int maxLength(Iterator i)
> >
> > return the length of the longest string in i.  If i contains other 
> > than strings, uses toString() value.
> 
> Not convinced. StringUtils has a stripAll(String[]), which is 
> also a bit daft. Really we're just duplicating every method 
> to get a map() function.
> 
> map( StringUtils.strip, array )
> 
> though the above has a touch of max logic to it. I think it 
> is silly for us to add these. [quick note to implement 'map' in Java].
> 
> > * htmlescape(String)     (should rename: escapeHtml)
> > * htmlunescape(String)   (should rename: unescapeHtml)
> >
> > Turns funky characters into HTML entity equivalents (and 
> vice versa). 
> > This is far and away the most-used function I have.  There is 
> > definitely a need for it to be in Commons.  Apparently there's some 
> > debate as to whether it's appropriate for StringUtils.
> 
> Definitely not going to get in StringUtils. It is a very 
> useful function though.
> 
> > My code supports all the named entities I know about. See 
> > http://hotwired.lycos.com/webmonkey/reference/special_characters/
> >
> >
> > * String jsEscape    (should rename: escapeJavaScript)
> >
> > JavaScript's string escaping is nearly identical to Java's, 
> and it's 
> > sort of a no-brainer to support it as well as Java.  (In 
> JS, a single 
> > quote is escaped; in Java it's not.)
> >
> > * uncurlQuotes(String)
> >
> > Turn Windows and Mac curly-quotes into their non-curly ASCII 
> > equivalents.
> >
> >
> > * capitalize
> >
> > Properly-spelled version of capitalise :-)
> 
> Hey, I offered to rename it :) The general consensus was that 
> there was no reason to enforce such spelling.
> 
> > * lowerize
> >
> > Symmetrical with capitalize. Turn the first character into a 
> > lower-case letter.
> 
> uncapitalize :) Not as amusing as lowerize, but same feature.
> 
> > * pluralize
> >
> > turn String s into a plural English noun (doing the right 
> thing with 
> > "story" -> "stories" and "mess" -> "messes")
> 
> Does it handle all english though? ie) It will fail on 
> pluralising abacus etc.

Seems to me to be beyond the scope of StringUtils

> 
> > * toUnderscore
> >
> > Converts camelCaseVersusC to camel_case_versus_c
> >
> > I'd love to get some ideas on a better name for this.  
> Also, it'd be 
> > nice to have a full, symmetrical set of camelCase vs. 
> under_score vs. 
> > CONSTANT_NAMING vs. "separate words" naming style converters.
> 
> Hmm. Probably a bit specific for StringUtils. Do people often 
> need this?

+1 to Henri's comment. Seems beyond StringUtils scope.

> 
> > * getStackTrace(Throwable t)
> >
> > Convenience method to get a stack trace as a string:
> >         StringWriter s = new StringWriter();
> >         PrintWriter p = new PrintWriter(s);
> >         t.printStackTrace(p);
> >         p.close();
> >         return s.toString();
> 
> This ought to already exist in ExceptionUtils. It made the 
> journey out of StringUtils early on.
> 
> Hen
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [lang] Adding Purple to StringUtils

Posted by Henri Yandell <ba...@generationjava.com>.

On Wed, 5 Mar 2003, Alex Chaffee / Purple Technology wrote:

> I'll list these independently, so we can start haggling over
>
> 1. yea or nay
> 2. naming
> 3. API / method signature
> 4. appropriate package (if lang.StringUtils is not the right place)

Seems good.

> for each in turn.  Naturally, I'm open to any negotiation; the methods
> I'd really like to lobby for are htmlescape and strdiffVerbose.
>
>  - Alex
>
> * isWhitespace(String str)
>
> Returns true if the str contains only whitespace characters.
>
> Fills a gap in the isAlphanumeric family.  (Funny that isWhitespace is
> the only one of the bunch that I had occasion to write!)

Already there. I suspect it's just not in the 1.0 release.

> * abbreviate(String str, int max)
>
> Turn "Now is the time for all good men" into "Now is the time for..."
>
> Specifically:
>
> If str is less than max characters long, return it.
> Else abbreviate it to (substring(str, 0, max-3) + "...").
> If max is less than 3, throw an IllegalArgumentException.
> In no case will it return a string of length greater than max.

String Taglib has a truncateNicely method which is similar to this. I like
the abbreviate name and am yea on this, as long as it incorporates
trunc-nicely [STATUS.html has a note to add truncateNicely]

> * String strdiff(String s1, String s2)
>
> Compare two strings, and return the portion where they differ.  (More
> precisely, return the remainder of the second string, starting from
> where it's different from the first.)
> e.g. strdiff("i am a machine", "i am a robot") -> "robot"
>
> * int strdiffat(String s1, String s2)
>
> Compare two strings, and return the index at which the strings begin
> to diverge.
> E.g. strdiff("i am a machine", "i am a robot") -> 7<p>
> return -1 if they are the same

yea on both of these.

>
> * String strdiffVerbose(String expected, String actual)
>
> Compare two strings, and return a verbose description of how
> they differ. Shows a window around the location to provide
> context.
>
> E.g. strdiffVerbose("i am a robot", "i am a machine") might return a
> string containing
>
> strings differ at character 7
> Expected: ...am a robot
>   Actual: ...am a machine
>
> This was developed in order to provide some sanity to JUnit's
> assertEquals display.

Nay. It's not a 'util' but an applicatin on top of the utils.

> * String rtrim(String orig)
> * String ltrim(String orig)
>
> Trim the whitespace off the right or left side of a String
> only. (Probably rename trimRight and trimLeft.)

stripStart and stripEnd I believe.

> * int maxLength(Iterator i)
>
> return the length of the longest string in i.  If i contains other
> than strings, uses toString() value.

Not convinced. StringUtils has a stripAll(String[]), which is also a bit
daft. Really we're just duplicating every method to get a map() function.

map( StringUtils.strip, array )

though the above has a touch of max logic to it. I think it is silly for
us to add these. [quick note to implement 'map' in Java].

> * htmlescape(String)     (should rename: escapeHtml)
> * htmlunescape(String)   (should rename: unescapeHtml)
>
> Turns funky characters into HTML entity equivalents (and vice versa).
> This is far and away the most-used function I have.  There is
> definitely a need for it to be in Commons.  Apparently there's some
> debate as to whether it's appropriate for StringUtils.

Definitely not going to get in StringUtils. It is a very useful function
though.

> My code supports all the named entities I know about. See
> http://hotwired.lycos.com/webmonkey/reference/special_characters/
>
>
> * String jsEscape    (should rename: escapeJavaScript)
>
> JavaScript's string escaping is nearly identical to Java's, and it's
> sort of a no-brainer to support it as well as Java.  (In JS, a single
> quote is escaped; in Java it's not.)
>
> * uncurlQuotes(String)
>
> Turn Windows and Mac curly-quotes into their non-curly ASCII equivalents.
>
>
> * capitalize
>
> Properly-spelled version of capitalise :-)

Hey, I offered to rename it :) The general consensus was that there was no
reason to enforce such spelling.

> * lowerize
>
> Symmetrical with capitalize. Turn the first character into a
> lower-case letter.

uncapitalize :) Not as amusing as lowerize, but same feature.

> * pluralize
>
> turn String s into a plural English noun (doing the right thing with
> "story" -> "stories" and "mess" -> "messes")

Does it handle all english though? ie) It will fail on pluralising abacus
etc.

> * toUnderscore
>
> Converts camelCaseVersusC to camel_case_versus_c
>
> I'd love to get some ideas on a better name for this.  Also, it'd be
> nice to have a full, symmetrical set of camelCase vs. under_score
> vs. CONSTANT_NAMING vs. "separate words" naming style converters.

Hmm. Probably a bit specific for StringUtils. Do people often need this?

> * getStackTrace(Throwable t)
>
> Convenience method to get a stack trace as a string:
>         StringWriter s = new StringWriter();
>         PrintWriter p = new PrintWriter(s);
>         t.printStackTrace(p);
>         p.close();
>         return s.toString();

This ought to already exist in ExceptionUtils. It made the journey out of
StringUtils early on.

Hen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org