You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by kbennett <kb...@bbsinc.biz> on 2007/09/26 21:32:16 UTC

Apache Commons Lang

Jakarta Commons Lang

I'd like to explain why I like the Apache Commons Lang library so much, and
why I recommend using it more in Tika.  (We already include this library in
our dependencies.)

Commons Lang is for me the most useful external library I use.  It is
extremely helpful in making my source code clearer, more concise, and more
robust.  Commons Lang is so useful and general that it is hard for me to
imagine a project on which I would not want to use it.  In my opinion, it is
a gold mine worthy of being included in Java itself.  That's how highly I
regard it.

Following are samples of the features that have been the most helpful to me.

----

StringUtils:

The StringUtils library fills in a lot of string functionality that is built
directly into other languages.  StringUtils makes Java string handling
slightly less inferior to that of languages such as Python and Ruby.  As per
StringUtils' javadoc, this is the functionality it provides:

IsEmpty/IsBlank - checks if a String contains text
Trim/Strip - removes leading and trailing whitespace
Equals - compares two strings null-safe
IndexOf/LastIndexOf/Contains - null-safe index-of checks
IndexOfAny/LastIndexOfAny/IndexOfAnyBut/LastIndexOfAnyBut - index-of any of
a set of Strings
ContainsOnly/ContainsNone - does String contains only/none of these
characters
Substring/Left/Right/Mid - null-safe substring extractions
SubstringBefore/SubstringAfter/SubstringBetween - substring extraction
relative to other strings
Split/Join - splits a String into an array of substrings and vice versa
Remove/Delete - removes part of a String
Replace/Overlay - Searches a String and replaces one String with another
Chomp/Chop - removes the last part of a String
LeftPad/RightPad/Center/Repeat - pads a String
UpperCase/LowerCase/SwapCase/Capitalize/Uncapitalize - changes the case of a
String
CountMatches - counts the number of occurrences of one String in another
IsAlpha/IsNumeric/IsWhitespace/IsAsciiPrintable - checks the characters in a
String
DefaultString - protects against a null input String
Reverse/ReverseDelimited - reverses a String
Abbreviate - abbreviates a string using ellipsis
Difference - compares two Strings and reports on their differences
LevensteinDistance - the number of changes needed to change one String into
another

It also resolves some of the awkwardness of null handling.  For example,
StringUtils.equals() and ObjectUtils.equals() compare two objects for
equality, where either can be null.  Using these, we don't need to remember
to use "aString".equals(s) instead of s.equals("aString").

---

StringUtils.isNotBlank():

StringUtils.isNotBlank() returns true if and only if a string is non-null,
not empty, and does not contain only whitespace.  Compare this call to:

s != null && s.trim().length() > 0

Because this kind of operation is done so frequently, the minimal amount of
learning and dependency required to use isNotBlank() is (IMO) far outweighed
by the greater code clarity, conciseness, and robustness.

An even worse case is:

s.trim().length() > 0

One might be tempted to use this, thinking that s will never be null. 
However, if someday a null is used, then a null pointer exception is thrown. 
Assuming that in this case null just means {no value} and not {serious
error}, we just caused a runtime exception for no good reason.

----

How many times have we not implemented a toString() method (or an equals()
method, or ...) because it was not worth the time?

For example, sometimes I'm writing a new class, and I'd like it to output
its state during debugging.  Commons Lang reduces the cost of doing so.  The
simplest form is:

public String toString() {
  return ReflectionToStringBuilder.toString(this);
}

Not very efficient for production, but perfectly acceptable for development. 
There are other forms that take a little longer to write but are more
efficient.

----

System properties such as operating system and Java version are made more
convenient and less error prone through the use of static variables such as
SystemUtils.LINE_SEPARATOR (which returns the value itself, not the key).

If you've ever had to determine OS type or Java version by analyzing the
system properties, you'll appreciate that they've done this for you. 
Examples are IS_OS_WINDOWS, JAVA_VERSION_FLOAT, IS_JAVA_1_5. 

----

StringEscapeUtils contains methods that escape and unescape HTML, XML, SQL,
Java, and Javascript.  

----

DateUtils contains helper methods for date arithmetic, etc., and includes
constants such as MILLIS_PER_DAY.

----

WordUtils has methods that do capitalization tasks, word wrap, and initial
generation.

----

- Keith

-- 
View this message in context: http://www.nabble.com/Apache-Commons-Lang-tf4524392.html#a12908120
Sent from the Apache Tika - Development mailing list archive at Nabble.com.