You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2008/08/22 16:20:16 UTC

More thoughts on Java

OK here's a lot more of my notes as I'm run analysis on the code:



new Integer(foo).toString() is better written as Integer.toString(foo)


Integer.parseInt() returns and int; Integer.valueOf() returns an
Integer. Likewise for other primitives. Use the one you need to avoid
an unneeded box or unbox.


Strings should always be compared with equals(), never ==


Never call System.exit(); it's nearly as harsh as "kill -9". Throw an
exception out of main.


StringBuilder is good but I've seen this:
StringBuilder foo = ...;
foo.append("something " + bar + " something else");
.. which rather misses the point of StringBuilder. Append each string
with append()
foo.append("something ").append(bar).append(" something else");


You can append a character like '\t' instead of a one-character String
like "\t". It's faster.


Don't use Vector or Hashtable or Enumeration anymore -- use ArrayList
and HashMap and Iterator, and wrap in Collections.synchronizedList()
et al. if you need the same synchronization semantics.


Javadoc comments start with /** and end with */ or else they aren't
recognized as such. Also I personally feel we should not write empty
javadoc or purely perfunctory javadoc that says nothing beyond what
the method already indicates -- e.g. "Gets foo" for getFoo()


I personally avoid imports using a wildcard (e.g. java.util.*). Though
I think this is a common convention -- not using wildcards -- I won't
change it here myself.


I also like to declare everything final that can be, except method
params and locals, where it makes for too much noise. But I won't
foist that on the whole codebase. Likewise I prefer to make elements
as private as possible.


I think instance fields should never be anything but private. Expose
them with getters.


I also think that System.out/err should never appear in the code, nor
printStackTrace(), with the possible exception of main() methods. Log
exceptions and messages with an SLF4J logger.


We should standardize on SLF4J for logging. I see direct use of
commons logging creeping back in.


toArray() should ideally be called with a right-sized array:
List<String> stuff = ...;
String[] stuffArray = stuff.toArray(new String[stuff.size()]);
Passing a 0-length array works in that it will be discarded and a new
array allocated, but that means the allocation was pointless.


'transient' is rarely needed, and never does anything in a
non-Serializable class.


These are redundancies that can be simplified:

if (someBoolean == true)

to

if (someBoolean)

int value = foo();
return value;

to

return foo();

int foo = 0;
foo = somethingElse();

to

int foo = somethingElse();

Re: More thoughts on Java

Posted by Ted Dunning <te...@gmail.com>.
On Fri, Aug 22, 2008 at 9:59 AM, Sean Owen <sr...@gmail.com> wrote:

>
> One never needs to use this String constructor -- actually I am not
> sure why it exists:
> new String("foo")
> All it does is make a String that != "foo" (though it .equals("foo"))


This can be very important because strings can share structure.  I once had
a program that read a large file as a string and then used a construct like

     map.put(key, new String(s.substring(...)))

because otherwise the map would have a reference to s via the substring.

I agree about the string constructor applied to a constant.

main() can just "throws Exception" -- while normally bad practice to
> expose an API that throws so broad an exception signature, nobody
> should invoke main() in a program.


No.  Don't declare exceptions until they are thrown.  At the least, it
throws off the IDE's.

And mains shouldn't normally exist in this container-ized world.  If they do
exist, then they should be tested.  IF they are tested, then it helps to
know precisely what they might throw.  Besides, this saves you nothing.


> We should use Java-style array syntax instead of C-style: String[] foo
> instead of String foo[]. I always thought the latter was weird and not
> sure why Java preserved it.


this is a religious question without benefit.  You see it as weird.  That is
about all you can say about it.

Leave these alone.  The original author might have the opposite religion.

Another point of syntax: always use { and } even when the body is one
> line? I tend to favor it as it avoids this bug:
> if (...)
>  statement1;
>  statement 2 added sometime later but is not actually in the if statement;
> ... and also because it naturally adds a nearly-blank line after the
> body of the block


This is one of the few religious style issues I feel strong enough about to
change, at least if I have to work on that bit of code.

I would worry about the social consequences of changing all of them.

Where possible I think we should declare variables and arguments by
> interface, like:
> Map map = new HashMap();
> instead of
> HashMap map = new HashMap();


Strong +1.

Re: More thoughts on Java

Posted by Sean Owen <sr...@gmail.com>.
OK while I'm on a roll... how about this stuff? Shout if you object at
all to making changes based on these. I would only proceed with stuff
that seems unanimously acceptable.


One never needs to use this String constructor -- actually I am not
sure why it exists:
new String("foo")
All it does is make a String that != "foo" (though it .equals("foo"))


main() can just "throws Exception" -- while normally bad practice to
expose an API that throws so broad an exception signature, nobody
should invoke main() in a program.


We should use Java-style array syntax instead of C-style: String[] foo
instead of String foo[]. I always thought the latter was weird and not
sure why Java preserved it.


In JUnit, you can use
assertEquals(foo, bar);
rather than
assertTrue(foo.equals(bar))

The upside is it will automatically write an intelligent error message
that highlights the difference between expected and actual.


This:
public void foo() {
  super.foo();
}
doesn't do anything -- can just be removed since the overriding method
only delegates to the superclass anyway.


Also should the copyright statement always be the very very first
thing in the file? I had thought so.


Another point of syntax: always use { and } even when the body is one
line? I tend to favor it as it avoids this bug:
if (...)
  statement1;
  statement 2 added sometime later but is not actually in the if statement;
... and also because it naturally adds a nearly-blank line after the
body of the block


Where possible I think we should declare variables and arguments by
interface, like:
Map map = new HashMap();
instead of
HashMap map = new HashMap();

Re: More thoughts on Java

Posted by Sean Owen <sr...@gmail.com>.
Cool, I will not touch anything that was mentioned in this thread.

On Fri, Aug 22, 2008 at 4:58 PM, Grant Ingersoll <gs...@apache.org> wrote:
> I disagree.  I am more than likely the guilty party here and I'll own up to
> it and even take credit for it.  I don't know about you, but at 2 in the
> morning after staring at the same code for hours trying to find that stupid

I definitely agree about == false, even though I typically still use
the bang syntax myself.

The compiler can't change uses of a one-character String to a
character since it might change the semantics of the program -- would
at least change the bytecode. I think javac can do virtually nothing
interesting.

Re: More thoughts on Java

Posted by Grant Ingersoll <gs...@apache.org>.
Ah, we can all put in our coding pet peeves...  :-)

On Aug 22, 2008, at 10:20 AM, Sean Owen wrote:

> OK here's a lot more of my notes as I'm run analysis on the code:
>
> StringBuilder is good but I've seen this:
> StringBuilder foo = ...;
> foo.append("something " + bar + " something else");
> .. which rather misses the point of StringBuilder. Append each string
> with append()
> foo.append("something ").append(bar).append(" something else");
>
>

Definitely.
>
> Javadoc comments start with /** and end with */ or else they aren't
> recognized as such. Also I personally feel we should not write empty
> javadoc or purely perfunctory javadoc that says nothing beyond what
> the method already indicates -- e.g. "Gets foo" for getFoo()
>

+1

>
> I personally avoid imports using a wildcard (e.g. java.util.*). Though
> I think this is a common convention -- not using wildcards -- I won't
> change it here myself.
>

+1
>
>
> I also think that System.out/err should never appear in the code, nor
> printStackTrace(), with the possible exception of main() methods. Log
> exceptions and messages with an SLF4J logger.
>

+1

>
> We should standardize on SLF4J for logging. I see direct use of
> commons logging creeping back in.
>

+1.  Didn't catch that

>
> toArray() should ideally be called with a right-sized array:
> List<String> stuff = ...;
> String[] stuffArray = stuff.toArray(new String[stuff.size()]);
> Passing a 0-length array works in that it will be discarded and a new
> array allocated, but that means the allocation was pointless.
>
>
> 'transient' is rarely needed, and never does anything in a
> non-Serializable class.
>
>
> These are redundancies that can be simplified:
>
> if (someBoolean == true)
>
> to
>
> if (someBoolean)
>

I disagree.  I am more than likely the guilty party here and I'll own  
up to it and even take credit for it.  I don't know about you, but at  
2 in the morning after staring at the same code for hours trying to  
find that stupid bug that I just can't track down, I find

boolean lNamedVariable = ...
if (!lNamedVariable)

to be much harder to read than:
if (lNamedVariable == false)

which then makes it easier for me to understand exactly what the  
condition is and better decide if that is truly the intent.

and the same goes for == true.  I know _A LOT_ of people disagree with  
that, but for me, I've seen that one bite many people b/c they gloss  
over reading it, especially when the name of the variable starts with  
an "l" or some other character that has a look similar to the  
exclamation point, or their font size is small or their eyes are tired  
or whatever.

Just my two cents,
Grant


Re: More thoughts on Java

Posted by Dean Wampler <de...@aspectprogramming.com>.
On Aug 22, 2008, at 9:47 AM, Karl Wettin wrote:

> ...
>
>> I also like to declare everything final that can be, except method
>> params and locals, where it makes for too much noise. But I won't
>
> I think final only makes sense in superclass methods that are not  
> ment to be extended. There is lots of class, method and field  
> finalization in Lucene that I can't stand. (This is especially true  
> for Document and Term.) It forces me to do stuff like aggregating  
> code when I really want to extend it. It can be rather expensive if  
> you got enough instances to keep track of.
>

Another argument against excessive use of final is that they make  
testing harder. Sometimes it's convenient to write a test-only  
subclass to override a particular method so that it returns stubbed  
behavior, for example.

I'm much more likely to use final for data values, rather than  
methods, to promote more of a functional style of programming, i.e.,  
preferring immutable value objects.

> ...
>>


Dean Wampler, Ph.D.
dean at objectmentor.com
http://www.objectmentor.com
See also:
http://www.aspectprogramming.com  AOP advocacy site
http://aquarium.rubyforge.org     AOP for Ruby
http://www.contract4j.org         Design by Contract for Java5

I want my tombstone to say:
   Unknown Application Error in Dean Wampler.exe.
   Application Terminated.
       [Okay]        [Cancel]







Re: More thoughts on Java

Posted by Sean Owen <sr...@gmail.com>.
Cool, I hear you loud and clear on not marking classes final. Glad we
can come to some clear conclusions about conventions as I am sure the
project will benefit as a result. I would not change anything myself
that wasn't unanimously agreed upon, regardless of my own conclusions.

My only follow up is about array syntax... I don't necessarily see it
as an arbitrary style preference. A String array is a "String[]"; to
write it starting with just "String" seems possibly confusing as you
don't immediately parse that as an array reference. It seems strongly
conventional in Java to not use C-style syntax. That said, I don't see
any instances of this in the code anyway. I mentioned it only because
I saw some arrays declared as "String []" (extra space) and I suppose
that's tiny but seems worth standardizing to "String[]".

On Fri, Aug 22, 2008 at 8:46 PM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> +101. I've had some experiences with final classes where I had to extend
> their functionality. I had to resort to clever class loader tricks to get
> around them e.g. use same package+class name and put them in the war where
> Tomcat would look first before looking into the jar in the lib directory.
>
> There are other pains where a public class depends on a package private
> class for a large amount of functionality. It is very hard to rewrite the
> functionality of the package private class when extending the public class.
> The only easy solution is to either copy/paste the original code or  use the
> above class loader trick.

Re: More thoughts on Java

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Aug 22, 2008 at 11:45 PM, Ted Dunning <te...@gmail.com> wrote:

> On Fri, Aug 22, 2008 at 10:34 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > You are right, I am not sure it makes any significant difference in
> > speed. The only situation where I know it can help is in ProGuard
> > optimization as it can perform inlining as a result. The better
> > argument is just for correctness -- if you aren't sure you set up the
> > class for subclassing, prohibit it.
>
>
> I would come to *exactly* the opposite conclusion.  Until you have *PROVED*
> that the performance is need from exactly that piece of code, don't
> prohibit
> subclassing.


+101. I've had some experiences with final classes where I had to extend
their functionality. I had to resort to clever class loader tricks to get
around them e.g. use same package+class name and put them in the war where
Tomcat would look first before looking into the jar in the lib directory.

There are other pains where a public class depends on a package private
class for a large amount of functionality. It is very hard to rewrite the
functionality of the package private class when extending the public class.
The only easy solution is to either copy/paste the original code or  use the
above class loader trick.

>
>
>
> > But if there are helpful reasons to make classes extendable, leave it so,
> > yes.
>
>
> The strongest reason is that other people are clever and are likely to come
> up with something good.
>
> Interfaces (or abstracts as Karl would prefer) are fine and dandy, but lots
> of stuff isn't worth building the whole framework of classes around, but is
> still worth testing with mocks.
>

+ 101 again. *Please* do not make classes final until you have a very very
compelling reason.

-- 
Regards,
Shalin Shekhar Mangar.

Re: More thoughts on Java

Posted by Ted Dunning <te...@gmail.com>.
On Fri, Aug 22, 2008 at 10:34 AM, Sean Owen <sr...@gmail.com> wrote:

> You are right, I am not sure it makes any significant difference in
> speed. The only situation where I know it can help is in ProGuard
> optimization as it can perform inlining as a result. The better
> argument is just for correctness -- if you aren't sure you set up the
> class for subclassing, prohibit it.


I would come to *exactly* the opposite conclusion.  Until you have *PROVED*
that the performance is need from exactly that piece of code, don't prohibit
subclassing.


> But if there are helpful reasons to make classes extendable, leave it so,
> yes.


The strongest reason is that other people are clever and are likely to come
up with something good.

Interfaces (or abstracts as Karl would prefer) are fine and dandy, but lots
of stuff isn't worth building the whole framework of classes around, but is
still worth testing with mocks.

Re: More thoughts on Java

Posted by Sean Owen <sr...@gmail.com>.
You are right, I am not sure it makes any significant difference in
speed. The only situation where I know it can help is in ProGuard
optimization as it can perform inlining as a result. The better
argument is just for correctness -- if you aren't sure you set up the
class for subclassing, prohibit it.

But if there are helpful reasons to make classes extendable, leave it so, yes.

One alternate answer is to use interfaces more heavily so that it's
not necessary to subclass concrete classses for mocking purposes. This
is the road I went myself.

On Fri, Aug 22, 2008 at 6:29 PM, Ted Dunning <te...@gmail.com> wrote:
>
> +10^10
>
> The final declarations in Lucene make it nearly impossible to mock up tests
> that use Lucene objects.  TRULY a pain.
>
> I have almost never seen a final declaration make any difference in speed on
> modern Java platforms.
>

Re: More thoughts on Java

Posted by Ted Dunning <te...@gmail.com>.
On Fri, Aug 22, 2008 at 7:47 AM, Karl Wettin <ka...@gmail.com> wrote:

>
> I also like to declare everything final that can be, except method
>> params and locals, where it makes for too much noise. But I won't
>>
>
> I think final only makes sense in superclass methods that are not ment to
> be extended. There is lots of class, method and field finalization in Lucene
> that I can't stand. (This is especially true for Document and Term.) It
> forces me to do stuff like aggregating code when I really want to extend it.
> It can be rather expensive if you got enough instances to keep track of.


+10^10

The final declarations in Lucene make it nearly impossible to mock up tests
that use Lucene objects.  TRULY a pain.

I have almost never seen a final declaration make any difference in speed on
modern Java platforms.

Re: More thoughts on Java

Posted by Karl Wettin <ka...@gmail.com>.
22 aug 2008 kl. 16.20 skrev Sean Owen:

> StringBuilder is good but I've seen this:
> StringBuilder foo = ...;
> foo.append("something " + bar + " something else");
> .. which rather misses the point of StringBuilder. Append each string
> with append()
> foo.append("something ").append(bar).append(" something else");


> You can append a character like '\t' instead of a one-character String
> like "\t". It's faster.

Most modern compilers takes care of that sort of stuff.


> I also like to declare everything final that can be, except method
> params and locals, where it makes for too much noise. But I won't

I think final only makes sense in superclass methods that are not ment  
to be extended. There is lots of class, method and field finalization  
in Lucene that I can't stand. (This is especially true for Document  
and Term.) It forces me to do stuff like aggregating code when I  
really want to extend it. It can be rather expensive if you got enough  
instances to keep track of.


> I think instance fields should never be anything but private. Expose
> them with getters.

I think there is something to win from code that is easier to read.  
Where do you draw the line? For instance, do you add accessor methods  
for the fields of a private inner class?



           karl