You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/07/13 00:31:23 UTC
[math][collections] PrimitiveCollections and UnivariateStatistics: Iterators vs. for loops

I started to consider changing the evaluate() methods in 
UnivariateStatistic to work with DoubleIterators from the 
collections.primitives API. But I found some pretty basic shortcomings 
to working with DoubleCollections and DoubleIterators on double[]'s. 
Much of this came from the similar warning signs about performance that 
Phil raised with my initial implementations of UnivariateStatistics. So 
I went and used his strategy to attempt to come up with efficiency 
measures for various iteration strategies.


I've done some analysis and I'm pretty surprised to find out how 
inefficient the "Iterators" hasNext() methods are in for/while loops. I 
wrote a simple iterator for double[]'s and find that it always performs 
poorly (~ 2-3 times slower) in relation to the "for(int i = 0; i < 
length;i++)" loop approach. Phil pointed out that the slowdown in my 
initial implementation was due to having "method calls" for 
increment(double d) embedded in an iterative approach. While a 
significant percentage of time spent processing all the calculations in 
the "instantaneous increment" method call, there is also a good portion 
of time lost to poorly designed passes over the double[], 
DoubleArrayIterator performance, and the use of hasNext(). The attached 
example code shows these effects fairly clearly.

I'd recommend the following from these results for both [math] and 
[collections]:

*1* [math] we can get about a 95% efficiency in relation to regular 
for-loops (or evaluate2(...) in the code) by doing tricks like this.

public double evaluate(double[] values, int begin, int length) {
         for (int i = begin + length; --i >= 0;) {
          ...

*2* [math][collections]if we want to use Iterators they work the fastest 
when "hasNext()" can be avoided like the following, this provides about 
a 110% efficiency in relation to regular for-loops (or evaluate2(...) in 
the code) this is as opposed to a ~200% efficiency for the use of 
iter.hasNext() in the conditional test of the loop (or evaluate4(...) in 
the code).

public double evaluate3(double[] values, int begin, int length) {
	MyDoubleIterator iter
             = new MyDoubleIterator(values, begin, length);
         for (int i = length; --i >= 0;) {
               ... iter.next();
               ...

*3* [collections] in primitive collections the toArray() and 
toArray(primitive[]) methods can benefit from the above "Iterator" for 
loop in strategy for the Abstract Cases, I recommend however, 
considering the use of System.arrayCopy in each <Primitive>Collection 
where ever possible because this will always be faster that copying it 
by hand. In cases where [math] needs a copy of an array, we would really 
like it in the most efficient means possible. Our current DoubleArray 
implementations approach copies using System.arrayCopy(...).

Hope this is helpful to both groups,
Mark