You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/07/07 23:28:45 UTC

[math] Reducing replicated calculations across incrimental statistics (was: Re: [math] Recent commits to stat, util packages)

brent@worden.org wrote:

>>*Constructor approach to reusing moments.*
>>Mean mean = new Mean();
>>SecondMoment m2 = new SecondMoment(mean);
>>ThirdMoment m3 = new ThirdMoment(mean, m2);
>>FourthMoment m4 = new FourthMoment(mean, m2, m3);
>>Variance var = new Variance(m2);
>>Skew skew = new Skew(variance, m3);
>>Kurt kurt = new Kurt(variance, m4);
>>    
>>
>One problem with this approach, is now order of computation for each
>statistic object is a big concern.  The responsibility of correct
>ordering would have to be placed in univariate with Mark's statistics
>objects.  It would be better to place that responsibility in the
>statistic objects themselves
>
>Maybe we could make composite statistic objects that can compute more
>than one metric.  The composite would conform to the same statistic
>interface and would be adaptable into individual metrics.  Also, the
>responsibility of computation ordering would be hidden in the
>composite, removed from univariate.
>
>Brent Worden
>http://www.brent.worden.org/
>  
>
I did some experimentation in relation to this and came up with the 
following test case.

        FourthMoment m4 = new FourthMoment();
        Mean m = new Mean(m4);
        Variance v = new Variance(m4);
        Skewness s= new Skewness(m4);
        Kurtosis k = new Kurtosis(m4);

        for (int i = 0; i < testArray.length; i++){
            m4.increment(testArray[i]);
            m.increment(testArray[i]);
            v.increment(testArray[i]);
            s.increment(testArray[i]);
            k.increment(testArray[i]);
        }
       
        assertEquals(mean,m.getValue(),tolerance);
        assertEquals(var,v.getValue(),tolerance);
        assertEquals(skew ,s.getValue(),tolerance);
        assertEquals(kurt,k.getValue(),tolerance);


Previously, the class hierarchy looked like this
(--> = extends)

              mean (m1)
                 ^
                  |
var    --> m2,
                 ^
                  |
skew --> m3
                 ^
                  |
kurt   --> m4


I adjusted the mean,var,skew,kurt classes to be in the following 
hierarchy instead

mean  >delegates to>    m1
 ^                                    ^
  |                                     |
var     >delegates to>     m2,
 ^                                    ^
  |                                     |
skew  >delegates to>     m3
 ^                                    ^
  |                                     |
kurt    >delegates to>     m4

For incrementals I then made Each of the kurt, skew, var and mean 
classes delegate to an internal copy of its corresponding moment. 
Because FourthMoments encapsulate all the functionality of the lower 
moments, I can then use one instance of the FourthMoment and hand it 
into the statistics constructors, when this happens tehs tatistic no 
longer increments the internal moment that was handed in the 
constructor. I created code to make sure this moment is no longer 
incremented internally and must be incremented externally like in the 
above example ( this also means that it only needs to be done once per 
iteration instead of once per statistic, a large savings in duplicated 
computation). This works at any level in the moment hierarchy, for example

FirstMoment m1 = new FirstMoment();
Mean m = new Mean(m1);

or

SecondMoment m2 = new SecondMoment();
Mean m = new Mean(m2);
Variance v = new Variance(m2);

or

ThirdMoment m3 = new ThirdMoment();
Mean m = new Mean(m3);
Variance v = new Variance(m3);
Skewness s= new Skewness(m3);

or

FourthMoment m4 = new FourthMoment();
Mean m = new Mean(m4);
Variance v = new Variance(m4);
Skewness s= new Skewness(m4);
Kurtosis k = new Kurtosis(m4);

Finally, I can always forget doing this approach and fall back to just:

Mean m = new Mean();
Variance v = new Variance();
Skewness s= new Skewness();
Kurtosis k = new Kurtosis();

:which is less efficient, but requires no ordering restrictions. Its a 
start towards both flexibility and efficiency by providing various 
combinations of the stats for different usecases.
-Mark

--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org