You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Ken Geis <kg...@speakeasy.org> on 2004/05/13 12:13:59 UTC

[math] statistics performance boost

As I explained, I am using commons-math to enable data mining algorithms 
I am writing.  I am using a lot of SummaryStatistics and TTest.  Through 
some profiling, I was able to find places to optimize code and I ended 
up getting a 15x performance boost within my application.  This was from 
three changes:

1. Add clone() to SummaryStatisticsImpl.  This implies adding clone() to 
SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean, 
Mean, and Variance.  To Mark, I think that the behavior of clone() is 
well implied by the Javadoc for java.lang.Object.  I was surprised that 
I obviously had not read that before yesterday.  To Phil, your suggested 
getSummary() method/bean would indeed solve my problem and give me even 
better performance.  (clone() was ~20x faster than the 
serialize/deserialize hack I was using.  This probably accounts for 2x 
of my overall 15x.)

2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance() 
was being called for every call to tTest.  This is not a cheap method. 
After #1, this method was taking up something like 17% of the runtime of 
my synthetic benchmark.  I created a method to lazily get the 
DistributionFactory and store it (transient) as a class attribute.

3. Make ContinuedFraction.evaluate(...) iterative instead of recursive. 
  This gave me a 125% (2.25x) improvement in performance of this method. 
  I think I can optimize it further, hopefully not at the cost of 
readability.

Patches available on request.  Should I just start posting them when I 
have patches like this?



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] statistics performance boost

Posted by Phil Steitz <ph...@steitz.com>.
Ken Geis wrote:
> As I explained, I am using commons-math to enable data mining algorithms 
> I am writing.  I am using a lot of SummaryStatistics and TTest.  Through 
> some profiling, I was able to find places to optimize code and I ended 
> up getting a 15x performance boost within my application.  This was from 
> three changes:
> 
> 1. Add clone() to SummaryStatisticsImpl.  This implies adding clone() to 
> SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean, 
> Mean, and Variance.  To Mark, I think that the behavior of clone() is 
> well implied by the Javadoc for java.lang.Object.  I was surprised that 
> I obviously had not read that before yesterday.  To Phil, your suggested 
> getSummary() method/bean would indeed solve my problem and give me even 
> better performance.  (clone() was ~20x faster than the 
> serialize/deserialize hack I was using.  This probably accounts for 2x 
> of my overall 15x.)

As noted in previous response, getSummary(), StatisticalSummaryValues have 
been added.
> 
> 2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance() 
> was being called for every call to tTest.  This is not a cheap method. 
> After #1, this method was taking up something like 17% of the runtime of 
> my synthetic benchmark.  I created a method to lazily get the 
> DistributionFactory and store it (transient) as a class attribute.

TTestImpl now caches the factory (as instance, not class variable).

> 
> 3. Make ContinuedFraction.evaluate(...) iterative instead of recursive. 
>  This gave me a 125% (2.25x) improvement in performance of this method. 
>  I think I can optimize it further, hopefully not at the cost of 
> readability.

We could really use this, as it would also prevent stack overflows (could 
be cause of BZ #29414).  A patch would be most welcome :-)

> 
> Patches available on request.  Should I just start posting them when I 
> have patches like this?
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] statistics performance boost

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Most definitely post any patches. It would be good to post them on our 
bugzilla so that they can be properly tracked.

http://jakarta.apache.org/commons/math/developers.html

Ken Geis wrote:
> As I explained, I am using commons-math to enable data mining algorithms 
> I am writing.  I am using a lot of SummaryStatistics and TTest.  Through 
> some profiling, I was able to find places to optimize code and I ended 
> up getting a 15x performance boost within my application.  This was from 
> three changes:
> 
> 1. Add clone() to SummaryStatisticsImpl.  This implies adding clone() to 
> SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean, 
> Mean, and Variance.  To Mark, I think that the behavior of clone() is 
> well implied by the Javadoc for java.lang.Object.  I was surprised that 
> I obviously had not read that before yesterday.  To Phil, your suggested 
> getSummary() method/bean would indeed solve my problem and give me even 
> better performance.  (clone() was ~20x faster than the 
> serialize/deserialize hack I was using.  This probably accounts for 2x 
> of my overall 15x.)
> 

I think we should work on improvements to both clone and getSummary() 
methods.

> 2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance() 
> was being called for every call to tTest.  This is not a cheap method. 
> After #1, this method was taking up something like 17% of the runtime of 
> my synthetic benchmark.  I created a method to lazily get the 
> DistributionFactory and store it (transient) as a class attribute.
> 
> 3. Make ContinuedFraction.evaluate(...) iterative instead of recursive. 
>  This gave me a 125% (2.25x) improvement in performance of this method. 
>  I think I can optimize it further, hopefully not at the cost of 
> readability.
> 
> Patches available on request.  Should I just start posting them when I 
> have patches like this?
> 
> 

All of your efforts are greatly appreciated, we will gladly acknowledge 
your efforts as a contributor in the project documentation.

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org