You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Ken Geis <kg...@speakeasy.org> on 2004/05/13 12:13:59 UTC
[math] statistics performance boost
As I explained, I am using commons-math to enable data mining algorithms
I am writing. I am using a lot of SummaryStatistics and TTest. Through
some profiling, I was able to find places to optimize code and I ended
up getting a 15x performance boost within my application. This was from
three changes:
1. Add clone() to SummaryStatisticsImpl. This implies adding clone() to
SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean,
Mean, and Variance. To Mark, I think that the behavior of clone() is
well implied by the Javadoc for java.lang.Object. I was surprised that
I obviously had not read that before yesterday. To Phil, your suggested
getSummary() method/bean would indeed solve my problem and give me even
better performance. (clone() was ~20x faster than the
serialize/deserialize hack I was using. This probably accounts for 2x
of my overall 15x.)
2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance()
was being called for every call to tTest. This is not a cheap method.
After #1, this method was taking up something like 17% of the runtime of
my synthetic benchmark. I created a method to lazily get the
DistributionFactory and store it (transient) as a class attribute.
3. Make ContinuedFraction.evaluate(...) iterative instead of recursive.
This gave me a 125% (2.25x) improvement in performance of this method.
I think I can optimize it further, hopefully not at the cost of
readability.
Patches available on request. Should I just start posting them when I
have patches like this?
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] statistics performance boost
Posted by Phil Steitz <ph...@steitz.com>.
Ken Geis wrote:
> As I explained, I am using commons-math to enable data mining algorithms
> I am writing. I am using a lot of SummaryStatistics and TTest. Through
> some profiling, I was able to find places to optimize code and I ended
> up getting a 15x performance boost within my application. This was from
> three changes:
>
> 1. Add clone() to SummaryStatisticsImpl. This implies adding clone() to
> SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean,
> Mean, and Variance. To Mark, I think that the behavior of clone() is
> well implied by the Javadoc for java.lang.Object. I was surprised that
> I obviously had not read that before yesterday. To Phil, your suggested
> getSummary() method/bean would indeed solve my problem and give me even
> better performance. (clone() was ~20x faster than the
> serialize/deserialize hack I was using. This probably accounts for 2x
> of my overall 15x.)
As noted in previous response, getSummary(), StatisticalSummaryValues have
been added.
>
> 2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance()
> was being called for every call to tTest. This is not a cheap method.
> After #1, this method was taking up something like 17% of the runtime of
> my synthetic benchmark. I created a method to lazily get the
> DistributionFactory and store it (transient) as a class attribute.
TTestImpl now caches the factory (as instance, not class variable).
>
> 3. Make ContinuedFraction.evaluate(...) iterative instead of recursive.
> This gave me a 125% (2.25x) improvement in performance of this method.
> I think I can optimize it further, hopefully not at the cost of
> readability.
We could really use this, as it would also prevent stack overflows (could
be cause of BZ #29414). A patch would be most welcome :-)
>
> Patches available on request. Should I just start posting them when I
> have patches like this?
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] statistics performance boost
Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Most definitely post any patches. It would be good to post them on our
bugzilla so that they can be properly tracked.
http://jakarta.apache.org/commons/math/developers.html
Ken Geis wrote:
> As I explained, I am using commons-math to enable data mining algorithms
> I am writing. I am using a lot of SummaryStatistics and TTest. Through
> some profiling, I was able to find places to optimize code and I ended
> up getting a 15x performance boost within my application. This was from
> three changes:
>
> 1. Add clone() to SummaryStatisticsImpl. This implies adding clone() to
> SecondMoment, Sum, SumOfSquares, Min, Max, SumOfLogs, GeometricMean,
> Mean, and Variance. To Mark, I think that the behavior of clone() is
> well implied by the Javadoc for java.lang.Object. I was surprised that
> I obviously had not read that before yesterday. To Phil, your suggested
> getSummary() method/bean would indeed solve my problem and give me even
> better performance. (clone() was ~20x faster than the
> serialize/deserialize hack I was using. This probably accounts for 2x
> of my overall 15x.)
>
I think we should work on improvements to both clone and getSummary()
methods.
> 2. Change TTestImpl; the commons-discovery DiscoverClass.newInstance()
> was being called for every call to tTest. This is not a cheap method.
> After #1, this method was taking up something like 17% of the runtime of
> my synthetic benchmark. I created a method to lazily get the
> DistributionFactory and store it (transient) as a class attribute.
>
> 3. Make ContinuedFraction.evaluate(...) iterative instead of recursive.
> This gave me a 125% (2.25x) improvement in performance of this method.
> I think I can optimize it further, hopefully not at the cost of
> readability.
>
> Patches available on request. Should I just start posting them when I
> have patches like this?
>
>
All of your efforts are greatly appreciated, we will gladly acknowledge
your efforts as a contributor in the project documentation.
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org