You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Eric Barnhill <er...@gmail.com> on 2018/01/31 09:17:48 UTC

[statistics] Java 8 and summary statistics

If we are going to target Java 8  with commons-statistics, then we should
make use of the built in DoubleSummaryStatistics() (
https://docs.oracle.com/javase/8/docs/api/java/util/DoubleSummaryStatistics.html)
class, related classes for other numeric types, and related interfaces in
the Collectors() class.

I see a clear niche within commons for a convenience class that allows the
user to simply pass an array and a requested statistic, rather than build a
stream. Overloaded methods ( for example getMean(double[] d) and
getMean(long[] l) ) would handle whether, for example,
DoubleSummaryStatistics or LongSummaryStatistics need be called and the
return value could simply be the desired statistic.

 That would take care of Mean, Min, Max Sum, and Count.

I think the rest of the summary stats can be gathered by mapping
intermediate operations onto the same streams. For example, mapping each
value onto its square before calling SummaryStatistics(), then dividing the
sum of the mapped stream by its count, would return standard deviation.
That also seems to me a nice niche for commons, delivering additional
summary statistics beyond  the built-ins.

Given that a lot of the key functionality is built-in the scope of this
project seems better suited for a SummaryStatistics class with static
methods.

If this sounds good I'll start a branch to develop it.

Eric

Re: [statistics] Java 8 and summary statistics

Posted by Gilles <gi...@harfang.homelinux.org>.
Hi Eric.

On Wed, 31 Jan 2018 10:17:48 +0100, Eric Barnhill wrote:
> If we are going to target Java 8  with commons-statistics, then we 
> should
> make use of the built in DoubleSummaryStatistics() (
> 
> https://docs.oracle.com/javase/8/docs/api/java/util/DoubleSummaryStatistics.html)
> class, related classes for other numeric types, and related 
> interfaces in
> the Collectors() class.

Thanks for taking this on.

> I see a clear niche within commons for a convenience class that 
> allows the
> user to simply pass an array and a requested statistic, rather than 
> build a
> stream. Overloaded methods ( for example getMean(double[] d) and
> getMean(long[] l) ) would handle whether, for example,
> DoubleSummaryStatistics or LongSummaryStatistics need be called and 
> the
> return value could simply be the desired statistic.

Do you also intend to look into generics (e.g. allowing statistics
on objects like "Duration", as suggested by Gary some time ago)?

>  That would take care of Mean, Min, Max Sum, and Count.
>
> I think the rest of the summary stats can be gathered by mapping
> intermediate operations onto the same streams. For example, mapping 
> each
> value onto its square before calling SummaryStatistics(), then 
> dividing the
> sum of the mapped stream by its count, would return standard 
> deviation.
> That also seems to me a nice niche for commons, delivering additional
> summary statistics beyond  the built-ins.
>
> Given that a lot of the key functionality is built-in the scope of 
> this
> project seems better suited for a SummaryStatistics class with static
> methods.

As long as it does not block fluent usage of Java 8 interfaces and
  syntactic constructs...

> If this sounds good I'll start a branch to develop it.

I'm still not a very knowledgeable user of JDK8 (a hidden cost of years
working on CM).
I hope that others will comment (and help with) the revamping of
the statistical utilities.

Regards,
Gilles

>
> Eric


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org