You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Anirudh Joshi (Jira)" <ji...@apache.org> on 2023/07/02 16:19:00 UTC
[jira] [Comment Edited] (STATISTICS-71) Implementation of Univariate Statistics

    [ https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739388#comment-17739388 ] 

Anirudh Joshi edited comment on STATISTICS-71 at 7/2/23 4:18 PM:
-----------------------------------------------------------------

{quote}[...] What about {{Count}} being a {{{}DoubleStorelessStatistics{}}}, on which other(s) could depend?
{quote}
I was thinking the same to avoid redundant computations while computing multiple statistics. We could implement count as a standalone statistic and use composition to avoid redundant computations while computing multiple statistics ?

DoubleStorelessUnivariateStatistic add(double d);
{quote}{{[...] What's the intended usage?}}
{quote}
{{The reason I included this signature is to possibly support chaining during the add calls like}}
{code:java}
Mean m = new Mean();
double mean = m.add(1).add(2).add(3).getAsDouble();

double mean = Stream.of(1.0, 2.0, 3.0).map(Mean::add).getAsDouble();{code}
{quote}[...] E.g. is "Storeless" a required part of the name? Or is it an "implementation detail"?
{quote}
I changed the interface name to DoubleStorelessUnivariateStatistic since I feel we might actually need 3 interfaces, IntStorelessSummaryStatistics for integer data and LongStorelessSummaryStatistics for long data, similar to JDK SummaryStatistics. I feel its better to have Storeless as part of the interface name to make it clear that it is a storeless implementation, so that users are aware that they cannot do certain things like compute rolling statistics for instance. But I do not have a strong opinion on this and curious to hear other arguments against the naming (e.g. if the name is too verbose)
{quote}[...] be more restrictive in order to forbid meaningless combinations?
{quote}
Not sure if I understand the requirement correctly. What are the kinds of combinations we want to restrict here ?


was (Author: JIRAUSER299640):
{quote}[...] What about {{Count}} being a {{{}DoubleStorelessStatistics{}}}, on which other(s) could depend?
{quote}
I was thinking the same to avoid redundant computations while computing multiple statistics. We could implement count as a standalone statistic and use composition to avoid redundant computations while computing multiple statistics ?

DoubleStorelessUnivariateStatistic add(double d);
{quote}{{{}[...] What's the intended usage?{}}}{\{ }}
{quote}
{{The reason I included this signature is to possibly support chaining during the add calls like}}
{code:java}
Mean m = new Mean();
double mean = m.add(1).add(2).add(3).getAsDouble();

double mean = Stream.of(1.0, 2.0, 3.0).map(Mean::add).getAsDouble();{code}
{quote}[...] E.g. is "Storeless" a required part of the name? Or is it an "implementation detail"?
{quote}
I changed the interface name to DoubleStorelessUnivariateStatistic since I feel we might actually need 3 interfaces, IntStorelessSummaryStatistics for integer data and LongStorelessSummaryStatistics for long data, similar to JDK SummaryStatistics. I feel its better to have Storeless as part of the interface name to make it clear that it is a storeless implementation, so that users are aware that they cannot do certain things like compute rolling statistics for instance. But I do not have a strong opinion on this and curious to hear other arguments against the naming (e.g. if the name is too verbose)
{quote}[...] be more restrictive in order to forbid meaningless combinations?
{quote}
Not sure if I understand the requirement correctly. What are the kinds of combinations we want to restrict here ?

> Implementation of Univariate Statistics
> ---------------------------------------
>
>                 Key: STATISTICS-71
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-71
>             Project: Commons Statistics
>          Issue Type: Task
>          Components: descriptive
>            Reporter: Anirudh Joshi
>            Priority: Minor
>              Labels: gsoc, gsoc2023
>
> Jira ticket to track the implementation of the Univariate statistics required for the updated SummaryStatistics API. 
> The implementation would be "storeless". It should be used for calculating statistics that can be computed in one pass through the data without storing the sample values.
> Currently I have the definition of API as (this might evolve as I continue working)
> {code:java}
> public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {
>     DoubleStorelessUnivariateStatistic add(double v);
>     long getCount();
>     void combine(DoubleStorelessUnivariateStatistic other);
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)