You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Anirudh Joshi (Jira)" <ji...@apache.org> on 2023/07/02 16:19:00 UTC
[jira] [Comment Edited] (STATISTICS-71) Implementation of Univariate Statistics
[ https://issues.apache.org/jira/browse/STATISTICS-71?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739388#comment-17739388 ]
Anirudh Joshi edited comment on STATISTICS-71 at 7/2/23 4:18 PM:
-----------------------------------------------------------------
{quote}[...] What about {{Count}} being a {{{}DoubleStorelessStatistics{}}}, on which other(s) could depend?
{quote}
I was thinking the same to avoid redundant computations while computing multiple statistics. We could implement count as a standalone statistic and use composition to avoid redundant computations while computing multiple statistics ?
DoubleStorelessUnivariateStatistic add(double d);
{quote}{{[...] What's the intended usage?}}
{quote}
{{The reason I included this signature is to possibly support chaining during the add calls like}}
{code:java}
Mean m = new Mean();
double mean = m.add(1).add(2).add(3).getAsDouble();
double mean = Stream.of(1.0, 2.0, 3.0).map(Mean::add).getAsDouble();{code}
{quote}[...] E.g. is "Storeless" a required part of the name? Or is it an "implementation detail"?
{quote}
I changed the interface name to DoubleStorelessUnivariateStatistic since I feel we might actually need 3 interfaces, IntStorelessSummaryStatistics for integer data and LongStorelessSummaryStatistics for long data, similar to JDK SummaryStatistics. I feel its better to have Storeless as part of the interface name to make it clear that it is a storeless implementation, so that users are aware that they cannot do certain things like compute rolling statistics for instance. But I do not have a strong opinion on this and curious to hear other arguments against the naming (e.g. if the name is too verbose)
{quote}[...] be more restrictive in order to forbid meaningless combinations?
{quote}
Not sure if I understand the requirement correctly. What are the kinds of combinations we want to restrict here ?
was (Author: JIRAUSER299640):
{quote}[...] What about {{Count}} being a {{{}DoubleStorelessStatistics{}}}, on which other(s) could depend?
{quote}
I was thinking the same to avoid redundant computations while computing multiple statistics. We could implement count as a standalone statistic and use composition to avoid redundant computations while computing multiple statistics ?
DoubleStorelessUnivariateStatistic add(double d);
{quote}{{{}[...] What's the intended usage?{}}}{\{ }}
{quote}
{{The reason I included this signature is to possibly support chaining during the add calls like}}
{code:java}
Mean m = new Mean();
double mean = m.add(1).add(2).add(3).getAsDouble();
double mean = Stream.of(1.0, 2.0, 3.0).map(Mean::add).getAsDouble();{code}
{quote}[...] E.g. is "Storeless" a required part of the name? Or is it an "implementation detail"?
{quote}
I changed the interface name to DoubleStorelessUnivariateStatistic since I feel we might actually need 3 interfaces, IntStorelessSummaryStatistics for integer data and LongStorelessSummaryStatistics for long data, similar to JDK SummaryStatistics. I feel its better to have Storeless as part of the interface name to make it clear that it is a storeless implementation, so that users are aware that they cannot do certain things like compute rolling statistics for instance. But I do not have a strong opinion on this and curious to hear other arguments against the naming (e.g. if the name is too verbose)
{quote}[...] be more restrictive in order to forbid meaningless combinations?
{quote}
Not sure if I understand the requirement correctly. What are the kinds of combinations we want to restrict here ?
> Implementation of Univariate Statistics
> ---------------------------------------
>
> Key: STATISTICS-71
> URL: https://issues.apache.org/jira/browse/STATISTICS-71
> Project: Commons Statistics
> Issue Type: Task
> Components: descriptive
> Reporter: Anirudh Joshi
> Priority: Minor
> Labels: gsoc, gsoc2023
>
> Jira ticket to track the implementation of the Univariate statistics required for the updated SummaryStatistics API.
> The implementation would be "storeless". It should be used for calculating statistics that can be computed in one pass through the data without storing the sample values.
> Currently I have the definition of API as (this might evolve as I continue working)
> {code:java}
> public interface DoubleStorelessUnivariateStatistic extends DoubleSupplier {
> DoubleStorelessUnivariateStatistic add(double v);
> long getCount();
> void combine(DoubleStorelessUnivariateStatistic other);
> } {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)