You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@gmail.com> on 2014/10/10 21:40:48 UTC

[math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Now that we have a "storeless" percentile estimator, we can add
quartile computation to SummaryStatistics.  Any objections to my
adding this?  I could optionally add a boolean constructor argument
to avoid the overhead of maintaining these stats.  Or more
generally, add a bitfield encoding the exact set of stats the user
wants to maintain.  If there are no objections to the addition, I
will open a JIRA.

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by venkatesha murthy <ve...@gmail.com>.
Many Thanks Phil, for answering all my questions.

On Tue, Oct 14, 2014 at 10:19 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 10/14/14 6:59 AM, venkatesha murthy wrote:
> > ok.
> >
> > Wanted to understand advantage of having a container class for all
> > storeless stats (just as DescriptiveStats is for Univariate). I could
> open
> > another email thread.
>
> SummaryStatistics is a container for storeless stats,
> DescriptiveStatistics is for stats computed over a stored dataset,
> possibly with a rolling window.  The rationale here is that
> SummaryStatistics aggregates StorelessUnivariateUnivariateStatistics
> while DescriptiveStatistics aggregates statistics that implement
> only UnivariateStatistic, which requires that the full set of data
> be provided as an input array (so the aggregate has to maintain a
> dataset in memory).  The advantage of having a container for
> storeless stats is that a stream of data can be fed into the
> container's addValue method and the constituent stats will all get
> updated with the values as they come in.
>
> > Also wanted to understand whats a abstract interface problem that you
> were
> > refering
>
> We moved to favoring abstract classes (where needed / useful) over
> interfaces because it is easier to add to / modify abstract classes
> than interfaces in a backward compatible way.
> >
> > thanks
> > murthy
> >
> > On Tue, Oct 14, 2014 at 9:47 AM, Phil Steitz <ph...@gmail.com>
> wrote:
> >
> >> On 10/13/14 8:55 PM, venkatesha murthy wrote:
> >>> On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <ph...@gmail.com>
> >> wrote:
> >>>> On 10/13/14 1:04 PM, venkatesha murthy wrote:
> >>>>> Adding a bit more on this:
> >>>>> a) The DescriptiveStatisticalSummary actually handles the rest of the
> >>>>> functions such as addValue, getPercentile etc.
> >>>>> b) I have added addValue() as it is important to see either storeless
> >> or
> >>>>> store variants as interfaces.
> >>>>> c) A case in point being (for b); i was actually trying out a
> lockfull
> >>>> and
> >>>>> a lockfree based variants for descriptive statistical summary and it
> >> was
> >>>>> very concise/consistent with an interface to use that has all common
> >>>>> functions across all variants.
> >>>>> d) well lock based or lock free variants are not a part of this patch
> >> as
> >>>>> iam still working through
> >>>>>
> >>>>> However i feel the getPercentile can definitely add value. Please let
> >> me
> >>>>> know if i could turn in all the relevant methods of
> >>>>> DescriptiveStorelessStatistics  into statistical summary (such as
> >>>> kurtosis,
> >>>>> skewness etc..) and then we could just use SummaryStatistics.
> >>>> I am not sure I understand what you are proposing.  Currently, we
> >>>> have two statistical "aggregates" for descriptive univariate stats:
> >>>> SummaryStatistics - aggregates "storeless" statistics over a stream
> >>>> of data that is not stored in memory
> >>>> DescriptiveStatistics - provides an extended set of statistics, some
> >>>> of which require that the full set of data be stored in memory
> >>>>
> >>>> OK. I am sorry for the confusion here. I understand the intent now.
> >>> However what i wanted to convey was all the statistics that
> >>> is supported in current DescriptiveStatistics can be supported in
> >> Storeless
> >>> variant as well. (For eg: skewness, kurtosis, percentile)
> >> No, for example exact percentiles, or even arbitrary percentiles
> >> (without the quantile - e.g. quartile) specified in advance, can't
> >> be computed without storing the data.  Also, DescriptiveStatistics
> >> supports a rolling window and stats it implements can make use of
> >> multi-pass algorithms.
> >>
> >>> Therefore; what i was proposing is to have a common interface that can
> >> have
> >>> all these methods too. for eg: (we can change the name if it is needed)
> >>>
> >>> DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends
> >>> StatisticalSummary{
> >>>      getKurtosis();
> >>>      getPercentile();
> >>>      getSkewness();
> >>>      // Add Mutation methods as well
> >>>      addValue(double d);
> >>>      //Provide additional builder methods for injecting custom
> >> percentile,
> >>> kurtosis, skewness, variance etc.
> >>>      withPercentile(S Percentile);
> >>>      withKurtosis(S kurtosis);
> >>> }
> >> Per comments above, the contracts of these aggregates are
> >> different.  We have also moved away from defining abstract
> >> interfaces as these end up creating problems when we want to add
> >> things (as in the subject of this thread).
> >>
> >> Phil
> >>>> The subject of this thread was a proposal to add quartiles to
> >>>> SummaryStatistics, as the new(ish) PSquarePercentile allows those
> >>>> statistics to be computed without storing the data.
> >>>>
> >>>> Agreed. I was just adding points on how we can bring both
> >>> DescriptiveStatistics and SummaryStatistics under a common interface
> for
> >>> all the stats.
> >>>
> >>>> Phil
> >>>>> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
> >>>>> venkateshamurthyts@gmail.com> wrote:
> >>>>>
> >>>>>> Hi Phil,
> >>>>>>
> >>>>>> Though i did not add to StatisticalSummary i was actually working
> on a
> >>>>>> DescriptiveStatisticalSummary for all the Storeless variants
> inclusive
> >>>> of
> >>>>>> PSquarePercentile. Would it help if you can actually implement
> >>>>>> SummaryStatisitcs with an extended interface such as
> >>>>>> DescriptiveStatisticalSummary ? below.
> >>>>>>
> >>>>>> That said i actually wanted to discuss the new storelessvariant of
> >>>>>> descriptive statistics.
> >>>>>> a) DescriptiveStatisticalSummary - an extended interface for
> >>>>>> StatisticalSummary (adds a Generic type that can cater for store
> full
> >>>> and
> >>>>>> storeless)
> >>>>>> b) DescriptiveStorelessStatistics - Storeless variant of
> >>>>>> DescriptiveStatisitcs
> >>>>>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized
> >> wrapper.
> >>>>>> Test case classes added to the same.
> >>>>>>
> >>>>>> Please let me know on this i could also accomodate the changes to
> >>>> summary
> >>>>>> stats based on this change here.
> >>>>>> Also please let me know if this could be raised as a jira ticket to
> >>>> pursue.
> >>>>>> Thanks
> >>>>>> Murthy
> >>>>>>
> >>>>>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <phil.steitz@gmail.com
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Now that we have a "storeless" percentile estimator, we can add
> >>>>>>> quartile computation to SummaryStatistics.  Any objections to my
> >>>>>>> adding this?  I could optionally add a boolean constructor argument
> >>>>>>> to avoid the overhead of maintaining these stats.  Or more
> >>>>>>> generally, add a bitfield encoding the exact set of stats the user
> >>>>>>> wants to maintain.  If there are no objections to the addition, I
> >>>>>>> will open a JIRA.
> >>>>>>>
> >>>>>>> Phil
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>>>
> >>>>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by Phil Steitz <ph...@gmail.com>.
On 10/14/14 6:59 AM, venkatesha murthy wrote:
> ok.
>
> Wanted to understand advantage of having a container class for all
> storeless stats (just as DescriptiveStats is for Univariate). I could open
> another email thread.

SummaryStatistics is a container for storeless stats,
DescriptiveStatistics is for stats computed over a stored dataset,
possibly with a rolling window.  The rationale here is that
SummaryStatistics aggregates StorelessUnivariateUnivariateStatistics
while DescriptiveStatistics aggregates statistics that implement
only UnivariateStatistic, which requires that the full set of data
be provided as an input array (so the aggregate has to maintain a
dataset in memory).  The advantage of having a container for
storeless stats is that a stream of data can be fed into the
container's addValue method and the constituent stats will all get
updated with the values as they come in.

> Also wanted to understand whats a abstract interface problem that you were
> refering

We moved to favoring abstract classes (where needed / useful) over
interfaces because it is easier to add to / modify abstract classes
than interfaces in a backward compatible way.  
>
> thanks
> murthy
>
> On Tue, Oct 14, 2014 at 9:47 AM, Phil Steitz <ph...@gmail.com> wrote:
>
>> On 10/13/14 8:55 PM, venkatesha murthy wrote:
>>> On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <ph...@gmail.com>
>> wrote:
>>>> On 10/13/14 1:04 PM, venkatesha murthy wrote:
>>>>> Adding a bit more on this:
>>>>> a) The DescriptiveStatisticalSummary actually handles the rest of the
>>>>> functions such as addValue, getPercentile etc.
>>>>> b) I have added addValue() as it is important to see either storeless
>> or
>>>>> store variants as interfaces.
>>>>> c) A case in point being (for b); i was actually trying out a lockfull
>>>> and
>>>>> a lockfree based variants for descriptive statistical summary and it
>> was
>>>>> very concise/consistent with an interface to use that has all common
>>>>> functions across all variants.
>>>>> d) well lock based or lock free variants are not a part of this patch
>> as
>>>>> iam still working through
>>>>>
>>>>> However i feel the getPercentile can definitely add value. Please let
>> me
>>>>> know if i could turn in all the relevant methods of
>>>>> DescriptiveStorelessStatistics  into statistical summary (such as
>>>> kurtosis,
>>>>> skewness etc..) and then we could just use SummaryStatistics.
>>>> I am not sure I understand what you are proposing.  Currently, we
>>>> have two statistical "aggregates" for descriptive univariate stats:
>>>> SummaryStatistics - aggregates "storeless" statistics over a stream
>>>> of data that is not stored in memory
>>>> DescriptiveStatistics - provides an extended set of statistics, some
>>>> of which require that the full set of data be stored in memory
>>>>
>>>> OK. I am sorry for the confusion here. I understand the intent now.
>>> However what i wanted to convey was all the statistics that
>>> is supported in current DescriptiveStatistics can be supported in
>> Storeless
>>> variant as well. (For eg: skewness, kurtosis, percentile)
>> No, for example exact percentiles, or even arbitrary percentiles
>> (without the quantile - e.g. quartile) specified in advance, can't
>> be computed without storing the data.  Also, DescriptiveStatistics
>> supports a rolling window and stats it implements can make use of
>> multi-pass algorithms.
>>
>>> Therefore; what i was proposing is to have a common interface that can
>> have
>>> all these methods too. for eg: (we can change the name if it is needed)
>>>
>>> DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends
>>> StatisticalSummary{
>>>      getKurtosis();
>>>      getPercentile();
>>>      getSkewness();
>>>      // Add Mutation methods as well
>>>      addValue(double d);
>>>      //Provide additional builder methods for injecting custom
>> percentile,
>>> kurtosis, skewness, variance etc.
>>>      withPercentile(S Percentile);
>>>      withKurtosis(S kurtosis);
>>> }
>> Per comments above, the contracts of these aggregates are
>> different.  We have also moved away from defining abstract
>> interfaces as these end up creating problems when we want to add
>> things (as in the subject of this thread).
>>
>> Phil
>>>> The subject of this thread was a proposal to add quartiles to
>>>> SummaryStatistics, as the new(ish) PSquarePercentile allows those
>>>> statistics to be computed without storing the data.
>>>>
>>>> Agreed. I was just adding points on how we can bring both
>>> DescriptiveStatistics and SummaryStatistics under a common interface for
>>> all the stats.
>>>
>>>> Phil
>>>>> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
>>>>> venkateshamurthyts@gmail.com> wrote:
>>>>>
>>>>>> Hi Phil,
>>>>>>
>>>>>> Though i did not add to StatisticalSummary i was actually working on a
>>>>>> DescriptiveStatisticalSummary for all the Storeless variants inclusive
>>>> of
>>>>>> PSquarePercentile. Would it help if you can actually implement
>>>>>> SummaryStatisitcs with an extended interface such as
>>>>>> DescriptiveStatisticalSummary ? below.
>>>>>>
>>>>>> That said i actually wanted to discuss the new storelessvariant of
>>>>>> descriptive statistics.
>>>>>> a) DescriptiveStatisticalSummary - an extended interface for
>>>>>> StatisticalSummary (adds a Generic type that can cater for store full
>>>> and
>>>>>> storeless)
>>>>>> b) DescriptiveStorelessStatistics - Storeless variant of
>>>>>> DescriptiveStatisitcs
>>>>>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized
>> wrapper.
>>>>>> Test case classes added to the same.
>>>>>>
>>>>>> Please let me know on this i could also accomodate the changes to
>>>> summary
>>>>>> stats based on this change here.
>>>>>> Also please let me know if this could be raised as a jira ticket to
>>>> pursue.
>>>>>> Thanks
>>>>>> Murthy
>>>>>>
>>>>>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <ph...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Now that we have a "storeless" percentile estimator, we can add
>>>>>>> quartile computation to SummaryStatistics.  Any objections to my
>>>>>>> adding this?  I could optionally add a boolean constructor argument
>>>>>>> to avoid the overhead of maintaining these stats.  Or more
>>>>>>> generally, add a bitfield encoding the exact set of stats the user
>>>>>>> wants to maintain.  If there are no objections to the addition, I
>>>>>>> will open a JIRA.
>>>>>>>
>>>>>>> Phil
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>
>>>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by venkatesha murthy <ve...@gmail.com>.
ok.

Wanted to understand advantage of having a container class for all
storeless stats (just as DescriptiveStats is for Univariate). I could open
another email thread.
Also wanted to understand whats a abstract interface problem that you were
refering

thanks
murthy

On Tue, Oct 14, 2014 at 9:47 AM, Phil Steitz <ph...@gmail.com> wrote:

> On 10/13/14 8:55 PM, venkatesha murthy wrote:
> > On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <ph...@gmail.com>
> wrote:
> >
> >> On 10/13/14 1:04 PM, venkatesha murthy wrote:
> >>> Adding a bit more on this:
> >>> a) The DescriptiveStatisticalSummary actually handles the rest of the
> >>> functions such as addValue, getPercentile etc.
> >>> b) I have added addValue() as it is important to see either storeless
> or
> >>> store variants as interfaces.
> >>> c) A case in point being (for b); i was actually trying out a lockfull
> >> and
> >>> a lockfree based variants for descriptive statistical summary and it
> was
> >>> very concise/consistent with an interface to use that has all common
> >>> functions across all variants.
> >>> d) well lock based or lock free variants are not a part of this patch
> as
> >>> iam still working through
> >>>
> >>> However i feel the getPercentile can definitely add value. Please let
> me
> >>> know if i could turn in all the relevant methods of
> >>> DescriptiveStorelessStatistics  into statistical summary (such as
> >> kurtosis,
> >>> skewness etc..) and then we could just use SummaryStatistics.
> >> I am not sure I understand what you are proposing.  Currently, we
> >> have two statistical "aggregates" for descriptive univariate stats:
> >> SummaryStatistics - aggregates "storeless" statistics over a stream
> >> of data that is not stored in memory
> >> DescriptiveStatistics - provides an extended set of statistics, some
> >> of which require that the full set of data be stored in memory
> >>
> >> OK. I am sorry for the confusion here. I understand the intent now.
> > However what i wanted to convey was all the statistics that
> > is supported in current DescriptiveStatistics can be supported in
> Storeless
> > variant as well. (For eg: skewness, kurtosis, percentile)
>
> No, for example exact percentiles, or even arbitrary percentiles
> (without the quantile - e.g. quartile) specified in advance, can't
> be computed without storing the data.  Also, DescriptiveStatistics
> supports a rolling window and stats it implements can make use of
> multi-pass algorithms.
>
> >
> > Therefore; what i was proposing is to have a common interface that can
> have
> > all these methods too. for eg: (we can change the name if it is needed)
> >
> > DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends
> > StatisticalSummary{
> >      getKurtosis();
> >      getPercentile();
> >      getSkewness();
> >      // Add Mutation methods as well
> >      addValue(double d);
> >      //Provide additional builder methods for injecting custom
> percentile,
> > kurtosis, skewness, variance etc.
> >      withPercentile(S Percentile);
> >      withKurtosis(S kurtosis);
> > }
>
> Per comments above, the contracts of these aggregates are
> different.  We have also moved away from defining abstract
> interfaces as these end up creating problems when we want to add
> things (as in the subject of this thread).
>
> Phil
> >
> >> The subject of this thread was a proposal to add quartiles to
> >> SummaryStatistics, as the new(ish) PSquarePercentile allows those
> >> statistics to be computed without storing the data.
> >>
> >> Agreed. I was just adding points on how we can bring both
> > DescriptiveStatistics and SummaryStatistics under a common interface for
> > all the stats.
> >
> >> Phil
> >>> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
> >>> venkateshamurthyts@gmail.com> wrote:
> >>>
> >>>> Hi Phil,
> >>>>
> >>>> Though i did not add to StatisticalSummary i was actually working on a
> >>>> DescriptiveStatisticalSummary for all the Storeless variants inclusive
> >> of
> >>>> PSquarePercentile. Would it help if you can actually implement
> >>>> SummaryStatisitcs with an extended interface such as
> >>>> DescriptiveStatisticalSummary ? below.
> >>>>
> >>>> That said i actually wanted to discuss the new storelessvariant of
> >>>> descriptive statistics.
> >>>> a) DescriptiveStatisticalSummary - an extended interface for
> >>>> StatisticalSummary (adds a Generic type that can cater for store full
> >> and
> >>>> storeless)
> >>>> b) DescriptiveStorelessStatistics - Storeless variant of
> >>>> DescriptiveStatisitcs
> >>>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized
> wrapper.
> >>>>
> >>>> Test case classes added to the same.
> >>>>
> >>>> Please let me know on this i could also accomodate the changes to
> >> summary
> >>>> stats based on this change here.
> >>>> Also please let me know if this could be raised as a jira ticket to
> >> pursue.
> >>>> Thanks
> >>>> Murthy
> >>>>
> >>>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <ph...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Now that we have a "storeless" percentile estimator, we can add
> >>>>> quartile computation to SummaryStatistics.  Any objections to my
> >>>>> adding this?  I could optionally add a boolean constructor argument
> >>>>> to avoid the overhead of maintaining these stats.  Or more
> >>>>> generally, add a bitfield encoding the exact set of stats the user
> >>>>> wants to maintain.  If there are no objections to the addition, I
> >>>>> will open a JIRA.
> >>>>>
> >>>>> Phil
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by Phil Steitz <ph...@gmail.com>.
On 10/13/14 8:55 PM, venkatesha murthy wrote:
> On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <ph...@gmail.com> wrote:
>
>> On 10/13/14 1:04 PM, venkatesha murthy wrote:
>>> Adding a bit more on this:
>>> a) The DescriptiveStatisticalSummary actually handles the rest of the
>>> functions such as addValue, getPercentile etc.
>>> b) I have added addValue() as it is important to see either storeless or
>>> store variants as interfaces.
>>> c) A case in point being (for b); i was actually trying out a lockfull
>> and
>>> a lockfree based variants for descriptive statistical summary and it was
>>> very concise/consistent with an interface to use that has all common
>>> functions across all variants.
>>> d) well lock based or lock free variants are not a part of this patch as
>>> iam still working through
>>>
>>> However i feel the getPercentile can definitely add value. Please let me
>>> know if i could turn in all the relevant methods of
>>> DescriptiveStorelessStatistics  into statistical summary (such as
>> kurtosis,
>>> skewness etc..) and then we could just use SummaryStatistics.
>> I am not sure I understand what you are proposing.  Currently, we
>> have two statistical "aggregates" for descriptive univariate stats:
>> SummaryStatistics - aggregates "storeless" statistics over a stream
>> of data that is not stored in memory
>> DescriptiveStatistics - provides an extended set of statistics, some
>> of which require that the full set of data be stored in memory
>>
>> OK. I am sorry for the confusion here. I understand the intent now.
> However what i wanted to convey was all the statistics that
> is supported in current DescriptiveStatistics can be supported in Storeless
> variant as well. (For eg: skewness, kurtosis, percentile)

No, for example exact percentiles, or even arbitrary percentiles
(without the quantile - e.g. quartile) specified in advance, can't
be computed without storing the data.  Also, DescriptiveStatistics
supports a rolling window and stats it implements can make use of
multi-pass algorithms. 

>
> Therefore; what i was proposing is to have a common interface that can have
> all these methods too. for eg: (we can change the name if it is needed)
>
> DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends
> StatisticalSummary{
>      getKurtosis();
>      getPercentile();
>      getSkewness();
>      // Add Mutation methods as well
>      addValue(double d);
>      //Provide additional builder methods for injecting custom percentile,
> kurtosis, skewness, variance etc.
>      withPercentile(S Percentile);
>      withKurtosis(S kurtosis);
> }

Per comments above, the contracts of these aggregates are
different.  We have also moved away from defining abstract
interfaces as these end up creating problems when we want to add
things (as in the subject of this thread).

Phil
>
>> The subject of this thread was a proposal to add quartiles to
>> SummaryStatistics, as the new(ish) PSquarePercentile allows those
>> statistics to be computed without storing the data.
>>
>> Agreed. I was just adding points on how we can bring both
> DescriptiveStatistics and SummaryStatistics under a common interface for
> all the stats.
>
>> Phil
>>> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
>>> venkateshamurthyts@gmail.com> wrote:
>>>
>>>> Hi Phil,
>>>>
>>>> Though i did not add to StatisticalSummary i was actually working on a
>>>> DescriptiveStatisticalSummary for all the Storeless variants inclusive
>> of
>>>> PSquarePercentile. Would it help if you can actually implement
>>>> SummaryStatisitcs with an extended interface such as
>>>> DescriptiveStatisticalSummary ? below.
>>>>
>>>> That said i actually wanted to discuss the new storelessvariant of
>>>> descriptive statistics.
>>>> a) DescriptiveStatisticalSummary - an extended interface for
>>>> StatisticalSummary (adds a Generic type that can cater for store full
>> and
>>>> storeless)
>>>> b) DescriptiveStorelessStatistics - Storeless variant of
>>>> DescriptiveStatisitcs
>>>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized wrapper.
>>>>
>>>> Test case classes added to the same.
>>>>
>>>> Please let me know on this i could also accomodate the changes to
>> summary
>>>> stats based on this change here.
>>>> Also please let me know if this could be raised as a jira ticket to
>> pursue.
>>>> Thanks
>>>> Murthy
>>>>
>>>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <ph...@gmail.com>
>>>> wrote:
>>>>
>>>>> Now that we have a "storeless" percentile estimator, we can add
>>>>> quartile computation to SummaryStatistics.  Any objections to my
>>>>> adding this?  I could optionally add a boolean constructor argument
>>>>> to avoid the overhead of maintaining these stats.  Or more
>>>>> generally, add a bitfield encoding the exact set of stats the user
>>>>> wants to maintain.  If there are no objections to the addition, I
>>>>> will open a JIRA.
>>>>>
>>>>> Phil
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by venkatesha murthy <ve...@gmail.com>.
On Tue, Oct 14, 2014 at 6:05 AM, Phil Steitz <ph...@gmail.com> wrote:

> On 10/13/14 1:04 PM, venkatesha murthy wrote:
> > Adding a bit more on this:
> > a) The DescriptiveStatisticalSummary actually handles the rest of the
> > functions such as addValue, getPercentile etc.
> > b) I have added addValue() as it is important to see either storeless or
> > store variants as interfaces.
> > c) A case in point being (for b); i was actually trying out a lockfull
> and
> > a lockfree based variants for descriptive statistical summary and it was
> > very concise/consistent with an interface to use that has all common
> > functions across all variants.
> > d) well lock based or lock free variants are not a part of this patch as
> > iam still working through
> >
> > However i feel the getPercentile can definitely add value. Please let me
> > know if i could turn in all the relevant methods of
> > DescriptiveStorelessStatistics  into statistical summary (such as
> kurtosis,
> > skewness etc..) and then we could just use SummaryStatistics.
>
> I am not sure I understand what you are proposing.  Currently, we
> have two statistical "aggregates" for descriptive univariate stats:
> SummaryStatistics - aggregates "storeless" statistics over a stream
> of data that is not stored in memory
> DescriptiveStatistics - provides an extended set of statistics, some
> of which require that the full set of data be stored in memory
>
> OK. I am sorry for the confusion here. I understand the intent now.
However what i wanted to convey was all the statistics that
is supported in current DescriptiveStatistics can be supported in Storeless
variant as well. (For eg: skewness, kurtosis, percentile)

Therefore; what i was proposing is to have a common interface that can have
all these methods too. for eg: (we can change the name if it is needed)

DescriptiveStatisticalSummary<S extends UnivariateStatistics> extends
StatisticalSummary{
     getKurtosis();
     getPercentile();
     getSkewness();
     // Add Mutation methods as well
     addValue(double d);
     //Provide additional builder methods for injecting custom percentile,
kurtosis, skewness, variance etc.
     withPercentile(S Percentile);
     withKurtosis(S kurtosis);
}

> The subject of this thread was a proposal to add quartiles to
> SummaryStatistics, as the new(ish) PSquarePercentile allows those
> statistics to be computed without storing the data.
>
> Agreed. I was just adding points on how we can bring both
DescriptiveStatistics and SummaryStatistics under a common interface for
all the stats.

> Phil
> >
> > On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
> > venkateshamurthyts@gmail.com> wrote:
> >
> >> Hi Phil,
> >>
> >> Though i did not add to StatisticalSummary i was actually working on a
> >> DescriptiveStatisticalSummary for all the Storeless variants inclusive
> of
> >> PSquarePercentile. Would it help if you can actually implement
> >> SummaryStatisitcs with an extended interface such as
> >> DescriptiveStatisticalSummary ? below.
> >>
> >> That said i actually wanted to discuss the new storelessvariant of
> >> descriptive statistics.
> >> a) DescriptiveStatisticalSummary - an extended interface for
> >> StatisticalSummary (adds a Generic type that can cater for store full
> and
> >> storeless)
> >> b) DescriptiveStorelessStatistics - Storeless variant of
> >> DescriptiveStatisitcs
> >> c) SynchronizedDescriptiveStorelessStatistics - a synchronized wrapper.
> >>
> >> Test case classes added to the same.
> >>
> >> Please let me know on this i could also accomodate the changes to
> summary
> >> stats based on this change here.
> >> Also please let me know if this could be raised as a jira ticket to
> pursue.
> >>
> >> Thanks
> >> Murthy
> >>
> >> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <ph...@gmail.com>
> >> wrote:
> >>
> >>> Now that we have a "storeless" percentile estimator, we can add
> >>> quartile computation to SummaryStatistics.  Any objections to my
> >>> adding this?  I could optionally add a boolean constructor argument
> >>> to avoid the overhead of maintaining these stats.  Or more
> >>> generally, add a bitfield encoding the exact set of stats the user
> >>> wants to maintain.  If there are no objections to the addition, I
> >>> will open a JIRA.
> >>>
> >>> Phil
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by Phil Steitz <ph...@gmail.com>.
On 10/13/14 1:04 PM, venkatesha murthy wrote:
> Adding a bit more on this:
> a) The DescriptiveStatisticalSummary actually handles the rest of the
> functions such as addValue, getPercentile etc.
> b) I have added addValue() as it is important to see either storeless or
> store variants as interfaces.
> c) A case in point being (for b); i was actually trying out a lockfull and
> a lockfree based variants for descriptive statistical summary and it was
> very concise/consistent with an interface to use that has all common
> functions across all variants.
> d) well lock based or lock free variants are not a part of this patch as
> iam still working through
>
> However i feel the getPercentile can definitely add value. Please let me
> know if i could turn in all the relevant methods of
> DescriptiveStorelessStatistics  into statistical summary (such as kurtosis,
> skewness etc..) and then we could just use SummaryStatistics.

I am not sure I understand what you are proposing.  Currently, we
have two statistical "aggregates" for descriptive univariate stats:
SummaryStatistics - aggregates "storeless" statistics over a stream
of data that is not stored in memory
DescriptiveStatistics - provides an extended set of statistics, some
of which require that the full set of data be stored in memory

The subject of this thread was a proposal to add quartiles to
SummaryStatistics, as the new(ish) PSquarePercentile allows those
statistics to be computed without storing the data.

Phil
>
> On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
> venkateshamurthyts@gmail.com> wrote:
>
>> Hi Phil,
>>
>> Though i did not add to StatisticalSummary i was actually working on a
>> DescriptiveStatisticalSummary for all the Storeless variants inclusive of
>> PSquarePercentile. Would it help if you can actually implement
>> SummaryStatisitcs with an extended interface such as
>> DescriptiveStatisticalSummary ? below.
>>
>> That said i actually wanted to discuss the new storelessvariant of
>> descriptive statistics.
>> a) DescriptiveStatisticalSummary - an extended interface for
>> StatisticalSummary (adds a Generic type that can cater for store full and
>> storeless)
>> b) DescriptiveStorelessStatistics - Storeless variant of
>> DescriptiveStatisitcs
>> c) SynchronizedDescriptiveStorelessStatistics - a synchronized wrapper.
>>
>> Test case classes added to the same.
>>
>> Please let me know on this i could also accomodate the changes to summary
>> stats based on this change here.
>> Also please let me know if this could be raised as a jira ticket to pursue.
>>
>> Thanks
>> Murthy
>>
>> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <ph...@gmail.com>
>> wrote:
>>
>>> Now that we have a "storeless" percentile estimator, we can add
>>> quartile computation to SummaryStatistics.  Any objections to my
>>> adding this?  I could optionally add a boolean constructor argument
>>> to avoid the overhead of maintaining these stats.  Or more
>>> generally, add a bitfield encoding the exact set of stats the user
>>> wants to maintain.  If there are no objections to the addition, I
>>> will open a JIRA.
>>>
>>> Phil
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by venkatesha murthy <ve...@gmail.com>.
Adding a bit more on this:
a) The DescriptiveStatisticalSummary actually handles the rest of the
functions such as addValue, getPercentile etc.
b) I have added addValue() as it is important to see either storeless or
store variants as interfaces.
c) A case in point being (for b); i was actually trying out a lockfull and
a lockfree based variants for descriptive statistical summary and it was
very concise/consistent with an interface to use that has all common
functions across all variants.
d) well lock based or lock free variants are not a part of this patch as
iam still working through

However i feel the getPercentile can definitely add value. Please let me
know if i could turn in all the relevant methods of
DescriptiveStorelessStatistics  into statistical summary (such as kurtosis,
skewness etc..) and then we could just use SummaryStatistics.

On Tue, Oct 14, 2014 at 1:15 AM, venkatesha murthy <
venkateshamurthyts@gmail.com> wrote:

> Hi Phil,
>
> Though i did not add to StatisticalSummary i was actually working on a
> DescriptiveStatisticalSummary for all the Storeless variants inclusive of
> PSquarePercentile. Would it help if you can actually implement
> SummaryStatisitcs with an extended interface such as
> DescriptiveStatisticalSummary ? below.
>
> That said i actually wanted to discuss the new storelessvariant of
> descriptive statistics.
> a) DescriptiveStatisticalSummary - an extended interface for
> StatisticalSummary (adds a Generic type that can cater for store full and
> storeless)
> b) DescriptiveStorelessStatistics - Storeless variant of
> DescriptiveStatisitcs
> c) SynchronizedDescriptiveStorelessStatistics - a synchronized wrapper.
>
> Test case classes added to the same.
>
> Please let me know on this i could also accomodate the changes to summary
> stats based on this change here.
> Also please let me know if this could be raised as a jira ticket to pursue.
>
> Thanks
> Murthy
>
> On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <ph...@gmail.com>
> wrote:
>
>> Now that we have a "storeless" percentile estimator, we can add
>> quartile computation to SummaryStatistics.  Any objections to my
>> adding this?  I could optionally add a boolean constructor argument
>> to avoid the overhead of maintaining these stats.  Or more
>> generally, add a bitfield encoding the exact set of stats the user
>> wants to maintain.  If there are no objections to the addition, I
>> will open a JIRA.
>>
>> Phil
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>

Re: [math] Include quartiles estimated using PSquarePercentile in SummaryStatistics

Posted by venkatesha murthy <ve...@gmail.com>.
Hi Phil,

Though i did not add to StatisticalSummary i was actually working on a
DescriptiveStatisticalSummary for all the Storeless variants inclusive of
PSquarePercentile. Would it help if you can actually implement
SummaryStatisitcs with an extended interface such as
DescriptiveStatisticalSummary ? below.

That said i actually wanted to discuss the new storelessvariant of
descriptive statistics.
a) DescriptiveStatisticalSummary - an extended interface for
StatisticalSummary (adds a Generic type that can cater for store full and
storeless)
b) DescriptiveStorelessStatistics - Storeless variant of
DescriptiveStatisitcs
c) SynchronizedDescriptiveStorelessStatistics - a synchronized wrapper.

Test case classes added to the same.

Please let me know on this i could also accomodate the changes to summary
stats based on this change here.
Also please let me know if this could be raised as a jira ticket to pursue.

Thanks
Murthy

On Sat, Oct 11, 2014 at 1:10 AM, Phil Steitz <ph...@gmail.com> wrote:

> Now that we have a "storeless" percentile estimator, we can add
> quartile computation to SummaryStatistics.  Any objections to my
> adding this?  I could optionally add a boolean constructor argument
> to avoid the overhead of maintaining these stats.  Or more
> generally, add a bitfield encoding the exact set of stats the user
> wants to maintain.  If there are no objections to the addition, I
> will open a JIRA.
>
> Phil
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>