You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@gmail.com> on 2008/02/10 01:49:07 UTC

[math] Multivariate (vector) stats WAS: Re: [math] sum of logs in summary statistics

On Feb 9, 2008 11:14 AM, Luc Maisonobe <Lu...@free.fr> wrote:
> Phil Steitz a écrit :
>
> > On Feb 8, 2008 7:00 AM,  <lu...@free.fr> wrote:
> >> luc.maisonobe@free.fr wrote:
> >>
> >>> In addition to the statistics required by the StatisticalSummary interface it
> >>> implements, the SummaryStatistics class computes the sum of squares and the
> >>> sum
> >>> of logs. It also has setters and getters for the underlying statistics
> >>> implementations. However, it does not provide a getSumlg method.
> >> The sum of logs is also not used in the equals, hash and toString methods.
> >>
> >> Luc
> >>
> >>
> >>> Should the sum of logs computation be deprecated or a getSumlg method added ?
> >>>
> >
> > Interesting.  This is likely a result of refactoring several years
> > back when the geometric mean computation used the sum of logs
> > instance.  Now it does not, so it is either wasted computation or
> > something of value not exposed to the user.  Makes sense to me to add
> > getSumLog to SummaryStatistics.  It doesn't need to be included in
> > equals or hashcode since geo mean + N equivalence implies log sum
> > equivalence.
> >
> > Looking again at the code, I now see it as stupid that geometricMean
> > in SummaryStatistics does not use the sumOfLogs instance.  If
> > geometricMean exposed a setter for its internally wrapped sumOfLogs
> > instance, we could just set that in SummaryStatistics and only
> > increment the sumOfLogs instance.  It would probably also be an
> > improvment for geometricMean to expose a setter for this.
> >
> > If there are no objections, I will go ahead and make these changes.
> > Thanks for pointing this out, luc.
>
> Go ahead with that.
>
> If you could also have a glimpse at the multivariate summary statistics
> I added yesterday, I would be happy. During my paid work day (in the
> space industry), I am in the process of switching several projects from
> Mantissa to [math] and needed this feature.
>
> I am aware this was really done in a hurry. I have tried to be as
> compliant with univariate statistics as possible, but may have
> completely missed something. I have reused the VectorialCovariance class
> I commited one year ago, but it does not follow the general architecture
> of other statistics. I would really like to have your thoughts about this.
>

The code looks fine to me.  I have a couple of comments on the API.

1) I edited the javadoc for MultivariateSummaryStatistics to make it
clearer what was actually being computed / reported.  If you are OK
with these changes, we can replicate to the interface definition.
2) It might be more convenient for users to have the implementation
setters just take a single instance, rather than an array.  I can't
think of a use case where one would want to use different
implementations of the same statistic here. (so, e.g.
setMeanImpl(StorelessUnivariateStatistic[] meanImpl) becomes just
setMeanImpl(StorelessUnivariateStatistic meanImpl) and we take care of
the cloning).
3) I am not sure StatisticalMultivariateSummaryValues is necessary /
valuable enough to be included.  The analog for univariate was added
to be input into tests, etc, but I am not sure this one is that
useful.  Do you need / use this or can you see use cases for it?  If
not, I would say omit it and we can always add later if someone wants
it.

Might be good also to add a reference in the user guide just to let
people know its there.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Multivariate (vector) stats WAS: Re: [math] sum of logs in summary statistics

Posted by Luc Maisonobe <Lu...@free.fr>.
Phil Steitz a écrit :
> On Feb 9, 2008 5:49 PM, Phil Steitz <ph...@gmail.com> wrote:
>> On Feb 9, 2008 11:14 AM, Luc Maisonobe <Lu...@free.fr> wrote:
>>> Phil Steitz a écrit :
>>>
>>>> On Feb 8, 2008 7:00 AM,  <lu...@free.fr> wrote:
>>>>> luc.maisonobe@free.fr wrote:
>>>>>
>>>>>> In addition to the statistics required by the StatisticalSummary interface it
>>>>>> implements, the SummaryStatistics class computes the sum of squares and the
>>>>>> sum
>>>>>> of logs. It also has setters and getters for the underlying statistics
>>>>>> implementations. However, it does not provide a getSumlg method.
>>>>> The sum of logs is also not used in the equals, hash and toString methods.
>>>>>
>>>>> Luc
>>>>>
>>>>>
>>>>>> Should the sum of logs computation be deprecated or a getSumlg method added ?
>>>>>>
>>>> Interesting.  This is likely a result of refactoring several years
>>>> back when the geometric mean computation used the sum of logs
>>>> instance.  Now it does not, so it is either wasted computation or
>>>> something of value not exposed to the user.  Makes sense to me to add
>>>> getSumLog to SummaryStatistics.  It doesn't need to be included in
>>>> equals or hashcode since geo mean + N equivalence implies log sum
>>>> equivalence.
>>>>
>>>> Looking again at the code, I now see it as stupid that geometricMean
>>>> in SummaryStatistics does not use the sumOfLogs instance.  If
>>>> geometricMean exposed a setter for its internally wrapped sumOfLogs
>>>> instance, we could just set that in SummaryStatistics and only
>>>> increment the sumOfLogs instance.  It would probably also be an
>>>> improvment for geometricMean to expose a setter for this.
>>>>
>>>> If there are no objections, I will go ahead and make these changes.
>>>> Thanks for pointing this out, luc.
>>> Go ahead with that.
>>>
>>> If you could also have a glimpse at the multivariate summary statistics
>>> I added yesterday, I would be happy. During my paid work day (in the
>>> space industry), I am in the process of switching several projects from
>>> Mantissa to [math] and needed this feature.
>>>
>>> I am aware this was really done in a hurry. I have tried to be as
>>> compliant with univariate statistics as possible, but may have
>>> completely missed something. I have reused the VectorialCovariance class
>>> I commited one year ago, but it does not follow the general architecture
>>> of other statistics. I would really like to have your thoughts about this.
>>>
>> The code looks fine to me.  I have a couple of comments on the API.
>>
>> 1) I edited the javadoc for MultivariateSummaryStatistics to make it
>> clearer what was actually being computed / reported.  If you are OK
>> with these changes, we can replicate to the interface definition.

Yes, these changes are fine. I will copy the new javadoc to interface.

>> 2) It might be more convenient for users to have the implementation
>> setters just take a single instance, rather than an array.  I can't
>> think of a use case where one would want to use different
>> implementations of the same statistic here. (so, e.g.
>> setMeanImpl(StorelessUnivariateStatistic[] meanImpl) becomes just
>> setMeanImpl(StorelessUnivariateStatistic meanImpl) and we take care of
>> the cloning).
> 
> Ugh.  Just realized that there is no reason to believe we can clone arbitrary
> StorelessUnivariateStatistics, so this may not be possible.

I agree. So we should rather keep the arrays.

> 
>> 3) I am not sure StatisticalMultivariateSummaryValues is necessary /
>> valuable enough to be included.  The analog for univariate was added
>> to be input into tests, etc, but I am not sure this one is that
>> useful.  Do you need / use this or can you see use cases for it?  If
>> not, I would say omit it and we can always add later if someone wants
>> it.

I included it only for consistency. I don't need it and it does a lot of 
copying, which bother me. I'll remove it.

>>
>> Might be good also to add a reference in the user guide just to let
>> people know its there.

Yes. I forgot to do this. I'll write something.

Thanks
Luc

>>
>> Phil
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Multivariate (vector) stats WAS: Re: [math] sum of logs in summary statistics

Posted by Phil Steitz <ph...@gmail.com>.
On Feb 9, 2008 5:49 PM, Phil Steitz <ph...@gmail.com> wrote:
> On Feb 9, 2008 11:14 AM, Luc Maisonobe <Lu...@free.fr> wrote:
> > Phil Steitz a écrit :
> >
> > > On Feb 8, 2008 7:00 AM,  <lu...@free.fr> wrote:
> > >> luc.maisonobe@free.fr wrote:
> > >>
> > >>> In addition to the statistics required by the StatisticalSummary interface it
> > >>> implements, the SummaryStatistics class computes the sum of squares and the
> > >>> sum
> > >>> of logs. It also has setters and getters for the underlying statistics
> > >>> implementations. However, it does not provide a getSumlg method.
> > >> The sum of logs is also not used in the equals, hash and toString methods.
> > >>
> > >> Luc
> > >>
> > >>
> > >>> Should the sum of logs computation be deprecated or a getSumlg method added ?
> > >>>
> > >
> > > Interesting.  This is likely a result of refactoring several years
> > > back when the geometric mean computation used the sum of logs
> > > instance.  Now it does not, so it is either wasted computation or
> > > something of value not exposed to the user.  Makes sense to me to add
> > > getSumLog to SummaryStatistics.  It doesn't need to be included in
> > > equals or hashcode since geo mean + N equivalence implies log sum
> > > equivalence.
> > >
> > > Looking again at the code, I now see it as stupid that geometricMean
> > > in SummaryStatistics does not use the sumOfLogs instance.  If
> > > geometricMean exposed a setter for its internally wrapped sumOfLogs
> > > instance, we could just set that in SummaryStatistics and only
> > > increment the sumOfLogs instance.  It would probably also be an
> > > improvment for geometricMean to expose a setter for this.
> > >
> > > If there are no objections, I will go ahead and make these changes.
> > > Thanks for pointing this out, luc.
> >
> > Go ahead with that.
> >
> > If you could also have a glimpse at the multivariate summary statistics
> > I added yesterday, I would be happy. During my paid work day (in the
> > space industry), I am in the process of switching several projects from
> > Mantissa to [math] and needed this feature.
> >
> > I am aware this was really done in a hurry. I have tried to be as
> > compliant with univariate statistics as possible, but may have
> > completely missed something. I have reused the VectorialCovariance class
> > I commited one year ago, but it does not follow the general architecture
> > of other statistics. I would really like to have your thoughts about this.
> >
>
> The code looks fine to me.  I have a couple of comments on the API.
>
> 1) I edited the javadoc for MultivariateSummaryStatistics to make it
> clearer what was actually being computed / reported.  If you are OK
> with these changes, we can replicate to the interface definition.
> 2) It might be more convenient for users to have the implementation
> setters just take a single instance, rather than an array.  I can't
> think of a use case where one would want to use different
> implementations of the same statistic here. (so, e.g.
> setMeanImpl(StorelessUnivariateStatistic[] meanImpl) becomes just
> setMeanImpl(StorelessUnivariateStatistic meanImpl) and we take care of
> the cloning).

Ugh.  Just realized that there is no reason to believe we can clone arbitrary
StorelessUnivariateStatistics, so this may not be possible.

> 3) I am not sure StatisticalMultivariateSummaryValues is necessary /
> valuable enough to be included.  The analog for univariate was added
> to be input into tests, etc, but I am not sure this one is that
> useful.  Do you need / use this or can you see use cases for it?  If
> not, I would say omit it and we can always add later if someone wants
> it.
>
> Might be good also to add a reference in the user guide just to let
> people know its there.
>
> Phil
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org