You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@steitz.com> on 2004/08/15 20:05:45 UTC

Re: [math] Only sample variances?

Kim van der Linde wrote:
> Hi,
> 
> I just looked through the variance class, and saw that
> it only supports sample vasriance (N-1) but not
> population variances (N). Any reason for this, and can
> it be added?

It would probably be better to add this to one of the aggregates or 
StatUtils or to add a new statistic altother. This is because a 
UnivariateStatistic can only report one value. The most commonly used (at 
least in statistical applications) value for Variance and Standard 
Deviation is the sample statistic, which is why that is what the 
statistics named "StandardDeviation" and "Variance" report. From a user's 
perspective, its not too hard to convert; but I agree that it would be 
convenient to add these. Probably best to make new statistics in the 
moment subpackage called "PopulationXxx".  Patches welcome ;-)

Phil


> 
> Cheers,
> 
> Kim
> 
> 
> 		
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - 50x more storage than other providers!
> http://promotions.yahoo.com/new_mail
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Only sample variances?

Posted by Kim van der Linde <ki...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu>
wrote:

> Well, I think should step back and ask a few design
> questions concerning the objects that will use these

> Sample/Population variances and that will assist us 
> in their own design.
> 
> 1.) Is it the case that a covariance matrix could be
> built off of "either" Sample or Population
Variances?

Yes. With the remark that the whole matrix is filled
with either sample (co-)variances or population
(co-)variances.

> 2.) Are there other applications of Sample/Pop
> Variances which we want to implement, if so what are

> they? Are they interchangeable in these cases?
>
> 3.) Do we want to add methods to the
> Descriptive/Summary/StatUtils stats 
> to capture both cases?

I can not answer these two questions. However, I do
know that you can calculate any method that uses
(co-)variances with either population or sample
estimates. So, my suggestion would be to incorperate
it such a way that it deploys a default (my preference
would be sample) but leaves the option open to use
population versions instead, without calling a
complete new class. Essentially, as soon as you go
with the population variance, all derived methods have
to go with that to, including correlations,
regressions pca, GLM etc.
 
> What this and the Remedian case are somewhat
> convincing me of is that, in the SummaryStatistics 
> case; you need to know what your want before you 
> start adding values to the Statistic, which
> constitutes a sort of configuration environment, 
> while in the "DescriptiveStatistics" case, one can 
> choose these aspects afterward, as the statistic is 
> calculated after all the values are known.
>
> This means that you either have to calculate both
> the PopulationVariance and SampleVariance in the 
> SummaryStatistics case, or configure it to use one
or
> the other. While in the DescriptiveStatistics case, 
> you can just call the appropriate method to return 
> that statistic.

If you want to set it very blunt, the only difference
is the N/(N-1) (or reciprocal of that) factor, which
always can be added.

That is also why I think that incorporating is the
best way.

With the median, this might be different.

Cheers,

Kim


		
__________________________________
Do you Yahoo!?
Yahoo! Mail - You care about security. So do we.
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Only sample variances?

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Well, I think should step back and ask a few design questions concerning 
the objects that will use these Sample/Population variances and that 
will assist us in their own design.

1.) Is it the case that a covariance matrix could be built off of 
"either" Sample or Population Variances?

2.) Are there other applications of Sample/Pop Variances which we want 
to implement, if so what are they? Are they interchangeable in these cases?

3.) Do we want to add methods to the Descriptive/Summary/StatUtils stats 
to capture both cases?

What this and the Remedian case are somewhat convincing me of is that, 
in the SummaryStatistics case; you need to know what your want before 
you start adding values to the Statistic, which constitutes a sort of 
configuration environment, while in the "DescriptiveStatistics" case, 
one can choose these aspects afterward, as the statistic is calculated 
after all the values are known.

This means that you either have to calculate both the PopulationVariance 
and SampleVariance in the SummaryStatistics case, or configure it to use 
one or the other. While in the DescriptiveStatistics case, you can just 
call the appropriate method to return that statistic.

-Mark

> Hi MArk,
> 
> I think we have to think very carefully about this.
> Especially when we start including covariances. My old
> textbooks give the formula as population estimates,
> just like excell (no choice, only population).
> However, covariance matrices include the sample
> covariances....
> 
> Cheers,
> 
> Kim

Phil Steitz wrote:
> Mark R. Diggory wrote:
> 
>> Yes, at the UnivariateStatistic level, these would need to be new 
>> classes. My question as well is "Does it apply as well to higher order 
>> moments?"
> 
> 
> In theory, yes, though I have never seen non-bias-corrected versions of 
> Skewness and Kurtosis used.  The current formulas are all defined for 
> the most common use case where the data represent a sample from a 
> population whose true distribution and associated parameters are 
> unknown.population The formulas that we use provide unbiased estimators 
> for population parameters in this case.  This is explained fairly well 
> for the Variance here:
> http://mathworld.wolfram.com/Variance.html
> and for Skewness and Kurtosis here:
> http://mathworld.wolfram.com/k-Statistic.html
> 
> The "Population Variance" is useful when the data *are* the population 
> (i.e. the distribution is discrete and there is no sampling going on).  
> I am not aware of use cases where Skewness and Kurtosis are useful in 
> analyzing full population data or other uses for the non-bias-corrected 
> versions of these.  These could exist, I am just not aware of them.
> 
>>
>> Maybe we should place everything into the following packages:
> 
> 
> I don't think we need yet another subpackage.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Only sample variances?

Posted by Phil Steitz <ph...@steitz.com>.
Mark R. Diggory wrote:
> Yes, at the UnivariateStatistic level, these would need to be new 
> classes. My question as well is "Does it apply as well to higher order 
> moments?"

In theory, yes, though I have never seen non-bias-corrected versions of 
Skewness and Kurtosis used.  The current formulas are all defined for the 
most common use case where the data represent a sample from a population 
whose true distribution and associated parameters are unknown.population 
The formulas that we use provide unbiased estimators for population 
parameters in this case.  This is explained fairly well for the Variance here:
http://mathworld.wolfram.com/Variance.html
and for Skewness and Kurtosis here:
http://mathworld.wolfram.com/k-Statistic.html

The "Population Variance" is useful when the data *are* the population 
(i.e. the distribution is discrete and there is no sampling going on).  I 
am not aware of use cases where Skewness and Kurtosis are useful in 
analyzing full population data or other uses for the non-bias-corrected 
versions of these.  These could exist, I am just not aware of them.

> 
> Maybe we should place everything into the following packages:

I don't think we need yet another subpackage.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Only sample variances?

Posted by Kim van der Linde <ki...@yahoo.com>.
Hi MArk,

I think we have to think very carefully about this.
Especially when we start including covariances. My old
textbooks give the formula as population estimates,
just like excell (no choice, only population).
However, covariance matrices include the sample
covariances....

Cheers,

Kim
--- "Mark R. Diggory" <md...@latte.harvard.edu>
wrote:

> Yes, at the UnivariateStatistic level, these would
> need to be new 
> classes. My question as well is "Does it apply as
> well to higher order 
> moments?"
> 
> Maybe we should place everything into the following
> packages:
> 
> o.a.c.m.stat.univariate.moment.sample
> o.a.c.m.stat.univariate.moment.population
> 
> -Mark
> 
> Phil Steitz wrote:
> 
> > Kim van der Linde wrote:
> > 
> >> Hi,
> >>
> >> I just looked through the variance class, and saw
> that
> >> it only supports sample vasriance (N-1) but not
> >> population variances (N). Any reason for this,
> and can
> >> it be added?
> > 
> > 
> > It would probably be better to add this to one of
> the aggregates or 
> > StatUtils or to add a new statistic altother. This
> is because a 
> > UnivariateStatistic can only report one value. The
> most commonly used 
> > (at least in statistical applications) value for
> Variance and Standard 
> > Deviation is the sample statistic, which is why
> that is what the 
> > statistics named "StandardDeviation" and
> "Variance" report. From a 
> > user's perspective, its not too hard to convert;
> but I agree that it 
> > would be convenient to add these. Probably best to
> make new statistics 
> > in the moment subpackage called "PopulationXxx". 
> Patches welcome ;-)
> > 
> > Phil
> > 
> > 
> >>
> >> Cheers,
> >>
> >> Kim
> >>
> >>
> >>        
> >> __________________________________
> >> Do you Yahoo!?
> >> Yahoo! Mail - 50x more storage than other
> providers!
> >> http://promotions.yahoo.com/new_mail
> >>
> >>
>
---------------------------------------------------------------------
> >> To unsubscribe, e-mail:
> commons-dev-unsubscribe@jakarta.apache.org
> >> For additional commands, e-mail:
> commons-dev-help@jakarta.apache.org
> >>
> > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> commons-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> commons-dev-help@jakarta.apache.org
> > 
> 
> -- 
> Mark Diggory
> Software Developer
> Harvard MIT Data Center
> http://www.hmdc.harvard.edu
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> commons-dev-help@jakarta.apache.org
> 
> 



		
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Only sample variances?

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Yes, at the UnivariateStatistic level, these would need to be new 
classes. My question as well is "Does it apply as well to higher order 
moments?"

Maybe we should place everything into the following packages:

o.a.c.m.stat.univariate.moment.sample
o.a.c.m.stat.univariate.moment.population

-Mark

Phil Steitz wrote:

> Kim van der Linde wrote:
> 
>> Hi,
>>
>> I just looked through the variance class, and saw that
>> it only supports sample vasriance (N-1) but not
>> population variances (N). Any reason for this, and can
>> it be added?
> 
> 
> It would probably be better to add this to one of the aggregates or 
> StatUtils or to add a new statistic altother. This is because a 
> UnivariateStatistic can only report one value. The most commonly used 
> (at least in statistical applications) value for Variance and Standard 
> Deviation is the sample statistic, which is why that is what the 
> statistics named "StandardDeviation" and "Variance" report. From a 
> user's perspective, its not too hard to convert; but I agree that it 
> would be convenient to add these. Probably best to make new statistics 
> in the moment subpackage called "PopulationXxx".  Patches welcome ;-)
> 
> Phil
> 
> 
>>
>> Cheers,
>>
>> Kim
>>
>>
>>        
>> __________________________________
>> Do you Yahoo!?
>> Yahoo! Mail - 50x more storage than other providers!
>> http://promotions.yahoo.com/new_mail
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Only sample variances?

Posted by Phil Steitz <ph...@steitz.com>.
Brent Worden wrote:
> Any objection to tracking these and future feature requests using the wiki?
> A wish list, if you will.

+1  Up to now, I have been using task.xml, which is relatively up to date. 
  I would be fine with moving this stuff to the wiki, starting with the 
stuff marked "Post 1.0" on task.xml


> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Only sample variances?

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Wiki or Bug tracking sounds good to me.

Brent Worden wrote:

> That's not a problem.  I can add it quite quickly.
> 
> Also, don't think the wiki will be the only avenue for you to give us
> suggestions.  Any method you choose to express your desires to the [math]
> team is acceptable.  I just want a place to collect these requests to help
> guide development for future releases.
> 
> Lastly, if you choose to contribute to the wiki, deep development experience
> is definitely not needed.  Only the ability to markup text with wiki tags is
> required.
> 
> Brent Worden
> 
> -----Original Message-----
> From: Kim van der Linde [mailto:kimvdlinde@yahoo.com]
> Sent: Sunday, August 15, 2004 8:46 PM
> To: Jakarta Commons Developers List
> Subject: RE: [math] Only sample variances?
> 
> 
> Besides that, there is no maths part there (yet?)
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail Address AutoComplete - You start. We finish.
> http://promotions.yahoo.com/new_mail
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [math] Only sample variances?

Posted by Brent Worden <br...@worden.org>.
That's not a problem.  I can add it quite quickly.

Also, don't think the wiki will be the only avenue for you to give us
suggestions.  Any method you choose to express your desires to the [math]
team is acceptable.  I just want a place to collect these requests to help
guide development for future releases.

Lastly, if you choose to contribute to the wiki, deep development experience
is definitely not needed.  Only the ability to markup text with wiki tags is
required.

Brent Worden

-----Original Message-----
From: Kim van der Linde [mailto:kimvdlinde@yahoo.com]
Sent: Sunday, August 15, 2004 8:46 PM
To: Jakarta Commons Developers List
Subject: RE: [math] Only sample variances?


Besides that, there is no maths part there (yet?)



__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [math] Only sample variances?

Posted by Kim van der Linde <ki...@yahoo.com>.
Besides that, there is no maths part there (yet?)


		
__________________________________
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [math] Only sample variances?

Posted by Al Chou <ho...@yahoo.com>.
--- Brent Worden <br...@worden.org> wrote:
> Any objection to tracking these and future feature requests using the wiki?
> A wish list, if you will.

+1

We should formally note somewhere that that's the procedure we follow.


Al


> -----Original Message-----
> From: Phil Steitz [mailto:phil@steitz.com]
> Sent: Sunday, August 15, 2004 1:06 PM
> To: Jakarta Commons Developers List
> Subject: Re: [math] Only sample variances?
> 
> 
> Kim van der Linde wrote:
> > Hi,
> >
> > I just looked through the variance class, and saw that
> > it only supports sample vasriance (N-1) but not
> > population variances (N). Any reason for this, and can
> > it be added?
> 
> It would probably be better to add this to one of the aggregates or
> StatUtils or to add a new statistic altother. This is because a
> UnivariateStatistic can only report one value. The most commonly used (at
> least in statistical applications) value for Variance and Standard
> Deviation is the sample statistic, which is why that is what the
> statistics named "StandardDeviation" and "Variance" report. From a user's
> perspective, its not too hard to convert; but I agree that it would be
> convenient to add these. Probably best to make new statistics in the
> moment subpackage called "PopulationXxx".  Patches welcome ;-)
> 
> Phil
> 
> 
> >
> > Cheers,
> >
> > Kim

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [math] Only sample variances?

Posted by Kim van der Linde <ki...@yahoo.com>.
--- Brent Worden <br...@worden.org> wrote:

> Any objection to tracking these and future feature
> requests using the wiki?

Yes, because I am not one of those hard core
developers.

Kim


		
__________________________________
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [math] Only sample variances?

Posted by Brent Worden <br...@worden.org>.
Any objection to tracking these and future feature requests using the wiki?
A wish list, if you will.

-----Original Message-----
From: Phil Steitz [mailto:phil@steitz.com]
Sent: Sunday, August 15, 2004 1:06 PM
To: Jakarta Commons Developers List
Subject: Re: [math] Only sample variances?


Kim van der Linde wrote:
> Hi,
>
> I just looked through the variance class, and saw that
> it only supports sample vasriance (N-1) but not
> population variances (N). Any reason for this, and can
> it be added?

It would probably be better to add this to one of the aggregates or
StatUtils or to add a new statistic altother. This is because a
UnivariateStatistic can only report one value. The most commonly used (at
least in statistical applications) value for Variance and Standard
Deviation is the sample statistic, which is why that is what the
statistics named "StandardDeviation" and "Variance" report. From a user's
perspective, its not too hard to convert; but I agree that it would be
convenient to add these. Probably best to make new statistics in the
moment subpackage called "PopulationXxx".  Patches welcome ;-)

Phil


>
> Cheers,
>
> Kim
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - 50x more storage than other providers!
> http://promotions.yahoo.com/new_mail
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org