You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@steitz.com> on 2009/02/08 21:52:35 UTC

[math] Correlation and Covariance

MATH-114 and MATH-138 propose support for correlation matrices.  I have 
been working on these and would like to propose the following:

Create a new package o.a.c.m.stat.correlation to house intially
    a) Covariance - creates variance-covariance matrix from a matrix 
whose columns represent covariates.  Also includes convenience methods 
that work pairwise on double[] arrays (similar to VectorialCovariance, 
but requiring that the arrays be stored)
    b) PearsonCorrelation - creates Pearson's product-moment correlation 
matrix from either a covariance matrix or a matrix of covariates. Also 
includes methods to return matrices of correlation standard errors and 
p-values (aka significances, i.e. p-value for null hypothesis that the 
coefficient is 0).
    c) SpearmanRankCorrelation - like Pearson's but no covariance matrix 
constructor and using rank correlation. 

To implement c), we need a place for the RankingAlgorithm interface and 
implementations (see MATH-138).   Any suggestions on where to put 
these?  Leaving in correlation may be awkward later on as we do more 
with rank transformations.

I have a) implemented using a fairly stable two-pass algorithm.  I tried 
just using VectorialCovariance, but could not get the accuracy I wanted 
using the one-pass algorithm there.  We should probably at some point 
look at improving the updating formula used there along the lines of 
what we do for Variance, but it is a nice feature of that class that it 
does not require the input vectors to be stored and I would not want to 
see that changed.   For b), similar to the patch in JIRA, I would use 
the R computation from SimpleRegression if working from a matrix, or 
just compute column sigmas and scale directly if working from a 
covariance matrix.

Does this sound good?

If I don't hear any objections, I will commit some code along the lines 
above for us to look at.

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Correlation and Covariance

Posted by Luc Maisonobe <Lu...@free.fr>.
Phil Steitz a écrit :
> MATH-114 and MATH-138 propose support for correlation matrices.  I have
> been working on these and would like to propose the following:
> 
> Create a new package o.a.c.m.stat.correlation to house intially
>    a) Covariance - creates variance-covariance matrix from a matrix
> whose columns represent covariates.  Also includes convenience methods
> that work pairwise on double[] arrays (similar to VectorialCovariance,
> but requiring that the arrays be stored)
>    b) PearsonCorrelation - creates Pearson's product-moment correlation
> matrix from either a covariance matrix or a matrix of covariates. Also
> includes methods to return matrices of correlation standard errors and
> p-values (aka significances, i.e. p-value for null hypothesis that the
> coefficient is 0).
>    c) SpearmanRankCorrelation - like Pearson's but no covariance matrix
> constructor and using rank correlation.
> To implement c), we need a place for the RankingAlgorithm interface and
> implementations (see MATH-138).   Any suggestions on where to put
> these?  Leaving in correlation may be awkward later on as we do more
> with rank transformations.
> 
> I have a) implemented using a fairly stable two-pass algorithm.  I tried
> just using VectorialCovariance, but could not get the accuracy I wanted
> using the one-pass algorithm there.  We should probably at some point
> look at improving the updating formula used there along the lines of
> what we do for Variance, but it is a nice feature of that class that it
> does not require the input vectors to be stored and I would not want to
> see that changed.   For b), similar to the patch in JIRA, I would use
> the R computation from SimpleRegression if working from a matrix, or
> just compute column sigmas and scale directly if working from a
> covariance matrix.

This seems good to me. Perhaps a dedicated "ranking" package under stat
would be fine.

Luc

> 
> Does this sound good?
> 
> If I don't hear any objections, I will commit some code along the lines
> above for us to look at.
> 
> Phil
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org