You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Patrick Meyer (JIRA)" <ji...@apache.org> on 2010/12/01 17:13:12 UTC

[jira] Created: (MATH-449) Storeless covariance

Storeless covariance
--------------------

                 Key: MATH-449
                 URL: https://issues.apache.org/jira/browse/MATH-449
             Project: Commons Math
          Issue Type: Improvement
            Reporter: Patrick Meyer


Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.

{code}
public class StorelessCovariance{

    private double deltaX = 0.0;
    private double deltaY = 0.0;
    private double meanX = 0.0;
    private double meanY = 0.0;
    private double N=0;
    private Double covarianceNumerator=0.0;
    private boolean unbiased=true;

    public Covariance(boolean unbiased){
	this.unbiased = unbiased;
    }

    public void increment(Double x, Double y){
        if(x!=null & y!=null){
            N++;
            deltaX = x - meanX;
            deltaY = y - meanY;
            meanX += deltaX/N;
            meanY += deltaY/N;
            covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
        }
        
    }

    public Double getResult(){
        if(unbiased){
            return covarianceNumerator/(N-1.0);
        }else{
            return covarianceNumerator/N;
        }
    }   
}
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Meyer updated MATH-449:
-------------------------------

    Description: 
Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.

{code}
//This code is granted for inclusion in the Apache Commons under the terms of the ASL.

public class StorelessCovariance{

    private double deltaX = 0.0;
    private double deltaY = 0.0;
    private double meanX = 0.0;
    private double meanY = 0.0;
    private double N=0;
    private Double covarianceNumerator=0.0;
    private boolean unbiased=true;

    public Covariance(boolean unbiased){
	this.unbiased = unbiased;
    }

    public void increment(Double x, Double y){
        if(x!=null & y!=null){
            N++;
            deltaX = x - meanX;
            deltaY = y - meanY;
            meanX += deltaX/N;
            meanY += deltaY/N;
            covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
        }
        
    }

    public Double getResult(){
        if(unbiased){
            return covarianceNumerator/(N-1.0);
        }else{
            return covarianceNumerator/N;
        }
    }   
}
{code}

  was:
Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.

{code}
public class StorelessCovariance{

    private double deltaX = 0.0;
    private double deltaY = 0.0;
    private double meanX = 0.0;
    private double meanY = 0.0;
    private double N=0;
    private Double covarianceNumerator=0.0;
    private boolean unbiased=true;

    public Covariance(boolean unbiased){
	this.unbiased = unbiased;
    }

    public void increment(Double x, Double y){
        if(x!=null & y!=null){
            N++;
            deltaX = x - meanX;
            deltaY = y - meanY;
            meanX += deltaX/N;
            meanY += deltaY/N;
            covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
        }
        
    }

    public Double getResult(){
        if(unbiased){
            return covarianceNumerator/(N-1.0);
        }else{
            return covarianceNumerator/N;
        }
    }   
}
{code}


> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Meyer updated MATH-449:
-------------------------------

    Attachment: MATH-449.patch

This patch includes three new classes, StorelessCovariance.java, StorelessCovarianceMatrix.java, and StorelessCovarianceTest.java. For the test cases, I used the same data as in CovarianceTest.java. However, I reduced the accuracy to 10E-7 because the tests failed the Longley data when using 10E-9.

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MATH-449) Storeless covariance

Posted by "Gilles (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilles updated MATH-449:
------------------------

    Assignee: Thomas Neidhart
    
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Thomas Neidhart
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch, MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MATH-449) Storeless covariance

Posted by "Phil Steitz (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz updated MATH-449:
-----------------------------

    Assignee:     (was: Phil Steitz)
    
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MATH-449) Storeless covariance

Posted by "Thomas Neidhart (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved MATH-449.
----------------------------------

    Resolution: Fixed

I was just waiting for feedback from Patrick, but I think we can resolve this issue for now. The open question was related to whether internal details shall be accessible via a public interface, but I think it is safer and cleaner to hide such details.

If there is a need to get access to this in someway we should open a new issue for it imho.
                
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Thomas Neidhart
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch, MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050013#comment-13050013 ] 

Patrick Meyer commented on MATH-449:
------------------------------------

I agree. A new class would be best. Now that I am more familiar with commons math, my code should be changed to use double primitive types instead of Double objects. That use seems more consistent with other descriptive statistics in math.

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211321#comment-13211321 ] 

Patrick Meyer commented on MATH-449:
------------------------------------

I think everything looks fine and hiding the details is fine for now. Thanks!
                
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Thomas Neidhart
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch, MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088390#comment-13088390 ] 

Patrick Meyer commented on MATH-449:
------------------------------------

These changes sound fine to me. I'd be happy to add the javadoc once these changes are made. Do I just add the javadoc comments to the class files? Will subversion pick up the changes on comments?

Thanks,
Patrick

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049850#comment-13049850 ] 

Phil Steitz commented on MATH-449:
----------------------------------

Can you please either add a comment indicating the code pasted to the ticket is granted for inclusion under terms of the ASL or add it as an attachment and check the "for inclusion" box?  Also, some unit tests would be great, including validation tests against the stored data version.



> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051765#comment-13051765 ] 

Phil Steitz commented on MATH-449:
----------------------------------

Right, we tend to use primitives.  Don't hesitate to ask if you have any questions about making patches, etc.  Thanks!

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MATH-449) Storeless covariance

Posted by "Gilles (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilles updated MATH-449:
------------------------

    Fix Version/s:     (was: 3.1)
                   3.0

Waiting for an imminent patch by Patrick Meyer.
                
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Thomas Neidhart (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208702#comment-13208702 ] 

Thomas Neidhart commented on MATH-449:
--------------------------------------

Patch applied together with additional changes in r1244667.

Thanks for it Patrick!
                
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch, MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088719#comment-13088719 ] 

Patrick Meyer commented on MATH-449:
------------------------------------

I like all of these ideas. When I wrote the patch, I didn't know if forcing a square matrix was preferred, so I wrote it more generally. A square matrix is fine with me. 

Incrementing the full vector of new values is definitely the safest way to do it. However, it forces the user into listwise deletion if a case has any missing data. The more granular version allows a user to implement pairwise deletion. Nether option is a great way to handle missing data, but do we want to force one approach on the user? Is there way to increment the full vector of values and account for missing data on one or more variables?

Thanks,
Patrick

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Thomas Neidhart (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209670#comment-13209670 ] 

Thomas Neidhart commented on MATH-449:
--------------------------------------

I had missed some suggestions from Phil at first, and have committed them in r1245133.

The changes include:

* Drop setEntry and incrementCovariance. Rename incrementRow to increment and have that the only mutator. (/)
* Replace colDimension and rowDimension with just dimension, forcing the matrix to be square. (/)
* Store only upper triangular BivariateCovariances. (/)
** Add a transpose method to StorelessBivariateCovariance so getEntry returns something that can be further 
incemented properly. (x)
* Add symmetry tests (/)
* Change getCovariance to return the actual covariance double value instead of the Storelessxxxx object (/)
* make StorelessBivariateCovariance package private as it is not used outside StorelessCovariance (/)

Returning the inner StorelessBivariateCovariance elements is dangerous as incrementing them individually could break the symmetry due to the way they are now stored internally (as upper triangular matrix). Adding a transpose method to achieve this somehow as Phil described is at least not obvious to me.

As there seems to be no actual use of the inner elements anyway, this has been dropped so far.
Do you agree with the changes made so far?
                
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Thomas Neidhart
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch, MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088913#comment-13088913 ] 

Patrick Meyer commented on MATH-449:
------------------------------------

I like that idea. It's probably the best way to handle it. However, in looking back at the regular Covariance class, it only provides for listwise deletion. Should we reconsider treatment of missing data in Covariance and StorelessCovariance so that the implementations are similar? We should probably give the user an option for treatment of missing data. The cov() function in R has an option for casewise or pairwise deletion but it looks like only casewise is available for the Pearson correlation. Missing data for Spearman's correlation is handled through the ranking procedure.

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088317#comment-13088317 ] 

Phil Steitz commented on MATH-449:
----------------------------------

Thanks for the patch!

Definitely a useful addition.  Looking carefully at the code, I think the following would be good:

1. StorelessCovarianceMatrix really corresponds to Covariance.  The current implementations of Covariance and PearssonsCorrelation are really matrix-valued.  So I think StorelessCovarianceMatrix should be called StorelsssCovariance and what is now StorelessCovariance should be BivariateStorelessCovariance.  (Of course, one could argue that it is the current classes that are misnamed.  If people feel strongly that is the case, we can discuss changing those names and creating bivariate versions.  In any case, we should be consistent.) 

2. I think StorelessCovariance (the matrix version) should extend Covariance.  This should work, just omitting array/matrix constructors and overriding getMatrix as it implements it now.  The advantage of this is that it can then be used, for example, to create a correlation matrix using the method exposed by PearsonsCorrelation.

3. We need to fill in the missing javadoc.

Thanks again for the patch.  I will take care of the items above if there are no objections and no one beats me to it.


> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088843#comment-13088843 ] 

Phil Steitz commented on MATH-449:
----------------------------------

We could allow NaNs in the input vectors and skip updating the bivariate covariances for pairs including a NaN.

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Gilles (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211163#comment-13211163 ] 

Gilles commented on MATH-449:
-----------------------------

Thomas,

It is not necessary to resolve this issue before 3.0. Thus you could postpone to 3.1 if more discussion is needed to clarify the missing parts.

                
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Thomas Neidhart
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch, MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz updated MATH-449:
-----------------------------

    Fix Version/s:     (was: 3.0)
                   3.1

Pushing out to 3.1, awaiting patch

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049943#comment-13049943 ] 

Patrick Meyer commented on MATH-449:
------------------------------------

I've added the comment to the code. If you have better language for the comment, pleas send it to me and I will include it.

Do you have any suggestions for how to best integrate this code into the Covariance class? It's not so easy given that the class allows for computation of a covariance matrix.

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049956#comment-13049956 ] 

Phil Steitz commented on MATH-449:
----------------------------------

I am leaning toward just adding a new class called StorelessCovariance and similar for StorelessPearsonsCorrelation that just provide bivariate statistics.  You will notice that in fact the PersoansCorrelation class delegates to SimpleRegression which does the bivariate computation "storelessly" already.

Thanks for adding the comment and for the contribution!

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088996#comment-13088996 ] 

Phil Steitz commented on MATH-449:
----------------------------------

Good point on the stored data version.  This is really our first foray into meaningful management of missing data and now is  a great time to start dealing with it.  In the correlation package, at this point, we can fairly easily support either or both casewise or pairwise "deletion" so it is probably best to make it configurable. Also, we need to agree on and advertise the fact that NaNs should be used to signal missing data.  Lets start by implementing things this way in the new storeless covariance classes and then open new tickets to add support for missing data in first the rest of the correlation package and then regression.

One thing that is bugging me a little is convincing myself that if we allow pairwise deletion, the covariance matrix will be legitimate (i.e. have all of the analytical properties associated with a cov matrix).  Also, are there negative implications that I have not thought about to using NaNs to signal missing data.   

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MATH-449) Storeless covariance

Posted by "Patrick Meyer (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Patrick Meyer updated MATH-449:
-------------------------------

    Attachment: MATH-449.patch

This patch adds comments to the latest version of the StorelessCovariance and BivariateStorelessCovariance classes.
                
> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>         Attachments: MATH-449.patch, MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phil Steitz updated MATH-449:
-----------------------------

    Fix Version/s: 3.0

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>             Fix For: 3.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088421#comment-13088421 ] 

Phil Steitz commented on MATH-449:
----------------------------------

Code has been committed in r1160026 with the following additional changes (beyond refactoring above):

1. Eliminated use of deprecated MathRuntimeException (I know, this usage was copied from Covariance, which also needs to be fixed.  I will do that.)
2. Changed N to n as field name and other minor formatting
3. Changed error message to INSUFFICIENT_OBSERVED_POINTS_IN_SAMPLE from INSUFFICIENT_DIMENSIONS for insufficient data in the bivariate case.
4. Eliminated the try-catch blocks in getData, getCovarianceMatrix in StorelessCovariance (renamed).  These both advertise and throw IllegalArgumentException and I saw no reason to wrap what was being caught and rethrown.
5. Renamed getEntry, setEntry, incrementEntry to xXCovariance in StorelessCovariance.

Some more notes: 

In making StorelessCovariance extend Covariance, I had to override getN to throw UnsupportedOperationException, since there is no global N defined in the storeless implementation. I guess alternatively, we could return the min among the bivariate covariances.  More on this below.

I did not improve or merge the tests, which we should also eventually do - i.e., make StorlessCovarianceTest extend CovarianceTest, refactoring the base class tests so they can be applied to the subclass.  To do this, we need to feed the data incrementally to the storeless version, separating the data provisioning from validation in the tests.

Sorry I did not notice this before, but we need to do something about the potential lack of integrity of the (virtual) covariance matrix exposed.  Currently, incrementing (i,j) does nothing to (j,i) so the virtual matrix is not even guaranteed to be symmetric.  I thought about dispatching calls to ensure this, but when you think about all of the constraints involved in ensuring what we expose is a valid covariance matrix, I am leaning toward recommending instead that we do not allow the granular, pairwise incrementing or individual bivariate covariance setters; but instead keep only the increment method that requires a full vector of new values.  This will guarantee data integrity (and also as a bonus, restore meaningfulness of getN).  What do you think about this?  If we don't do it, we need to at least ensure minimally that the matrix is symmetric and we should probably remove the setEntry method.  My recommendation is that we 

a) Drop setEntry and incrementCovariance.  Rename incrementRow to increment and have that the only mutator.
b) Replace colDimension and rowDimension with just dimension, forcing the matrix to be square.
c) Store only upper triangular BivariateCovariances.  Add a transpose method to StorelessBivariateCovariance so getEntry returns something that can be further incemented properly.
d) Add symmetry tests




> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-449) Storeless covariance

Posted by "Phil Steitz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088398#comment-13088398 ] 

Phil Steitz commented on MATH-449:
----------------------------------

I will commit the new classes, refactored per last comments, and then you can then submit a new patch against the changed sources including the missing javadoc.  Just remember to svn update before you add the javadoc.  Yes, svn diff picks up every change you make to your local checkout.

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay (2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf]. I have provided a simple class for implementing this algorithm. It would be nice to have this integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira