You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@commons.apache.org by ps...@apache.org on 2010/09/20 03:57:03 UTC
svn commit: r998761 - in /commons/proper/math:
branches/MATH_2_X/src/site/xdoc/userguide/stat.xml
trunk/src/site/xdoc/userguide/stat.xml
Author: psteitz
Date: Mon Sep 20 01:57:03 2010
New Revision: 998761
URL: http://svn.apache.org/viewvc?rev=998761&view=rev
Log:
Fixed errors in multiple regression section. JIRA: MATH-407.
Modified:
commons/proper/math/branches/MATH_2_X/src/site/xdoc/userguide/stat.xml
commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml
Modified: commons/proper/math/branches/MATH_2_X/src/site/xdoc/userguide/stat.xml
URL: http://svn.apache.org/viewvc/commons/proper/math/branches/MATH_2_X/src/site/xdoc/userguide/stat.xml?rev=998761&r1=998760&r2=998761&view=diff
==============================================================================
--- commons/proper/math/branches/MATH_2_X/src/site/xdoc/userguide/stat.xml (original)
+++ commons/proper/math/branches/MATH_2_X/src/site/xdoc/userguide/stat.xml Mon Sep 20 01:57:03 2010
@@ -473,37 +473,47 @@ System.out.println(regression.getSlopeSt
</subsection>
<subsection name="1.5 Multiple linear regression">
<p>
- <a href="../apidocs/org/apache/commons/math/stat/regression/MultipleLinearRegression.html">
- MultipleLinearRegression</a> provides ordinary least squares regression
- with a generic multiple variable linear model, which in matrix notation
- can be expressed as:
+ <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
+ OLSMultipleLinearRegression</a> and
+ <a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
+ GLSMultipleLinearRegression</a> provide least squares regression to fit the linear model:
</p>
<p>
- <code> y=X*b+u </code>
+ <code> Y=X*b+u </code>
</p>
<p>
- where y is an <code>n-vector</code> <b>regressand</b>, X is a <code>[n,k]</code> matrix whose <code>k</code> columns are called
- <b>regressors</b>, b is <code>k-vector</code> of <b>regression parameters</b> and <code>u</code> is an <code>n-vector</code>
- of <b>error terms</b> or <b>residuals</b>. The notation is quite standard in literature,
- cf eg <a href="http://www.econ.queensu.ca/ETM">Davidson and MacKinnon, Econometrics Theory and Methods, 2004</a>.
+ where Y is an n-vector <b>regressand</b>, X is a [n,k] matrix whose k columns are called
+ <b>regressors</b>, b is k-vector of <b>regression parameters</b> and u is an n-vector
+ of <b>error terms</b> or <b>residuals</b>.
</p>
<p>
- Two implementations are provided: <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
- OLSMultipleLinearRegression</a> and
+ <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
+ OLSMultipleLinearRegression</a> provides Ordinary Least Squares Regression, and
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
- GLSMultipleLinearRegression</a>
+ GLSMultipleLinearRegression</a> implements Generalized Least Squares. See the javadoc for these
+ classes for details on the algorithms and forumlas used.
</p>
<p>
- Observations (x,y and covariance data matrices) can be added to the model via the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method.
- The observations are stored in memory until the next time the addData method is invoked.
+ Data for OLS models can be loaded in a single double[] array, consisting of concatenated rows of data, each containing
+ the regressand (Y) value, followed by regressor values; or using a double[][] array with rows corresponding to
+ observations. GLS models also require a double[][] array representing the covariance matrix of the error terms. See
+ <a href="../apidocs/org/apache/commons/math/stat/regression/AbstractMultipleLinearRegression.html#newSampleData(double[], int, int)">
+ AbstractMultipleLinearRegression#newSampleData(double[],int,int)</a>,
+ <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html#newSampleData(double[], double[][])">
+ OLSMultipleLinearRegression#newSampleData(double[], double[][])</a> and
+ <a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html#newSampleData(double[], double[][], double[][])">
+ GLSMultipleLinearRegression#newSampleData(double[],double[][],double[][])</a> for details.
</p>
<p>
<strong>Usage Notes</strong>: <ul>
- <li> Data is validated when invoking the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method and
- <code>IllegalArgumentException</code> is thrown when inappropriate.
+ <li> Data are validated when invoking any of the newSample, newX, newY or newCovariance methods and
+ <code>IllegalArgumentException</code> is thrown when input data arrays do not have matching dimensions
+ or do not contain sufficient data to estimate the model.
</li>
- <li> Only the GLS regressions require the covariance matrix, so in the OLS regression it is ignored and can be safely
- inputted as <code>null</code>.</li>
+ <li> By default, regression models are estimated with intercept terms. In the notation above, this implies that the
+ X matrix contains an initial row identically equal to 1. X data supplied to the newX or newSample methods should not
+ include this column - the data loading methods will create it automatically. To estimate a model without an intercept
+ term, set the <code>noIntercept</code> property to <code>true.</code></li>
</ul>
</p>
<p>
@@ -511,44 +521,48 @@ System.out.println(regression.getSlopeSt
<dl>
<dt>OLS regression</dt>
<br></br>
- <dd>Instantiate an OLS regression object and load dataset
+ <dd>Instantiate an OLS regression object and load a dataset:
<source>
-MultipleLinearRegression regression = new OLSMultipleLinearRegression();
+OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
double[] x = new double[6][];
-x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
-x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
-x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
-x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
-x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
-x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};
-regression.addData(y, x, null); // we don't need covariance
+x[0] = new double[]{0, 0, 0, 0, 0};
+x[1] = new double[]{2.0, 0, 0, 0, 0};
+x[2] = new double[]{0, 3.0, 0, 0, 0};
+x[3] = new double[]{0, 0, 4.0, 0, 0};
+x[4] = new double[]{0, 0, 0, 5.0, 0};
+x[5] = new double[]{0, 0, 0, 0, 6.0};
+regression.newSample(y, x);
</source>
</dd>
- <dd>Estimate of regression values honours the <code>MultipleLinearRegression</code> interface:
+ <dd>Get regression parameters and diagnostics:
<source>
-double[] beta = regression.estimateRegressionParameters();
+double[] beta = regression.estimateRegressionParameters();
double[] residuals = regression.estimateResiduals();
double[][] parametersVariance = regression.estimateRegressionParametersVariance();
double regressandVariance = regression.estimateRegressandVariance();
+
+double rSquared = regression.caclulateRSquared();
+
+double sigma = regression.estimateRegressionStandardError();
</source>
</dd>
<dt>GLS regression</dt>
<br></br>
- <dd>Instantiate an GLS regression object and load dataset
+ <dd>Instantiate a GLS regression object and load a dataset:
<source>
-MultipleLinearRegression regression = new GLSMultipleLinearRegression();
+GLSMultipleLinearRegression regression = new GLSMultipleLinearRegression();
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
double[] x = new double[6][];
-x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
-x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
-x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
-x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
-x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
-x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};
+x[0] = new double[]{0, 0, 0, 0, 0};
+x[1] = new double[]{2.0, 0, 0, 0, 0};
+x[2] = new double[]{0, 3.0, 0, 0, 0};
+x[3] = new double[]{0, 0, 4.0, 0, 0};
+x[4] = new double[]{0, 0, 0, 5.0, 0};
+x[5] = new double[]{0, 0, 0, 0, 6.0};
double[][] omega = new double[6][];
omega[0] = new double[]{1.1, 0, 0, 0, 0, 0};
omega[1] = new double[]{0, 2.2, 0, 0, 0, 0};
@@ -556,12 +570,9 @@ omega[2] = new double[]{0, 0, 3.3, 0, 0,
omega[3] = new double[]{0, 0, 0, 4.4, 0, 0};
omega[4] = new double[]{0, 0, 0, 0, 5.5, 0};
omega[5] = new double[]{0, 0, 0, 0, 0, 6.6};
-regression.addData(y, x, omega); // we do need covariance
+regression.newSampleData(y, x, omega);
</source>
</dd>
- <dd>Estimate of regression values honours the same <code>MultipleLinearRegression</code> interface as
- the OLS regression.
- </dd>
</dl>
</p>
</subsection>
Modified: commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml
URL: http://svn.apache.org/viewvc/commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml?rev=998761&r1=998760&r2=998761&view=diff
==============================================================================
--- commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml (original)
+++ commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml Mon Sep 20 01:57:03 2010
@@ -473,37 +473,47 @@ System.out.println(regression.getSlopeSt
</subsection>
<subsection name="1.5 Multiple linear regression">
<p>
- <a href="../apidocs/org/apache/commons/math/stat/regression/MultipleLinearRegression.html">
- MultipleLinearRegression</a> provides ordinary least squares regression
- with a generic multiple variable linear model, which in matrix notation
- can be expressed as:
+ <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
+ OLSMultipleLinearRegression</a> and
+ <a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
+ GLSMultipleLinearRegression</a> provide least squares regression to fit the linear model:
</p>
<p>
- <code> y=X*b+u </code>
+ <code> Y=X*b+u </code>
</p>
<p>
- where y is an <code>n-vector</code> <b>regressand</b>, X is a <code>[n,k]</code> matrix whose <code>k</code> columns are called
- <b>regressors</b>, b is <code>k-vector</code> of <b>regression parameters</b> and <code>u</code> is an <code>n-vector</code>
- of <b>error terms</b> or <b>residuals</b>. The notation is quite standard in literature,
- cf eg <a href="http://www.econ.queensu.ca/ETM">Davidson and MacKinnon, Econometrics Theory and Methods, 2004</a>.
+ where Y is an n-vector <b>regressand</b>, X is a [n,k] matrix whose k columns are called
+ <b>regressors</b>, b is k-vector of <b>regression parameters</b> and u is an n-vector
+ of <b>error terms</b> or <b>residuals</b>.
</p>
<p>
- Two implementations are provided: <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
- OLSMultipleLinearRegression</a> and
+ <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
+ OLSMultipleLinearRegression</a> provides Ordinary Least Squares Regression, and
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
- GLSMultipleLinearRegression</a>
+ GLSMultipleLinearRegression</a> implements Generalized Least Squares. See the javadoc for these
+ classes for details on the algorithms and forumlas used.
</p>
<p>
- Observations (x,y and covariance data matrices) can be added to the model via the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method.
- The observations are stored in memory until the next time the addData method is invoked.
+ Data for OLS models can be loaded in a single double[] array, consisting of concatenated rows of data, each containing
+ the regressand (Y) value, followed by regressor values; or using a double[][] array with rows corresponding to
+ observations. GLS models also require a double[][] array representing the covariance matrix of the error terms. See
+ <a href="../apidocs/org/apache/commons/math/stat/regression/AbstractMultipleLinearRegression.html#newSampleData(double[], int, int)">
+ AbstractMultipleLinearRegression#newSampleData(double[],int,int)</a>,
+ <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html#newSampleData(double[], double[][])">
+ OLSMultipleLinearRegression#newSampleData(double[], double[][])</a> and
+ <a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html#newSampleData(double[], double[][], double[][])">
+ GLSMultipleLinearRegression#newSampleData(double[],double[][],double[][])</a> for details.
</p>
<p>
<strong>Usage Notes</strong>: <ul>
- <li> Data is validated when invoking the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method and
- <code>IllegalArgumentException</code> is thrown when inappropriate.
+ <li> Data are validated when invoking any of the newSample, newX, newY or newCovariance methods and
+ <code>IllegalArgumentException</code> is thrown when input data arrays do not have matching dimensions
+ or do not contain sufficient data to estimate the model.
</li>
- <li> Only the GLS regressions require the covariance matrix, so in the OLS regression it is ignored and can be safely
- inputted as <code>null</code>.</li>
+ <li> By default, regression models are estimated with intercept terms. In the notation above, this implies that the
+ X matrix contains an initial row identically equal to 1. X data supplied to the newX or newSample methods should not
+ include this column - the data loading methods will create it automatically. To estimate a model without an intercept
+ term, set the <code>noIntercept</code> property to <code>true.</code></li>
</ul>
</p>
<p>
@@ -511,44 +521,48 @@ System.out.println(regression.getSlopeSt
<dl>
<dt>OLS regression</dt>
<br></br>
- <dd>Instantiate an OLS regression object and load dataset
+ <dd>Instantiate an OLS regression object and load a dataset:
<source>
-MultipleLinearRegression regression = new OLSMultipleLinearRegression();
+OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
double[] x = new double[6][];
-x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
-x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
-x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
-x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
-x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
-x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};
-regression.addData(y, x, null); // we don't need covariance
+x[0] = new double[]{0, 0, 0, 0, 0};
+x[1] = new double[]{2.0, 0, 0, 0, 0};
+x[2] = new double[]{0, 3.0, 0, 0, 0};
+x[3] = new double[]{0, 0, 4.0, 0, 0};
+x[4] = new double[]{0, 0, 0, 5.0, 0};
+x[5] = new double[]{0, 0, 0, 0, 6.0};
+regression.newSample(y, x);
</source>
</dd>
- <dd>Estimate of regression values honours the <code>MultipleLinearRegression</code> interface:
+ <dd>Get regression parameters and diagnostics:
<source>
-double[] beta = regression.estimateRegressionParameters();
+double[] beta = regression.estimateRegressionParameters();
double[] residuals = regression.estimateResiduals();
double[][] parametersVariance = regression.estimateRegressionParametersVariance();
double regressandVariance = regression.estimateRegressandVariance();
+
+double rSquared = regression.caclulateRSquared();
+
+double sigma = regression.estimateRegressionStandardError();
</source>
</dd>
<dt>GLS regression</dt>
<br></br>
- <dd>Instantiate an GLS regression object and load dataset
+ <dd>Instantiate a GLS regression object and load a dataset:
<source>
-MultipleLinearRegression regression = new GLSMultipleLinearRegression();
+GLSMultipleLinearRegression regression = new GLSMultipleLinearRegression();
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
double[] x = new double[6][];
-x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
-x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
-x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
-x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
-x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
-x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};
+x[0] = new double[]{0, 0, 0, 0, 0};
+x[1] = new double[]{2.0, 0, 0, 0, 0};
+x[2] = new double[]{0, 3.0, 0, 0, 0};
+x[3] = new double[]{0, 0, 4.0, 0, 0};
+x[4] = new double[]{0, 0, 0, 5.0, 0};
+x[5] = new double[]{0, 0, 0, 0, 6.0};
double[][] omega = new double[6][];
omega[0] = new double[]{1.1, 0, 0, 0, 0, 0};
omega[1] = new double[]{0, 2.2, 0, 0, 0, 0};
@@ -556,12 +570,9 @@ omega[2] = new double[]{0, 0, 3.3, 0, 0,
omega[3] = new double[]{0, 0, 0, 4.4, 0, 0};
omega[4] = new double[]{0, 0, 0, 0, 5.5, 0};
omega[5] = new double[]{0, 0, 0, 0, 0, 6.6};
-regression.addData(y, x, omega); // we do need covariance
+regression.newSampleData(y, x, omega);
</source>
</dd>
- <dd>Estimate of regression values honours the same <code>MultipleLinearRegression</code> interface as
- the OLS regression.
- </dd>
</dl>
</p>
</subsection>