You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by ps...@apache.org on 2004/04/25 21:16:14 UTC

cvs commit: jakarta-commons/math/xdocs/userguide stat.xml

psteitz     2004/04/25 12:16:13

  Modified:    math/xdocs/userguide stat.xml
  Log:
  Added BivariateRegression section.
  
  Revision  Changes    Path
  1.14      +101 -3    jakarta-commons/math/xdocs/userguide/stat.xml
  
  Index: stat.xml
  ===================================================================
  RCS file: /home/cvs/jakarta-commons/math/xdocs/userguide/stat.xml,v
  retrieving revision 1.13
  retrieving revision 1.14
  diff -u -r1.13 -r1.14
  --- stat.xml	21 Mar 2004 20:32:50 -0000	1.13
  +++ stat.xml	25 Apr 2004 19:16:13 -0000	1.14
  @@ -240,8 +240,106 @@
         </p>                  
         </subsection>
         <subsection name="1.4 Bivariate regression" href="regression">
  -        <p>This is yet to be written. Any contributions will be gratefully
  -          accepted!</p>
  +        <p>
  +         <a href="../apidocs/org/apache/commons/math/stat/multivariate/BivariateRegression.html">
  +          org.apache.commons.math.stat.multivariate.BivariateRegression</a>
  +          provides ordinary least squares regression with one independent variable., estimating
  +          the linear model:
  +         </p>
  +         <p>
  +           <code> y = intercept + slope * x  </code>
  +         </p>
  +         <p>
  +           Standard errors for <code>intercept</code> and <code>slope</code> are 
  +           available as well as ANOVA, r-square and Pearson's r statistics.
  +         </p>
  +         <p>
  +           Observations (x,y pairs) can be added to the model one at a time or they 
  +           can be provided in a 2-dimensional array.  The observations are not stored
  +           in memory, so there is no limit to the number of observations that can be
  +           added to the model. 
  +         </p>
  +         <p>
  +           <strong>Usage Notes</strong>: <ul>
  +           <li> When there are fewer than two observations in the model, or when
  +            there is no variation in the x values (i.e. all x values are the same) 
  +            all statistics return <code>NaN</code>.  At least two observations with
  +            different x coordinates are requred to estimate a bivariate regression 
  +            model.</li>
  +           <li> getters for the statistics always compute values based on the current
  +           set of observations -- i.e., you can get statistics, then add more data
  +           and get updated statistics without using a new instance.  There is no 
  +           "compute" method that updates all statistics.  Each of the getters performs
  +           the necessary computations to return the requested statistic.</li>
  +          </ul>
  +        </p>
  +        <p>
  +           <strong>Implementation Notes</strong>: <ul>
  +           <li> As observations are added to the model, the sum of x values, y values,
  +           cross products (x times y), and squared deviations of x and y from their 
  +           respective means are updated using updating formulas defined in 
  +           "Algorithms for Computing the Sample Variance: Analysis and
  +           Recommendations", Chan, T.F., Golub, G.H., and LeVeque, R.J. 
  +           1983, American Statistician, vol. 37, pp. 242-247, referenced in
  +           Weisberg, S. "Applied Linear Regression". 2nd Ed. 1985.  All regression
  +           statistics are computed from these sums.</li>
  +           <li> Inference statistics (confidence intervals, parameter significance levels)
  +           are based on on the assumption that the observations included in the model are 
  +           drawn from a <a href="http://mathworld.wolfram.com/BivariateNormalDistribution.html">
  +           Bivariate Normal Distribution</a></li>
  +          </ul>
  +        </p>
  +        <p>
  +        Here is are some examples.
  +        <dl>
  +          <dt>Estimate a model based on observations added one at a time</dt>
  +          <br></br>
  +          <dd>Instantiate a regression instance and add data points
  +          <source>
  + regression = new BivariateRegression();
  + regression.addData(1d, 2d);
  + // At this point, with only one observation, all regression statistics will return NaN
  + regression.addData(3d, 3d);
  + // With only two observations, slope and intercept can be computed
  + // but inference statistics will return NaN
  + regression.addData(3d, 3d);
  + // Now all statistics are defined.
  +         </source>
  +         </dd>
  +         <dd>Compute some statistics based on observations added so far
  +         <source>
  +System.out.println(regression.getIntercept());   // displays intercept of regression line
  +System.out.println(regression.getSlope());       // displays slope of regression line
  +System.out.println(regression.getSlopeStdErr()); // displays slope standard error
  +         </source>
  +         </dd>
  +         <dd>Use the regression model to predict the y value for a new x value
  +         <source>
  +System.out.println(regression.predict(1.5d)      // displays predicted y value for x = 1.5
  +         </source>
  +         More data points can be added and subsequent getXXX calls will incorporate
  +         additional data in statistics.
  +         </dd>
  +         <dt>Estimate a model from a double[][] array of data points</dt>
  +          <br></br>
  +          <dd>Instantiate a regression object and load dataset
  +          <source>
  +          double[][] data = { { 1, 3 }, {2, 5 }, {3, 7 }, {4, 14 }, {5, 11 }};
  +          BivariateRegression regression = new BivariateRegression();
  +          regression.addData(data);
  +          </source>
  +          </dd>
  +          <dd>Estimate regression model based on data
  +         <source>
  +System.out.println(regression.getIntercept());   // displays intercept of regression line
  +System.out.println(regression.getSlope());       // displays slope of regression line
  +System.out.println(regression.getSlopeStdErr()); // displays slope standard error
  +         </source>
  +         More data points -- even another double[][] array -- can be added and subsequent 
  +         getXXX calls will incorporate additional data in statistics.
  +         </dd>
  +         </dl>
  +        </p>
         </subsection>
         <subsection name="1.5 Statistical tests" href="tests">
           <p>This is yet to be written. Any contributions will be gratefully
  
  
  

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org