You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@commons.apache.org by tn...@apache.org on 2013/11/02 22:02:13 UTC

svn commit: r1538282 - /commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml

Author: tn
Date: Sat Nov  2 21:02:13 2013
New Revision: 1538282

URL: http://svn.apache.org/r1538282
Log:
Add recently added features to the userguide.

Modified:
    commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml

Modified: commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml
URL: http://svn.apache.org/viewvc/commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml?rev=1538282&r1=1538281&r2=1538282&view=diff
==============================================================================
--- commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml (original)
+++ commons/proper/math/trunk/src/site/xdoc/userguide/stat.xml Sat Nov  2 21:02:13 2013
@@ -32,13 +32,13 @@
           and t-, chi-square and ANOVA test statistics.
         </p>
         <p>
-         <a href="#a1.2_Descriptive_statistics">Descriptive statistics</a><br></br>
-         <a href="#a1.3_Frequency_distributions">Frequency distributions</a><br></br>
-         <a href="#a1.4_Simple_regression">Simple Regression</a><br></br>
-         <a href="#a1.5_Multiple_linear_regression">Multiple Regression</a><br></br>
-         <a href="#a1.6_Rank_transformations">Rank transformations</a><br></br>
-         <a href="#a1.7_Covariance_and_correlation">Covariance and correlation</a><br></br>
-         <a href="#a1.8_Statistical_tests">Statistical Tests</a><br></br>
+         <a href="#a1.2_Descriptive_statistics">Descriptive statistics</a><br/>
+         <a href="#a1.3_Frequency_distributions">Frequency distributions</a><br/>
+         <a href="#a1.4_Simple_regression">Simple Regression</a><br/>
+         <a href="#a1.5_Multiple_linear_regression">Multiple Regression</a><br/>
+         <a href="#a1.6_Rank_transformations">Rank transformations</a><br/>
+         <a href="#a1.7_Covariance_and_correlation">Covariance and correlation</a><br/>
+         <a href="#a1.8_Statistical_tests">Statistical Tests</a><br/>
         </p>
       </subsection>
       <subsection name="1.2 Descriptive statistics">
@@ -154,7 +154,7 @@
           Here are some examples showing how to compute Descriptive statistics.
           <dl>
           <dt>Compute summary statistics for a list of double values</dt>
-          <br></br>
+          <br/>
           <dd>Using the <code>DescriptiveStatistics</code> aggregate
           (values are stored in memory):
         <source>
@@ -206,7 +206,7 @@ mean = StatUtils.mean(values, 0, 3);
         </dd>
         <dt>Maintain a "rolling mean" of the most recent 100 values from
         an input stream</dt>
-        <br></br>
+        <br/>
         <dd>Use a <code>DescriptiveStatistics</code> instance with
         window size set to 100
         <source>
@@ -311,7 +311,7 @@ double totalSampleSum = aggregatedStats.
           Here are some examples.
           <dl>
           <dt>Compute a frequency distribution based on integer values</dt>
-          <br></br>
+          <br/>
           <dd>Mixing integers, longs, Integers and Longs:
           <source>
  Frequency f = new Frequency();
@@ -328,7 +328,7 @@ double totalSampleSum = aggregatedStats.
           </source>
           </dd>
           <dt>Count string frequencies</dt>
-          <br></br>
+          <br/>
           <dd>Using case-sensitive comparison, alpha sort order (natural comparator):
           <source>
 Frequency f = new Frequency();
@@ -455,7 +455,7 @@ System.out.println(regression.predict(1.
          More data points can be added and subsequent getXxx calls will incorporate
          additional data in statistics.
          </dd>
-         <br></br>
+         <br/>
          <dt>Estimate a model from a double[][] array of data points</dt>
           <dd>Instantiate a regression object and load dataset
           <source>
@@ -478,7 +478,7 @@ System.out.println(regression.getSlopeSt
          More data points -- even another double[][] array -- can be added and subsequent
          getXxx calls will incorporate additional data in statistics.
          </dd>
-<br></br>
+<br/>
          <dt>Estimate a model from a double[][] array of data points, <em>excluding</em> the intercept</dt>
           <dd>Instantiate a regression object and load dataset
           <source>
@@ -558,7 +558,7 @@ System.out.println(regression.getInterce
         Here are some examples.
         <dl>
          <dt>OLS regression</dt>
-          <br></br>
+          <br/>
           <dd>Instantiate an OLS regression object and load a dataset:
           <source>
 OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
@@ -589,7 +589,7 @@ double sigma = regression.estimateRegres
          </source>
          </dd>
          <dt>GLS regression</dt>
-          <br></br>
+          <br/>
           <dd>Instantiate a GLS regression object and load a dataset:
           <source>
 GLSMultipleLinearRegression regression = new GLSMultipleLinearRegression();
@@ -664,17 +664,19 @@ new NaturalRanking(NaNStrategy.REMOVED,T
           <a href="../apidocs/org/apache/commons/math3/stat/correlation/Covariance.html">
           Covariance</a> computes covariances, 
           <a href="../apidocs/org/apache/commons/math3/stat/correlation/PearsonsCorrelation.html">
-          PearsonsCorrelation</a> provides Pearson's Product-Moment correlation coefficients and
+          PearsonsCorrelation</a> provides Pearson's Product-Moment correlation coefficients,
           <a href="../apidocs/org/apache/commons/math3/stat/correlation/SpearmansCorrelation.html">
-          SpearmansCorrelation</a> computes Spearman's rank correlation.
+          SpearmansCorrelation</a> computes Spearman's rank correlation and
+          <a href="../apidocs/org/apache/commons/math3/stat/correlation/KendallsCorrelation.html">
+          KendallsCorrelation</a> computes Kendall's tau rank correlation.
         </p>
         <p>
           <strong>Implementation Notes</strong>
           <ul>
           <li>
-            Unbiased covariances are given by the formula <br></br>
-            <code>cov(X, Y) = sum [(x<sub>i</sub> - E(X))(y<sub>i</sub> - E(Y))] / (n - 1)</code>
-            where <code>E(X)</code> is the mean of <code>X</code> and <code>E(Y)</code>
+           Unbiased covariances are given by the formula <br/>
+           <code>cov(X, Y) = sum [(x<sub>i</sub> - E(X))(y<sub>i</sub> - E(Y))] / (n - 1)</code>
+           where <code>E(X)</code> is the mean of <code>X</code> and <code>E(Y)</code>
            is the mean of the <code>Y</code> values. Non-bias-corrected estimates use 
            <code>n</code> in place of <code>n - 1.</code>  Whether or not covariances are
            bias-corrected is determined by the optional parameter, "biasCorrected," which
@@ -682,7 +684,7 @@ new NaturalRanking(NaNStrategy.REMOVED,T
           </li>
           <li>
           <a href="../apidocs/org/apache/commons/math3/stat/correlation/PearsonsCorrelation.html">
-          PearsonsCorrelation</a> computes correlations defined by the formula <br></br>
+          PearsonsCorrelation</a> computes correlations defined by the formula <br/>
           <code>cor(X, Y) = sum[(x<sub>i</sub> - E(X))(y<sub>i</sub> - E(Y))] / [(n - 1)s(X)s(Y)]</code><br/>
           where <code>E(X)</code> and <code>E(Y)</code> are means of <code>X</code> and <code>Y</code>
           and <code>s(X)</code>, <code>s(Y)</code> are standard deviations.
@@ -693,6 +695,11 @@ new NaturalRanking(NaNStrategy.REMOVED,T
           correlation on the ranked data.  The ranking algorithm is configurable. By default, 
           <a href="../apidocs/org/apache/commons/math3/stat/ranking/NaturalRanking.html">
           NaturalRanking</a> with default strategies for handling ties and NaN values is used.
+          </li>
+          <li>
+          <a href="../apidocs/org/apache/commons/math3/stat/correlation/KendallsCorrelation.html">
+          KendallsCorrelation</a> computes the association between two measured quantities. A tau test
+          is a non-parametric hypothesis test for statistical dependence based on the tau coefficient.
           </li> 
           </ul>
         </p>
@@ -700,7 +707,7 @@ new NaturalRanking(NaNStrategy.REMOVED,T
         <strong>Examples:</strong>
         <dl>
           <dt><strong>Covariance of 2 arrays</strong></dt>
-          <br></br>
+          <br/>
           <dd>To compute the unbiased covariance between 2 double arrays,
           <code>x</code> and <code>y</code>, use:
           <source>
@@ -711,9 +718,9 @@ new Covariance().covariance(x, y)
 covariance(x, y, false)
           </source>
           </dd>
-          <br></br>
+          <br/>
           <dt><strong>Covariance matrix</strong></dt>
-          <br></br>
+          <br/>
           <dd> A covariance matrix over the columns of a source matrix <code>data</code>
           can be computed using
           <source>
@@ -726,18 +733,18 @@ new Covariance().computeCovarianceMatrix
 computeCovarianceMatrix(data, false)
          </source>
           </dd>
-           <br></br>
+           <br/>
           <dt><strong>Pearson's correlation of 2 arrays</strong></dt>
-          <br></br>
+          <br/>
           <dd>To compute the Pearson's product-moment correlation between two double arrays
           <code>x</code> and <code>y</code>, use:
           <source>
 new PearsonsCorrelation().correlation(x, y)
           </source>
           </dd>
-          <br></br>
+          <br/>
           <dt><strong>Pearson's correlation matrix</strong></dt>
-          <br></br>
+          <br/>
           <dd> A (Pearson's) correlation matrix over the columns of a source matrix <code>data</code>
           can be computed using
           <source>
@@ -746,9 +753,9 @@ new PearsonsCorrelation().computeCorrela
           The i-jth entry of the returned matrix is the Pearson's product-moment correlation between the
           ith and jth columns of <code>data.</code> 
           </dd>
-           <br></br>
+          <br/>
           <dt><strong>Pearson's correlation significance and standard errors</strong></dt>
-          <br></br>
+          <br/>
           <dd> To compute standard errors and/or significances of correlation coefficients
           associated with Pearson's correlation coefficients, start by creating a
           <code>PearsonsCorrelation</code> instance
@@ -771,22 +778,22 @@ correlation.getCorrelationPValues()
           </source>
           <code>getCorrelationPValues().getEntry(i,j)</code> is the
           probability that a random variable distributed as <code>t<sub>n-2</sub></code> takes
-           a value with absolute value greater than or equal to <br></br>
-           <code>|r<sub>ij</sub>|((n - 2) / (1 - r<sub>ij</sub><sup>2</sup>))<sup>1/2</sup></code>,
-           where <code>r<sub>ij</sub></code> is the estimated correlation between the ith and jth
-           columns of the source array or RealMatrix. This is sometimes referred to as the 
-           <i>significance</i> of the coefficient.<br/><br/>
-           For example, if <code>data</code> is a RealMatrix with 2 columns and 10 rows, then 
-           <source>
+          a value with absolute value greater than or equal to <br/>
+          <code>|r<sub>ij</sub>|((n - 2) / (1 - r<sub>ij</sub><sup>2</sup>))<sup>1/2</sup></code>,
+          where <code>r<sub>ij</sub></code> is the estimated correlation between the ith and jth
+          columns of the source array or RealMatrix. This is sometimes referred to as the 
+          <i>significance</i> of the coefficient.<br/><br/>
+          For example, if <code>data</code> is a RealMatrix with 2 columns and 10 rows, then 
+          <source>
 new PearsonsCorrelation(data).getCorrelationPValues().getEntry(0,1)
-           </source>
-           is the significance of the Pearson's correlation coefficient between the two columns
-           of <code>data</code>.  If this value is less than .01, we can say that the correlation
-           between the two columns of data is significant at the 99% level.
+          </source>
+          is the significance of the Pearson's correlation coefficient between the two columns
+          of <code>data</code>.  If this value is less than .01, we can say that the correlation
+          between the two columns of data is significant at the 99% level.
           </dd>
-           <br></br>
+          <br/>
           <dt><strong>Spearman's rank correlation coefficient</strong></dt>
-          <br></br>
+          <br/>
           <dd>To compute the Spearman's rank-moment correlation between two double arrays
           <code>x</code> and <code>y</code>:
           <source>
@@ -798,7 +805,15 @@ RankingAlgorithm ranking = new NaturalRa
 new PearsonsCorrelation().correlation(ranking.rank(x), ranking.rank(y))
           </source>
           </dd>
-           <br></br>
+          <br/>     
+          <dt><strong>Kendalls's tau rank correlation coefficient</strong></dt>
+          <br/>
+          <dd>To compute the Kendall's tau rank correlation between two double arrays
+          <code>x</code> and <code>y</code>:
+          <source>
+new KendallsCorrelation().correlation(x, y)
+          </source>
+          </dd>
         </dl>
         </p>
       </subsection>
@@ -814,9 +829,11 @@ new PearsonsCorrelation().correlation(ra
           <a href="http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm">
           One-Way ANOVA</a>,
           <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc35.htm">
-          Mann-Whitney U</a> and
+          Mann-Whitney U</a>,
           <a href="http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test">
-          Wilcoxon signed rank</a> test statistics as well as
+          Wilcoxon signed rank</a> and
+          <a href="http://en.wikipedia.org/wiki/Binomial_test">
+          Binomial</a> test statistics as well as
           <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
           p-values</a> associated with <code>t-</code>,
           <code>Chi-Square</code>, <code>G</code>, <code>One-Way ANOVA</code>, <code>Mann-Whitney U</code>
@@ -830,9 +847,11 @@ new PearsonsCorrelation().correlation(ra
           <a href="../apidocs/org/apache/commons/math3/stat/inference/OneWayAnova.html">
           OneWayAnova</a>,
           <a href="../apidocs/org/apache/commons/math3/stat/inference/MannWhitneyUTest.html">
-          MannWhitneyUTest</a>, and
+          MannWhitneyUTest</a>,
           <a href="../apidocs/org/apache/commons/math3/stat/inference/WilcoxonSignedRankTest.html">
-          WilcoxonSignedRankTest</a>.          
+          WilcoxonSignedRankTest</a> and
+          <a href="../apidocs/org/apache/commons/math3/stat/inference/BinomialTest.html">
+          BinomialTest</a>.                    
           The <a href="../apidocs/org/apache/commons/math3/stat/inference/TestUtils.html">
           TestUtils</a> class provides static methods to get test instances or
           to compute test statistics directly.  The examples below all use the
@@ -886,7 +905,7 @@ new PearsonsCorrelation().correlation(ra
         <strong>Examples:</strong>
         <dl>
           <dt><strong>One-sample <code>t</code> tests</strong></dt>
-          <br></br>
+          <br/>
           <dd>To compare the mean of a double[] array to a fixed value:
           <source>
 double[] observed = {1d, 2d, 3d};
@@ -932,9 +951,9 @@ TestUtils.tTest(mu, observed, alpha);
           To test, for example at the 95% level of confidence, use
           <code>alpha = 0.05</code>
           </dd>
-          <br></br>
+          <br/>
           <dt><strong>Two-Sample t-tests</strong></dt>
-          <br></br>
+          <br/>
           <dd><strong>Example 1:</strong> Paired test evaluating
           the null hypothesis that the mean difference between corresponding
           (paired) elements of the <code>double[]</code> arrays
@@ -1005,9 +1024,9 @@ TestUtils.tTest(sample1, sample2, .05);
            replace "t" at the beginning of the method name with "homoscedasticT"
            </p>
            </dd>
-           <br></br>
+           <br/>
           <dt><strong>Chi-square tests</strong></dt>
-          <br></br>
+          <br/>
           <dd>To compute a chi-square statistic measuring the agreement between a
           <code>long[]</code> array of observed counts and a <code>double[]</code>
           array of expected counts, use:
@@ -1043,7 +1062,7 @@ TestUtils.chiSquareTest(expected, observ
 TestUtils.chiSquareTest(counts);
           </source>
           The rows of the 2-way table are
-          <code>count[0], ... , count[count.length - 1]. </code><br></br>
+          <code>count[0], ... , count[count.length - 1]. </code><br/>
           The chi-square statistic returned is
           <code>sum((counts[i][j] - expected[i][j])^2/expected[i][j])</code>
           where the sum is taken over all table entries and
@@ -1066,9 +1085,9 @@ TestUtils.chiSquareTest(counts, alpha);
           The boolean value returned will be <code>true</code> iff the null
           hypothesis can be rejected with confidence <code>1 - alpha</code>.
           </dd>
-          <br></br>
+          <br/>
           <dt><strong>G tests</strong></dt>
-          <br></br>
+          <br/>
           <dd>G tests are an alternative to chi-square tests that are recommended
           when observed counts are small and / or incidence probabilities for
           some cells are small. See Ted Dunning's paper,
@@ -1077,8 +1096,8 @@ TestUtils.chiSquareTest(counts, alpha);
           background and an empirical analysis showing now chi-square
           statistics can be misleading in the presence of low incidence probabilities.
           This paper also derives the formulas used in computing G statistics and the
-          root log likelihood ratio provided by the <code>GTest</code> class.</dd>
-          <dd>
+          root log likelihood ratio provided by the <code>GTest</code> class.
+          </dd>
           <dd>To compute a G-test statistic measuring the agreement between a
           <code>long[]</code> array of observed counts and a <code>double[]</code>
           array of expected counts, use:
@@ -1090,13 +1109,13 @@ System.out.println(TestUtils.g(expected,
           the value displayed will be
           <code>2 * sum(observed[i]) * log(observed[i]/expected[i])</code>
           </dd>
-          <dd> To get the p-value associated with the null hypothesis that
+          <dd>To get the p-value associated with the null hypothesis that
           <code>observed</code> conforms to <code>expected</code> use:
           <source>
 TestUtils.gTest(expected, observed);
           </source>
           </dd>
-          <dd> To test the null hypothesis that <code>observed</code> conforms to
+          <dd>To test the null hypothesis that <code>observed</code> conforms to
           <code>expected</code> with <code>alpha</code> siginficance level
           (equiv. <code>100 * (1-alpha)%</code> confidence) where <code>
           0 &lt; alpha &lt; 1 </code> use:
@@ -1128,9 +1147,10 @@ new GTest().rootLogLikelihoodRatio(5, 19
           returns the root log likelihood associated with the null hypothesis that A 
           and B are independent.
           </dd>
-          <br></br>
+          <br/>
           <dt><strong>One-Way ANOVA tests</strong></dt>
-          <br></br>
+          <br/>
+          <dd>
           <source>
 double[] classA =
    {93.0, 103.0, 95.0, 101.0, 91.0, 105.0, 96.0, 94.0, 101.0 };