You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by ps...@apache.org on 2004/08/02 06:20:09 UTC
cvs commit: jakarta-commons/math/xdocs/userguide stat.xml
psteitz 2004/08/01 21:20:09
Modified: math/src/java/org/apache/commons/math/stat/inference
TTest.java TTestImpl.java
math/src/test/org/apache/commons/math/stat/inference
TTestTest.java
math/xdocs/userguide stat.xml
Log:
Removed boolean equalVariances flag from t-test API.
Revision Changes Path
1.7 +337 -161 jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTest.java
Index: TTest.java
===================================================================
RCS file: /home/cvs/jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTest.java,v
retrieving revision 1.6
retrieving revision 1.7
diff -u -r1.6 -r1.7
--- TTest.java 23 Jun 2004 16:26:14 -0000 1.6
+++ TTest.java 2 Aug 2004 04:20:08 -0000 1.7
@@ -20,12 +20,30 @@
/**
* An interface for Student's t-tests.
+ * <p>
+ * Tests can be:<ul>
+ * <li>One-sample or two-sample</li>
+ * <li>One-sided or two-sided</li>
+ * <li>Paired or unpaired (for two-sample tests)</li>
+ * <li>Homoscedastic (equal variance assumption) or heteroscedastic
+ * (for two sample tests)</li>
+ * <li>Fixed significance level (boolean-valued) or returning p-values.
+ * </li></ul>
+ * <p>
+ * Test statistics are available for all tests. Methods including "Test" in
+ * in their names perform tests, all other methods return t-statistics. Among
+ * the "Test" methods, <code>double-</code>valued methods return p-values;
+ * <code>boolean-</code>valued methods perform fixed significance level tests.
+ * Significance levels are always specified as numbers between 0 and 0.5
+ * (e.g. tests at the 95% level use <code>alpha=0.05</code>).
+ * <p>
+ * Input to tests can be either <code>double[]</code> arrays or
+ * {@link StatisticalSummary} instances.
+ *
*
* @version $Revision$ $Date$
*/
public interface TTest {
-
-
/**
* Computes a paired, 2-sample t-statistic based on the data in the input
* arrays. The t-statistic returned is equivalent to what would be returned by
@@ -46,13 +64,11 @@
* @throws MathException if the statistic can not be computed do to a
* convergence or other numerical error.
*/
- double pairedT(double[] sample1, double[] sample2)
- throws IllegalArgumentException, MathException;
-
+ public abstract double pairedT(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a paired, two-sample, two-tailed t-test
+ * <i> p-value</i>, associated with a paired, two-sample, two-tailed t-test
* based on the data in the input arrays.
* <p>
* The number returned is the smallest significance level
@@ -83,11 +99,10 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double pairedTTest(double[] sample1, double[] sample2)
- throws IllegalArgumentException, MathException;
-
+ public abstract double pairedTTest(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException;
/**
- * Performs a paired t-test</a> evaluating the null hypothesis that the
+ * Performs a paired t-test evaluating the null hypothesis that the
* mean of the paired differences between <code>sample1</code> and
* <code>sample2</code> is 0 in favor of the two-sided alternative that the
* mean paired difference is not equal to 0, with significance level
@@ -118,9 +133,11 @@
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- boolean pairedTTest(double[] sample1, double[] sample2, double alpha)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean pairedTTest(
+ double[] sample1,
+ double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm#formula">
* t statistic </a> given observed values and a comparison constant.
@@ -136,9 +153,8 @@
* @return t statistic
* @throws IllegalArgumentException if input array length is less than 2
*/
- double t(double mu, double[] observed)
- throws IllegalArgumentException;
-
+ public abstract double t(double mu, double[] observed)
+ throws IllegalArgumentException;
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm#formula">
* t statistic </a> to use in comparing the mean of the dataset described by
@@ -155,19 +171,19 @@
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
- double t(double mu, StatisticalSummary sampleStats)
- throws IllegalArgumentException;
-
+ public abstract double t(double mu, StatisticalSummary sampleStats)
+ throws IllegalArgumentException;
/**
- * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * 2-sample t statistic. </a>
+ * Computes a 2-sample t statistic, under the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic without the
+ * equal variances hypothesis, use {@link #t(double[], double[])}.
* <p>
- * This statistic can be used to perform a two-sample t-test to compare
- * sample means.
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
+ * The t-statisitc is
* <p>
- * (1) <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
+ * <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
@@ -181,9 +197,35 @@
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
- * If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The observed array lengths must both be at least 2.
+ * </li></ul>
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public abstract double homoscedasticT(double[] sample1, double[] sample2)
+ throws IllegalArgumentException;
+ /**
+ * Computes a 2-sample t statistic, without the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic assuming equal
+ * variances, use {@link #homoscedasticT(double[], double[])}.
* <p>
- * (2) <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
+ * This statistic can be used to perform a two-sample t-test to compare
+ * sample means.
+ * <p>
+ * The t-statisitc is
+ * <p>
+ * <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
+ * <p>
+ * where <strong><code>n1</code></strong> is the size of the first sample
+ * <strong><code> n2</code></strong> is the size of the second sample;
+ * <strong><code> m1</code></strong> is the mean of the first sample;
+ * <strong><code> m2</code></strong> is the mean of the second sample;
+ * <strong><code> var1</code></strong> is the variance of the first sample;
+ * <strong><code> var2</code></strong> is the variance of the second sample;
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
@@ -191,32 +233,64 @@
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
- * @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
- * @throws MathException if the statistic can not be computed do to a
- * convergence or other numerical error.
*/
- double t(double[] sample1, double[] sample2, boolean equalVariances)
- throws IllegalArgumentException, MathException;
-
+ public abstract double t(double[] sample1, double[] sample2)
+ throws IllegalArgumentException;
/**
- * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * 2-sample t statistic </a>, comparing the means of the datasets described
- * by two {@link StatisticalSummary} instances.
+ * Computes a 2-sample t statistic </a>, comparing the means of the datasets
+ * described by two {@link StatisticalSummary} instances, without the
+ * assumption of equal subpopulation variances. Use
+ * {@link #homoscedasticT(StatisticalSummary, StatisticalSummary)} to
+ * compute a t-statistic under the equal variances assumption.
* <p>
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
+ * The returned t-statisitc is
+ * <p>
+ * <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* <p>
- * (1) <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
+ * where <strong><code>n1</code></strong> is the size of the first sample;
+ * <strong><code> n2</code></strong> is the size of the second sample;
+ * <strong><code> m1</code></strong> is the mean of the first sample;
+ * <strong><code> m2</code></strong> is the mean of the second sample
+ * <strong><code> var1</code></strong> is the variance of the first sample;
+ * <strong><code> var2</code></strong> is the variance of the second sample
+ * <p>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The datasets described by the two Univariates must each contain
+ * at least 2 observations.
+ * </li></ul>
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public abstract double t(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException;
+ /**
+ * Computes a 2-sample t statistic, comparing the means of the datasets
+ * described by two {@link StatisticalSummary} instances, under the
+ * assumption of equal subpopulation variances. To compute a t-statistic
+ * without the equal variances assumption, use
+ * {@link #t(StatisticalSummary, StatisticalSummary)}.
+ * <p>
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
+ * <p>
+ * The t-statisitc returned is
+ * <p>
+ * <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
* <strong><code> m1</code></strong> is the mean of first sample;
- * <strong><code> m2</code></strong> is the mean of second sample</li>
- * </ul>
+ * <strong><code> m2</code></strong> is the mean of second sample
* and <strong><code>var</code></strong> is the pooled variance estimate:
* <p>
* <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
@@ -224,10 +298,6 @@
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
- * If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
- * <p>
- * (2) <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
- * <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
* at least 2 observations.
@@ -235,18 +305,16 @@
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
- * @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
- double t(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- boolean equalVariances)
- throws IllegalArgumentException;
-
+ public abstract double homoscedasticT(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException;
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a one-sample, two-tailed t-test
+ * <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the input array with the constant <code>mu</code>.
* <p>
* The number returned is the smallest significance level
@@ -270,13 +338,12 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double tTest(double mu, double[] sample)
- throws IllegalArgumentException, MathException;
-
+ public abstract double tTest(double mu, double[] sample)
+ throws IllegalArgumentException, MathException;
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
- * which <code>sample</code> is drawn equals <code>mu</code>.
+ * which <code>sample</code> is drawn equals <code>mu</code>.
* <p>
* Returns <code>true</code> iff the null hypothesis can be
* rejected with confidence <code>1 - alpha</code>. To
@@ -308,13 +375,11 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error computing the p-value
*/
- boolean tTest(double mu, double[] sample, double alpha)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean tTest(double mu, double[] sample, double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a one-sample, two-tailed t-test
+ * <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the dataset described by <code>sampleStats</code>
* with the constant <code>mu</code>.
* <p>
@@ -327,7 +392,8 @@
* <strong>Usage Note:</strong><br>
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
- * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">here</a>
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The sample must contain at least 2 observations.
@@ -339,17 +405,17 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double tTest(double mu, StatisticalSummary sampleStats)
- throws IllegalArgumentException, MathException;
-
+ public abstract double tTest(double mu, StatisticalSummary sampleStats)
+ throws IllegalArgumentException, MathException;
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
- * two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
- * which the dataset described by <code>stats</code> is drawn equals <code>mu</code>.
- * <p>
- * Returns <code>true</code> iff the null hypothesis can be
- * rejected with confidence <code>1 - alpha</code>. To
- * perform a 1-sided test, use <code>alpha / 2</code>
+ * two-sided t-test</a> evaluating the null hypothesis that the mean of the
+ * population from which the dataset described by <code>stats</code> is
+ * drawn equals <code>mu</code>.
+ * <p>
+ * Returns <code>true</code> iff the null hypothesis can be rejected with
+ * confidence <code>1 - alpha</code>. To perform a 1-sided test, use
+ * <code>alpha / 2.</code>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>sample mean = mu </code> at
@@ -377,13 +443,14 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- boolean tTest(double mu, StatisticalSummary sampleStats, double alpha)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean tTest(
+ double mu,
+ StatisticalSummary sampleStats,
+ double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a two-sample, two-tailed t-test
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays.
* <p>
* The number returned is the smallest significance level
@@ -391,19 +458,50 @@
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
+ * sample data to compute the p-value. The t-statistic used is as defined in
+ * {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
+ * to the degrees of freedom is used,
* as described
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
+ * here.</a> To perform the test under the assumption of equal subpopulation
+ * variances, use {@link #homoscedasticTTest(double[], double[])}.
+ * <p>
+ * <strong>Usage Note:</strong><br>
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
+ * <p>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The observed array lengths must both be at least 2.
+ * </li></ul>
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public abstract double tTest(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Returns the <i>observed significance level</i>, or
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
+ * comparing the means of the input arrays, under the assumption that
+ * the two samples are drawn from subpopulations with equal variances.
+ * To perform the test without the equal variances assumption, use
+ * {@link #tTest(double[], double[])}.
+ * <p>
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
+ * minus 2 is used as the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@@ -417,47 +515,99 @@
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double tTest(double[] sample1, double[] sample2, boolean equalVariances)
- throws IllegalArgumentException, MathException;
-
+ public abstract double homoscedasticTTest(
+ double[] sample1,
+ double[] sample2)
+ throws IllegalArgumentException, MathException;
/**
- * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
+ * Performs a
+ * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
* and <code>sample2</code> are drawn from populations with the same mean,
- * with significance level <code>alpha</code>.
+ * with significance level <code>alpha</code>. This test does not assume
+ * that the subpopulation variances are equal. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(double[], double[], double)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
+ * Welch-Satterthwaite approximation.</a>
+
+ * <p>
+ * <strong>Examples:</strong><br><ol>
+ * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
+ * the 95% level, use
+ * <br><code>tTest(sample1, sample2, 0.05). </code>
+ * </li>
+ * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>,
+ * first verify that the measured mean of <code>sample 1</code> is less
+ * than the mean of <code>sample 2</code> and then use
+ * <br><code>tTest(sample1, sample2, 0.005) </code>
+ * </li></ol>
* <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * <strong>Usage Note:</strong><br>
+ * The validity of the test depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
+ * <p>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The observed array lengths must both be at least 2.
+ * </li>
+ * <li> <code> 0 < alpha < 0.5 </code>
+ * </li></ul>
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @param alpha significance level of the test
+ * @return true if the null hypothesis can be rejected with
+ * confidence 1 - alpha
+ * @throws IllegalArgumentException if the preconditions are not met
+ * @throws MathException if an error occurs performing the test
+ */
+ public abstract boolean tTest(
+ double[] sample1,
+ double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Performs a
+ * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
+ * two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
+ * and <code>sample2</code> are drawn from populations with the same mean,
+ * with significance level <code>alpha</code>, assuming that the
+ * subpopulation variances are equal. Use
+ * {@link #tTest(double[], double[], double)} to perform the test without
+ * the assumption of equal variances.
+ * <p>
+ * Returns <code>true</code> iff the null hypothesis that the means are
+ * equal can be rejected with confidence <code>1 - alpha</code>. To
+ * perform a 1-sided test, use <code>alpha / 2.</code> To perform the test
+ * without the assumption of equal subpopulation variances, use
+ * {@link #tTest(double[], double[], double)}.
+ * <p>
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #t(double[], double[])} for the formula. The sum of the sample
+ * sizes minus 2 is used as the degrees of freedom.
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
- * the 95% level, under the assumption of equal subpopulation variances,
- * use <br><code>tTest(sample1, sample2, 0.05, true) </code>
+ * the 95% level, use <br><code>tTest(sample1, sample2, 0.05). </code>
* </li>
- * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
- * at the 99% level without assuming equal variances, first verify that the measured
- * mean of <code>sample 1</code> is less than the mean of <code>sample 2</code>
- * and then use <br><code>tTest(sample1, sample2, 0.005, false) </code>
+ * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2, </code>
+ * at the 99% level, first verify that the measured mean of
+ * <code>sample 1</code> is less than the mean of <code>sample 2</code>
+ * and then use
+ * <br><code>tTest(sample1, sample2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@@ -475,40 +625,70 @@
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- boolean tTest(double[] sample1, double[] sample2, double alpha,
- boolean equalVariances)
- throws IllegalArgumentException, MathException;
-
+ public abstract boolean homoscedasticTTest(
+ double[] sample1,
+ double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a two-sample, two-tailed t-test
- * comparing the means of the datasets described by two Univariates.
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
- * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
+ * sample data to compute the p-value. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
+ * <p>
+ * <strong>Usage Note:</strong><br>
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
+ * <p>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The datasets described by the two Univariates must each contain
+ * at least 2 observations.
+ * </li></ul>
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public abstract double tTest(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Returns the <i>observed significance level</i>, or
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances, under the hypothesis of equal subpopulation variances. To
+ * perform a test without the equal variances assumption, use
+ * {@link #tTest(StatisticalSummary, StatisticalSummary)}.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
+ * <p>
+ * See {@link #homoscedasticT(double[], double[])} for the formula used to
+ * compute the t-statistic. The sum of the sample sizes minus 2 is used as
+ * the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@@ -522,49 +702,44 @@
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- boolean equalVariances)
- throws IllegalArgumentException, MathException;
-
- /**
- * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
- * two-sided t-test</a> evaluating the null hypothesis that <code>sampleStats1</code>
- * and <code>sampleStats2</code> describe datasets drawn from populations with the
- * same mean, with significance level <code>alpha</code>.
+ public abstract double homoscedasticTTest(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException, MathException;
+ /**
+ * Performs a
+ * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
+ * two-sided t-test</a> evaluating the null hypothesis that
+ * <code>sampleStats1</code> and <code>sampleStats2</code> describe
+ * datasets drawn from populations with the same mean, with significance
+ * level <code>alpha</code>. This test does not assume that the
+ * subpopulation variances are equal. To perform the test under the equal
+ * variances assumption, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
- * <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * Welch-Satterthwaite approximation.</a>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
- * the 95% level under the assumption of equal subpopulation variances, use
- * <br><code>tTest(sampleStats1, sampleStats2, 0.05, true) </code>
+ * the 95%, use
+ * <br><code>tTest(sampleStats1, sampleStats2, 0.05) </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
- * at the 99% level without assuming that subpopulation variances are equal,
- * first verify that the measured mean of <code>sample 1</code> is less than
- * the mean of <code>sample 2</code> and then use
- * <br><code>tTest(sampleStats1, sampleStats2, 0.005, false) </code>
+ * at the 99% level, first verify that the measured mean of
+ * <code>sample 1</code> is less than the mean of <code>sample 2</code>
+ * and then use
+ * <br><code>tTest(sampleStats1, sampleStats2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@@ -583,13 +758,14 @@
* @param sampleStats1 StatisticalSummary describing sample data values
* @param sampleStats2 StatisticalSummary describing sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- double alpha, boolean equalVariances)
- throws IllegalArgumentException, MathException;
-}
+ public abstract boolean tTest(
+ StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2,
+ double alpha)
+ throws IllegalArgumentException, MathException;
+}
\ No newline at end of file
1.9 +395 -152 jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTestImpl.java
Index: TTestImpl.java
===================================================================
RCS file: /home/cvs/jakarta-commons/math/src/java/org/apache/commons/math/stat/inference/TTestImpl.java,v
retrieving revision 1.8
retrieving revision 1.9
diff -u -r1.8 -r1.9
--- TTestImpl.java 23 Jun 2004 16:26:14 -0000 1.8
+++ TTestImpl.java 2 Aug 2004 04:20:08 -0000 1.9
@@ -23,6 +23,9 @@
/**
* Implements t-test statistics defined in the {@link TTest} interface.
+ * <p>
+ * Uses commons-math {@link org.apache.commons.math.distribution.TDistribution}
+ * implementation to estimate exact p-values.
*
* @version $Revision$ $Date$
*/
@@ -72,8 +75,7 @@
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a paired, two-sample, two-tailed t-test
+ * <i> p-value</i>, associated with a paired, two-sample, two-tailed t-test
* based on the data in the input arrays.
* <p>
* The number returned is the smallest significance level
@@ -113,7 +115,7 @@
}
/**
- * Performs a paired t-test</a> evaluating the null hypothesis that the
+ * Performs a paired t-test evaluating the null hypothesis that the
* mean of the paired differences between <code>sample1</code> and
* <code>sample2</code> is 0 in favor of the two-sided alternative that the
* mean paired difference is not equal to 0, with significance level
@@ -172,7 +174,8 @@
if ((observed == null) || (observed.length < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return t(StatUtils.mean(observed), mu, StatUtils.variance(observed), observed.length);
+ return t(StatUtils.mean(observed), mu, StatUtils.variance(observed),
+ observed.length);
}
/**
@@ -196,19 +199,21 @@
if ((sampleStats == null) || (sampleStats.getN() < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return t(sampleStats.getMean(), mu, sampleStats.getVariance(), sampleStats.getN());
+ return t(sampleStats.getMean(), mu, sampleStats.getVariance(),
+ sampleStats.getN());
}
/**
- * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * 2-sample t statistic. </a>
+ * Computes a 2-sample t statistic, under the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic without the
+ * equal variances hypothesis, use {@link #t(double[], double[])}.
* <p>
- * This statistic can be used to perform a two-sample t-test to compare
- * sample means.
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
+ * The t-statisitc is
* <p>
- * (1) <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
+ * <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
@@ -222,9 +227,44 @@
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
- * If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The observed array lengths must both be at least 2.
+ * </li></ul>
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public double homoscedasticT(double[] sample1, double[] sample2)
+ throws IllegalArgumentException {
+ if ((sample1 == null) || (sample2 == null ||
+ Math.min(sample1.length, sample2.length) < 2)) {
+ throw new IllegalArgumentException("insufficient data for t statistic");
+ }
+ return homoscedasticT(StatUtils.mean(sample1), StatUtils.mean(sample2),
+ StatUtils.variance(sample1), StatUtils.variance(sample2),
+ (double) sample1.length, (double) sample2.length);
+ }
+
+ /**
+ * Computes a 2-sample t statistic, without the hypothesis of equal
+ * subpopulation variances. To compute a t-statistic assuming equal
+ * variances, use {@link #homoscedasticT(double[], double[])}.
+ * <p>
+ * This statistic can be used to perform a two-sample t-test to compare
+ * sample means.
+ * <p>
+ * The t-statisitc is
* <p>
- * (2) <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
+ * <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
+ * <p>
+ * where <strong><code>n1</code></strong> is the size of the first sample
+ * <strong><code> n2</code></strong> is the size of the second sample;
+ * <strong><code> m1</code></strong> is the mean of the first sample;
+ * <strong><code> m2</code></strong> is the mean of the second sample;
+ * <strong><code> var1</code></strong> is the variance of the first sample;
+ * <strong><code> var2</code></strong> is the variance of the second sample;
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
@@ -232,38 +272,82 @@
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
- * @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
- public double t(double[] sample1, double[] sample2, boolean equalVariances)
+ public double t(double[] sample1, double[] sample2)
throws IllegalArgumentException {
if ((sample1 == null) || (sample2 == null ||
Math.min(sample1.length, sample2.length) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return t(StatUtils.mean(sample1), StatUtils.mean(sample2), StatUtils.variance(sample1),
- StatUtils.variance(sample2), (double) sample1.length,
- (double) sample2.length, equalVariances);
+ return t(StatUtils.mean(sample1), StatUtils.mean(sample2),
+ StatUtils.variance(sample1), StatUtils.variance(sample2),
+ (double) sample1.length, (double) sample2.length);
}
/**
- * Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * 2-sample t statistic </a>, comparing the means of the datasets described
- * by two {@link StatisticalSummary} instances.
+ * Computes a 2-sample t statistic </a>, comparing the means of the datasets
+ * described by two {@link StatisticalSummary} instances, without the
+ * assumption of equal subpopulation variances. Use
+ * {@link #homoscedasticT(StatisticalSummary, StatisticalSummary)} to
+ * compute a t-statistic under the equal variances assumption.
* <p>
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
+ * The returned t-statisitc is
+ * <p>
+ * <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
+ * <p>
+ * where <strong><code>n1</code></strong> is the size of the first sample;
+ * <strong><code> n2</code></strong> is the size of the second sample;
+ * <strong><code> m1</code></strong> is the mean of the first sample;
+ * <strong><code> m2</code></strong> is the mean of the second sample
+ * <strong><code> var1</code></strong> is the variance of the first sample;
+ * <strong><code> var2</code></strong> is the variance of the second sample
* <p>
- * (1) <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The datasets described by the two Univariates must each contain
+ * at least 2 observations.
+ * </li></ul>
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return t statistic
+ * @throws IllegalArgumentException if the precondition is not met
+ */
+ public double t(StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
+ throws IllegalArgumentException {
+ if ((sampleStats1 == null) ||
+ (sampleStats2 == null ||
+ Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
+ throw new IllegalArgumentException("insufficient data for t statistic");
+ }
+ return t(sampleStats1.getMean(), sampleStats2.getMean(),
+ sampleStats1.getVariance(), sampleStats2.getVariance(),
+ (double) sampleStats1.getN(), (double) sampleStats2.getN());
+ }
+
+ /**
+ * Computes a 2-sample t statistic, comparing the means of the datasets
+ * described by two {@link StatisticalSummary} instances, under the
+ * assumption of equal subpopulation variances. To compute a t-statistic
+ * without the equal variances assumption, use
+ * {@link #t(StatisticalSummary, StatisticalSummary)}.
+ * <p>
+ * This statistic can be used to perform a (homoscedastic) two-sample
+ * t-test to compare sample means.
+ * <p>
+ * The t-statisitc returned is
+ * <p>
+ * <code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
* <strong><code> m1</code></strong> is the mean of first sample;
- * <strong><code> m2</code></strong> is the mean of second sample</li>
- * </ul>
+ * <strong><code> m2</code></strong> is the mean of second sample
* and <strong><code>var</code></strong> is the pooled variance estimate:
* <p>
* <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
@@ -271,10 +355,6 @@
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
- * If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
- * <p>
- * (2) <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
- * <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
* at least 2 observations.
@@ -282,27 +362,25 @@
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
- * @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
- public double t(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- boolean equalVariances)
+ public double homoscedasticT(StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
throws IllegalArgumentException {
if ((sampleStats1 == null) ||
(sampleStats2 == null ||
Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return t(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
- sampleStats2.getVariance(), (double) sampleStats1.getN(),
- (double) sampleStats2.getN(), equalVariances);
+ return homoscedasticT(sampleStats1.getMean(), sampleStats2.getMean(),
+ sampleStats1.getVariance(), sampleStats2.getVariance(),
+ (double) sampleStats1.getN(), (double) sampleStats2.getN());
}
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a one-sample, two-tailed t-test
+ * <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the input array with the constant <code>mu</code>.
* <p>
* The number returned is the smallest significance level
@@ -331,13 +409,14 @@
if ((sample == null) || (sample.length < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample), sample.length);
+ return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample),
+ sample.length);
}
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
- * which <code>sample</code> is drawn equals <code>mu</code>.
+ * which <code>sample</code> is drawn equals <code>mu</code>.
* <p>
* Returns <code>true</code> iff the null hypothesis can be
* rejected with confidence <code>1 - alpha</code>. To
@@ -379,8 +458,7 @@
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a one-sample, two-tailed t-test
+ * <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the dataset described by <code>sampleStats</code>
* with the constant <code>mu</code>.
* <p>
@@ -393,7 +471,8 @@
* <strong>Usage Note:</strong><br>
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
- * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">here</a>
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The sample must contain at least 2 observations.
@@ -410,17 +489,19 @@
if ((sampleStats == null) || (sampleStats.getN() < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return tTest(sampleStats.getMean(), mu, sampleStats.getVariance(), sampleStats.getN());
+ return tTest(sampleStats.getMean(), mu, sampleStats.getVariance(),
+ sampleStats.getN());
}
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
- * two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
- * which the dataset described by <code>stats</code> is drawn equals <code>mu</code>.
- * <p>
- * Returns <code>true</code> iff the null hypothesis can be
- * rejected with confidence <code>1 - alpha</code>. To
- * perform a 1-sided test, use <code>alpha / 2</code>
+ * two-sided t-test</a> evaluating the null hypothesis that the mean of the
+ * population from which the dataset described by <code>stats</code> is
+ * drawn equals <code>mu</code>.
+ * <p>
+ * Returns <code>true</code> iff the null hypothesis can be rejected with
+ * confidence <code>1 - alpha</code>. To perform a 1-sided test, use
+ * <code>alpha / 2.</code>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>sample mean = mu </code> at
@@ -448,7 +529,8 @@
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- public boolean tTest( double mu, StatisticalSummary sampleStats, double alpha)
+ public boolean tTest( double mu, StatisticalSummary sampleStats,
+ double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
@@ -458,8 +540,7 @@
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a two-sample, two-tailed t-test
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays.
* <p>
* The number returned is the smallest significance level
@@ -467,19 +548,59 @@
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
+ * sample data to compute the p-value. The t-statistic used is as defined in
+ * {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
+ * to the degrees of freedom is used,
* as described
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
+ * here.</a> To perform the test under the assumption of equal subpopulation
+ * variances, use {@link #homoscedasticTTest(double[], double[])}.
+ * <p>
+ * <strong>Usage Note:</strong><br>
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
+ * <p>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The observed array lengths must both be at least 2.
+ * </li></ul>
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public double tTest(double[] sample1, double[] sample2)
+ throws IllegalArgumentException, MathException {
+ if ((sample1 == null) || (sample2 == null ||
+ Math.min(sample1.length, sample2.length) < 2)) {
+ throw new IllegalArgumentException("insufficient data");
+ }
+ return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2),
+ StatUtils.variance(sample1), StatUtils.variance(sample2),
+ (double) sample1.length, (double) sample2.length);
+ }
+
+ /**
+ * Returns the <i>observed significance level</i>, or
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
+ * comparing the means of the input arrays, under the assumption that
+ * the two samples are drawn from subpopulations with equal variances.
+ * To perform the test without the equal variances assumption, use
+ * {@link #tTest(double[], double[])}.
+ * <p>
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
+ * minus 2 is used as the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@@ -493,55 +614,112 @@
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- public double tTest(double[] sample1, double[] sample2, boolean equalVariances)
+ public double homoscedasticTTest(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException {
if ((sample1 == null) || (sample2 == null ||
Math.min(sample1.length, sample2.length) < 2)) {
throw new IllegalArgumentException("insufficient data");
}
- return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2), StatUtils.variance(sample1),
+ return homoscedasticTTest(StatUtils.mean(sample1),
+ StatUtils.mean(sample2), StatUtils.variance(sample1),
StatUtils.variance(sample2), (double) sample1.length,
- (double) sample2.length, equalVariances);
+ (double) sample2.length);
}
+
/**
- * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
+ * Performs a
+ * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
* and <code>sample2</code> are drawn from populations with the same mean,
- * with significance level <code>alpha</code>.
+ * with significance level <code>alpha</code>. This test does not assume
+ * that the subpopulation variances are equal. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(double[], double[], double)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
+ * Welch-Satterthwaite approximation.</a>
+
+ * <p>
+ * <strong>Examples:</strong><br><ol>
+ * <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
+ * the 95% level, use
+ * <br><code>tTest(sample1, sample2, 0.05). </code>
+ * </li>
+ * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>,
+ * first verify that the measured mean of <code>sample 1</code> is less
+ * than the mean of <code>sample 2</code> and then use
+ * <br><code>tTest(sample1, sample2, 0.005) </code>
+ * </li></ol>
+ * <p>
+ * <strong>Usage Note:</strong><br>
+ * The validity of the test depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
+ * <p>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The observed array lengths must both be at least 2.
+ * </li>
+ * <li> <code> 0 < alpha < 0.5 </code>
+ * </li></ul>
+ *
+ * @param sample1 array of sample data values
+ * @param sample2 array of sample data values
+ * @param alpha significance level of the test
+ * @return true if the null hypothesis can be rejected with
+ * confidence 1 - alpha
+ * @throws IllegalArgumentException if the preconditions are not met
+ * @throws MathException if an error occurs performing the test
+ */
+ public boolean tTest(double[] sample1, double[] sample2,
+ double alpha)
+ throws IllegalArgumentException, MathException {
+ if ((alpha <= 0) || (alpha > 0.5)) {
+ throw new IllegalArgumentException("bad significance level: " + alpha);
+ }
+ return (tTest(sample1, sample2) < alpha);
+ }
+
+ /**
+ * Performs a
+ * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
+ * two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
+ * and <code>sample2</code> are drawn from populations with the same mean,
+ * with significance level <code>alpha</code>, assuming that the
+ * subpopulation variances are equal. Use
+ * {@link #tTest(double[], double[], double)} to perform the test without
+ * the assumption of equal variances.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * Returns <code>true</code> iff the null hypothesis that the means are
+ * equal can be rejected with confidence <code>1 - alpha</code>. To
+ * perform a 1-sided test, use <code>alpha / 2.</code> To perform the test
+ * without the assumption of equal subpopulation variances, use
+ * {@link #tTest(double[], double[], double)}.
+ * <p>
+ * A pooled variance estimate is used to compute the t-statistic. See
+ * {@link #t(double[], double[])} for the formula. The sum of the sample
+ * sizes minus 2 is used as the degrees of freedom.
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
- * the 95% level, under the assumption of equal subpopulation variances,
- * use <br><code>tTest(sample1, sample2, 0.05, true) </code>
+ * the 95% level, use <br><code>tTest(sample1, sample2, 0.05). </code>
* </li>
- * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
- * at the 99% level without assuming equal variances, first verify that the measured
- * mean of <code>sample 1</code> is less than the mean of <code>sample 2</code>
- * and then use <br><code>tTest(sample1, sample2, 0.005, false) </code>
+ * <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2, </code>
+ * at the 99% level, first verify that the measured mean of
+ * <code>sample 1</code> is less than the mean of <code>sample 2</code>
+ * and then use
+ * <br><code>tTest(sample1, sample2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@@ -559,45 +737,81 @@
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- public boolean tTest(double[] sample1, double[] sample2, double alpha,
- boolean equalVariances)
+ public boolean homoscedasticTTest(double[] sample1, double[] sample2,
+ double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
}
- return (tTest(sample1, sample2, equalVariances) < alpha);
+ return (homoscedasticTTest(sample1, sample2) < alpha);
}
/**
* Returns the <i>observed significance level</i>, or
- * <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
- * p-value</a>, associated with a two-sample, two-tailed t-test
- * comparing the means of the datasets described by two Univariates.
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
+ * The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
- * <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
+ * sample data to compute the p-value. To perform the test assuming
+ * equal variances, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
+ * <p>
+ * <strong>Usage Note:</strong><br>
+ * The validity of the p-value depends on the assumptions of the parametric
+ * t-test procedure, as discussed
+ * <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
+ * here</a>
+ * <p>
+ * <strong>Preconditions</strong>: <ul>
+ * <li>The datasets described by the two Univariates must each contain
+ * at least 2 observations.
+ * </li></ul>
+ *
+ * @param sampleStats1 StatisticalSummary describing data from the first sample
+ * @param sampleStats2 StatisticalSummary describing data from the second sample
+ * @return p-value for t-test
+ * @throws IllegalArgumentException if the precondition is not met
+ * @throws MathException if an error occurs computing the p-value
+ */
+ public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2)
+ throws IllegalArgumentException, MathException {
+ if ((sampleStats1 == null) || (sampleStats2 == null ||
+ Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
+ throw new IllegalArgumentException("insufficient data for t statistic");
+ }
+ return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
+ sampleStats2.getVariance(), (double) sampleStats1.getN(),
+ (double) sampleStats2.getN());
+ }
+
+ /**
+ * Returns the <i>observed significance level</i>, or
+ * <i>p-value</i>, associated with a two-sample, two-tailed t-test
+ * comparing the means of the datasets described by two StatisticalSummary
+ * instances, under the hypothesis of equal subpopulation variances. To
+ * perform a test without the equal variances assumption, use
+ * {@link #tTest(StatisticalSummary, StatisticalSummary)}.
+ * <p>
+ * The number returned is the smallest significance level
+ * at which one can reject the null hypothesis that the two means are
+ * equal in favor of the two-sided alternative that they are different.
+ * For a one-sided test, divide the returned value by 2.
* <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * See {@link #homoscedasticT(double[], double[])} for the formula used to
+ * compute the t-statistic. The sum of the sample sizes minus 2 is used as
+ * the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@@ -611,57 +825,53 @@
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
- * @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
- public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- boolean equalVariances)
+ public double homoscedasticTTest(StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2)
throws IllegalArgumentException, MathException {
if ((sampleStats1 == null) || (sampleStats2 == null ||
Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
- return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
+ return homoscedasticTTest(sampleStats1.getMean(),
+ sampleStats2.getMean(), sampleStats1.getVariance(),
sampleStats2.getVariance(), (double) sampleStats1.getN(),
- (double) sampleStats2.getN(), equalVariances);
+ (double) sampleStats2.getN());
}
/**
- * Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
- * two-sided t-test</a> evaluating the null hypothesis that <code>sampleStats1</code>
- * and <code>sampleStats2</code> describe datasets drawn from populations with the
- * same mean, with significance level <code>alpha</code>.
+ * Performs a
+ * <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
+ * two-sided t-test</a> evaluating the null hypothesis that
+ * <code>sampleStats1</code> and <code>sampleStats2</code> describe
+ * datasets drawn from populations with the same mean, with significance
+ * level <code>alpha</code>. This test does not assume that the
+ * subpopulation variances are equal. To perform the test under the equal
+ * variances assumption, use
+ * {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
- * If the <code>equalVariances</code> parameter is <code>false,</code>
- * the test does not assume that the underlying popuation variances are
- * equal and it uses approximated degrees of freedom computed from the
- * sample data to compute the p-value. In this case, formula (1) for the
- * {@link #t(double[], double[], boolean)} statistic is used
- * and the Welch-Satterthwaite approximation to the degrees of freedom is used,
- * as described
+ * See {@link #t(double[], double[])} for the formula used to compute the
+ * t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
- * here.</a>
- * <p>
- * If <code>equalVariances</code> is <code>true</code>, a pooled variance
- * estimate is used to compute the t-statistic (formula (2)) and the sum of the
- * sample sizes minus 2 is used as the degrees of freedom.
+ * Welch-Satterthwaite approximation.</a>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
- * the 95% level under the assumption of equal subpopulation variances, use
- * <br><code>tTest(sampleStats1, sampleStats2, 0.05, true) </code>
+ * the 95%, use
+ * <br><code>tTest(sampleStats1, sampleStats2, 0.05) </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
- * at the 99% level without assuming that subpopulation variances are equal,
- * first verify that the measured mean of <code>sample 1</code> is less than
- * the mean of <code>sample 2</code> and then use
- * <br><code>tTest(sampleStats1, sampleStats2, 0.005, false) </code>
+ * at the 99% level, first verify that the measured mean of
+ * <code>sample 1</code> is less than the mean of <code>sample 2</code>
+ * and then use
+ * <br><code>tTest(sampleStats1, sampleStats2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@@ -680,19 +890,18 @@
* @param sampleStats1 StatisticalSummary describing sample data values
* @param sampleStats2 StatisticalSummary describing sample data values
* @param alpha significance level of the test
- * @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
- public boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
- double alpha, boolean equalVariances)
+ public boolean tTest(StatisticalSummary sampleStats1,
+ StatisticalSummary sampleStats2, double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
}
- return (tTest(sampleStats1, sampleStats2, equalVariances) < alpha);
+ return (tTest(sampleStats1, sampleStats2) < alpha);
}
//----------------------------------------------- Protected methods
@@ -738,8 +947,8 @@
/**
* Computes t test statistic for 2-sample t-test.
- * If equalVariance is true, the pooled variance
- * estimate is computed and used.
+ * <p>
+ * Does not assume that subpopulation variances are equal.
*
* @param m1 first sample mean
* @param m2 second sample mean
@@ -747,17 +956,29 @@
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
- * @param equalVariances are variances assumed equal?
* @return t test statistic
*/
protected double t(double m1, double m2, double v1, double v2, double n1,
- double n2, boolean equalVariances) {
- if (equalVariances) {
- double pooledVariance = ((n1 - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2);
- return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
- } else {
+ double n2) {
return (m1 - m2) / Math.sqrt((v1 / n1) + (v2 / n2));
- }
+ }
+
+ /**
+ * Computes t test statistic for 2-sample t-test under the hypothesis
+ * of equal subpopulation variances.
+ *
+ * @param m1 first sample mean
+ * @param m2 second sample mean
+ * @param v1 first sample variance
+ * @param v2 second sample variance
+ * @param n1 first sample n
+ * @param n2 second sample n
+ * @return t test statistic
+ */
+ protected double homoscedasticT(double m1, double m2, double v1,
+ double v2, double n1, double n2) {
+ double pooledVariance = ((n1 - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2);
+ return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
}
/**
@@ -780,8 +1001,9 @@
/**
* Computes p-value for 2-sided, 2-sample t-test.
- * If equalVariances is true, the sum of the sample sizes minus 2
- * is used as df; otherwise df is approximated from the data.
+ * <p>
+ * Does not assume subpopulation variances are equal. Degrees of freedom
+ * are estimated from the data.
*
* @param m1 first sample mean
* @param m2 second sample mean
@@ -789,20 +1011,41 @@
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
- * @param equalVariances are variances assumed equal?
* @return p-value
* @throws MathException if an error occurs computing the p-value
*/
protected double tTest(double m1, double m2, double v1, double v2,
- double n1, double n2, boolean equalVariances)
+ double n1, double n2)
+ throws MathException {
+ double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
+ double degreesOfFreedom = 0;
+ degreesOfFreedom= df(v1, v2, n1, n2);
+ TDistribution tDistribution =
+ getDistributionFactory().createTDistribution(degreesOfFreedom);
+ return 1.0 - tDistribution.cumulativeProbability(-t, t);
+ }
+
+ /**
+ * Computes p-value for 2-sided, 2-sample t-test, under the assumption
+ * of equal subpopulation variances.
+ * <p>
+ * The sum of the sample sizes minus 2 is used as degrees of freedom.
+ *
+ * @param m1 first sample mean
+ * @param m2 second sample mean
+ * @param v1 first sample variance
+ * @param v2 second sample variance
+ * @param n1 first sample n
+ * @param n2 second sample n
+ * @return p-value
+ * @throws MathException if an error occurs computing the p-value
+ */
+ protected double homoscedasticTTest(double m1, double m2, double v1,
+ double v2, double n1, double n2)
throws MathException {
- double t = Math.abs(t(m1, m2, v1, v2, n1, n2, equalVariances));
+ double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
double degreesOfFreedom = 0;
- if (equalVariances) {
degreesOfFreedom = (double) (n1 + n2 - 2);
- } else {
- degreesOfFreedom= df(v1, v2, n1, n2);
- }
TDistribution tDistribution =
getDistributionFactory().createTDistribution(degreesOfFreedom);
return 1.0 - tDistribution.cumulativeProbability(-t, t);
1.6 +24 -24 jakarta-commons/math/src/test/org/apache/commons/math/stat/inference/TTestTest.java
Index: TTestTest.java
===================================================================
RCS file: /home/cvs/jakarta-commons/math/src/test/org/apache/commons/math/stat/inference/TTestTest.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -r1.5 -r1.6
--- TTestTest.java 2 Jun 2004 13:08:55 -0000 1.5
+++ TTestTest.java 2 Aug 2004 04:20:09 -0000 1.6
@@ -166,73 +166,73 @@
// Target comparison values computed using R version 1.8.1 (Linux version)
assertEquals("two sample heteroscedastic t stat", 1.603717,
- testStatistic.t(sample1, sample2, false), 1E-6);
+ testStatistic.t(sample1, sample2), 1E-6);
assertEquals("two sample heteroscedastic t stat", 1.603717,
- testStatistic.t(sampleStats1, sampleStats2, false), 1E-6);
+ testStatistic.t(sampleStats1, sampleStats2), 1E-6);
assertEquals("two sample heteroscedastic p value", 0.1288394,
- testStatistic.tTest(sample1, sample2, false), 1E-7);
+ testStatistic.tTest(sample1, sample2), 1E-7);
assertEquals("two sample heteroscedastic p value", 0.1288394,
- testStatistic.tTest(sampleStats1, sampleStats2, false), 1E-7);
+ testStatistic.tTest(sampleStats1, sampleStats2), 1E-7);
assertTrue("two sample heteroscedastic t-test reject",
- testStatistic.tTest(sample1, sample2, 0.2, false));
+ testStatistic.tTest(sample1, sample2, 0.2));
assertTrue("two sample heteroscedastic t-test reject",
- testStatistic.tTest(sampleStats1, sampleStats2, 0.2, false));
+ testStatistic.tTest(sampleStats1, sampleStats2, 0.2));
assertTrue("two sample heteroscedastic t-test accept",
- !testStatistic.tTest(sample1, sample2, 0.1, false));
+ !testStatistic.tTest(sample1, sample2, 0.1));
assertTrue("two sample heteroscedastic t-test accept",
- !testStatistic.tTest(sampleStats1, sampleStats2, 0.1, false));
+ !testStatistic.tTest(sampleStats1, sampleStats2, 0.1));
try {
- testStatistic.tTest(sample1, sample2, .95, false);
+ testStatistic.tTest(sample1, sample2, .95);
fail("alpha out of range, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
- // exptected
+ // expected
}
try {
- testStatistic.tTest(sampleStats1, sampleStats2, .95, false);
+ testStatistic.tTest(sampleStats1, sampleStats2, .95);
fail("alpha out of range, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sample1, tooShortObs, .01, false);
+ testStatistic.tTest(sample1, tooShortObs, .01);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sampleStats1, tooShortStats, .01, false);
+ testStatistic.tTest(sampleStats1, tooShortStats, .01);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sample1, tooShortObs, false);
+ testStatistic.tTest(sample1, tooShortObs);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.tTest(sampleStats1, tooShortStats, false);
+ testStatistic.tTest(sampleStats1, tooShortStats);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.t(sample1, tooShortObs, false);
+ testStatistic.t(sample1, tooShortObs);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
- testStatistic.t(sampleStats1, tooShortStats, false);
+ testStatistic.t(sampleStats1, tooShortStats);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
@@ -252,13 +252,13 @@
// Target comparison values computed using R version 1.8.1 (Linux version)
assertEquals("two sample homoscedastic t stat", -1.120897,
- testStatistic.t(sample1, sample2, true), 10E-6);
+ testStatistic.homoscedasticT(sample1, sample2), 10E-6);
assertEquals("two sample homoscedastic p value", 0.2948490,
- testStatistic.tTest(sampleStats1, sampleStats2, true), 1E-6);
+ testStatistic.homoscedasticTTest(sampleStats1, sampleStats2), 1E-6);
assertTrue("two sample homoscedastic t-test reject",
- testStatistic.tTest(sample1, sample2, 0.3, true));
+ testStatistic.homoscedasticTTest(sample1, sample2, 0.3));
assertTrue("two sample homoscedastic t-test accept",
- !testStatistic.tTest(sample1, sample2, 0.2, true));
+ !testStatistic.homoscedasticTTest(sample1, sample2, 0.2));
}
public void testSmallSamples() throws Exception {
@@ -266,8 +266,8 @@
double[] sample2 = {4d, 5d};
// Target values computed using R, version 1.8.1 (linux version)
- assertEquals(-2.2361, testStatistic.t(sample1, sample2, false), 1E-4);
- assertEquals(0.1987, testStatistic.tTest(sample1, sample2, false), 1E-4);
+ assertEquals(-2.2361, testStatistic.t(sample1, sample2), 1E-4);
+ assertEquals(0.1987, testStatistic.tTest(sample1, sample2), 1E-4);
}
public void testPaired() throws Exception {
1.20 +11 -9 jakarta-commons/math/xdocs/userguide/stat.xml
Index: stat.xml
===================================================================
RCS file: /home/cvs/jakarta-commons/math/xdocs/userguide/stat.xml,v
retrieving revision 1.19
retrieving revision 1.20
diff -u -r1.19 -r1.20
--- stat.xml 23 Jun 2004 16:26:16 -0000 1.19
+++ stat.xml 2 Aug 2004 04:20:09 -0000 1.20
@@ -411,7 +411,10 @@
Welch-Satterwaite approximation</a> is used to compute the degrees
of freedom. Methods to return t-statistics and p-values are provided in each
case, as well as boolean-valued methods to perform fixed significance
- level tests. See the examples below and the API documentation for
+ level tests. The names of methods or methods that assume equal
+ subpopulation variances always start with "homoscedastic." Test or
+ test-statistic methods that just start with "t" do not assume equal
+ variances. See the examples below and the API documentation for
more details.</li>
<li>The validity of the p-values returned by the t-test depends on the
assumptions of the parametric t-test procedure, as discussed
@@ -536,26 +539,25 @@
To compute the t-statistic:
<source>
TTestImpl testStatistic = new TTestImpl();
-testStatistic.t(summary1, summary2, false);
+testStatistic.t(summary1, summary2);
</source>
</p>
<p>
To compute the (one-sided) p-value:
<source>
-testStatistic.tTest(sample1, sample2, false);
+testStatistic.tTest(sample1, sample2);
</source>
</p>
<p>
To perform a fixed significance level test with alpha = .05:
<source>
-testStatistic.tTest(sample1, sample2, .05, false);
+testStatistic.tTest(sample1, sample2, .05);
</source>
</p>
<p>
- In each case above, the last (boolean) parameter determines
- whether or not the test should assume that subpopulation variances
- are equal. Replacing this with <code>true</code> will result in
- homoscedastic (equal variances) tests / test statistics.
+ In each case above, the test does not assume that the subpopulation
+ variances are equal. To perform the tests under this assumption,
+ replace "t" at the beginning of the method name with "homoscedasticT"
</p>
</dd>
<dt>Computing <code>chi-square</code> test statistics</dt>
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org