You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by William J Rust <wj...@weru.ksu.edu> on 2007/09/17 23:36:49 UTC

normal deviates don't pass t test

I'm working on a climate simulation program that takes monthly averages 
and generates daily readings that are assumed to be normally 
distributed. The following program creates 10 sets of 100,000 random 
deviates with mean 10 and SD 5. It then applies a t test (results below) 
to ensure that the generated numbers are good enough. As the results 
show, they aren't. I'm wondering a) I am doing something wrong or b) is 
there something wrong with the stats routines?

Thanks,

wjr

package usda.weru.cligen2;

import org.apache.commons.math.MathException;

/**
 *
 * @author wjr
 */
public class TestNormal {
        
    static org.apache.commons.math.distribution.NormalDistributionImpl nd =
            new 
org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);

    public static void main(String[] args) {
        double[] arry = new double[100000];
        java.util.Random ran = new java.util.Random(1l);
        
        for (int jdx = 0; jdx < 10; jdx++) {
            for (int idx = 0; idx < arry.length; idx++) {
                try {
                    arry[idx] = 
nd.inverseCumulativeProbability(ran.nextDouble());
                } catch (MathException ex) {
                    ex.printStackTrace();
                }
            }
            try {
                System.out.println("ttest " + 
org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
            } catch (IllegalArgumentException ex) {
                ex.printStackTrace();
            } catch (MathException ex) {
                ex.printStackTrace();
            }
        }
    }
}

Output:

>
> run-single:
> ttest 0.3433300114960922
> ttest 0.1431930575825282
> ttest 0.12336027805916228
> ttest 0.49478850669361796
> ttest 0.9216887341410063
> ttest 0.9937228334312525
> ttest 0.13669784550400177
> ttest 0.9646134537758599
> ttest 0.9965741269090211
> ttest 0.03815948891784959
> BUILD SUCCESSFUL (total time: 20 seconds)



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: normal deviates don't pass t test

Posted by Phil Steitz <ph...@gmail.com>.
On 9/24/07, Bill Rust <wj...@weru.ksu.edu> wrote:
> Thanks for the reply.
>
> First, the reason for using NormalDistributionImpl is that I'm
> translating from FORTRAN and I wasn't thinking a whole lot. Using Random
> makes sense.
>
> Second, I'm not really caring, at least not yet, about how normal my
> output samples are. What I do really care about is that the means of my
> generated samples match the observed means. For example, if I start with
> an observed max temperature of 30, I want to determine that 30 is within
> the confidence interval of the sample at a 90% level. What I think that
> I am getting is, using the same technique, half the time I am hitting my
> goal and the other half my samples stink. If my understanding is
> correct, I should be getting a 90% confidence level 9 times out of 10,
> more or less, which clearly isn't happening.
>

You can't conclude that from the t-test, which as I said does not
support rejecting the null hypothesis that the means are equal at the
90% level except in the last case presented (one in the ten you have
presented).  So the results appear to show that the sample means are
distributed as they should be.

Sorry if I was not clear in my explanation of what the t-test p-value
represents.  I think you may be reversing the probabilities in your
interpretation.  The p-value (what t-test reports) represents the
probablility that, assuming the the sample is drawn from a population
with mean equal to the target, a difference as least as large as that
observed between the sample mean and the true mean will be obtained in
a random sample of the given size.  If that probablity is less than
.1, you can reject the null hypothesis that the means are the same
with 90% confidence.  Only one out of ten of your samples supports
this (exactly as would be expected by chance).

If what you want is to compute 90% confidence intervals and observe
coverage, you should do exactly that.  You can compute the widths
using the provided standard deveiation and sample size and just count
directly the number of means that fall within the expected intervals.
You should see, on average, nine out of 10 sample means falling within
the 90% confidence interval range, 95 out of 100 in 95% interval, etc.

Once again, if you do not find that the means are falling within
expected confidence intervals, please open a Jira ticket.

Phil

> wjr
>
> Phil Steitz wrote:
> > On 9/17/07, William J Rust <wj...@weru.ksu.edu> wrote:
> >> I'm working on a climate simulation program that takes monthly averages
> >> and generates daily readings that are assumed to be normally
> >> distributed. The following program creates 10 sets of 100,000 random
> >> deviates with mean 10 and SD 5. It then applies a t test (results below)
> >> to ensure that the generated numbers are good enough. As the results
> >> show, they aren't. I'm wondering a) I am doing something wrong or b) is
> >> there something wrong with the stats routines?
> >
> > There are a couple of problems here.  First, while your inversion
> > method should generate approximately normally distributed values, it
> > is better to use the JDK-supplied method for this (much faster and a
> > better algorithm).  There is a wrapped version of this provided in
> > org.apache.commons.math.random.RandomDataImpl. To use that:
> >
> > import org.apache.commons.math.random.RandomData;
> > import org.apache.commons.math.random.RandomDataImpl;
> > RandomData randomData = new RandomDataImpl();
> > ...
> > arry[idx] = randomData.nextGaussian(10, 5);
> >
> > Second, I don't understand what you are expecting from the t-test.
> > TestUtils.tTest(mu, array) returns the p-value associated with a
> > two-tailed test with the null hypothesis that the values in the array
> > come from a distribution with mean = mu.  So small p-values, say less
> > than .01, would indicate that the mean appears to differ significantly
> > from 10. This should happen roughly one in every 100 times.
> > Differences as large as what you observed on your first run should
> > happen about 34 out of every 100 times, etc.  The values reported
> > below do not look surprising to me. They do not support rejecting the
> > null hypothesis that the mean is what it is supposed to be, which is a
> > good thing.
> >
> > To test normality of the deviates, you should apply a normality test
> > to the deviates themselves, e.g. a Kolmogorov-Smirnov test.  Commons
> > math does not currently include normality tests  (patches welcome :).
> > To do this, you would need to dump the generated arrays to a file and
> > then do the test with R or some other package that includes normality
> > tests.
> >
> > Unless I am missing something, I don't think a t-test is going to give
> > you the information that you need to verify that the generated values
> > are normally distributed.  Another thing that you could do is to
> > examine the empirical distribution of the generated values - lay a
> > grid over the range and count how many fall into each range and
> > compare these counts to what you would expect under the hypothesis of
> > normality (essentially what the K-S test does).  You can use
> > org.apache.commons.random.EmpircalDistribution to bin the generated
> > data and get bin counts.
> >
> > If you do find that normality tests fail on the generated values using
> > either your inversion method or the RandomDataImpl.nextGaussian
> > method, please open a Jira ticket
> > (http://commons.apache.org/math/issue-tracking.html) including the R
> > script or output from the package that you used for testing.  Thanks!
> >
> > hth,
> >
> > Phil
> >
> >
> >> Thanks,
> >>
> >> wjr
> >>
> >> package usda.weru.cligen2;
> >>
> >> import org.apache.commons.math.MathException;
> >>
> >> /**
> >>  *
> >>  * @author wjr
> >>  */
> >> public class TestNormal {
> >>
> >>     static org.apache.commons.math.distribution.NormalDistributionImpl nd =
> >>             new
> >> org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);
> >>
> >>     public static void main(String[] args) {
> >>         double[] arry = new double[100000];
> >>         java.util.Random ran = new java.util.Random(1l);
> >>
> >>         for (int jdx = 0; jdx < 10; jdx++) {
> >>             for (int idx = 0; idx < arry.length; idx++) {
> >>                 try {
> >>                     arry[idx] =
> >> nd.inverseCumulativeProbability(ran.nextDouble());
> >>                 } catch (MathException ex) {
> >>                     ex.printStackTrace();
> >>                 }
> >>             }
> >>             try {
> >>                 System.out.println("ttest " +
> >> org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
> >>             } catch (IllegalArgumentException ex) {
> >>                 ex.printStackTrace();
> >>             } catch (MathException ex) {
> >>                 ex.printStackTrace();
> >>             }
> >>         }
> >>     }
> >> }
> >>
> >> Output:
> >>
> >>> run-single:
> >>> ttest 0.3433300114960922
> >>> ttest 0.1431930575825282
> >>> ttest 0.12336027805916228
> >>> ttest 0.49478850669361796
> >>> ttest 0.9216887341410063
> >>> ttest 0.9937228334312525
> >>> ttest 0.13669784550400177
> >>> ttest 0.9646134537758599
> >>> ttest 0.9965741269090211
> >>> ttest 0.03815948891784959
> >>> BUILD SUCCESSFUL (total time: 20 seconds)
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: user-help@commons.apache.org
> >>
> >>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: normal deviates don't pass t test

Posted by Bill Rust <wj...@weru.ksu.edu>.
Thanks for the reply.

First, the reason for using NormalDistributionImpl is that I'm 
translating from FORTRAN and I wasn't thinking a whole lot. Using Random 
makes sense.

Second, I'm not really caring, at least not yet, about how normal my 
output samples are. What I do really care about is that the means of my 
generated samples match the observed means. For example, if I start with 
an observed max temperature of 30, I want to determine that 30 is within 
the confidence interval of the sample at a 90% level. What I think that 
I am getting is, using the same technique, half the time I am hitting my 
goal and the other half my samples stink. If my understanding is 
correct, I should be getting a 90% confidence level 9 times out of 10, 
more or less, which clearly isn't happening.

wjr

Phil Steitz wrote:
> On 9/17/07, William J Rust <wj...@weru.ksu.edu> wrote:
>> I'm working on a climate simulation program that takes monthly averages
>> and generates daily readings that are assumed to be normally
>> distributed. The following program creates 10 sets of 100,000 random
>> deviates with mean 10 and SD 5. It then applies a t test (results below)
>> to ensure that the generated numbers are good enough. As the results
>> show, they aren't. I'm wondering a) I am doing something wrong or b) is
>> there something wrong with the stats routines?
> 
> There are a couple of problems here.  First, while your inversion
> method should generate approximately normally distributed values, it
> is better to use the JDK-supplied method for this (much faster and a
> better algorithm).  There is a wrapped version of this provided in
> org.apache.commons.math.random.RandomDataImpl. To use that:
> 
> import org.apache.commons.math.random.RandomData;
> import org.apache.commons.math.random.RandomDataImpl;
> RandomData randomData = new RandomDataImpl();
> ...
> arry[idx] = randomData.nextGaussian(10, 5);
> 
> Second, I don't understand what you are expecting from the t-test.
> TestUtils.tTest(mu, array) returns the p-value associated with a
> two-tailed test with the null hypothesis that the values in the array
> come from a distribution with mean = mu.  So small p-values, say less
> than .01, would indicate that the mean appears to differ significantly
> from 10. This should happen roughly one in every 100 times.
> Differences as large as what you observed on your first run should
> happen about 34 out of every 100 times, etc.  The values reported
> below do not look surprising to me. They do not support rejecting the
> null hypothesis that the mean is what it is supposed to be, which is a
> good thing.
> 
> To test normality of the deviates, you should apply a normality test
> to the deviates themselves, e.g. a Kolmogorov-Smirnov test.  Commons
> math does not currently include normality tests  (patches welcome :).
> To do this, you would need to dump the generated arrays to a file and
> then do the test with R or some other package that includes normality
> tests.
> 
> Unless I am missing something, I don't think a t-test is going to give
> you the information that you need to verify that the generated values
> are normally distributed.  Another thing that you could do is to
> examine the empirical distribution of the generated values - lay a
> grid over the range and count how many fall into each range and
> compare these counts to what you would expect under the hypothesis of
> normality (essentially what the K-S test does).  You can use
> org.apache.commons.random.EmpircalDistribution to bin the generated
> data and get bin counts.
> 
> If you do find that normality tests fail on the generated values using
> either your inversion method or the RandomDataImpl.nextGaussian
> method, please open a Jira ticket
> (http://commons.apache.org/math/issue-tracking.html) including the R
> script or output from the package that you used for testing.  Thanks!
> 
> hth,
> 
> Phil
> 
> 
>> Thanks,
>>
>> wjr
>>
>> package usda.weru.cligen2;
>>
>> import org.apache.commons.math.MathException;
>>
>> /**
>>  *
>>  * @author wjr
>>  */
>> public class TestNormal {
>>
>>     static org.apache.commons.math.distribution.NormalDistributionImpl nd =
>>             new
>> org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);
>>
>>     public static void main(String[] args) {
>>         double[] arry = new double[100000];
>>         java.util.Random ran = new java.util.Random(1l);
>>
>>         for (int jdx = 0; jdx < 10; jdx++) {
>>             for (int idx = 0; idx < arry.length; idx++) {
>>                 try {
>>                     arry[idx] =
>> nd.inverseCumulativeProbability(ran.nextDouble());
>>                 } catch (MathException ex) {
>>                     ex.printStackTrace();
>>                 }
>>             }
>>             try {
>>                 System.out.println("ttest " +
>> org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
>>             } catch (IllegalArgumentException ex) {
>>                 ex.printStackTrace();
>>             } catch (MathException ex) {
>>                 ex.printStackTrace();
>>             }
>>         }
>>     }
>> }
>>
>> Output:
>>
>>> run-single:
>>> ttest 0.3433300114960922
>>> ttest 0.1431930575825282
>>> ttest 0.12336027805916228
>>> ttest 0.49478850669361796
>>> ttest 0.9216887341410063
>>> ttest 0.9937228334312525
>>> ttest 0.13669784550400177
>>> ttest 0.9646134537758599
>>> ttest 0.9965741269090211
>>> ttest 0.03815948891784959
>>> BUILD SUCCESSFUL (total time: 20 seconds)
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: normal deviates don't pass t test

Posted by Phil Steitz <ph...@gmail.com>.
On 9/17/07, William J Rust <wj...@weru.ksu.edu> wrote:
> I'm working on a climate simulation program that takes monthly averages
> and generates daily readings that are assumed to be normally
> distributed. The following program creates 10 sets of 100,000 random
> deviates with mean 10 and SD 5. It then applies a t test (results below)
> to ensure that the generated numbers are good enough. As the results
> show, they aren't. I'm wondering a) I am doing something wrong or b) is
> there something wrong with the stats routines?

There are a couple of problems here.  First, while your inversion
method should generate approximately normally distributed values, it
is better to use the JDK-supplied method for this (much faster and a
better algorithm).  There is a wrapped version of this provided in
org.apache.commons.math.random.RandomDataImpl. To use that:

import org.apache.commons.math.random.RandomData;
import org.apache.commons.math.random.RandomDataImpl;
RandomData randomData = new RandomDataImpl();
...
arry[idx] = randomData.nextGaussian(10, 5);

Second, I don't understand what you are expecting from the t-test.
TestUtils.tTest(mu, array) returns the p-value associated with a
two-tailed test with the null hypothesis that the values in the array
come from a distribution with mean = mu.  So small p-values, say less
than .01, would indicate that the mean appears to differ significantly
from 10. This should happen roughly one in every 100 times.
Differences as large as what you observed on your first run should
happen about 34 out of every 100 times, etc.  The values reported
below do not look surprising to me. They do not support rejecting the
null hypothesis that the mean is what it is supposed to be, which is a
good thing.

To test normality of the deviates, you should apply a normality test
to the deviates themselves, e.g. a Kolmogorov-Smirnov test.  Commons
math does not currently include normality tests  (patches welcome :).
To do this, you would need to dump the generated arrays to a file and
then do the test with R or some other package that includes normality
tests.

Unless I am missing something, I don't think a t-test is going to give
you the information that you need to verify that the generated values
are normally distributed.  Another thing that you could do is to
examine the empirical distribution of the generated values - lay a
grid over the range and count how many fall into each range and
compare these counts to what you would expect under the hypothesis of
normality (essentially what the K-S test does).  You can use
org.apache.commons.random.EmpircalDistribution to bin the generated
data and get bin counts.

If you do find that normality tests fail on the generated values using
either your inversion method or the RandomDataImpl.nextGaussian
method, please open a Jira ticket
(http://commons.apache.org/math/issue-tracking.html) including the R
script or output from the package that you used for testing.  Thanks!

hth,

Phil


>
> Thanks,
>
> wjr
>
> package usda.weru.cligen2;
>
> import org.apache.commons.math.MathException;
>
> /**
>  *
>  * @author wjr
>  */
> public class TestNormal {
>
>     static org.apache.commons.math.distribution.NormalDistributionImpl nd =
>             new
> org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);
>
>     public static void main(String[] args) {
>         double[] arry = new double[100000];
>         java.util.Random ran = new java.util.Random(1l);
>
>         for (int jdx = 0; jdx < 10; jdx++) {
>             for (int idx = 0; idx < arry.length; idx++) {
>                 try {
>                     arry[idx] =
> nd.inverseCumulativeProbability(ran.nextDouble());
>                 } catch (MathException ex) {
>                     ex.printStackTrace();
>                 }
>             }
>             try {
>                 System.out.println("ttest " +
> org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
>             } catch (IllegalArgumentException ex) {
>                 ex.printStackTrace();
>             } catch (MathException ex) {
>                 ex.printStackTrace();
>             }
>         }
>     }
> }
>
> Output:
>
> >
> > run-single:
> > ttest 0.3433300114960922
> > ttest 0.1431930575825282
> > ttest 0.12336027805916228
> > ttest 0.49478850669361796
> > ttest 0.9216887341410063
> > ttest 0.9937228334312525
> > ttest 0.13669784550400177
> > ttest 0.9646134537758599
> > ttest 0.9965741269090211
> > ttest 0.03815948891784959
> > BUILD SUCCESSFUL (total time: 20 seconds)
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: normal deviates don't pass t test

Posted by James Carman <ja...@carmanconsulting.com>.
Try using SecureRandom.

On 9/17/07, William J Rust <wj...@weru.ksu.edu> wrote:
> I'm working on a climate simulation program that takes monthly averages
> and generates daily readings that are assumed to be normally
> distributed. The following program creates 10 sets of 100,000 random
> deviates with mean 10 and SD 5. It then applies a t test (results below)
> to ensure that the generated numbers are good enough. As the results
> show, they aren't. I'm wondering a) I am doing something wrong or b) is
> there something wrong with the stats routines?
>
> Thanks,
>
> wjr
>
> package usda.weru.cligen2;
>
> import org.apache.commons.math.MathException;
>
> /**
>  *
>  * @author wjr
>  */
> public class TestNormal {
>
>     static org.apache.commons.math.distribution.NormalDistributionImpl nd =
>             new
> org.apache.commons.math.distribution.NormalDistributionImpl(10, 5);
>
>     public static void main(String[] args) {
>         double[] arry = new double[100000];
>         java.util.Random ran = new java.util.Random(1l);
>
>         for (int jdx = 0; jdx < 10; jdx++) {
>             for (int idx = 0; idx < arry.length; idx++) {
>                 try {
>                     arry[idx] =
> nd.inverseCumulativeProbability(ran.nextDouble());
>                 } catch (MathException ex) {
>                     ex.printStackTrace();
>                 }
>             }
>             try {
>                 System.out.println("ttest " +
> org.apache.commons.math.stat.inference.TestUtils.tTest(10,arry));
>             } catch (IllegalArgumentException ex) {
>                 ex.printStackTrace();
>             } catch (MathException ex) {
>                 ex.printStackTrace();
>             }
>         }
>     }
> }
>
> Output:
>
> >
> > run-single:
> > ttest 0.3433300114960922
> > ttest 0.1431930575825282
> > ttest 0.12336027805916228
> > ttest 0.49478850669361796
> > ttest 0.9216887341410063
> > ttest 0.9937228334312525
> > ttest 0.13669784550400177
> > ttest 0.9646134537758599
> > ttest 0.9965741269090211
> > ttest 0.03815948891784959
> > BUILD SUCCESSFUL (total time: 20 seconds)
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org