You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/06/16 21:20:31 UTC

[math] RandomData and ValueServer Failures . . .

I appear to "occasionally" get JUnit test failures from ValueServer and 
RandomData Tests. This would appear to be because the mean sampled 
values can sometimes deviate from the expected mean even for 1000 case 
draws, I know this happens "rarely", just enough over the last month or 
so for me to start to notice this behavior. Oldly, when its off, its off 
in a big way, so its not just a matter of changing the tolerance.

This is probibly a "weak fix" but maybe if you guys increase the sample 
sizes on those tests so you can get a little better tolerance than 0.1.

-Mark




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] RandomData and ValueServer Failures . . .

Posted by Phil Steitz <st...@yahoo.com>.
--- Phil Steitz <st...@yahoo.com> wrote:
> 
> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> > 
> > 
> > Al Chou wrote:
> > > --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> > > 
> > >>I appear to "occasionally" get JUnit test failures from ValueServer and 
> > >>RandomData Tests. This would appear to be because the mean sampled 
> > >>values can sometimes deviate from the expected mean even for 1000 case 
> > >>draws, I know this happens "rarely", just enough over the last month or 
> > >>so for me to start to notice this behavior. Oldly, when its off, its off 
> > >>in a big way, so its not just a matter of changing the tolerance.

One more remark on this.  My previous response explains the RandomDataTest
failures -- these are to be expected roughly 1 in every 1000/(number of tests)
timess.  The ValueServer is more complex (at least the testNextDigest test). 
What is going on there is that first the test is "digesting" an input file to
compute an empirical distribution describing the data.   This is done by an
EmpiricalDistribution instance.  The technique used divides the data range into
"bins", computes bin frequencies and within-bin means and variances and then
stores this information for use in generating data.  The default bin count is
set at 1000 and the ValueServer tests do not override this.  This is a mistake,
since the test file has only 1000 records in it. (Generally the bin count
should be an order of magnitude less than the number of records, as indicated
in the javadoc ;-)) I should have changed this when I cut the test file down to
1000 records.  This may be what is causing the relatively high incidence of
failures for the ValueServer tests.  I will submit a patch decreasing the bin
count for these tests when I get back at the end of this week.  Thanks for
pointing this out.  Strangely, it hasn't happened to me (i.e., the ValueServer
tests failing).

Phil


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] RandomData and ValueServer Failures . . .

Posted by Phil Steitz <st...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> 
> 
> Al Chou wrote:
> > --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> > 
> >>I appear to "occasionally" get JUnit test failures from ValueServer and 
> >>RandomData Tests. This would appear to be because the mean sampled 
> >>values can sometimes deviate from the expected mean even for 1000 case 
> >>draws, I know this happens "rarely", just enough over the last month or 
> >>so for me to start to notice this behavior. Oldly, when its off, its off 
> >>in a big way, so its not just a matter of changing the tolerance.
> > 
> > 
> > Approximately how big is "off in a big way"?  Is it because a pseudorandom
> > number generator is used dynamically in the test?  I guess I should look at
> the
> > test code (which I'll try to do on the train home), but offhand it
> surprises me
> > that the tests are ever far off from their expected results.
> > 
> 
> Greater than the tolerance for the tests, which is surprising to me 
> because its set to be 0.1, the last time I saw it fail the value was 
> approx 5.1xxxxxxxx. Remember, this probibly doesn't happen very often. 
> It would be interesting to run a batch test and actually get an estimate.
> 
> assertEquals("mean", 5.069831575018909, stats.getMean(), tolerance);
> 
> What one has to consider, is that stats.getMean() is a sample mean that 
> can "vary" about the range of variance for the sampled values. Rarely, 
> the sample is of poor enough quality not to effectively describe the 
> populations mean and variance. There is always this small probiblity 
> that the mean of the sample will not match the mean of the values. So 
> testing with an "assertEquals" isn't vary helpfull in terms of getting 
> the mean (I remember another discussion being had earlier on the list 
> concerning having something like an assertApproximatelyEquals.
> 
> But, what I think we could really use are statistically based assertion 
> tests!!!!!!!!!!!!!!!! :-)
> 
> double[] population;
> double tolerance = 0.05;
> 
> assertStudentsT("mean", population, stats, tolerance);
> 
> then the test would be more like a t-test of sorts. Testing if the 
> sampled set of values is "significantly different" than the population 
> set. I get the feeling that this would be a stronger assertion than 
> testing if the means and standard deviations are equal.
> 
> Any Ideas?
> -Mark
> 

There are two concepts mixed up above.  The first one has to do with testing
randomly generated data for conformity to expected distributions.  The current
Junit test cases for RandomData and ValueServer actually do that using
statistical methods.  Specifically, most of the RandomData and ValueServer
tests use chi-square tests to compare the observed data to what would be
expected under the hypothesis that the data actually come from the advertised
distributions.  Like all statistical significance tests, these tests have a
"significance level", which is generally set at .001.  In this context, what
that means is that the tests will "randomly fail" with probability .001, even
if there is nothing wrong with the code.  That is why the Junit messages have a
disclaimer saying something like "will fail approximately 1 in 1000 times. 
Repeated failures indicate a problem".  If you look at the test cases, you will
notice that the tests for generating data from non-Uniform continuous
distributions do not use this type of test.  I think there is a FIXME somewhere
indicating that once we have t-test capabilites (which is now) we should
replace the "absolute" test with an actual t-test.  That might be a good idea;
but I would not place a high priority on it, since there is not much going on
in these methods.  If we do this, we should make sure to set the significance
level to .001 to be consistent with the other tests and to control the
incidence of "false" failures.

Phil


> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] RandomData and ValueServer Failures . . .

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Al Chou wrote:
> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> 
>>I appear to "occasionally" get JUnit test failures from ValueServer and 
>>RandomData Tests. This would appear to be because the mean sampled 
>>values can sometimes deviate from the expected mean even for 1000 case 
>>draws, I know this happens "rarely", just enough over the last month or 
>>so for me to start to notice this behavior. Oldly, when its off, its off 
>>in a big way, so its not just a matter of changing the tolerance.
> 
> 
> Approximately how big is "off in a big way"?  Is it because a pseudorandom
> number generator is used dynamically in the test?  I guess I should look at the
> test code (which I'll try to do on the train home), but offhand it surprises me
> that the tests are ever far off from their expected results.
> 

Greater than the tolerance for the tests, which is surprising to me 
because its set to be 0.1, the last time I saw it fail the value was 
approx 5.1xxxxxxxx. Remember, this probibly doesn't happen very often. 
It would be interesting to run a batch test and actually get an estimate.

assertEquals("mean", 5.069831575018909, stats.getMean(), tolerance);

What one has to consider, is that stats.getMean() is a sample mean that 
can "vary" about the range of variance for the sampled values. Rarely, 
the sample is of poor enough quality not to effectively describe the 
populations mean and variance. There is always this small probiblity 
that the mean of the sample will not match the mean of the values. So 
testing with an "assertEquals" isn't vary helpfull in terms of getting 
the mean (I remember another discussion being had earlier on the list 
concerning having something like an assertApproximatelyEquals.

But, what I think we could really use are statistically based assertion 
tests!!!!!!!!!!!!!!!! :-)

double[] population;
double tolerance = 0.05;

assertStudentsT("mean", population, stats, tolerance);

then the test would be more like a t-test of sorts. Testing if the 
sampled set of values is "significantly different" than the population 
set. I get the feeling that this would be a stronger assertion than 
testing if the means and standard deviations are equal.

Any Ideas?
-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] RandomData and ValueServer Failures . . .

Posted by Al Chou <ho...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> I appear to "occasionally" get JUnit test failures from ValueServer and 
> RandomData Tests. This would appear to be because the mean sampled 
> values can sometimes deviate from the expected mean even for 1000 case 
> draws, I know this happens "rarely", just enough over the last month or 
> so for me to start to notice this behavior. Oldly, when its off, its off 
> in a big way, so its not just a matter of changing the tolerance.

Approximately how big is "off in a big way"?  Is it because a pseudorandom
number generator is used dynamically in the test?  I guess I should look at the
test code (which I'll try to do on the train home), but offhand it surprises me
that the tests are ever far off from their expected results.


Al

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org