You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by "Becksfort, Jared" <Ja...@STJUDE.ORG> on 2012/07/25 06:28:39 UTC

[math] Unit Tests for Multivariate Distribution Sampling

Hello,

I am working on submitting code for multivariate normal distributions, including sampling and unit tests (issue Math-815).  It is my first submission, and it has had some issues with style and other guidelines.  Gilles has given me some useful feedback about several pieces, but I thought I would also ask a question this list.

I need to have a unit test pass deterministically even though the sampling algorithm is inherently stochastic.  I assumed that resetting the seed before sampling would be sufficient to test a few values to within a specified tolerance.  It has worked so far for me.  Gilles suggested, though, that I use the testSampling method in RealDistributionAbstractTest.java as a model.  But it uses a statistical test (Chi-Squared) in addition to resetting the seed.  Aside from the added difficulty of hypothesis testing in more dimensions, is it actually necessary?  Wouldn't resetting the seed give you the same values each time when you sample in the unit test?  Doesn't that make it essentially a deterministic test, eliminating the need for a hypothesis test of the samples?

Thanks,
Jared

Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] Unit Tests for Multivariate Distribution Sampling

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hello Jared.

> 
> I think it will be difficult (for me, at least) to provide a general method for testing sampling across all multivariate distributions.  I imagine it can be done, but I would prefer for now just to make it an abstract  method and expect the writers of future multivariate distribution classes to provide ways to verify their sampling works.  I am new here, though, so that may not be your preference.  I do think a general statistical test such as TestUtils Chi-Squared as applied in RealDistributionAbstractTest may be difficult to apply in general to all multivariate distributions.
> 
> Unless someone has a better idea, I would suggest that for the time being multivariate sampling unit tests be implemented in their own classes, and the base class method remain abstract.

Since you didn't get help from this list about that issue, you are
absolutely entitled to leave it as is.


Thanks for the contribution,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

RE: [math] Unit Tests for Multivariate Distribution Sampling

Posted by "Becksfort, Jared" <Ja...@STJUDE.ORG>.

Gilles,

I think it will be difficult (for me, at least) to provide a general method for testing sampling across all multivariate distributions.  I imagine it can be done, but I would prefer for now just to make it an abstract  method and expect the writers of future multivariate distribution classes to provide ways to verify their sampling works.  I am new here, though, so that may not be your preference.  I do think a general statistical test such as TestUtils Chi-Squared as applied in RealDistributionAbstractTest may be difficult to apply in general to all multivariate distributions.

Unless someone has a better idea, I would suggest that for the time being multivariate sampling unit tests be implemented in their own classes, and the base class method remain abstract.

Jared

-----Original Message-----
From: Gilles Sadowski [mailto:gilles@harfang.homelinux.org]
Sent: Wednesday, July 25, 2012 4:37 AM
To: dev@commons.apache.org
Subject: Re: [math] Unit Tests for Multivariate Distribution Sampling

Hi Jared.

>
> I am working on submitting code for multivariate normal distributions,
> including sampling and unit tests (issue Math-815).  It is my first
> submission, and it has had some issues with style and other guidelines.
> Gilles has given me some useful feedback about several pieces, but I
> thought I would also ask a question this list.
>
> I need to have a unit test pass deterministically even though the
> sampling algorithm is inherently stochastic.  I assumed that resetting
> the seed before sampling would be sufficient to test a few values to
> within a specified tolerance.  It has worked so far for me.  Gilles
> suggested, though, that I use the testSampling method in
> RealDistributionAbstractTest.java as a model.  But it uses a
> statistical test (Chi-Squared) in addition to resetting the seed.
> Aside from the added difficulty of hypothesis testing in more
> dimensions, is it actually necessary?  Wouldn't resetting the seed
> give you the same values each time when you sample in the unit test?
> Doesn't that make it essentially a deterministic test, eliminating the
> need for a hypothesis test of the samples?

There are 2 things:
1. Having a test that sometimes fail just because of one "bad" draw.
   This is indeed solved by selecting a seed.
2. Test that the "sample" of the distribution provides the expected
   result. The "testSampling" referred to is nice because it is set up
   independently of the actual distribution: The expected result of an
   infinite number of draws is known and the statistical test (of the test
   result) checks that the set of actual draws is close enough to the the
   one theoretically expected.

As you say, adapting the hypothesis testing is not necessarily obvious (I don't know), but people here might be able explain what to do...


Thanks,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org



Email Disclaimer:  www.stjude.org/emaildisclaimer
Consultation Disclaimer:  www.stjude.org/consultationdisclaimer


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [math] Unit Tests for Multivariate Distribution Sampling

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.

Hi Jared.

> 
> I am working on submitting code for multivariate normal distributions, 
> including sampling and unit tests (issue Math-815).  It is my first 
> submission, and it has had some issues with style and other guidelines. 
> Gilles has given me some useful feedback about several pieces, but I
> thought I would also ask a question this list.
> 
> I need to have a unit test pass deterministically even though the
> sampling algorithm is inherently stochastic.  I assumed that resetting
> the seed before sampling would be sufficient to test a few values to
> within a specified tolerance.  It has worked so far for me.  Gilles
> suggested, though, that I use the testSampling method in
> RealDistributionAbstractTest.java as a model.  But it uses a
> statistical test (Chi-Squared) in addition to resetting the seed.
> Aside from the added difficulty of hypothesis testing in more
> dimensions, is it actually necessary?  Wouldn't resetting the seed
> give you the same values each time when you sample in the unit test?
> Doesn't that make it essentially a deterministic test, eliminating the
> need for a hypothesis test of the samples?

There are 2 things:
1. Having a test that sometimes fail just because of one "bad" draw.
   This is indeed solved by selecting a seed.
2. Test that the "sample" of the distribution provides the expected
   result. The "testSampling" referred to is nice because it is set up
   independently of the actual distribution: The expected result of an
   infinite number of draws is known and the statistical test (of the test
   result) checks that the set of actual draws is close enough to the the
   one theoretically expected.

As you say, adapting the hypothesis testing is not necessarily obvious (I
don't know), but people here might be able explain what to do...


Thanks,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org