You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by Pedro Larroy <pe...@gmail.com> on 2017/10/16 12:56:21 UTC

Improving and rationalizing unit tests

Hi

Some of the unit tests are extremely costly in terms of memory and compute.

As an example in the gluon tests we are loading all the datasets.

test_gluon_data.test_datasets

Also running huge networks like resnets in test_gluon_model_zoo.

This is ridiculously slow, and straight impossible on some embedded /
memory constrained devices, and anyway is making tests run for longer than
needed.

Unit tests should be small, self contained, if possible pure (avoiding this
kind of dataset IO if possible).

I think it would be better to split them in real unit tests and extended
integration test suites that do more intensive computation. This would also
help with the feedback time with PRs and CI infrastructure.


Thoughts?

Re: Improving and rationalizing unit tests

Posted by Chris Olivier <cj...@gmail.com>.
I'd like to respectfully dispute the assumption that it's hard to debug
with random values:

If a test is failing with any sort of frequency, it's easy to come up with
offending values by running the test in a loop for 10000 times or so.  I
did this just yesterday to prove a test case was also failing in v0.11.0
with similar frequency.  If you can consistently run it 10k+ times without
it failing, then it's probably not causing any serious  problems in CI.

On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <pe...@gmail.com>
wrote:

> I think using a properly seeded and initialized (pseudo)random is actually
> beneficial (and deterministic), handpicked examples are usually too
> simplistic and miss corner cases.
>
> Better yet is to use property based testing, which will pick corner cases
> and do fuzzing automatically to check with high degree of confidence that a
> testing condition holds.
>
> Probably it would be good if we use a property based testing library in
> adition to nose to check invariants.
>
> A quick googling yields this one for python for example:
> https://hypothesis.readthedocs.io/en/latest/quickstart.html does anyone
> have experience or can recommend a nice property based testing library for
> python?
>
>
> Regards
>
> On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bh...@gmail.com>
> wrote:
>
> > I agree with Pedro.
> >
> > Based on various observations on unit test failures, I would like to
> > propose a few guidelines to follow for the unit tests. Even though I use
> > the word, “must” for my humble opinions below, please feel free to
> suggest
> > alternatives or modifications to these guidelines:
> >
> > 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X =
> 2
> > minutes max.
> > 1b) The total run time budget for all unit tests <= Y minutes. Say, Y =
> 60
> > minutes max.
> >
> > 2) All Unit tests must have deterministic (not Stochastic) behavior. That
> > is, instead of using the random() function to test a range of input
> values,
> > each input test value must be carefully hand-picked to represent the
> > commonly used input scenarios. The correct place to stochastically test
> > random input values is to have continuously running nightly tests and NOT
> > the sanity/smoke/unit tests for each PR.
> >
> > 3) All Unit tests must be as much self-contained and independent of
> > external components as possible. For example, datasets required for the
> > unit test must NOT be present on external website which, if unreachable,
> > can cause test run failures. Instead, all datasets must be available
> > locally.
> >
> > 4) It is impossible to test everything in unit tests and so only common
> > use-cases and code-paths must be tested in unit-tests. Less common
> > scenarios like integration with 3rd party products must be tested in
> > nightly/weekly tests.
> >
> > 5) A unit test must NOT be disabled on a failure unless proven to exhibit
> > unreliable behavior. The burden-of-proof for a test failure must be on
> the
> > PR submitter and the PR must NOT be merged without a opening a new github
> > issue explaining the problem. If the unit test is disabled for some
> reason,
> > then the unit test must NOT be removed from the unit tests list; instead
> > the unit test must be modified to add the following lines at the start of
> > the test:
> >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
> >     Exit(0)
> >
> > Please suggest modifications to the above proposal such that we can make
> > the unit tests framework to be the rock-solid foundation for the active
> > development of Apache MXNet (Incubating).
> >
> > Regards,
> > Bhavin Thaker.
> >
> >
> > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > Some of the unit tests are extremely costly in terms of memory and
> > compute.
> > >
> > > As an example in the gluon tests we are loading all the datasets.
> > >
> > > test_gluon_data.test_datasets
> > >
> > > Also running huge networks like resnets in test_gluon_model_zoo.
> > >
> > > This is ridiculously slow, and straight impossible on some embedded /
> > > memory constrained devices, and anyway is making tests run for longer
> > than
> > > needed.
> > >
> > > Unit tests should be small, self contained, if possible pure (avoiding
> > this
> > > kind of dataset IO if possible).
> > >
> > > I think it would be better to split them in real unit tests and
> extended
> > > integration test suites that do more intensive computation. This would
> > also
> > > help with the feedback time with PRs and CI infrastructure.
> > >
> > >
> > > Thoughts?
> > >
> >
>

Re: Improving and rationalizing unit tests

Posted by kellen sunderland <ke...@gmail.com>.
Interesting package Sheng, but I think one problem with that approach is
that tests are already taking an extremely long time to run.  Re-running
tests that fail could make it even more difficult to merge PRs.

On Mon, Oct 16, 2017 at 8:45 PM, Zha, Sheng <zh...@amazon.com> wrote:

> Here’s a package that may help us on flaky tests:
> https://pypi.python.org/pypi/flaky. It can retry tests that are marked
> flaky and can pass or fail based on specified threshold. In the short term
> we can use this to pass tests that are not 100% reliable.
>
> Best regards,
> -sz
>
> On 10/16/17, 10:32 AM, "kellen sunderland" <ke...@gmail.com>
> wrote:
>
>     I think you’ve covered the pros/cons of having determinism in your
> tests.  It seems like a potential maintenance effort versus forced code
> robustness argument to me.  I’d suggest you have a separate vote on this
> topic.
>
>     For me the main topic that should be focused on is making the CI
> system fast and stable for PR builds.  I think the best road to doing this,
> as previously suggested, is to segment tests and move those that are long
> running, fail occasionally, or require external components into a
> nightly/weekly test.
>
>     I would also propose that:
>
>     6) Test fixtures are created to test a subset of functionality, and
> that we don’t have test fixtures like test_operator.py that are nearly 5000
> lines long, and take 20 minutes to run.  There’s a few advantages to
> breaking these tests into smaller files:
>
>     We will have fewer merge conflicts, because fewer people will be
> editing the same test files across PRs.  Debugging issues with tests will
> become easier, as hopefully there will be less potential side effects
> between tests (this does happen now). We may be a little more confident
> that the tests run independently, eventually meaning that we could run them
> in parallel more easily, which would reduce test run latency time (but not
> throughput).  Last, we will be able to disable tests at some convenient
> level of granularity, for example when running on IoT devices, or without
> OpenCV.  At the moment we’d have to disable tests individually.
>
>     7) We cleanup tests that are no longer needed:
>
>     I’ve personally found it quite unintuitive in MXNet to discover which
> tests are actually needed, where they are run, how often, etc.  Are the
> nightly tests actually being run nightly?  Are the cpp tests run, why is
> the Travis CI folder still there, what is the difference between ci_build
> folder and the Jenkins folder, etc.  If we’re going to take a look at
> revamping the tests folder I’d recommend we clean up the folder structure a
> bit, and delete the non-relevant files to make it easier for newcomers to
> know what’s happening.  We’ll always have these files for reference in
> source control.
>
>     -Kellen
>
>     From: Chris Olivier
>     Sent: Monday, October 16, 2017 6:46 PM
>     To: dev@mxnet.incubator.apache.org
>     Subject: Re: Improving and rationalizing unit tests
>
>     My argument is that I am actually categorically against having a
>     requirement that the same input values be used for testing for every
> run.
>
>     I don't personally view "convenience in reproducing" as outweighing
>     "finding edge cases that I didn't think of or that haven't been tried
>     before".
>
>     On Mon, Oct 16, 2017 at 9:34 AM, Pedro Larroy <
> pedro.larroy.lists@gmail.com>
>     wrote:
>
>     > It's always going to be deterministic one way or another unless you
> use
>     > random from the entropy pool such as /dev/random. I don't think it's
> a good
>     > practice not to seed properly and have values depend on execution
> order /
>     > parallelism / time or whatever, but that's just my opinion. I would
> want to
>     > use the same values for all test runs for reproducibility.
>     >
>     > I think your argument goes more towards the previously mentioned
> "property
>     > based testing" approach, which is in the spirit of what you are
> supporting,
>     > if I'm not mistaken.
>     >
>     > On Mon, Oct 16, 2017 at 6:22 PM, Chris Olivier <
> cjolivier01@gmail.com>
>     > wrote:
>     >
>     > > My take on the suggestion of purely deterministic inputs is
> (including
>     > > deterministic seeding):
>     > >
>     > > "I want the same values to be used for all test runs because it is
>     > > inconvenient when a unit test fails for some edge cases.  I prefer
> that
>     > > unforseen edge case failures occur in the field and not during
> testing".
>     > >
>     > > Is this the motivation?  Seems strange to me.
>     > >
>     > >
>     > > On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <
>     > > pedro.larroy.lists@gmail.com>
>     > > wrote:
>     > >
>     > > > I think using a properly seeded and initialized (pseudo)random is
>     > > actually
>     > > > beneficial (and deterministic), handpicked examples are usually
> too
>     > > > simplistic and miss corner cases.
>     > > >
>     > > > Better yet is to use property based testing, which will pick
> corner
>     > cases
>     > > > and do fuzzing automatically to check with high degree of
> confidence
>     > > that a
>     > > > testing condition holds.
>     > > >
>     > > > Probably it would be good if we use a property based testing
> library in
>     > > > adition to nose to check invariants.
>     > > >
>     > > > A quick googling yields this one for python for example:
>     > > > https://hypothesis.readthedocs.io/en/latest/quickstart.html does
>     > anyone
>     > > > have experience or can recommend a nice property based testing
> library
>     > > for
>     > > > python?
>     > > >
>     > > >
>     > > > Regards
>     > > >
>     > > > On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <
> bhavinthaker@gmail.com
>     > >
>     > > > wrote:
>     > > >
>     > > > > I agree with Pedro.
>     > > > >
>     > > > > Based on various observations on unit test failures, I would
> like to
>     > > > > propose a few guidelines to follow for the unit tests. Even
> though I
>     > > use
>     > > > > the word, “must” for my humble opinions below, please feel
> free to
>     > > > suggest
>     > > > > alternatives or modifications to these guidelines:
>     > > > >
>     > > > > 1) 1a) Each unit test must have a run time budget <= X
> minutes. Say,
>     > X
>     > > =
>     > > > 2
>     > > > > minutes max.
>     > > > > 1b) The total run time budget for all unit tests <= Y minutes.
> Say,
>     > Y =
>     > > > 60
>     > > > > minutes max.
>     > > > >
>     > > > > 2) All Unit tests must have deterministic (not Stochastic)
> behavior.
>     > > That
>     > > > > is, instead of using the random() function to test a range of
> input
>     > > > values,
>     > > > > each input test value must be carefully hand-picked to
> represent the
>     > > > > commonly used input scenarios. The correct place to
> stochastically
>     > test
>     > > > > random input values is to have continuously running nightly
> tests and
>     > > NOT
>     > > > > the sanity/smoke/unit tests for each PR.
>     > > > >
>     > > > > 3) All Unit tests must be as much self-contained and
> independent of
>     > > > > external components as possible. For example, datasets
> required for
>     > the
>     > > > > unit test must NOT be present on external website which, if
>     > > unreachable,
>     > > > > can cause test run failures. Instead, all datasets must be
> available
>     > > > > locally.
>     > > > >
>     > > > > 4) It is impossible to test everything in unit tests and so
> only
>     > common
>     > > > > use-cases and code-paths must be tested in unit-tests. Less
> common
>     > > > > scenarios like integration with 3rd party products must be
> tested in
>     > > > > nightly/weekly tests.
>     > > > >
>     > > > > 5) A unit test must NOT be disabled on a failure unless proven
> to
>     > > exhibit
>     > > > > unreliable behavior. The burden-of-proof for a test failure
> must be
>     > on
>     > > > the
>     > > > > PR submitter and the PR must NOT be merged without a opening a
> new
>     > > github
>     > > > > issue explaining the problem. If the unit test is disabled for
> some
>     > > > reason,
>     > > > > then the unit test must NOT be removed from the unit tests
> list;
>     > > instead
>     > > > > the unit test must be modified to add the following lines at
> the
>     > start
>     > > of
>     > > > > the test:
>     > > > >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
>     > > > >     Exit(0)
>     > > > >
>     > > > > Please suggest modifications to the above proposal such that
> we can
>     > > make
>     > > > > the unit tests framework to be the rock-solid foundation for
> the
>     > active
>     > > > > development of Apache MXNet (Incubating).
>     > > > >
>     > > > > Regards,
>     > > > > Bhavin Thaker.
>     > > > >
>     > > > >
>     > > > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
>     > > > pedro.larroy.lists@gmail.com
>     > > > > >
>     > > > > wrote:
>     > > > >
>     > > > > > Hi
>     > > > > >
>     > > > > > Some of the unit tests are extremely costly in terms of
> memory and
>     > > > > compute.
>     > > > > >
>     > > > > > As an example in the gluon tests we are loading all the
> datasets.
>     > > > > >
>     > > > > > test_gluon_data.test_datasets
>     > > > > >
>     > > > > > Also running huge networks like resnets in
> test_gluon_model_zoo.
>     > > > > >
>     > > > > > This is ridiculously slow, and straight impossible on some
>     > embedded /
>     > > > > > memory constrained devices, and anyway is making tests run
> for
>     > longer
>     > > > > than
>     > > > > > needed.
>     > > > > >
>     > > > > > Unit tests should be small, self contained, if possible pure
>     > > (avoiding
>     > > > > this
>     > > > > > kind of dataset IO if possible).
>     > > > > >
>     > > > > > I think it would be better to split them in real unit tests
> and
>     > > > extended
>     > > > > > integration test suites that do more intensive computation.
> This
>     > > would
>     > > > > also
>     > > > > > help with the feedback time with PRs and CI infrastructure.
>     > > > > >
>     > > > > >
>     > > > > > Thoughts?
>     > > > > >
>     > > > >
>     > > >
>     > >
>     >
>
>
>
>

Re: Improving and rationalizing unit tests

Posted by "Zha, Sheng" <zh...@amazon.com>.
Here’s a package that may help us on flaky tests: https://pypi.python.org/pypi/flaky. It can retry tests that are marked flaky and can pass or fail based on specified threshold. In the short term we can use this to pass tests that are not 100% reliable.

Best regards,
-sz

On 10/16/17, 10:32 AM, "kellen sunderland" <ke...@gmail.com> wrote:

    I think you’ve covered the pros/cons of having determinism in your tests.  It seems like a potential maintenance effort versus forced code robustness argument to me.  I’d suggest you have a separate vote on this topic.
    
    For me the main topic that should be focused on is making the CI system fast and stable for PR builds.  I think the best road to doing this, as previously suggested, is to segment tests and move those that are long running, fail occasionally, or require external components into a nightly/weekly test.
    
    I would also propose that:
    
    6) Test fixtures are created to test a subset of functionality, and that we don’t have test fixtures like test_operator.py that are nearly 5000 lines long, and take 20 minutes to run.  There’s a few advantages to breaking these tests into smaller files:  
    
    We will have fewer merge conflicts, because fewer people will be editing the same test files across PRs.  Debugging issues with tests will become easier, as hopefully there will be less potential side effects between tests (this does happen now). We may be a little more confident that the tests run independently, eventually meaning that we could run them in parallel more easily, which would reduce test run latency time (but not throughput).  Last, we will be able to disable tests at some convenient level of granularity, for example when running on IoT devices, or without OpenCV.  At the moment we’d have to disable tests individually.
    
    7) We cleanup tests that are no longer needed:
    
    I’ve personally found it quite unintuitive in MXNet to discover which tests are actually needed, where they are run, how often, etc.  Are the nightly tests actually being run nightly?  Are the cpp tests run, why is the Travis CI folder still there, what is the difference between ci_build folder and the Jenkins folder, etc.  If we’re going to take a look at revamping the tests folder I’d recommend we clean up the folder structure a bit, and delete the non-relevant files to make it easier for newcomers to know what’s happening.  We’ll always have these files for reference in source control.
    
    -Kellen
    
    From: Chris Olivier
    Sent: Monday, October 16, 2017 6:46 PM
    To: dev@mxnet.incubator.apache.org
    Subject: Re: Improving and rationalizing unit tests
    
    My argument is that I am actually categorically against having a
    requirement that the same input values be used for testing for every run.
    
    I don't personally view "convenience in reproducing" as outweighing
    "finding edge cases that I didn't think of or that haven't been tried
    before".
    
    On Mon, Oct 16, 2017 at 9:34 AM, Pedro Larroy <pe...@gmail.com>
    wrote:
    
    > It's always going to be deterministic one way or another unless you use
    > random from the entropy pool such as /dev/random. I don't think it's a good
    > practice not to seed properly and have values depend on execution order /
    > parallelism / time or whatever, but that's just my opinion. I would want to
    > use the same values for all test runs for reproducibility.
    >
    > I think your argument goes more towards the previously mentioned "property
    > based testing" approach, which is in the spirit of what you are supporting,
    > if I'm not mistaken.
    >
    > On Mon, Oct 16, 2017 at 6:22 PM, Chris Olivier <cj...@gmail.com>
    > wrote:
    >
    > > My take on the suggestion of purely deterministic inputs is (including
    > > deterministic seeding):
    > >
    > > "I want the same values to be used for all test runs because it is
    > > inconvenient when a unit test fails for some edge cases.  I prefer that
    > > unforseen edge case failures occur in the field and not during testing".
    > >
    > > Is this the motivation?  Seems strange to me.
    > >
    > >
    > > On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <
    > > pedro.larroy.lists@gmail.com>
    > > wrote:
    > >
    > > > I think using a properly seeded and initialized (pseudo)random is
    > > actually
    > > > beneficial (and deterministic), handpicked examples are usually too
    > > > simplistic and miss corner cases.
    > > >
    > > > Better yet is to use property based testing, which will pick corner
    > cases
    > > > and do fuzzing automatically to check with high degree of confidence
    > > that a
    > > > testing condition holds.
    > > >
    > > > Probably it would be good if we use a property based testing library in
    > > > adition to nose to check invariants.
    > > >
    > > > A quick googling yields this one for python for example:
    > > > https://hypothesis.readthedocs.io/en/latest/quickstart.html does
    > anyone
    > > > have experience or can recommend a nice property based testing library
    > > for
    > > > python?
    > > >
    > > >
    > > > Regards
    > > >
    > > > On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bhavinthaker@gmail.com
    > >
    > > > wrote:
    > > >
    > > > > I agree with Pedro.
    > > > >
    > > > > Based on various observations on unit test failures, I would like to
    > > > > propose a few guidelines to follow for the unit tests. Even though I
    > > use
    > > > > the word, “must” for my humble opinions below, please feel free to
    > > > suggest
    > > > > alternatives or modifications to these guidelines:
    > > > >
    > > > > 1) 1a) Each unit test must have a run time budget <= X minutes. Say,
    > X
    > > =
    > > > 2
    > > > > minutes max.
    > > > > 1b) The total run time budget for all unit tests <= Y minutes. Say,
    > Y =
    > > > 60
    > > > > minutes max.
    > > > >
    > > > > 2) All Unit tests must have deterministic (not Stochastic) behavior.
    > > That
    > > > > is, instead of using the random() function to test a range of input
    > > > values,
    > > > > each input test value must be carefully hand-picked to represent the
    > > > > commonly used input scenarios. The correct place to stochastically
    > test
    > > > > random input values is to have continuously running nightly tests and
    > > NOT
    > > > > the sanity/smoke/unit tests for each PR.
    > > > >
    > > > > 3) All Unit tests must be as much self-contained and independent of
    > > > > external components as possible. For example, datasets required for
    > the
    > > > > unit test must NOT be present on external website which, if
    > > unreachable,
    > > > > can cause test run failures. Instead, all datasets must be available
    > > > > locally.
    > > > >
    > > > > 4) It is impossible to test everything in unit tests and so only
    > common
    > > > > use-cases and code-paths must be tested in unit-tests. Less common
    > > > > scenarios like integration with 3rd party products must be tested in
    > > > > nightly/weekly tests.
    > > > >
    > > > > 5) A unit test must NOT be disabled on a failure unless proven to
    > > exhibit
    > > > > unreliable behavior. The burden-of-proof for a test failure must be
    > on
    > > > the
    > > > > PR submitter and the PR must NOT be merged without a opening a new
    > > github
    > > > > issue explaining the problem. If the unit test is disabled for some
    > > > reason,
    > > > > then the unit test must NOT be removed from the unit tests list;
    > > instead
    > > > > the unit test must be modified to add the following lines at the
    > start
    > > of
    > > > > the test:
    > > > >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
    > > > >     Exit(0)
    > > > >
    > > > > Please suggest modifications to the above proposal such that we can
    > > make
    > > > > the unit tests framework to be the rock-solid foundation for the
    > active
    > > > > development of Apache MXNet (Incubating).
    > > > >
    > > > > Regards,
    > > > > Bhavin Thaker.
    > > > >
    > > > >
    > > > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
    > > > pedro.larroy.lists@gmail.com
    > > > > >
    > > > > wrote:
    > > > >
    > > > > > Hi
    > > > > >
    > > > > > Some of the unit tests are extremely costly in terms of memory and
    > > > > compute.
    > > > > >
    > > > > > As an example in the gluon tests we are loading all the datasets.
    > > > > >
    > > > > > test_gluon_data.test_datasets
    > > > > >
    > > > > > Also running huge networks like resnets in test_gluon_model_zoo.
    > > > > >
    > > > > > This is ridiculously slow, and straight impossible on some
    > embedded /
    > > > > > memory constrained devices, and anyway is making tests run for
    > longer
    > > > > than
    > > > > > needed.
    > > > > >
    > > > > > Unit tests should be small, self contained, if possible pure
    > > (avoiding
    > > > > this
    > > > > > kind of dataset IO if possible).
    > > > > >
    > > > > > I think it would be better to split them in real unit tests and
    > > > extended
    > > > > > integration test suites that do more intensive computation. This
    > > would
    > > > > also
    > > > > > help with the feedback time with PRs and CI infrastructure.
    > > > > >
    > > > > >
    > > > > > Thoughts?
    > > > > >
    > > > >
    > > >
    > >
    >
    
    


RE: Improving and rationalizing unit tests

Posted by kellen sunderland <ke...@gmail.com>.
I think you’ve covered the pros/cons of having determinism in your tests.  It seems like a potential maintenance effort versus forced code robustness argument to me.  I’d suggest you have a separate vote on this topic.

For me the main topic that should be focused on is making the CI system fast and stable for PR builds.  I think the best road to doing this, as previously suggested, is to segment tests and move those that are long running, fail occasionally, or require external components into a nightly/weekly test.

I would also propose that:

6) Test fixtures are created to test a subset of functionality, and that we don’t have test fixtures like test_operator.py that are nearly 5000 lines long, and take 20 minutes to run.  There’s a few advantages to breaking these tests into smaller files:  

We will have fewer merge conflicts, because fewer people will be editing the same test files across PRs.  Debugging issues with tests will become easier, as hopefully there will be less potential side effects between tests (this does happen now). We may be a little more confident that the tests run independently, eventually meaning that we could run them in parallel more easily, which would reduce test run latency time (but not throughput).  Last, we will be able to disable tests at some convenient level of granularity, for example when running on IoT devices, or without OpenCV.  At the moment we’d have to disable tests individually.

7) We cleanup tests that are no longer needed:

I’ve personally found it quite unintuitive in MXNet to discover which tests are actually needed, where they are run, how often, etc.  Are the nightly tests actually being run nightly?  Are the cpp tests run, why is the Travis CI folder still there, what is the difference between ci_build folder and the Jenkins folder, etc.  If we’re going to take a look at revamping the tests folder I’d recommend we clean up the folder structure a bit, and delete the non-relevant files to make it easier for newcomers to know what’s happening.  We’ll always have these files for reference in source control.

-Kellen

From: Chris Olivier
Sent: Monday, October 16, 2017 6:46 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: Improving and rationalizing unit tests

My argument is that I am actually categorically against having a
requirement that the same input values be used for testing for every run.

I don't personally view "convenience in reproducing" as outweighing
"finding edge cases that I didn't think of or that haven't been tried
before".

On Mon, Oct 16, 2017 at 9:34 AM, Pedro Larroy <pe...@gmail.com>
wrote:

> It's always going to be deterministic one way or another unless you use
> random from the entropy pool such as /dev/random. I don't think it's a good
> practice not to seed properly and have values depend on execution order /
> parallelism / time or whatever, but that's just my opinion. I would want to
> use the same values for all test runs for reproducibility.
>
> I think your argument goes more towards the previously mentioned "property
> based testing" approach, which is in the spirit of what you are supporting,
> if I'm not mistaken.
>
> On Mon, Oct 16, 2017 at 6:22 PM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > My take on the suggestion of purely deterministic inputs is (including
> > deterministic seeding):
> >
> > "I want the same values to be used for all test runs because it is
> > inconvenient when a unit test fails for some edge cases.  I prefer that
> > unforseen edge case failures occur in the field and not during testing".
> >
> > Is this the motivation?  Seems strange to me.
> >
> >
> > On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > wrote:
> >
> > > I think using a properly seeded and initialized (pseudo)random is
> > actually
> > > beneficial (and deterministic), handpicked examples are usually too
> > > simplistic and miss corner cases.
> > >
> > > Better yet is to use property based testing, which will pick corner
> cases
> > > and do fuzzing automatically to check with high degree of confidence
> > that a
> > > testing condition holds.
> > >
> > > Probably it would be good if we use a property based testing library in
> > > adition to nose to check invariants.
> > >
> > > A quick googling yields this one for python for example:
> > > https://hypothesis.readthedocs.io/en/latest/quickstart.html does
> anyone
> > > have experience or can recommend a nice property based testing library
> > for
> > > python?
> > >
> > >
> > > Regards
> > >
> > > On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bhavinthaker@gmail.com
> >
> > > wrote:
> > >
> > > > I agree with Pedro.
> > > >
> > > > Based on various observations on unit test failures, I would like to
> > > > propose a few guidelines to follow for the unit tests. Even though I
> > use
> > > > the word, “must” for my humble opinions below, please feel free to
> > > suggest
> > > > alternatives or modifications to these guidelines:
> > > >
> > > > 1) 1a) Each unit test must have a run time budget <= X minutes. Say,
> X
> > =
> > > 2
> > > > minutes max.
> > > > 1b) The total run time budget for all unit tests <= Y minutes. Say,
> Y =
> > > 60
> > > > minutes max.
> > > >
> > > > 2) All Unit tests must have deterministic (not Stochastic) behavior.
> > That
> > > > is, instead of using the random() function to test a range of input
> > > values,
> > > > each input test value must be carefully hand-picked to represent the
> > > > commonly used input scenarios. The correct place to stochastically
> test
> > > > random input values is to have continuously running nightly tests and
> > NOT
> > > > the sanity/smoke/unit tests for each PR.
> > > >
> > > > 3) All Unit tests must be as much self-contained and independent of
> > > > external components as possible. For example, datasets required for
> the
> > > > unit test must NOT be present on external website which, if
> > unreachable,
> > > > can cause test run failures. Instead, all datasets must be available
> > > > locally.
> > > >
> > > > 4) It is impossible to test everything in unit tests and so only
> common
> > > > use-cases and code-paths must be tested in unit-tests. Less common
> > > > scenarios like integration with 3rd party products must be tested in
> > > > nightly/weekly tests.
> > > >
> > > > 5) A unit test must NOT be disabled on a failure unless proven to
> > exhibit
> > > > unreliable behavior. The burden-of-proof for a test failure must be
> on
> > > the
> > > > PR submitter and the PR must NOT be merged without a opening a new
> > github
> > > > issue explaining the problem. If the unit test is disabled for some
> > > reason,
> > > > then the unit test must NOT be removed from the unit tests list;
> > instead
> > > > the unit test must be modified to add the following lines at the
> start
> > of
> > > > the test:
> > > >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
> > > >     Exit(0)
> > > >
> > > > Please suggest modifications to the above proposal such that we can
> > make
> > > > the unit tests framework to be the rock-solid foundation for the
> active
> > > > development of Apache MXNet (Incubating).
> > > >
> > > > Regards,
> > > > Bhavin Thaker.
> > > >
> > > >
> > > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
> > > pedro.larroy.lists@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Some of the unit tests are extremely costly in terms of memory and
> > > > compute.
> > > > >
> > > > > As an example in the gluon tests we are loading all the datasets.
> > > > >
> > > > > test_gluon_data.test_datasets
> > > > >
> > > > > Also running huge networks like resnets in test_gluon_model_zoo.
> > > > >
> > > > > This is ridiculously slow, and straight impossible on some
> embedded /
> > > > > memory constrained devices, and anyway is making tests run for
> longer
> > > > than
> > > > > needed.
> > > > >
> > > > > Unit tests should be small, self contained, if possible pure
> > (avoiding
> > > > this
> > > > > kind of dataset IO if possible).
> > > > >
> > > > > I think it would be better to split them in real unit tests and
> > > extended
> > > > > integration test suites that do more intensive computation. This
> > would
> > > > also
> > > > > help with the feedback time with PRs and CI infrastructure.
> > > > >
> > > > >
> > > > > Thoughts?
> > > > >
> > > >
> > >
> >
>


Re: Improving and rationalizing unit tests

Posted by Chris Olivier <cj...@gmail.com>.
My argument is that I am actually categorically against having a
requirement that the same input values be used for testing for every run.

I don't personally view "convenience in reproducing" as outweighing
"finding edge cases that I didn't think of or that haven't been tried
before".

On Mon, Oct 16, 2017 at 9:34 AM, Pedro Larroy <pe...@gmail.com>
wrote:

> It's always going to be deterministic one way or another unless you use
> random from the entropy pool such as /dev/random. I don't think it's a good
> practice not to seed properly and have values depend on execution order /
> parallelism / time or whatever, but that's just my opinion. I would want to
> use the same values for all test runs for reproducibility.
>
> I think your argument goes more towards the previously mentioned "property
> based testing" approach, which is in the spirit of what you are supporting,
> if I'm not mistaken.
>
> On Mon, Oct 16, 2017 at 6:22 PM, Chris Olivier <cj...@gmail.com>
> wrote:
>
> > My take on the suggestion of purely deterministic inputs is (including
> > deterministic seeding):
> >
> > "I want the same values to be used for all test runs because it is
> > inconvenient when a unit test fails for some edge cases.  I prefer that
> > unforseen edge case failures occur in the field and not during testing".
> >
> > Is this the motivation?  Seems strange to me.
> >
> >
> > On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <
> > pedro.larroy.lists@gmail.com>
> > wrote:
> >
> > > I think using a properly seeded and initialized (pseudo)random is
> > actually
> > > beneficial (and deterministic), handpicked examples are usually too
> > > simplistic and miss corner cases.
> > >
> > > Better yet is to use property based testing, which will pick corner
> cases
> > > and do fuzzing automatically to check with high degree of confidence
> > that a
> > > testing condition holds.
> > >
> > > Probably it would be good if we use a property based testing library in
> > > adition to nose to check invariants.
> > >
> > > A quick googling yields this one for python for example:
> > > https://hypothesis.readthedocs.io/en/latest/quickstart.html does
> anyone
> > > have experience or can recommend a nice property based testing library
> > for
> > > python?
> > >
> > >
> > > Regards
> > >
> > > On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bhavinthaker@gmail.com
> >
> > > wrote:
> > >
> > > > I agree with Pedro.
> > > >
> > > > Based on various observations on unit test failures, I would like to
> > > > propose a few guidelines to follow for the unit tests. Even though I
> > use
> > > > the word, “must” for my humble opinions below, please feel free to
> > > suggest
> > > > alternatives or modifications to these guidelines:
> > > >
> > > > 1) 1a) Each unit test must have a run time budget <= X minutes. Say,
> X
> > =
> > > 2
> > > > minutes max.
> > > > 1b) The total run time budget for all unit tests <= Y minutes. Say,
> Y =
> > > 60
> > > > minutes max.
> > > >
> > > > 2) All Unit tests must have deterministic (not Stochastic) behavior.
> > That
> > > > is, instead of using the random() function to test a range of input
> > > values,
> > > > each input test value must be carefully hand-picked to represent the
> > > > commonly used input scenarios. The correct place to stochastically
> test
> > > > random input values is to have continuously running nightly tests and
> > NOT
> > > > the sanity/smoke/unit tests for each PR.
> > > >
> > > > 3) All Unit tests must be as much self-contained and independent of
> > > > external components as possible. For example, datasets required for
> the
> > > > unit test must NOT be present on external website which, if
> > unreachable,
> > > > can cause test run failures. Instead, all datasets must be available
> > > > locally.
> > > >
> > > > 4) It is impossible to test everything in unit tests and so only
> common
> > > > use-cases and code-paths must be tested in unit-tests. Less common
> > > > scenarios like integration with 3rd party products must be tested in
> > > > nightly/weekly tests.
> > > >
> > > > 5) A unit test must NOT be disabled on a failure unless proven to
> > exhibit
> > > > unreliable behavior. The burden-of-proof for a test failure must be
> on
> > > the
> > > > PR submitter and the PR must NOT be merged without a opening a new
> > github
> > > > issue explaining the problem. If the unit test is disabled for some
> > > reason,
> > > > then the unit test must NOT be removed from the unit tests list;
> > instead
> > > > the unit test must be modified to add the following lines at the
> start
> > of
> > > > the test:
> > > >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
> > > >     Exit(0)
> > > >
> > > > Please suggest modifications to the above proposal such that we can
> > make
> > > > the unit tests framework to be the rock-solid foundation for the
> active
> > > > development of Apache MXNet (Incubating).
> > > >
> > > > Regards,
> > > > Bhavin Thaker.
> > > >
> > > >
> > > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
> > > pedro.larroy.lists@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Some of the unit tests are extremely costly in terms of memory and
> > > > compute.
> > > > >
> > > > > As an example in the gluon tests we are loading all the datasets.
> > > > >
> > > > > test_gluon_data.test_datasets
> > > > >
> > > > > Also running huge networks like resnets in test_gluon_model_zoo.
> > > > >
> > > > > This is ridiculously slow, and straight impossible on some
> embedded /
> > > > > memory constrained devices, and anyway is making tests run for
> longer
> > > > than
> > > > > needed.
> > > > >
> > > > > Unit tests should be small, self contained, if possible pure
> > (avoiding
> > > > this
> > > > > kind of dataset IO if possible).
> > > > >
> > > > > I think it would be better to split them in real unit tests and
> > > extended
> > > > > integration test suites that do more intensive computation. This
> > would
> > > > also
> > > > > help with the feedback time with PRs and CI infrastructure.
> > > > >
> > > > >
> > > > > Thoughts?
> > > > >
> > > >
> > >
> >
>

Re: Improving and rationalizing unit tests

Posted by Pedro Larroy <pe...@gmail.com>.
It's always going to be deterministic one way or another unless you use
random from the entropy pool such as /dev/random. I don't think it's a good
practice not to seed properly and have values depend on execution order /
parallelism / time or whatever, but that's just my opinion. I would want to
use the same values for all test runs for reproducibility.

I think your argument goes more towards the previously mentioned "property
based testing" approach, which is in the spirit of what you are supporting,
if I'm not mistaken.

On Mon, Oct 16, 2017 at 6:22 PM, Chris Olivier <cj...@gmail.com>
wrote:

> My take on the suggestion of purely deterministic inputs is (including
> deterministic seeding):
>
> "I want the same values to be used for all test runs because it is
> inconvenient when a unit test fails for some edge cases.  I prefer that
> unforseen edge case failures occur in the field and not during testing".
>
> Is this the motivation?  Seems strange to me.
>
>
> On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <
> pedro.larroy.lists@gmail.com>
> wrote:
>
> > I think using a properly seeded and initialized (pseudo)random is
> actually
> > beneficial (and deterministic), handpicked examples are usually too
> > simplistic and miss corner cases.
> >
> > Better yet is to use property based testing, which will pick corner cases
> > and do fuzzing automatically to check with high degree of confidence
> that a
> > testing condition holds.
> >
> > Probably it would be good if we use a property based testing library in
> > adition to nose to check invariants.
> >
> > A quick googling yields this one for python for example:
> > https://hypothesis.readthedocs.io/en/latest/quickstart.html does anyone
> > have experience or can recommend a nice property based testing library
> for
> > python?
> >
> >
> > Regards
> >
> > On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bh...@gmail.com>
> > wrote:
> >
> > > I agree with Pedro.
> > >
> > > Based on various observations on unit test failures, I would like to
> > > propose a few guidelines to follow for the unit tests. Even though I
> use
> > > the word, “must” for my humble opinions below, please feel free to
> > suggest
> > > alternatives or modifications to these guidelines:
> > >
> > > 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X
> =
> > 2
> > > minutes max.
> > > 1b) The total run time budget for all unit tests <= Y minutes. Say, Y =
> > 60
> > > minutes max.
> > >
> > > 2) All Unit tests must have deterministic (not Stochastic) behavior.
> That
> > > is, instead of using the random() function to test a range of input
> > values,
> > > each input test value must be carefully hand-picked to represent the
> > > commonly used input scenarios. The correct place to stochastically test
> > > random input values is to have continuously running nightly tests and
> NOT
> > > the sanity/smoke/unit tests for each PR.
> > >
> > > 3) All Unit tests must be as much self-contained and independent of
> > > external components as possible. For example, datasets required for the
> > > unit test must NOT be present on external website which, if
> unreachable,
> > > can cause test run failures. Instead, all datasets must be available
> > > locally.
> > >
> > > 4) It is impossible to test everything in unit tests and so only common
> > > use-cases and code-paths must be tested in unit-tests. Less common
> > > scenarios like integration with 3rd party products must be tested in
> > > nightly/weekly tests.
> > >
> > > 5) A unit test must NOT be disabled on a failure unless proven to
> exhibit
> > > unreliable behavior. The burden-of-proof for a test failure must be on
> > the
> > > PR submitter and the PR must NOT be merged without a opening a new
> github
> > > issue explaining the problem. If the unit test is disabled for some
> > reason,
> > > then the unit test must NOT be removed from the unit tests list;
> instead
> > > the unit test must be modified to add the following lines at the start
> of
> > > the test:
> > >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
> > >     Exit(0)
> > >
> > > Please suggest modifications to the above proposal such that we can
> make
> > > the unit tests framework to be the rock-solid foundation for the active
> > > development of Apache MXNet (Incubating).
> > >
> > > Regards,
> > > Bhavin Thaker.
> > >
> > >
> > > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
> > pedro.larroy.lists@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > Some of the unit tests are extremely costly in terms of memory and
> > > compute.
> > > >
> > > > As an example in the gluon tests we are loading all the datasets.
> > > >
> > > > test_gluon_data.test_datasets
> > > >
> > > > Also running huge networks like resnets in test_gluon_model_zoo.
> > > >
> > > > This is ridiculously slow, and straight impossible on some embedded /
> > > > memory constrained devices, and anyway is making tests run for longer
> > > than
> > > > needed.
> > > >
> > > > Unit tests should be small, self contained, if possible pure
> (avoiding
> > > this
> > > > kind of dataset IO if possible).
> > > >
> > > > I think it would be better to split them in real unit tests and
> > extended
> > > > integration test suites that do more intensive computation. This
> would
> > > also
> > > > help with the feedback time with PRs and CI infrastructure.
> > > >
> > > >
> > > > Thoughts?
> > > >
> > >
> >
>

Re: Improving and rationalizing unit tests

Posted by Chris Olivier <cj...@gmail.com>.
My take on the suggestion of purely deterministic inputs is (including
deterministic seeding):

"I want the same values to be used for all test runs because it is
inconvenient when a unit test fails for some edge cases.  I prefer that
unforseen edge case failures occur in the field and not during testing".

Is this the motivation?  Seems strange to me.


On Mon, Oct 16, 2017 at 9:09 AM, Pedro Larroy <pe...@gmail.com>
wrote:

> I think using a properly seeded and initialized (pseudo)random is actually
> beneficial (and deterministic), handpicked examples are usually too
> simplistic and miss corner cases.
>
> Better yet is to use property based testing, which will pick corner cases
> and do fuzzing automatically to check with high degree of confidence that a
> testing condition holds.
>
> Probably it would be good if we use a property based testing library in
> adition to nose to check invariants.
>
> A quick googling yields this one for python for example:
> https://hypothesis.readthedocs.io/en/latest/quickstart.html does anyone
> have experience or can recommend a nice property based testing library for
> python?
>
>
> Regards
>
> On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bh...@gmail.com>
> wrote:
>
> > I agree with Pedro.
> >
> > Based on various observations on unit test failures, I would like to
> > propose a few guidelines to follow for the unit tests. Even though I use
> > the word, “must” for my humble opinions below, please feel free to
> suggest
> > alternatives or modifications to these guidelines:
> >
> > 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X =
> 2
> > minutes max.
> > 1b) The total run time budget for all unit tests <= Y minutes. Say, Y =
> 60
> > minutes max.
> >
> > 2) All Unit tests must have deterministic (not Stochastic) behavior. That
> > is, instead of using the random() function to test a range of input
> values,
> > each input test value must be carefully hand-picked to represent the
> > commonly used input scenarios. The correct place to stochastically test
> > random input values is to have continuously running nightly tests and NOT
> > the sanity/smoke/unit tests for each PR.
> >
> > 3) All Unit tests must be as much self-contained and independent of
> > external components as possible. For example, datasets required for the
> > unit test must NOT be present on external website which, if unreachable,
> > can cause test run failures. Instead, all datasets must be available
> > locally.
> >
> > 4) It is impossible to test everything in unit tests and so only common
> > use-cases and code-paths must be tested in unit-tests. Less common
> > scenarios like integration with 3rd party products must be tested in
> > nightly/weekly tests.
> >
> > 5) A unit test must NOT be disabled on a failure unless proven to exhibit
> > unreliable behavior. The burden-of-proof for a test failure must be on
> the
> > PR submitter and the PR must NOT be merged without a opening a new github
> > issue explaining the problem. If the unit test is disabled for some
> reason,
> > then the unit test must NOT be removed from the unit tests list; instead
> > the unit test must be modified to add the following lines at the start of
> > the test:
> >     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
> >     Exit(0)
> >
> > Please suggest modifications to the above proposal such that we can make
> > the unit tests framework to be the rock-solid foundation for the active
> > development of Apache MXNet (Incubating).
> >
> > Regards,
> > Bhavin Thaker.
> >
> >
> > On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
> pedro.larroy.lists@gmail.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > Some of the unit tests are extremely costly in terms of memory and
> > compute.
> > >
> > > As an example in the gluon tests we are loading all the datasets.
> > >
> > > test_gluon_data.test_datasets
> > >
> > > Also running huge networks like resnets in test_gluon_model_zoo.
> > >
> > > This is ridiculously slow, and straight impossible on some embedded /
> > > memory constrained devices, and anyway is making tests run for longer
> > than
> > > needed.
> > >
> > > Unit tests should be small, self contained, if possible pure (avoiding
> > this
> > > kind of dataset IO if possible).
> > >
> > > I think it would be better to split them in real unit tests and
> extended
> > > integration test suites that do more intensive computation. This would
> > also
> > > help with the feedback time with PRs and CI infrastructure.
> > >
> > >
> > > Thoughts?
> > >
> >
>

Re: Improving and rationalizing unit tests

Posted by Pedro Larroy <pe...@gmail.com>.
I think using a properly seeded and initialized (pseudo)random is actually
beneficial (and deterministic), handpicked examples are usually too
simplistic and miss corner cases.

Better yet is to use property based testing, which will pick corner cases
and do fuzzing automatically to check with high degree of confidence that a
testing condition holds.

Probably it would be good if we use a property based testing library in
adition to nose to check invariants.

A quick googling yields this one for python for example:
https://hypothesis.readthedocs.io/en/latest/quickstart.html does anyone
have experience or can recommend a nice property based testing library for
python?


Regards

On Mon, Oct 16, 2017 at 4:56 PM, Bhavin Thaker <bh...@gmail.com>
wrote:

> I agree with Pedro.
>
> Based on various observations on unit test failures, I would like to
> propose a few guidelines to follow for the unit tests. Even though I use
> the word, “must” for my humble opinions below, please feel free to suggest
> alternatives or modifications to these guidelines:
>
> 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X = 2
> minutes max.
> 1b) The total run time budget for all unit tests <= Y minutes. Say, Y = 60
> minutes max.
>
> 2) All Unit tests must have deterministic (not Stochastic) behavior. That
> is, instead of using the random() function to test a range of input values,
> each input test value must be carefully hand-picked to represent the
> commonly used input scenarios. The correct place to stochastically test
> random input values is to have continuously running nightly tests and NOT
> the sanity/smoke/unit tests for each PR.
>
> 3) All Unit tests must be as much self-contained and independent of
> external components as possible. For example, datasets required for the
> unit test must NOT be present on external website which, if unreachable,
> can cause test run failures. Instead, all datasets must be available
> locally.
>
> 4) It is impossible to test everything in unit tests and so only common
> use-cases and code-paths must be tested in unit-tests. Less common
> scenarios like integration with 3rd party products must be tested in
> nightly/weekly tests.
>
> 5) A unit test must NOT be disabled on a failure unless proven to exhibit
> unreliable behavior. The burden-of-proof for a test failure must be on the
> PR submitter and the PR must NOT be merged without a opening a new github
> issue explaining the problem. If the unit test is disabled for some reason,
> then the unit test must NOT be removed from the unit tests list; instead
> the unit test must be modified to add the following lines at the start of
> the test:
>     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
>     Exit(0)
>
> Please suggest modifications to the above proposal such that we can make
> the unit tests framework to be the rock-solid foundation for the active
> development of Apache MXNet (Incubating).
>
> Regards,
> Bhavin Thaker.
>
>
> On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <pedro.larroy.lists@gmail.com
> >
> wrote:
>
> > Hi
> >
> > Some of the unit tests are extremely costly in terms of memory and
> compute.
> >
> > As an example in the gluon tests we are loading all the datasets.
> >
> > test_gluon_data.test_datasets
> >
> > Also running huge networks like resnets in test_gluon_model_zoo.
> >
> > This is ridiculously slow, and straight impossible on some embedded /
> > memory constrained devices, and anyway is making tests run for longer
> than
> > needed.
> >
> > Unit tests should be small, self contained, if possible pure (avoiding
> this
> > kind of dataset IO if possible).
> >
> > I think it would be better to split them in real unit tests and extended
> > integration test suites that do more intensive computation. This would
> also
> > help with the feedback time with PRs and CI infrastructure.
> >
> >
> > Thoughts?
> >
>

Re: Improving and rationalizing unit tests

Posted by Tianqi Chen <tq...@cs.washington.edu>.
I would be great if there is a chance of a few testcase to reflect these
principles, so we have a concrete discussion basis.

Having seeded random number is good, but usually it is not the cause of non
deterministic error( most of which already resolved by having a relaxed
tolerance level).

Tianqi
On Mon, Oct 16, 2017 at 9:01 AM Bhavin Thaker <bh...@gmail.com>
wrote:

> For the randomness argument, I am more concerned of a unit test that
> exhibits different behaviors for different runs. Stochastic test, IMHO, is
> not a good sanity test, because the code entry Quality bar is stochastic
> rather than deterministic — causing a lot of churn for diagnosing Unit test
> failures for PRs.
>
> There are other places (nightly) to do extensive tests.
>
> PR-unit-tests are sanity tests and must be quick, reliable and consistent
> for every PR.
>
> Bhavin Thaker.
>
> On Mon, Oct 16, 2017 at 8:51 AM Pedro Larroy <pedro.larroy.lists@gmail.com
> >
> wrote:
>
> > That's not true. random() and similar functions are based on a PRNG. It
> > can be debugged and it's completely deterministic, a good practice is to
> > use a known seed for this.
> >
> > More info: https://en.wikipedia.org/wiki/Pseudorandom_number_generator
> >
> > On Mon, Oct 16, 2017 at 5:42 PM, pracheer gupta <
> > pracheer_gupta@hotmail.com> wrote:
> >
> >> @Chris: Any particular reason for -1? Randomness just prevents in
> writing
> >> tests that you can rely on and/or debug later on in case of failure.
> >>
> >> On Oct 16, 2017, at 8:28 AM, Chris Olivier <cjolivier01@gmail.com
> <mailto:
> >> cjolivier01@gmail.com>> wrote:
> >>
> >> -1 for "must not use random numbers for input"
> >>
> >> On Mon, Oct 16, 2017 at 7:56 AM, Bhavin Thaker <bhavinthaker@gmail.com
> >> <ma...@gmail.com>>
> >
> >
> >> wrote:
> >>
> >> I agree with Pedro.
> >>
> >> Based on various observations on unit test failures, I would like to
> >> propose a few guidelines to follow for the unit tests. Even though I use
> >> the word, “must” for my humble opinions below, please feel free to
> suggest
> >> alternatives or modifications to these guidelines:
> >>
> >> 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X
> = 2
> >> minutes max.
> >> 1b) The total run time budget for all unit tests <= Y minutes. Say, Y =
> 60
> >> minutes max.
> >>
> >> 2) All Unit tests must have deterministic (not Stochastic) behavior.
> That
> >> is, instead of using the random() function to test a range of input
> >> values,
> >> each input test value must be carefully hand-picked to represent the
> >> commonly used input scenarios. The correct place to stochastically test
> >> random input values is to have continuously running nightly tests and
> NOT
> >> the sanity/smoke/unit tests for each PR.
> >>
> >> 3) All Unit tests must be as much self-contained and independent of
> >> external components as possible. For example, datasets required for the
> >> unit test must NOT be present on external website which, if unreachable,
> >> can cause test run failures. Instead, all datasets must be available
> >> locally.
> >>
> >> 4) It is impossible to test everything in unit tests and so only common
> >> use-cases and code-paths must be tested in unit-tests. Less common
> >> scenarios like integration with 3rd party products must be tested in
> >> nightly/weekly tests.
> >>
> >> 5) A unit test must NOT be disabled on a failure unless proven to
> exhibit
> >> unreliable behavior. The burden-of-proof for a test failure must be on
> the
> >> PR submitter and the PR must NOT be merged without a opening a new
> github
> >> issue explaining the problem. If the unit test is disabled for some
> >> reason,
> >> then the unit test must NOT be removed from the unit tests list; instead
> >> the unit test must be modified to add the following lines at the start
> of
> >> the test:
> >>    Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
> >>    Exit(0)
> >>
> >> Please suggest modifications to the above proposal such that we can make
> >> the unit tests framework to be the rock-solid foundation for the active
> >> development of Apache MXNet (Incubating).
> >>
> >> Regards,
> >> Bhavin Thaker.
> >>
> >>
> >> On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
> >> pedro.larroy.lists@gmail.com<ma...@gmail.com>
> >
> >
> >>
> >> wrote:
> >>
> >> Hi
> >>
> >> Some of the unit tests are extremely costly in terms of memory and
> >> compute.
> >>
> >> As an example in the gluon tests we are loading all the datasets.
> >>
> >> test_gluon_data.test_datasets
> >>
> >> Also running huge networks like resnets in test_gluon_model_zoo.
> >>
> >> This is ridiculously slow, and straight impossible on some embedded /
> >> memory constrained devices, and anyway is making tests run for longer
> >> than
> >> needed.
> >>
> >> Unit tests should be small, self contained, if possible pure (avoiding
> >> this
> >> kind of dataset IO if possible).
> >>
> >> I think it would be better to split them in real unit tests and extended
> >> integration test suites that do more intensive computation. This would
> >> also
> >> help with the feedback time with PRs and CI infrastructure.
> >>
> >>
> >> Thoughts?
> >>
> >>
> >>
>

Re: Improving and rationalizing unit tests

Posted by Bhavin Thaker <bh...@gmail.com>.
For the randomness argument, I am more concerned of a unit test that
exhibits different behaviors for different runs. Stochastic test, IMHO, is
not a good sanity test, because the code entry Quality bar is stochastic
rather than deterministic — causing a lot of churn for diagnosing Unit test
failures for PRs.

There are other places (nightly) to do extensive tests.

PR-unit-tests are sanity tests and must be quick, reliable and consistent
for every PR.

Bhavin Thaker.

On Mon, Oct 16, 2017 at 8:51 AM Pedro Larroy <pe...@gmail.com>
wrote:

> That's not true. random() and similar functions are based on a PRNG. It
> can be debugged and it's completely deterministic, a good practice is to
> use a known seed for this.
>
> More info: https://en.wikipedia.org/wiki/Pseudorandom_number_generator
>
> On Mon, Oct 16, 2017 at 5:42 PM, pracheer gupta <
> pracheer_gupta@hotmail.com> wrote:
>
>> @Chris: Any particular reason for -1? Randomness just prevents in writing
>> tests that you can rely on and/or debug later on in case of failure.
>>
>> On Oct 16, 2017, at 8:28 AM, Chris Olivier <cjolivier01@gmail.com<mailto:
>> cjolivier01@gmail.com>> wrote:
>>
>> -1 for "must not use random numbers for input"
>>
>> On Mon, Oct 16, 2017 at 7:56 AM, Bhavin Thaker <bhavinthaker@gmail.com
>> <ma...@gmail.com>>
>
>
>> wrote:
>>
>> I agree with Pedro.
>>
>> Based on various observations on unit test failures, I would like to
>> propose a few guidelines to follow for the unit tests. Even though I use
>> the word, “must” for my humble opinions below, please feel free to suggest
>> alternatives or modifications to these guidelines:
>>
>> 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X = 2
>> minutes max.
>> 1b) The total run time budget for all unit tests <= Y minutes. Say, Y = 60
>> minutes max.
>>
>> 2) All Unit tests must have deterministic (not Stochastic) behavior. That
>> is, instead of using the random() function to test a range of input
>> values,
>> each input test value must be carefully hand-picked to represent the
>> commonly used input scenarios. The correct place to stochastically test
>> random input values is to have continuously running nightly tests and NOT
>> the sanity/smoke/unit tests for each PR.
>>
>> 3) All Unit tests must be as much self-contained and independent of
>> external components as possible. For example, datasets required for the
>> unit test must NOT be present on external website which, if unreachable,
>> can cause test run failures. Instead, all datasets must be available
>> locally.
>>
>> 4) It is impossible to test everything in unit tests and so only common
>> use-cases and code-paths must be tested in unit-tests. Less common
>> scenarios like integration with 3rd party products must be tested in
>> nightly/weekly tests.
>>
>> 5) A unit test must NOT be disabled on a failure unless proven to exhibit
>> unreliable behavior. The burden-of-proof for a test failure must be on the
>> PR submitter and the PR must NOT be merged without a opening a new github
>> issue explaining the problem. If the unit test is disabled for some
>> reason,
>> then the unit test must NOT be removed from the unit tests list; instead
>> the unit test must be modified to add the following lines at the start of
>> the test:
>>    Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
>>    Exit(0)
>>
>> Please suggest modifications to the above proposal such that we can make
>> the unit tests framework to be the rock-solid foundation for the active
>> development of Apache MXNet (Incubating).
>>
>> Regards,
>> Bhavin Thaker.
>>
>>
>> On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <
>> pedro.larroy.lists@gmail.com<ma...@gmail.com>
>
>
>>
>> wrote:
>>
>> Hi
>>
>> Some of the unit tests are extremely costly in terms of memory and
>> compute.
>>
>> As an example in the gluon tests we are loading all the datasets.
>>
>> test_gluon_data.test_datasets
>>
>> Also running huge networks like resnets in test_gluon_model_zoo.
>>
>> This is ridiculously slow, and straight impossible on some embedded /
>> memory constrained devices, and anyway is making tests run for longer
>> than
>> needed.
>>
>> Unit tests should be small, self contained, if possible pure (avoiding
>> this
>> kind of dataset IO if possible).
>>
>> I think it would be better to split them in real unit tests and extended
>> integration test suites that do more intensive computation. This would
>> also
>> help with the feedback time with PRs and CI infrastructure.
>>
>>
>> Thoughts?
>>
>>
>>

Re: Improving and rationalizing unit tests

Posted by pracheer gupta <pr...@hotmail.com>.
That’s true Pedro. I assumed, in this particular context, when we say “random” numbers we mean random numbers which have not been explicitly seeded which make the intermittently failing unit tests hard to reproduce.

On Oct 16, 2017, at 8:51 AM, Pedro Larroy <pe...@gmail.com>> wrote:

That's not true. random() and similar functions are based on a PRNG. It can
be debugged and it's completely deterministic, a good practice is to use a
known seed for this.

More info: https://en.wikipedia.org/wiki/Pseudorandom_number_generator

On Mon, Oct 16, 2017 at 5:42 PM, pracheer gupta <pr...@hotmail.com>>
wrote:

@Chris: Any particular reason for -1? Randomness just prevents in writing
tests that you can rely on and/or debug later on in case of failure.

On Oct 16, 2017, at 8:28 AM, Chris Olivier <cj...@gmail.com><mailto:
cjolivier01@gmail.com<ma...@gmail.com>>> wrote:

-1 for "must not use random numbers for input"

On Mon, Oct 16, 2017 at 7:56 AM, Bhavin Thaker <bh...@gmail.com><
mailto:bhavinthaker@gmail.com>>
wrote:

I agree with Pedro.

Based on various observations on unit test failures, I would like to
propose a few guidelines to follow for the unit tests. Even though I use
the word, “must” for my humble opinions below, please feel free to suggest
alternatives or modifications to these guidelines:

1) 1a) Each unit test must have a run time budget <= X minutes. Say, X = 2
minutes max.
1b) The total run time budget for all unit tests <= Y minutes. Say, Y = 60
minutes max.

2) All Unit tests must have deterministic (not Stochastic) behavior. That
is, instead of using the random() function to test a range of input values,
each input test value must be carefully hand-picked to represent the
commonly used input scenarios. The correct place to stochastically test
random input values is to have continuously running nightly tests and NOT
the sanity/smoke/unit tests for each PR.

3) All Unit tests must be as much self-contained and independent of
external components as possible. For example, datasets required for the
unit test must NOT be present on external website which, if unreachable,
can cause test run failures. Instead, all datasets must be available
locally.

4) It is impossible to test everything in unit tests and so only common
use-cases and code-paths must be tested in unit-tests. Less common
scenarios like integration with 3rd party products must be tested in
nightly/weekly tests.

5) A unit test must NOT be disabled on a failure unless proven to exhibit
unreliable behavior. The burden-of-proof for a test failure must be on the
PR submitter and the PR must NOT be merged without a opening a new github
issue explaining the problem. If the unit test is disabled for some reason,
then the unit test must NOT be removed from the unit tests list; instead
the unit test must be modified to add the following lines at the start of
the test:
  Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
  Exit(0)

Please suggest modifications to the above proposal such that we can make
the unit tests framework to be the rock-solid foundation for the active
development of Apache MXNet (Incubating).

Regards,
Bhavin Thaker.


On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <pe...@gmail.com>
<ma...@gmail.com>

wrote:

Hi

Some of the unit tests are extremely costly in terms of memory and
compute.

As an example in the gluon tests we are loading all the datasets.

test_gluon_data.test_datasets

Also running huge networks like resnets in test_gluon_model_zoo.

This is ridiculously slow, and straight impossible on some embedded /
memory constrained devices, and anyway is making tests run for longer
than
needed.

Unit tests should be small, self contained, if possible pure (avoiding
this
kind of dataset IO if possible).

I think it would be better to split them in real unit tests and extended
integration test suites that do more intensive computation. This would
also
help with the feedback time with PRs and CI infrastructure.


Thoughts?




Re: Improving and rationalizing unit tests

Posted by Pedro Larroy <pe...@gmail.com>.
That's not true. random() and similar functions are based on a PRNG. It can
be debugged and it's completely deterministic, a good practice is to use a
known seed for this.

More info: https://en.wikipedia.org/wiki/Pseudorandom_number_generator

On Mon, Oct 16, 2017 at 5:42 PM, pracheer gupta <pr...@hotmail.com>
wrote:

> @Chris: Any particular reason for -1? Randomness just prevents in writing
> tests that you can rely on and/or debug later on in case of failure.
>
> On Oct 16, 2017, at 8:28 AM, Chris Olivier <cjolivier01@gmail.com<mailto:
> cjolivier01@gmail.com>> wrote:
>
> -1 for "must not use random numbers for input"
>
> On Mon, Oct 16, 2017 at 7:56 AM, Bhavin Thaker <bhavinthaker@gmail.com<
> mailto:bhavinthaker@gmail.com>>
> wrote:
>
> I agree with Pedro.
>
> Based on various observations on unit test failures, I would like to
> propose a few guidelines to follow for the unit tests. Even though I use
> the word, “must” for my humble opinions below, please feel free to suggest
> alternatives or modifications to these guidelines:
>
> 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X = 2
> minutes max.
> 1b) The total run time budget for all unit tests <= Y minutes. Say, Y = 60
> minutes max.
>
> 2) All Unit tests must have deterministic (not Stochastic) behavior. That
> is, instead of using the random() function to test a range of input values,
> each input test value must be carefully hand-picked to represent the
> commonly used input scenarios. The correct place to stochastically test
> random input values is to have continuously running nightly tests and NOT
> the sanity/smoke/unit tests for each PR.
>
> 3) All Unit tests must be as much self-contained and independent of
> external components as possible. For example, datasets required for the
> unit test must NOT be present on external website which, if unreachable,
> can cause test run failures. Instead, all datasets must be available
> locally.
>
> 4) It is impossible to test everything in unit tests and so only common
> use-cases and code-paths must be tested in unit-tests. Less common
> scenarios like integration with 3rd party products must be tested in
> nightly/weekly tests.
>
> 5) A unit test must NOT be disabled on a failure unless proven to exhibit
> unreliable behavior. The burden-of-proof for a test failure must be on the
> PR submitter and the PR must NOT be merged without a opening a new github
> issue explaining the problem. If the unit test is disabled for some reason,
> then the unit test must NOT be removed from the unit tests list; instead
> the unit test must be modified to add the following lines at the start of
> the test:
>    Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
>    Exit(0)
>
> Please suggest modifications to the above proposal such that we can make
> the unit tests framework to be the rock-solid foundation for the active
> development of Apache MXNet (Incubating).
>
> Regards,
> Bhavin Thaker.
>
>
> On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <pedro.larroy.lists@gmail.com
> <ma...@gmail.com>
>
> wrote:
>
> Hi
>
> Some of the unit tests are extremely costly in terms of memory and
> compute.
>
> As an example in the gluon tests we are loading all the datasets.
>
> test_gluon_data.test_datasets
>
> Also running huge networks like resnets in test_gluon_model_zoo.
>
> This is ridiculously slow, and straight impossible on some embedded /
> memory constrained devices, and anyway is making tests run for longer
> than
> needed.
>
> Unit tests should be small, self contained, if possible pure (avoiding
> this
> kind of dataset IO if possible).
>
> I think it would be better to split them in real unit tests and extended
> integration test suites that do more intensive computation. This would
> also
> help with the feedback time with PRs and CI infrastructure.
>
>
> Thoughts?
>
>
>

Re: Improving and rationalizing unit tests

Posted by pracheer gupta <pr...@hotmail.com>.
@Chris: Any particular reason for -1? Randomness just prevents in writing tests that you can rely on and/or debug later on in case of failure.

On Oct 16, 2017, at 8:28 AM, Chris Olivier <cj...@gmail.com>> wrote:

-1 for "must not use random numbers for input"

On Mon, Oct 16, 2017 at 7:56 AM, Bhavin Thaker <bh...@gmail.com>>
wrote:

I agree with Pedro.

Based on various observations on unit test failures, I would like to
propose a few guidelines to follow for the unit tests. Even though I use
the word, “must” for my humble opinions below, please feel free to suggest
alternatives or modifications to these guidelines:

1) 1a) Each unit test must have a run time budget <= X minutes. Say, X = 2
minutes max.
1b) The total run time budget for all unit tests <= Y minutes. Say, Y = 60
minutes max.

2) All Unit tests must have deterministic (not Stochastic) behavior. That
is, instead of using the random() function to test a range of input values,
each input test value must be carefully hand-picked to represent the
commonly used input scenarios. The correct place to stochastically test
random input values is to have continuously running nightly tests and NOT
the sanity/smoke/unit tests for each PR.

3) All Unit tests must be as much self-contained and independent of
external components as possible. For example, datasets required for the
unit test must NOT be present on external website which, if unreachable,
can cause test run failures. Instead, all datasets must be available
locally.

4) It is impossible to test everything in unit tests and so only common
use-cases and code-paths must be tested in unit-tests. Less common
scenarios like integration with 3rd party products must be tested in
nightly/weekly tests.

5) A unit test must NOT be disabled on a failure unless proven to exhibit
unreliable behavior. The burden-of-proof for a test failure must be on the
PR submitter and the PR must NOT be merged without a opening a new github
issue explaining the problem. If the unit test is disabled for some reason,
then the unit test must NOT be removed from the unit tests list; instead
the unit test must be modified to add the following lines at the start of
the test:
   Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
   Exit(0)

Please suggest modifications to the above proposal such that we can make
the unit tests framework to be the rock-solid foundation for the active
development of Apache MXNet (Incubating).

Regards,
Bhavin Thaker.


On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <pe...@gmail.com>

wrote:

Hi

Some of the unit tests are extremely costly in terms of memory and
compute.

As an example in the gluon tests we are loading all the datasets.

test_gluon_data.test_datasets

Also running huge networks like resnets in test_gluon_model_zoo.

This is ridiculously slow, and straight impossible on some embedded /
memory constrained devices, and anyway is making tests run for longer
than
needed.

Unit tests should be small, self contained, if possible pure (avoiding
this
kind of dataset IO if possible).

I think it would be better to split them in real unit tests and extended
integration test suites that do more intensive computation. This would
also
help with the feedback time with PRs and CI infrastructure.


Thoughts?



Re: Improving and rationalizing unit tests

Posted by Chris Olivier <cj...@gmail.com>.
-1 for "must not use random numbers for input"

On Mon, Oct 16, 2017 at 7:56 AM, Bhavin Thaker <bh...@gmail.com>
wrote:

> I agree with Pedro.
>
> Based on various observations on unit test failures, I would like to
> propose a few guidelines to follow for the unit tests. Even though I use
> the word, “must” for my humble opinions below, please feel free to suggest
> alternatives or modifications to these guidelines:
>
> 1) 1a) Each unit test must have a run time budget <= X minutes. Say, X = 2
> minutes max.
> 1b) The total run time budget for all unit tests <= Y minutes. Say, Y = 60
> minutes max.
>
> 2) All Unit tests must have deterministic (not Stochastic) behavior. That
> is, instead of using the random() function to test a range of input values,
> each input test value must be carefully hand-picked to represent the
> commonly used input scenarios. The correct place to stochastically test
> random input values is to have continuously running nightly tests and NOT
> the sanity/smoke/unit tests for each PR.
>
> 3) All Unit tests must be as much self-contained and independent of
> external components as possible. For example, datasets required for the
> unit test must NOT be present on external website which, if unreachable,
> can cause test run failures. Instead, all datasets must be available
> locally.
>
> 4) It is impossible to test everything in unit tests and so only common
> use-cases and code-paths must be tested in unit-tests. Less common
> scenarios like integration with 3rd party products must be tested in
> nightly/weekly tests.
>
> 5) A unit test must NOT be disabled on a failure unless proven to exhibit
> unreliable behavior. The burden-of-proof for a test failure must be on the
> PR submitter and the PR must NOT be merged without a opening a new github
> issue explaining the problem. If the unit test is disabled for some reason,
> then the unit test must NOT be removed from the unit tests list; instead
> the unit test must be modified to add the following lines at the start of
> the test:
>     Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
>     Exit(0)
>
> Please suggest modifications to the above proposal such that we can make
> the unit tests framework to be the rock-solid foundation for the active
> development of Apache MXNet (Incubating).
>
> Regards,
> Bhavin Thaker.
>
>
> On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <pedro.larroy.lists@gmail.com
> >
> wrote:
>
> > Hi
> >
> > Some of the unit tests are extremely costly in terms of memory and
> compute.
> >
> > As an example in the gluon tests we are loading all the datasets.
> >
> > test_gluon_data.test_datasets
> >
> > Also running huge networks like resnets in test_gluon_model_zoo.
> >
> > This is ridiculously slow, and straight impossible on some embedded /
> > memory constrained devices, and anyway is making tests run for longer
> than
> > needed.
> >
> > Unit tests should be small, self contained, if possible pure (avoiding
> this
> > kind of dataset IO if possible).
> >
> > I think it would be better to split them in real unit tests and extended
> > integration test suites that do more intensive computation. This would
> also
> > help with the feedback time with PRs and CI infrastructure.
> >
> >
> > Thoughts?
> >
>

Re: Improving and rationalizing unit tests

Posted by Bhavin Thaker <bh...@gmail.com>.
I agree with Pedro.

Based on various observations on unit test failures, I would like to
propose a few guidelines to follow for the unit tests. Even though I use
the word, “must” for my humble opinions below, please feel free to suggest
alternatives or modifications to these guidelines:

1) 1a) Each unit test must have a run time budget <= X minutes. Say, X = 2
minutes max.
1b) The total run time budget for all unit tests <= Y minutes. Say, Y = 60
minutes max.

2) All Unit tests must have deterministic (not Stochastic) behavior. That
is, instead of using the random() function to test a range of input values,
each input test value must be carefully hand-picked to represent the
commonly used input scenarios. The correct place to stochastically test
random input values is to have continuously running nightly tests and NOT
the sanity/smoke/unit tests for each PR.

3) All Unit tests must be as much self-contained and independent of
external components as possible. For example, datasets required for the
unit test must NOT be present on external website which, if unreachable,
can cause test run failures. Instead, all datasets must be available
locally.

4) It is impossible to test everything in unit tests and so only common
use-cases and code-paths must be tested in unit-tests. Less common
scenarios like integration with 3rd party products must be tested in
nightly/weekly tests.

5) A unit test must NOT be disabled on a failure unless proven to exhibit
unreliable behavior. The burden-of-proof for a test failure must be on the
PR submitter and the PR must NOT be merged without a opening a new github
issue explaining the problem. If the unit test is disabled for some reason,
then the unit test must NOT be removed from the unit tests list; instead
the unit test must be modified to add the following lines at the start of
the test:
    Print(“Unit Test DISABLED; see GitHub issue: NNNN”)
    Exit(0)

Please suggest modifications to the above proposal such that we can make
the unit tests framework to be the rock-solid foundation for the active
development of Apache MXNet (Incubating).

Regards,
Bhavin Thaker.


On Mon, Oct 16, 2017 at 5:56 AM Pedro Larroy <pe...@gmail.com>
wrote:

> Hi
>
> Some of the unit tests are extremely costly in terms of memory and compute.
>
> As an example in the gluon tests we are loading all the datasets.
>
> test_gluon_data.test_datasets
>
> Also running huge networks like resnets in test_gluon_model_zoo.
>
> This is ridiculously slow, and straight impossible on some embedded /
> memory constrained devices, and anyway is making tests run for longer than
> needed.
>
> Unit tests should be small, self contained, if possible pure (avoiding this
> kind of dataset IO if possible).
>
> I think it would be better to split them in real unit tests and extended
> integration test suites that do more intensive computation. This would also
> help with the feedback time with PRs and CI infrastructure.
>
>
> Thoughts?
>

Re: Improving and rationalizing unit tests

Posted by Dick Carter <dc...@nvidia.com>.
This thread is very timely, as it relates to issues I've been trying to 
solve with my coding of the last week.  I think it simplest just to 
present the code (and detailed PR description) as the clearest form of 
my thinking.  Happy to discuss further.  Please see:

https://github.com/apache/incubator-mxnet/pull/8313

On 2017-10-16 05:56, Pedro Larroy <p....@gmail.com> wrote:
 > Hi>
 >
 > Some of the unit tests are extremely costly in terms of memory and 
compute.>
 >
 > As an example in the gluon tests we are loading all the datasets.>
 >
 > test_gluon_data.test_datasets>
 >
 > Also running huge networks like resnets in test_gluon_model_zoo.>
 >
 > This is ridiculously slow, and straight impossible on some embedded />
 > memory constrained devices, and anyway is making tests run for longer 
than>
 > needed.>
 >
 > Unit tests should be small, self contained, if possible pure 
(avoiding this>
 > kind of dataset IO if possible).>
 >
 > I think it would be better to split them in real unit tests and 
extended>
 > integration test suites that do more intensive computation. This 
would also>
 > help with the feedback time with PRs and CI infrastructure.>
 >
 >
 > Thoughts?>
 >
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------