You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/20 18:39:51 UTC

[GitHub] [incubator-mxnet] DickJC123 opened a new pull request #18762: Improve test seeding and robustness in test_numpy_interoperablity.py

DickJC123 opened a new pull request #18762:
URL: https://github.com/apache/incubator-mxnet/pull/18762


   ## Description ##
   I recently ran into a CI failure in test_numpy_interoperability.py::test_np_array_function_protocol: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-cpu/detail/PR-18694/14/pipeline.  I was not able to use the  reported seed for the failure to reproduce it.  I have investigated why and am supplying this PR as a fix- now reported seeds can be used to repro failures.  I was then able to use the new facility to troubleshoot which tests needed loosened tolerances for increased test robustness, and have supplied that as well- the failure rate I estimate now is around 1:10000.
   
   As a review, the robustness of a test should be able to be explored with:
   ```
   MXNET_TEST_COUNT=10000 pytest --verbose -s --log-cli-level=DEBUG <my_test>
   <see a failure, note failure seed NNN>
   MXNET_TEST_SEED=NNN pytest --verbose -s <my_test>
   ```
   The issue with test_numpy_interoperability.py was that it was creating a test workload at file import time using unseeded random values.  The fix makes the workload be regenerated for each test at test runtime in a manner that will depend on the seed of the test.
   
   The two tests that required loosened tolerances were linalg.tensorinv and linalg.solve.  At the setting as I left them, I saw 1 failure in 10K trials.  Rather than loosening the tolerances further, I will leave it to the code owners to diagnose the situation and propose a fix if they see fit to.  The tolerances could be loosened further, but other approaches could involve changing the scale or other properties of the input data.  The remaining failure can (after the PR is merged) be repro'd with:
   ```
   MXNET_TEST_SEED=801992040 pytest --verbose -s tests/python/unittest/test_numpy_interoperability.py::test_np_array_function_protocol
   ```
   A curious property of the remaining failure is that so many of the values are consistently smaller than the golden copy by 1.9%:
   ```
   Dispatch test: linalg.tensorinv
   
   *** Maximum errors for vector of size 3600:  rtol=0.01, atol=0.005
   
     1: Error 1.934343  Location of error: (1, 1, 0, 10, 4), a=128.42663574, b=130.96971130
     2: Error 1.933410  Location of error: (2, 0, 2, 6, 0), a=80.68855286, b=82.28920746
     3: Error 1.933032  Location of error: (2, 0, 2, 8, 3), a=61.98265076, b=63.21426773
     4: Error 1.931998  Location of error: (1, 2, 2, 4, 4), a=-151.11050415, b=-154.09732056
     5: Error 1.931560  Location of error: (1, 1, 0, 4, 4), a=-97.56709290, b=-99.49862671
     6: Error 1.931458  Location of error: (0, 0, 2, 10, 4), a=343.97329712, b=350.75769043
     7: Error 1.931435  Location of error: (1, 2, 2, 10, 4), a=199.16923523, b=203.10166931
     8: Error 1.931303  Location of error: (1, 2, 2, 9, 0), a=116.00872803, b=118.30317688
     9: Error 1.931238  Location of error: (1, 2, 0, 4, 2), a=1058.37841797, b=1079.23059082
    10: Error 1.931191  Location of error: (1, 1, 1, 10, 4), a=702.60571289, b=716.45141602
   [WARNING] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=801992040 to reproduce.
   ```
   
   [This PR may have additional fixes to other tests if I can't get a clean CI]
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [X] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes)
   - [X] Changes are complete (i.e. I finished coding on this PR)
   - [X] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
   - [X] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments are documented. 
   - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
   - Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
   - [X] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Interesting edge cases to note here
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] DickJC123 commented on pull request #18762: Improve test seeding and robustness in test_numpy_interoperablity.py

Posted by GitBox <gi...@apache.org>.

DickJC123 commented on pull request #18762:
URL: https://github.com/apache/incubator-mxnet/pull/18762#issuecomment-661277350


   Tagging @reminisce and @szha (who introduced/modified test_np_array_function_protocol) for comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18762: Improve test seeding and robustness in test_numpy_interoperablity.py

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #18762:
URL: https://github.com/apache/incubator-mxnet/pull/18762#issuecomment-661264572


   Hey @DickJC123 , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [centos-gpu, windows-cpu, unix-gpu, centos-cpu, unix-cpu, sanity, edge, windows-gpu, miscellaneous, clang, website]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ptrendx merged pull request #18762: Improve test seeding and robustness in test_numpy_interoperablity.py

Posted by GitBox <gi...@apache.org>.

ptrendx merged pull request #18762:
URL: https://github.com/apache/incubator-mxnet/pull/18762


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org