You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/17 23:16:56 UTC

[GitHub] [incubator-mxnet] DickJC123 opened a new issue #18747: unittests using @retry decorator can segfault if they fail

DickJC123 opened a new issue #18747:
URL: https://github.com/apache/incubator-mxnet/issues/18747


   ## Description
   This is a problem I ran into in the development of PR https://github.com/apache/incubator-mxnet/pull/18694, and **I have include a fix** in commit https://github.com/apache/incubator-mxnet/pull/18694/commits/95bfe3a642f07ffd0c78d965b7f590cee75a44fd.
   
   An example invocation of a test that is decorated with @retry(3) and that fails on its first attempt (succeeding on its 2nd) is:
   ```
   MXNET_TEST_SEED=757747955 pytest --verbose -s --log-cli-level=DEBUG tests/python/gpu/test_operator_gpu.py::test_np_mixedType_unary_funcs[float16-4-rint-None--5.0-5.0]
   ```
   I've posted the error message showing the segfault below.
   
   The problem seems to center on the fact that the current retry() implementation copies any seen exception to a variable `err` that it retains as it pursues further retry attempts of the test.  I believe that when the err object is finally garbage collected, the segfault is triggered (does the exception have stack trace pointers that are now stale?).  The fix is to not retain the exception past the iteration that generated it.
   
   In coming up with the above explanation, I determined that retaining only the exception string also avoids the segfault and so would work as a fix.
   So before:
   ```
   err = e
   ...
   raise err
   ```
   could become:
   ```
   err_msg = str(e)
   ...
   raise AssertionError(err_msg)
   ```
   I prefer to stick with the initial fix in the PR, which doesn't regenerate the exception.
   
   ### Error Message
   ```
   --------------------------------------------------------------------------------------- live log call ------------------------------------------------------------------------------[0/18716]
   INFO     common:common.py:221 Setting test np/mx/python random seeds, use MXNET_TEST_SEED=757747955 to reproduce.
   rint float16 (2, 2, 2, 2)
   
   *** Maximum errors for vector of size 16:  rtol=0.001, atol=1e-05
   
     1: Error 99864.382812  Location of error: (0, 1, 1, 1), a=-1.00000000, b=-0.00000000
   rint float16 (3, 3, 3, 2)
   rint float16 (1, 0, 2)
   PASSEDFatal Python error: Segmentation fault
   
   Current thread 0x00007f393667f740 (most recent call first):
     File "/opt/mxnet/python/mxnet/ndarray/ndarray.py", line 2570 in asnumpy
     File "/opt/mxnet/python/mxnet/numpy/multiarray.py", line 1251 in __repr__
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_io/saferepr.py", line 56 in repr_instance
     File "/usr/lib/python3.6/reprlib.py", line 65 in repr1
     File "/usr/lib/python3.6/reprlib.py", line 55 in repr
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_io/saferepr.py", line 47 in repr
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_io/saferepr.py", line 82 in saferepr
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 689 in repr_args
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 780 in repr_traceback_entry
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 821 in repr_traceback
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 877 in repr_excinfo
     File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 631 in getrepr
     File "/usr/local/lib/python3.6/dist-packages/_pytest/nodes.py", line 326 in _repr_failure_py
     File "/usr/local/lib/python3.6/dist-packages/_pytest/reports.py", line 296 in from_item_and_call
     File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 253 in pytest_runtest_makereport
     File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
     File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
     File "/usr/local/lib/python3.6/dist-packages/flaky/flaky_pytest_plugin.py", line 132 in call_and_report
     File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 100 in runtestprotocol
     File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 84 in pytest_runtest_protocol
     File "/usr/local/lib/python3.6/dist-packages/flaky/flaky_pytest_plugin.py", line 92 in pytest_runtest_protocol
     File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
     File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
     File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 271 in pytest_runtestloop
     File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
     File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
     File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 247 in _main
     File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 197 in wrap_session
     File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 240 in pytest_cmdline_main
     File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
     File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
     File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
     File "/usr/local/lib/python3.6/dist-packages/_pytest/config/__init__.py", line 93 in main
     File "/usr/local/bin/pytest", line 8 in <module>
   Segmentation fault (core dumped)
   
   ```
   ## To Reproduce
   (If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
   
   ### Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.
   2.
   
   ## What have you tried to solve it?
   
   1.
   2.
   
   ## Environment
   
   We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
   ```
   curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
   
   # paste outputs here
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha closed issue #18747: unittests using @retry decorator can segfault if they fail

Posted by GitBox <gi...@apache.org>.
szha closed issue #18747:
URL: https://github.com/apache/incubator-mxnet/issues/18747


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18747: unittests using @retry decorator can segfault if they fail

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18747:
URL: https://github.com/apache/incubator-mxnet/issues/18747#issuecomment-660710842


   fixed by #18694 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org