You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/17 23:16:56 UTC
[GitHub] [incubator-mxnet] DickJC123 opened a new issue #18747: unittests using @retry decorator can segfault if they fail
DickJC123 opened a new issue #18747:
URL: https://github.com/apache/incubator-mxnet/issues/18747
## Description
This is a problem I ran into in the development of PR https://github.com/apache/incubator-mxnet/pull/18694, and **I have include a fix** in commit https://github.com/apache/incubator-mxnet/pull/18694/commits/95bfe3a642f07ffd0c78d965b7f590cee75a44fd.
An example invocation of a test that is decorated with @retry(3) and that fails on its first attempt (succeeding on its 2nd) is:
```
MXNET_TEST_SEED=757747955 pytest --verbose -s --log-cli-level=DEBUG tests/python/gpu/test_operator_gpu.py::test_np_mixedType_unary_funcs[float16-4-rint-None--5.0-5.0]
```
I've posted the error message showing the segfault below.
The problem seems to center on the fact that the current retry() implementation copies any seen exception to a variable `err` that it retains as it pursues further retry attempts of the test. I believe that when the err object is finally garbage collected, the segfault is triggered (does the exception have stack trace pointers that are now stale?). The fix is to not retain the exception past the iteration that generated it.
In coming up with the above explanation, I determined that retaining only the exception string also avoids the segfault and so would work as a fix.
So before:
```
err = e
...
raise err
```
could become:
```
err_msg = str(e)
...
raise AssertionError(err_msg)
```
I prefer to stick with the initial fix in the PR, which doesn't regenerate the exception.
### Error Message
```
--------------------------------------------------------------------------------------- live log call ------------------------------------------------------------------------------[0/18716]
INFO common:common.py:221 Setting test np/mx/python random seeds, use MXNET_TEST_SEED=757747955 to reproduce.
rint float16 (2, 2, 2, 2)
*** Maximum errors for vector of size 16: rtol=0.001, atol=1e-05
1: Error 99864.382812 Location of error: (0, 1, 1, 1), a=-1.00000000, b=-0.00000000
rint float16 (3, 3, 3, 2)
rint float16 (1, 0, 2)
PASSEDFatal Python error: Segmentation fault
Current thread 0x00007f393667f740 (most recent call first):
File "/opt/mxnet/python/mxnet/ndarray/ndarray.py", line 2570 in asnumpy
File "/opt/mxnet/python/mxnet/numpy/multiarray.py", line 1251 in __repr__
File "/usr/local/lib/python3.6/dist-packages/_pytest/_io/saferepr.py", line 56 in repr_instance
File "/usr/lib/python3.6/reprlib.py", line 65 in repr1
File "/usr/lib/python3.6/reprlib.py", line 55 in repr
File "/usr/local/lib/python3.6/dist-packages/_pytest/_io/saferepr.py", line 47 in repr
File "/usr/local/lib/python3.6/dist-packages/_pytest/_io/saferepr.py", line 82 in saferepr
File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 689 in repr_args
File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 780 in repr_traceback_entry
File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 821 in repr_traceback
File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 877 in repr_excinfo
File "/usr/local/lib/python3.6/dist-packages/_pytest/_code/code.py", line 631 in getrepr
File "/usr/local/lib/python3.6/dist-packages/_pytest/nodes.py", line 326 in _repr_failure_py
File "/usr/local/lib/python3.6/dist-packages/_pytest/reports.py", line 296 in from_item_and_call
File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 253 in pytest_runtest_makereport
File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
File "/usr/local/lib/python3.6/dist-packages/flaky/flaky_pytest_plugin.py", line 132 in call_and_report
File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 100 in runtestprotocol
File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 84 in pytest_runtest_protocol
File "/usr/local/lib/python3.6/dist-packages/flaky/flaky_pytest_plugin.py", line 92 in pytest_runtest_protocol
File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 271 in pytest_runtestloop
File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 247 in _main
File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 197 in wrap_session
File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 240 in pytest_cmdline_main
File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
File "/usr/local/lib/python3.6/dist-packages/_pytest/config/__init__.py", line 93 in main
File "/usr/local/bin/pytest", line 8 in <module>
Segmentation fault (core dumped)
```
## To Reproduce
(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)
### Steps to reproduce
(Paste the commands you ran that produced the error.)
1.
2.
## What have you tried to solve it?
1.
2.
## Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
```
curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
# paste outputs here
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] szha closed issue #18747: unittests using @retry decorator can segfault if they fail
Posted by GitBox <gi...@apache.org>.
szha closed issue #18747:
URL: https://github.com/apache/incubator-mxnet/issues/18747
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [incubator-mxnet] szha commented on issue #18747: unittests using @retry decorator can segfault if they fail
Posted by GitBox <gi...@apache.org>.
szha commented on issue #18747:
URL: https://github.com/apache/incubator-mxnet/issues/18747#issuecomment-660710842
fixed by #18694
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org