You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/16 23:54:52 UTC

[GitHub] [incubator-mxnet] leezu opened a new issue #18740: test_sparse_operator.py::test_elemwise_binary_ops

leezu opened a new issue #18740:
URL: https://github.com/apache/incubator-mxnet/issues/18740


   ## Description
   Tests crashes affecting multiple PRs: https://github.com/apache/incubator-mxnet/pull/18711 https://github.com/apache/incubator-mxnet/pull/18694 https://github.com/apache/incubator-mxnet/pull/18722 https://github.com/apache/incubator-mxnet/pull/18733
   
   
   ```
   [2020-07-15T23:41:55.453Z] Fatal Python error: Aborted
   [2020-07-15T23:41:55.453Z] 
   [2020-07-15T23:41:55.453Z] Thread 0x00007f6de68a6700 (most recent call first):
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 400 in read
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 432 in from_io
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 967 in _thread_receiver
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 220 in run
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 285 in _perform_spawn
   [2020-07-15T23:41:55.453Z] 
   [2020-07-15T23:41:55.453Z] Current thread 0x00007f6de857a740 (most recent call first):
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/python/mxnet/_ctypes/ndarray.py", line 178 in __call__
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/python/mxnet/executor.py", line 184 in forward
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 937 in numeric_grad
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/python/mxnet/test_utils.py", line 1088 in check_numeric_gradient
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line 312 in test_elemwise_binary_op
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line 417 in check_elemwise_binary_ops
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line 520 in test_elemwise_binary_ops
   [2020-07-15T23:41:55.453Z]   File "/work/mxnet/tests/python/unittest/common.py", line 223 in test_new
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/python.py", line 167 in pytest_pyfunc_call
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/python.py", line 1445 in runtest
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/runner.py", line 134 in pytest_runtest_call
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/runner.py", line 210 in <lambda>
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/runner.py", line 237 in from_call
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/runner.py", line 210 in call_runtest_hook
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py", line 129 in call_and_report
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/runner.py", line 99 in runtestprotocol
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/runner.py", line 84 in pytest_runtest_protocol
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/flaky/flaky_pytest_plugin.py", line 92 in pytest_runtest_protocol
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/xdist/remote.py", line 87 in run_one_test
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/xdist/remote.py", line 70 in pytest_runtestloop
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/main.py", line 247 in _main
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/main.py", line 197 in wrap_session
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/_pytest/main.py", line 240 in pytest_cmdline_main
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187 in _multicall
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286 in __call__
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/xdist/remote.py", line 258 in <module>
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1084 in executetask
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 220 in run
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 285 in _perform_spawn
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 267 in integrate_as_primary_thread
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1060 in serve
   [2020-07-15T23:41:55.453Z]   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1554 in serve
   [2020-07-15T23:41:55.453Z]   File "<string>", line 8 in <module>
   [2020-07-15T23:41:55.453Z]   File "<string>", line 1 in <module>
   [2020-07-15T23:41:55.453Z] [gw0] [ 95%] PASSED tests/python/unittest/test_sparse_ndarray.py::test_sparse_getnnz 
   [2020-07-15T23:41:55.707Z] tests/python/unittest/test_sparse_operator.py::test_elemwise_binary_ops 
   [2020-07-15T23:41:55.707Z] [gw0] node down: Not properly terminated
   [2020-07-15T23:41:55.707Z] [gw0] [ 95%] FAILED tests/python/unittest/test_sparse_operator.py::test_elemwise_binary_ops 
   [2020-07-15T23:41:55.707Z] 
   [2020-07-15T23:41:55.707Z] replacing crashed worker gw0
   [2020-07-15T23:41:56.266Z] 
   [gw4] linux Python 3.6.9 cwd: /work/mxnet
   [2020-07-15T23:41:58.149Z] 
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu edited a comment on issue #18740: test_sparse_operator.py::test_elemwise_binary_ops

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #18740:
URL: https://github.com/apache/incubator-mxnet/issues/18740#issuecomment-660324478


   The pytest workers are all in separate processes. I'm only aware of the difference that `OMP_NUM_THREADS=$(expr $(nproc) / 4)` is exported before running the parallel pytest processes for non-serial tests. Serial tests will be run in a separate process without the `OMP_NUM_THREADS` variable after all non-serial tests finished.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18740: test_sparse_operator.py::test_elemwise_binary_ops

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18740:
URL: https://github.com/apache/incubator-mxnet/issues/18740#issuecomment-660324478


   The pytest workers are all in separate processes. I'm only aware of the difference that `OMP_NUM_THREADS=$(expr $(nproc) / 4)` is exported before running the parallel pytest processes for non-serial tests. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] DickJC123 commented on issue #18740: test_sparse_operator.py::test_elemwise_binary_ops

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on issue #18740:
URL: https://github.com/apache/incubator-mxnet/issues/18740#issuecomment-660322294


   I had some success with marking this test with `@pytest.mark.serial` without understanding the underlying issue, or why this action fixed it.  Could someone enlighten me, what do the serial-marked tests do that force them to be run serially?  Are the pytest workers all in the same process?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org