You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/09/29 18:43:11 UTC

[GitHub] [incubator-mxnet] leezu commented on issue #18564: [Flaky Test] test_gpu_memory_profiler_gluon fails

leezu commented on issue #18564:
URL: https://github.com/apache/incubator-mxnet/issues/18564#issuecomment-700907614


   Also flaky on CI. @ArmageddonKnight can you take a look why the test is causing segfault?
   
   ```
   [2020-09-29T17:47:32.445Z] tests/python/gpu/test_profiler_gpu.py::test_gpu_memory_profiler_gluon 
   [2020-09-29T17:47:32.445Z] Fatal Python error: Segmentation fault
   [2020-09-29T17:47:32.445Z] 
   [2020-09-29T17:47:32.445Z] Thread 0x00007f1161e1c700 (most recent call first):
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 400 in read
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 432 in from_io
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 967 in _thread_receiver
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 220 in run
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 285 in _perform_spawn
   [2020-09-29T17:47:32.445Z] 
   [2020-09-29T17:47:32.445Z] Current thread 0x00007f11633a4740 (most recent call first):
   [2020-09-29T17:47:32.445Z]   File "/work/mxnet/python/mxnet/ndarray/ndarray.py", line 2907 in backward
   [2020-09-29T17:47:32.445Z]   File "/work/mxnet/tests/python/gpu/test_profiler_gpu.py", line 129 in test_gpu_memory_profiler_gluon
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/python.py", line 167 in pytest_pyfunc_call
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/python.py", line 1445 in runtest
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 134 in pytest_runtest_call
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 210 in <lambda>
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 237 in from_call
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 210 in call_runtest_hook
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/flaky/flaky_pytest_plugin.py", line 129 in call_and_report
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 99 in runtestprotocol
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/runner.py", line 84 in pytest_runtest_protocol
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/flaky/flaky_pytest_plugin.py", line 92 in pytest_runtest_protocol
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/xdist/remote.py", line 87 in run_one_test
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/xdist/remote.py", line 70 in pytest_runtestloop
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 247 in _main
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 197 in wrap_session
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/_pytest/main.py", line 240 in pytest_cmdline_main
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/callers.py", line 187 in _multicall
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 87 in <lambda>
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/manager.py", line 93 in _hookexec
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/pluggy/hooks.py", line 286 in __call__
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/xdist/remote.py", line 258 in <module>
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 1084 in executetask
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 220 in run
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 285 in _perform_spawn
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 267 in integrate_as_primary_thread
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 1060 in serve
   [2020-09-29T17:47:32.445Z]   File "/usr/local/lib/python3.6/dist-packages/execnet/gateway_base.py", line 1554 in serve
   [2020-09-29T17:47:32.445Z]   File "<string>", line 8 in <module>
   [2020-09-29T17:47:32.445Z]   File "<string>", line 1 in <module>
   [2020-09-29T17:47:32.445Z] tests/python/gpu/test_profiler_gpu.py::test_gpu_memory_profiler_symbolic 
   [2020-09-29T17:47:32.699Z] [gw0] [ 90%] PASSED tests/python/gpu/test_profiler_gpu.py::test_gpu_memory_profiler_symbolic 
   [2020-09-29T17:47:32.699Z] tests/python/gpu/test_profiler_gpu.py::test_profile_create_domain 
   [2020-09-29T17:47:32.699Z] [gw0] [ 90%] PASSED tests/python/gpu/test_profiler_gpu.py::test_profile_create_domain 
   [2020-09-29T17:47:32.699Z] [gw3] [ 90%] PASSED tests/python/gpu/test_gluon_gpu.py::test_cosine_loss[False] 
   [2020-09-29T17:47:32.699Z] [gw1] node down: Not properly terminated
   [2020-09-29T17:47:32.699Z] [gw1] [ 91%] FAILED tests/python/gpu/test_profiler_gpu.py::test_gpu_memory_profiler_gluon 
   [2020-09-29T17:47:32.699Z] 
   [2020-09-29T17:47:32.699Z] replacing crashed worker gw1
   ```
   
   https://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-19185/runs/2/nodes/277/steps/307/log/?start=0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org