You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/10/16 15:52:39 UTC

[GitHub] [incubator-mxnet] ptrendx commented on issue #19360: SegFault while testing MXNet binaries for CUDA-11.0 using pytest

ptrendx commented on issue #19360:
URL: https://github.com/apache/incubator-mxnet/issues/19360#issuecomment-710129655


   We recently saw this issue too and I am looking for a fix now. I do not believe it is CUDA 11 specific, rather code layout/timing/environment specific - e.g. in our setup we did not see this issue on Ubuntu 18.04 but encounter it on 20.04. The problem is that MXNet does not actually wait for the side thread to finish before the program teardown. During the main thread teardown CUDA deinitializes itself. If the side thread is still running at this point and tries to destroy its mshadow stream, this calls `cudnnDestroy` on the cuDNN handle, which internally calls `cudaStreamDestroy` on cuDNN internal CUDA streams (CUDA is statically linked in cuDNN, which is why you see your segfault coming from `libcudnn_ops_infer.so.8`). When this call is done after the CUDA deinitialization, crash happens.
   
   I started looking at this yesterday - brief look at the destructors seems to imply that `join` should actually be called on the side threads, so not yet sure why this does not actually do the right thing. If anyone has more experience with the internals of the `ThreadedEnginePerDevice` I would be happy to leave that issue to them, but poking in the meantime.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org