You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/11/18 02:49:59 UTC

[GitHub] [incubator-mxnet] waytrue17 opened a new issue #19556: segfault on mx1.8-cu110 with python3.7

waytrue17 opened a new issue #19556:
URL: https://github.com/apache/incubator-mxnet/issues/19556


   ## Description
   Running mxnet-horovod example `incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py` on mxnet1.8-cuda11.0 with python 3.7 encountered a segfault error. The error occurred after the example script finished. 
   The same script works fine on mxnet1.8-cuda10.2 with python 3.7 and mxnet1.8-cuda11.0 with python 3.6.
   
   ## To Reproduce
   ### Steps to reproduce
   1. Launch an EC2 p3.8x gpu instance with dlami: ami-02440419a5afe47ab
   2. Build mx1.8-cu110 from source
   3. Install Horovod `python3 -m pip install horovod`
   4. Run `LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH python3 \
   incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py` to reproduce the error
   
   ## What have you tried to solve it?
   
   1. Backport #19378 to v1.8.x solved the issue
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org