You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/07/21 01:30:55 UTC

[GitHub] [incubator-mxnet] ChaiBapchya opened a new issue #18764: Horovod issue with stable PyPi mxnet versions 1.6.0cu102

ChaiBapchya opened a new issue #18764:
URL: https://github.com/apache/incubator-mxnet/issues/18764


   ## Description
   Undefined symbol error upon importing horovod for stable release of mxnet on PyPi
   
   Related to https://github.com/apache/incubator-mxnet/issues/16193
   ### Error Message
   
   ```
   python example/distributed_training-horovod/resnet50_imagenet.py
   Traceback (most recent call last):
     File "example/distributed_training-horovod/resnet50_imagenet.py", line 25, in <module>
       import horovod.mxnet as hvd
     File "/home/ubuntu/incubator-mxnet/mx_stable_pypi_cu102/lib/python3.7/site-packages/horovod/mxnet/__init__.py", line 25, in <module>
       from horovod.mxnet.mpi_ops import allgather
     File "/home/ubuntu/incubator-mxnet/mx_stable_pypi_cu102/lib/python3.7/site-packages/horovod/mxnet/mpi_ops.py", line 29, in <module>
       _basics = _HorovodBasics(__file__, 'mpi_lib')
     File "/home/ubuntu/incubator-mxnet/mx_stable_pypi_cu102/lib/python3.7/site-packages/horovod/common/basics.py", line 27, in __init__
       self.MPI_LIB_CTYPES = ctypes.CDLL(full_path, mode=ctypes.RTLD_GLOBAL)
     File "/home/ubuntu/anaconda3/lib/python3.7/ctypes/__init__.py", line 364, in __init__
       self._handle = _dlopen(self._name, mode)
   OSError: /home/ubuntu/incubator-mxnet/mx_stable_pypi_cu102/lib/python3.7/site-packages/horovod/mxnet/mpi_lib.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN5mxnet10CopyFromToERKNS_7NDArrayEPS1_i
   ```
   
   
   
   ### Steps to reproduce
   ```
      virtualenv -p python3 mx16cu101
       source mx16cu101/bin/activate
       pip install mxnet-cu101==1.6.0
       pip install gluoncv
       pip install horovod
       cd incubator-mxnet/
       python example/distributed_training-horovod/resnet50_imagenet.py
   ```
   ## What have you tried to solve it?
   
   1. Tried nightly releases from PyPi
   2. Tried releases [stable & nightly] from repo.mxnet.io
   2. Tried non mkl & mkl releases for PyPi as well as repo.mxnet.io


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18764: Horovod issue with stable PyPi mxnet versions 1.6.0cu102

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18764:
URL: https://github.com/apache/incubator-mxnet/issues/18764#issuecomment-661557773


   > Tried nightly releases from PyPi
   Tried releases [stable & nightly] from repo.mxnet.io
   Tried non mkl & mkl releases for PyPi as well as repo.mxnet.io
   
   What are the results?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #18764: Horovod issue with stable PyPi mxnet versions 1.6.0cu102

Posted by GitBox <gi...@apache.org>.
ChaiBapchya commented on issue #18764:
URL: https://github.com/apache/incubator-mxnet/issues/18764#issuecomment-662262347


   Same error
   ```
   OSError: /home/ubuntu/incubator-mxnet/mx_stable_pypi_cu102/lib/python3.7/site-packages/horovod/mxnet/mpi_lib.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN5mxnet10CopyFromToERKNS_7NDArrayEPS1_i
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18764: Horovod issue with stable PyPi mxnet versions 1.6.0cu102

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18764:
URL: https://github.com/apache/incubator-mxnet/issues/18764#issuecomment-662832752


   It still seems to be the case in 2.0 in #18772 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #18764: Horovod issue with stable PyPi mxnet versions 1.6.0cu102

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #18764:
URL: https://github.com/apache/incubator-mxnet/issues/18764#issuecomment-662839367


   That's a separate problem. @eric-haibin-lin mentioned the problem does not apply to 1.x nightly build 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org