You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/08/08 13:49:20 UTC

[GitHub] whu-lyh opened a new issue #12086: distribute training on local machines not AWS cloud

whu-lyh opened a new issue #12086: distribute training on local machines not AWS cloud 
URL: https://github.com/apache/incubator-mxnet/issues/12086
 
 
   Hello everyone. recently i meet a problem when i tried to distribute training on different machines each with a 12G 1080ti GPU,  i complied these GPUs with different version of CUDA (some are cuda9.0 , others are cuda8.0).then i meet this error:
   
   Traceback (most recent call last):
     File "train_face.py", line 25, in <module>
       from common import find_mxnet, fit
     File "/tmp/mxnet/common/find_mxnet.py", line 20, in <module>
       import mxnet as mx
     File "/tmp/mxnet/mxnet/__init__.py", line 25, in <module>
       from . import engine
     File "/tmp/mxnet/mxnet/engine.py", line 23, in <module>
       from .base import _LIB, check_call
     File "/tmp/mxnet/mxnet/base.py", line 113, in <module>
       _LIB = _load_lib()
     File "/tmp/mxnet/mxnet/base.py", line 105, in _load_lib
       lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
     File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
       self._handle = _dlopen(self._name, mode)
   OSError: libcudart.so.9.0: cannot open shared object file: No such file or directory
   
   train scripts are :
   
   export DMLC_INTERFACE=eth0; PS_VERBOSE=1; DMLC_PS_ROOT_PORT=22; ../../tools/launch.py -n 2 -H /home/hosts --sync-dst-dir /tmp/mxnet python2 train_face.py --network resnet-50 --gpus 0 --kv-store dist_sync
   
   so,i'm wander if distribute training must be on the same cuda version.or i miss something before training .thanks a lot. @ry @pluskid 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services