You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/05/04 01:32:20 UTC

[GitHub] threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce

threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce
URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-386484809
 
 
   @rahul003 
   The build instruction is in the design doc. 
   USE_DIST_KVSTORE = 1
   USE_MPI_DIST_KVSTORE = 1
    MPI_ROOT=/usr/lib/openmpi
   We let the end user to select which mpi to use. (openmpi, mpich, or intel mpi.) That's why we don't include src as 3rd party lib.  You can check horovod, they play the same trick.  https://github.com/uber/horovod#install
   So the end user need to install MPI separately.
   Can you try latest open mpi?   We tried both open mpi and intel mpi, their release dir structure looks like following:
   /home/zhouhaiy/openmpi/build
   [zhouhaiy@mlt-ace build]$ ls
   bin  etc  include  lib  share
   
   Looks like mpich release dir is not same as open mpi, I will have a check. 
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services