You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/05/04 01:32:20 UTC
[GitHub] threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet
Distributed Training by MPI AllReduce
threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce
URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-386484809
@rahul003
The build instruction is in the design doc.
USE_DIST_KVSTORE = 1
USE_MPI_DIST_KVSTORE = 1
MPI_ROOT=/usr/lib/openmpi
We let the end user to select which mpi to use. (openmpi, mpich, or intel mpi.) That's why we don't include src as 3rd party lib. You can check horovod, they play the same trick. https://github.com/uber/horovod#install
So the end user need to install MPI separately.
Can you try latest open mpi? We tried both open mpi and intel mpi, their release dir structure looks like following:
/home/zhouhaiy/openmpi/build
[zhouhaiy@mlt-ace build]$ ls
bin etc include lib share
Looks like mpich release dir is not same as open mpi, I will have a check.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services