You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by "zhouhai ye (JIRA)" <ji...@apache.org> on 2018/04/28 01:54:00 UTC
[jira] [Created] (MXNET-366) Extend MXNet Distributed Training by
MPI AllReduce
zhouhai ye created MXNET-366:
--------------------------------
Summary: Extend MXNet Distributed Training by MPI AllReduce
Key: MXNET-366
URL: https://issues.apache.org/jira/browse/MXNET-366
Project: Apache MXNet
Issue Type: New Feature
Reporter: zhouhai ye
Attachments: performance-allreduce.png, resnet-50.png
We add one type of new kvstore (dist_sync_mpi) which extend MXNet distributed training by MPI AllReduce. In this type of kvstore, since there's no parameter server, we replace original kvstore apis push and pull with one single api pushpull. You can refer API Spec part in the design doc for details.
Our design doc: [https://docs.google.com/document/d/1e4anwDiS18cWP49FAghU6tqqdtnRKUcbNJJxvhIfvIA/edit#heading=h.t762l56r1094]
Attached has the performance and accuracy info.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org