You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@singa.apache.org by "Wang Ji (JIRA)" <ji...@apache.org> on 2016/07/19 05:57:20 UTC

[jira] [Created] (SINGA-226) Add parallel training on a single machine for singa v1.0

Wang Ji created SINGA-226:
-----------------------------

             Summary: Add parallel training on a single machine for singa v1.0
                 Key: SINGA-226
                 URL: https://issues.apache.org/jira/browse/SINGA-226
             Project: Singa
          Issue Type: New Feature
            Reporter: Wang Ji
            Assignee: Wang Ji


In this ticket, we implement parallel training using multiple devices on a single machine. 
To support parallel training, a Updater class need to be implemented to aggregate partial gradient from parallel workers and using Optimizer to update the Parameters. Updater can be designed for different kinds of topological structure, i.e., *local-cpu*, *local-dev*, *local-allreduce*. 
*local-cpu:* Do aggregate and update parameter using CPU. In this mode, host CPU need to copy gradient and parameter tensor from GPU workers, do update, and copy back.
*local-gpu:* Do aggregate and update parameter using a chosen GPU. In this mode, the updater GPU need to copy gradient and parameter tensor from other GPU workers, do update, and copy back.
*local-allreduce:* In this mode, each parameter will be sliced among all GPU workers. In each iteration, gradients are aggregated and updated like a MPI Allreduce style.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)