You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "wangwei (JIRA)" <ji...@apache.org> on 2016/01/12 04:40:39 UTC

[jira] [Created] (SINGA-132) Optimize training on a single node with GPUs

wangwei created SINGA-132:
-----------------------------

             Summary: Optimize training on a single node with GPUs
                 Key: SINGA-132
                 URL: https://issues.apache.org/jira/browse/SINGA-132
             Project: Singa
          Issue Type: Improvement
            Reporter: wangwei
            Assignee: Haibo Chen


There are two training situations. 
1. a single worker. For this case, there is not need to launch a separate server thread. Because it would lead to communication cost between the worker and server. Instead, we can create an  Updater inside the Worker and call it to update the parameters locally inside the Worker. The driver's working flow should be changed for this case, i.e., there is no need to have a stub thread and server thread. The worker should run in the main thread and the program terminates once the worker finishes.

2. multiple worker. For this case, we need both workers and servers. First, we can make zookeeper an optional dependent library, as it is used for Job ID generation and termination condition check. If no Job ID is available, we can always use the default Job ID (0). Since there is only one process, we don't need zookeeper to know the status of workers in other processes. Second, the communication between worker-stub-server should be optimized, e.g., using GPU-Direct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)