You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "wangwei (JIRA)" <ji...@apache.org> on 2016/01/12 04:40:39 UTC
[jira] [Created] (SINGA-132) Optimize training on a single node
with GPUs
wangwei created SINGA-132:
-----------------------------
Summary: Optimize training on a single node with GPUs
Key: SINGA-132
URL: https://issues.apache.org/jira/browse/SINGA-132
Project: Singa
Issue Type: Improvement
Reporter: wangwei
Assignee: Haibo Chen
There are two training situations.
1. a single worker. For this case, there is not need to launch a separate server thread. Because it would lead to communication cost between the worker and server. Instead, we can create an Updater inside the Worker and call it to update the parameters locally inside the Worker. The driver's working flow should be changed for this case, i.e., there is no need to have a stub thread and server thread. The worker should run in the main thread and the program terminates once the worker finishes.
2. multiple worker. For this case, we need both workers and servers. First, we can make zookeeper an optional dependent library, as it is used for Job ID generation and termination condition check. If no Job ID is available, we can always use the default Job ID (0). Since there is only one process, we don't need zookeeper to know the status of workers in other processes. Second, the communication between worker-stub-server should be optimized, e.g., using GPU-Direct.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)