You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "Sheng Wang (JIRA)" <ji...@apache.org> on 2015/05/28 04:51:17 UTC

[jira] [Commented] (SINGA-3) Use Zookeeper to check stopping (finish) time of the system

    [ https://issues.apache.org/jira/browse/SINGA-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14562217#comment-14562217 ] 

Sheng Wang commented on SINGA-3:
--------------------------------

Pull Request #3 has been merged to add this feature.

> Use Zookeeper to check stopping (finish) time of the system
> -----------------------------------------------------------
>
>                 Key: SINGA-3
>                 URL: https://issues.apache.org/jira/browse/SINGA-3
>             Project: Singa
>          Issue Type: New Feature
>         Environment: Linux, gcc>4.8
>            Reporter: wangwei
>
> To stop each process (node), we need to stop both its local workers and servers. For worker threads, they will exit when they finish all training steps. For server threads, they can exit only when all connected workers have stopped. 
> We use Zookeeper to detect the worker state. In specific, the main thread of each process registers all local servers firstly to the Zookeeper. Then it registers each worker to a dedicated server group, where its parameters are maintained. When one worker finishes execution, it de-register from the server group (folder) in the Zookeeper and tells the main thread about its state. When all workers registered in one server group finish, the callback function registered for server group will send a stop message to him. The server tells the main thread about its state and stops upon receiving this message. Once all local workers and local servers finish, the main thread exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)