You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2015/09/15 12:07:45 UTC

[jira] [Commented] (SINGA-57) Improve Distributed Hogwild

    [ https://issues.apache.org/jira/browse/SINGA-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745178#comment-14745178 ] 

ASF subversion and git services commented on SINGA-57:
------------------------------------------------------

Commit ed9e37369c69dd76078e8285bc33d6b04ba60e9f in incubator-singa's branch refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=ed9e373 ]

SINGA-57 Improve Distributed Hogwild

The ClusterProto::sync_freq field controls the frequency of sync between
server groups.
After updating of Param (slice), the server checks the num of updates
since last sync. It also checks the num of pending syncs (i.e., requests
haven't received reponses) to avoid sending too many msgs to stopped
servers (the msgs would be occupy the memory of the sending buffer)
The server respones to every sync requests with the latest Param values.

Note: current does not support (there is bug) multiple worker groups in
one process for the distributed hogwild framework. We recommend to
replace this cluster topology with in-memory hogwild, i.e., launching
one worker group with multiple workers and one server group.


> Improve Distributed Hogwild
> ---------------------------
>
>                 Key: SINGA-57
>                 URL: https://issues.apache.org/jira/browse/SINGA-57
>             Project: Singa
>          Issue Type: Improvement
>            Reporter: wangwei
>
> The implementation SINGA-8 of distributed Hogwild uses the stub thread to monitor the network bandwidth. When the network has >0 bandwidth, the stub sends a sync reminder msg to a server, which would trigger the server to sync one param slice with other server groups.
> The code is messy due to the monitoring of network bandwidth and processing the sync reminder message. Another problem is that the  reminder message may not be generated frequently. Because it is generated only when the router times out. If the worker and server run very fast that the router rarely times out, then the sync reminder message cannot be sent. In contrast, if the router times out frequently, many reminder messages would be generated.
> This ticket improves the implementation by fixing the frequency of synchronization between server groups. A server sends a sync message every sync_freq updates, for the parameter slice it masters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)