You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "LI Guobao (JIRA)" <ji...@apache.org> on 2018/05/30 20:16:00 UTC

[jira] [Commented] (SYSTEMML-2349) Local worker error handling

    [ https://issues.apache.org/jira/browse/SYSTEMML-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495614#comment-16495614 ] 

LI Guobao commented on SYSTEMML-2349:
-------------------------------------

[~mboehm7], I'd like to know if we need to handle the error thrown by the agg service? If so, I have no idea how to catch the error outside the thread. Because if the agg service is down, all the workers will be blocked in the pull method and could not be stopped. And also the agg service will not stop if the workers have not finished their work. Thus, we could not reach to join the thread of agg service but be blocked in joining the workers.

> Local worker error handling
> ---------------------------
>
>                 Key: SYSTEMML-2349
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2349
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> While playing around with the locking scheme of the parameter server, I encountered unrelated errors that led to the parameter server hanging. We need to make sure all worker errors are correctly propagated so that we can guarantee termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)