You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2015/10/11 19:34:05 UTC

[jira] [Resolved] (REEF-817) Group comm hangs when root task is added after child tasks start running

     [ https://issues.apache.org/jira/browse/REEF-817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved REEF-817.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 0.14

Resolved via https://github.com/apache/incubator-reef/pull/551 .

> Group comm hangs when root task is added after child tasks start running
> ------------------------------------------------------------------------
>
>                 Key: REEF-817
>                 URL: https://issues.apache.org/jira/browse/REEF-817
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF-IO
>            Reporter: Joo Seong (Jason) Jeong
>            Assignee: Joo Seong (Jason) Jeong
>             Fix For: 0.14
>
>
> The Java-side group communication service makes an implicit assumption that the root task (aka controller task or master task) must be added to the topology before child tasks start running, which is not always true. For example, the evaluator that the root task should spawn on may be delayed due to mechanical issues. Topology formation is started after the root task has been added, and thus child tasks that start up early never get to know what its parent or children are even if the root task gets added later. This usually leads to a job timeout. This bug can be reproduced by purposely calling the {{CommunicationGroupDriver.addTask(rootTaskConf)}} late, using a simple {{Thread.sleep()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)