You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Mariia Mykhailova (JIRA)" <ji...@apache.org> on 2017/02/02 22:58:51 UTC

[jira] [Resolved] (REEF-1725) IMRU Job fails when UpdateTask is done but another evaluator fails at the same time causing system state change to ShutDown

     [ https://issues.apache.org/jira/browse/REEF-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mariia Mykhailova resolved REEF-1725.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 0.16

Resolved via [PR 1245|https://github.com/apache/reef/pull/1245]

> IMRU Job fails when UpdateTask is done but another evaluator fails at the same time causing system state change to ShutDown
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: REEF-1725
>                 URL: https://issues.apache.org/jira/browse/REEF-1725
>             Project: REEF
>          Issue Type: Bug
>            Reporter: Julia
>            Assignee: Julia
>            Priority: Critical
>             Fix For: 0.16
>
>
> Currently in IMRU fault tolerant system, when the master task is done, at the same time the task receives Close event caused by some other evaluator failures, the system state is changed to ShutDown, causing retry again. 
> In fact, as soon as master is done, this event should be clearly passed to driver and the driver should execute DoneAction no matter if there is any other failures happen at the same time. 
> There are multiple possible solutions:
> 1.	Let CompletedTask to carry “done” information – The major issue for this solution is not just the complexity of updating proto buffer message and both Java and C# code, the issue is task needs to have a way to let TaskRuntime know it is “done”. For that, we need to change ITask interface which is something we should be careful not to change unless it is really necessary. 
> 2.	Use task massage – this is simple to implement. However task message is sent with heartbeat for “running task”. If the task status is changed to close before the heartbeat is sent, this message won’t be sent out to driver. 
> 3.	Send different events for Update task COMPLETE and CLOSE. Currently no matter update task is really done or close by driver, ITask.Call() is returned and ICompletedTask is sent. If we only send ICompletedTask when the task is really done no matter what other things happen, and send IFailedTask if the Update task is closed by driver and the task is not “done”, then driver will be able to differentiate those two events.  This is an easier and quicker solution. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)