You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Julia (JIRA)" <ji...@apache.org> on 2016/09/08 22:06:20 UTC

[jira] [Commented] (REEF-1549) Resolve the issue in WaitingForRegistration

    [ https://issues.apache.org/jira/browse/REEF-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475165#comment-15475165 ] 

Julia commented on REEF-1549:
-----------------------------

Adding a context layer would introduce system state change therefore more changes in the fault tolerant code. 
To resolve WaitingForRegistration issue, I would like to propose this approach:
* move WaitingForRegistration in Call() method as the first line
* Passing cancellation token to WaitingForRegistration 
* When driver is in shut down state, it will send close even to all the running tasks as before
* When task receives cancellation token during WaitingForRegistration, it will return right way from the next retry loop.

This way, 
* Driver would be able to get IRunningTask quickly so that it can use this reference to sending event to task
* We won't mixture communication error with Injection Exception. 
* The behavior is controlled by cancellation token, it is on time with no delay.




> Resolve the issue in WaitingForRegistration
> -------------------------------------------
>
>                 Key: REEF-1549
>                 URL: https://issues.apache.org/jira/browse/REEF-1549
>             Project: REEF
>          Issue Type: Improvement
>    Affects Versions: 0.16
>            Reporter: Julia
>              Labels: FT
>
> Currently, if an elevator fails while we are still in the phase of task submission, we will have an issue where the newly created tasks will wait in WaitForRegistration in Group communication initialization until timeout. 
> A way to do it is to cancel the task that is in constructing. The issue is the driver has not received IRunningTask yet at this time therefore there is no way to send event to the task with the current system.
> Another way is to add a context layer for group communication initialization. Let Driver/GroupCommuDriver to control if all such contexts are created based on the context event. Then  submitting tasks on those contexts. This would keep the control for group communications in a centralized place. It would also makes task initialization much quicker and reduce the chance to get failures in task constructor before task is running. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)