You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Pei Jiang (JIRA)" <ji...@apache.org> on 2017/11/17 00:24:00 UTC

[jira] [Comment Edited] (REEF-1949) Closing ThreadPoolStage before tasks are finished

    [ https://issues.apache.org/jira/browse/REEF-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256196#comment-16256196 ] 

Pei Jiang edited comment on REEF-1949 at 11/17/17 12:23 AM:
------------------------------------------------------------

[^ReefDriverDebug.zip]

Please find the repro case attached.


was (Author: pejian):
Repro code

> Closing ThreadPoolStage before tasks are finished
> -------------------------------------------------
>
>                 Key: REEF-1949
>                 URL: https://issues.apache.org/jira/browse/REEF-1949
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF Driver
>    Affects Versions: 0.17
>            Reporter: Pei Jiang
>         Attachments: ReefDriverDebug.zip
>
>
> In EvaluatorManager.onEvaluatorDone(),
> {code}
> // This relies on the dispatcher to call the CompletedEvaluator handler.
> this.messageDispatcher.onEvaluatorCompleted(new CompletedEvaluatorImpl(this.evaluatorId)); 
> // This will close the dispatcher, which in turns shut down the executor in ThreadPoolStage.
> this.close(); 
> {code}
> Since in onEvaluatorCompleted the message sending task is submitted to an executor, there is no guarantee that the CompletedEvaluator message will be sent before the termination of the executor in this.close() call. When this happens, the CompletedEvaluator handler will not be triggered so the driver will think that some evaluators are alive and hence hang.
> Relevant logs:
> {code}
> Nov 01, 2017 11:05:57 PM org.apache.reef.wake.impl.ThreadPoolStage close
> SEVERE: Closing ThreadPoolStage EvaluatorMessageDispatcher:container_1508975419755_0006_01_000004: Executor did not terminate in 1,000 ms. Dropping 2 tasks
> Nov 01, 2017 11:05:57 PM org.apache.reef.wake.impl.ThreadPoolStage close
> SEVERE: Closing ThreadPoolStage EvaluatorMessageDispatcher:container_1508975419755_0006_01_000004: Executor failed to terminate.
> End of LogType:driver.stderr
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)