You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Mariia Mykhailova (JIRA)" <ji...@apache.org> on 2016/09/30 18:21:20 UTC

[jira] [Commented] (REEF-1482) IMRU driver does not exit even if all the task exit normally

    [ https://issues.apache.org/jira/browse/REEF-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536632#comment-15536632 ] 

Mariia Mykhailova commented on REEF-1482:
-----------------------------------------

From the discussion on the mailing list (to keep relevant information in one place):

{quote}
The problem with the previous code in the Driver was that when checking whether or not an Evaluator has ended, it checks its event queue for events that are still in the queue. If there are still events in the queue, the EvaluatorManager will return not-idle to the driver and thus preventing shutdown. However, the previous code did not consider events that are still being processed by an EventHandler.
The PR fixed it by incrementing a counter when entering an EventHandler and decrementing the counter when exiting it (see `ThreadPoolStage` in the PR, the relevant calls are `beforeOnNext()` and `afterOnNext()`), complete with a Thread that checks the counter repeatedly until it's declared completed when an Evaluator is shut down. `EvaluatorIdlenessThreadPoolSize is the size of the thread pool for checking the counter on all `EvaluatorManagers`. This seems like the most suspicious code which may cause the driver to fail to shut down after all Evaluators have completed. It might be useful to add logging statements with a higher error level upon checking for Evaluator completion to see if this is really the problem.
{quote}


> IMRU driver does not exit even if all the task exit normally
> ------------------------------------------------------------
>
>                 Key: REEF-1482
>                 URL: https://issues.apache.org/jira/browse/REEF-1482
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF.NET
>         Environment: C#
>            Reporter: Dhruv Mahajan
>
> Recently, upon running IMRU with large number of mappers, it is observed intermittently that IMRU driver does exit while all other tasks (map and update) exit normally without any issues. 
> The aim of this JIRA is to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)