You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Julia (JIRA)" <ji...@apache.org> on 2017/01/05 01:13:58 UTC

[jira] [Created] (REEF-1691) Should not request extra evaluators if evaluator failed at WatingForEvaluator state

Julia created REEF-1691:
---------------------------

             Summary: Should not request extra evaluators if evaluator failed at WatingForEvaluator state
                 Key: REEF-1691
                 URL: https://issues.apache.org/jira/browse/REEF-1691
             Project: REEF
          Issue Type: Bug
            Reporter: Julia


When Evaluators fail at both WatingForEvalautor state and TaskRunningState, in recovery, we use _failedEvaluatorsCount to request new Evaluators. That number includes the failed Evaluators in both states, while we have requested the new Evaluators for failed Evaluators at WatingForEvalautor state. This causes additional Evaluators are requested. It is a regression caused by REEF1677.

With REEF-1688, even we loose the condition to ignore the additional Evaluators added, the additional allocated Evaluator can be received in other state because we change the system state right after we got all the Evaluators needed. When we receive additional Allocated Evaluators in other unexpected state, it will result in IMRUSystemException. 

The fix is to only request Evaluators failed during/after task submitting in recovery. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)