You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Byung-Gon Chun (JIRA)" <ji...@apache.org> on 2015/08/15 05:52:45 UTC

[jira] [Comment Edited] (REEF-594) Add NodeDescriptor, number of cores, and memory to construct complete EvaluatorManager for recovered evaluator

    [ https://issues.apache.org/jira/browse/REEF-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698092#comment-14698092 ] 

Byung-Gon Chun edited comment on REEF-594 at 8/15/15 3:52 AM:
--------------------------------------------------------------

[~afchung90] For the first option, you can use {{Checkpoint}}. But the second option (reconstructing state from evaluators) sounds better. You don't need to carry the information every heartbeat interval. Maybe a driver can request to send the information in the next heartbeat (or right away) after the evaluators contact the driver.



was (Author: bgchun):
[~afchung90] For the first option, you can use {{Checkpoint}}. But the second option (reconstructing state from evaluators) sounds better.


> Add NodeDescriptor, number of cores, and memory to construct complete EvaluatorManager for recovered evaluator
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: REEF-594
>                 URL: https://issues.apache.org/jira/browse/REEF-594
>             Project: REEF
>          Issue Type: Sub-task
>          Components: REEF Driver, REEF.NET Driver
>            Reporter: Andrew Chung
>            Assignee: Andrew Chung
>
> Currently, when we recover an evaluator, we cannot provide complete information of the evaluator back to the user because we do not persist the information anywhere. There are a few options here to keep the information. The first is to persist the information in the DFS, the second is to add the information in EvaluatorStatusProto.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)