You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Jian He (JIRA)" <ji...@apache.org> on 2014/05/05 21:35:17 UTC

[jira] [Updated] (YARN-1368) Common work to re-populate containers’ state into scheduler

     [ https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jian He updated YARN-1368:
--------------------------

    Attachment: YARN-1368.1.patch

Uploaded a new patch.
- AbstractYarnScheduler#recoverContainersOnNode() does the majority of recovery mechanism which recovers RMContainer, SchedulerNode,Queue. SchedulerApplicationAttempt, appSchedulingInfo accordingly.
- ResourceTrackerService#handleContainerStatus is not needed anymore, that’s handled in the common recovery flow.
- Changed RMAppRecoveredTransition to add the current attempt to scheduler.
- Changed a few RMAppAttempt transitions to capture the completed containers that are recovered.
- some modifications in CapacityScheduler to not send unnecessary app_accepted/attempt_added event to the recovered apps/attempts.

Todo:
- Replace the containerStatus sent across via NM registration with a new object which captures the resource capability of the container.
-  FSQueue needs to implements its own recoverContainer method

> Common work to re-populate containers’ state into scheduler
> -----------------------------------------------------------
>
>                 Key: YARN-1368
>                 URL: https://issues.apache.org/jira/browse/YARN-1368
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Anubhav Dhoot
>         Attachments: YARN-1368.1.patch, YARN-1368.preliminary.patch
>
>
> YARN-1367 adds support for the NM to tell the RM about all currently running containers upon registration. The RM needs to send this information to the schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)