You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Xuan Gong (JIRA)" <ji...@apache.org> on 2015/02/24 06:50:11 UTC

[jira] [Commented] (YARN-3245) Find a way to reserve AMContainer resource to launch clean-up container in CapacityScheduler

    [ https://issues.apache.org/jira/browse/YARN-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14334446#comment-14334446 ] 

Xuan Gong commented on YARN-3245:
---------------------------------

More details here:
Currently, we have two directions:
* NM->RM : When the AM is successfully finished/failed, the NM will inform the RM through the regular heartbeat, then RM will change the related RMContainer/RMAppAttempt/RMApp status.
* RM->NM: When user kills the app/pre-emption, the RM will change the status first, then inform the NM through the NM heartbeat. NM will kill the AMContainer.

No matter in which direction, they will use the common function CapacityScheduler#completeContainer. In this function, based on whether the container is AM and clean-up container is enabled, we could reserve the resource by just trigger the containerFinishedEven to inform the RMContainer/RMAppAttempt/RMApp to change their status, but do not inform the queue to release the resource.

If this attempt is not the last attempt, we will release the container resource. If it is, we will use the resource to launch the clean-up container.

Based the different direction either NM->RM and RM->NM, we need to make sure the AMContainer really exists. The only way to make sure it is through the NodeStatusUpdate. If we could get the AMContainer from NodeStatusUpdate#completeContainerList, it means the AMContainer exists. Here, we could add a flag/trigger to indicate that right now it is the good time to launch the clean-up container.

So, in this ticket, we expect to fix: reserve the AMContainer resource, and release the resource afterwards.
How/When to launch the clean-up container will be fixed separately.


> Find a way to reserve AMContainer resource to launch clean-up container in CapacityScheduler
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-3245
>                 URL: https://issues.apache.org/jira/browse/YARN-3245
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>
> The clean-up container will be launched after the application is finished/killed/failed. Cleanup container may not get resources if we negotiate the resource for it separately because cluster may have gotten busy after the final AM exit. The propose is to reserve AMContainer resource, and use it to launch clean-up container. In that case, we do not need to re-negotiate the resource, and clean-up container can be launch in the same NM as AM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)