You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Arun Suresh (JIRA)" <ji...@apache.org> on 2016/09/14 00:45:20 UTC
[jira] [Comment Edited] (YARN-5637) Changes in NodeManager to support Container upgrade and rollback/commit

    [ https://issues.apache.org/jira/browse/YARN-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15488712#comment-15488712 ] 

Arun Suresh edited comment on YARN-5637 at 9/14/16 12:45 AM:
-------------------------------------------------------------

Updating patch based on [~jianhe]'s suggesting and rebasing with latest YARN-5620 patch.

bq. Do we need to do something for this condition ? else, it can be removed,
Yeah.. can be removed.. I had put that there to remind me of something.. forgot to remove it :)

bq. In RollbackContainerTransition: the container.getResourceSet() will return all resources including current and previous version. We should re-request only the previous version's resources, rather than the union of both?
In the latest patch, the resourceSet is reverted to previous state as well.

bq. I still have question on the commit API, how does AM use this API in practice ?
Commit is just a way for the AM to tell the NM that it is fine with the upgrade (after it performs some upgrade diagnostics check on the container perhaps) and the container is working as it should be.. After the AM does a commit, the container cannot be rolledback and any bookkeeping required to rollback (the reInitContext for eg.) can is deleted by the NM. 

Prior to a commit, if the upgraded Container fails, NM can choose to automatically rollback.

Of course the AM is still free to call 'upgrade' again, with an old launch context.

By default, autoCommit is 'true' which means, as soon as the container is upgraded, it is also committed.

bq. ..one implication for this API is that we'll have to persiste the commit state for NM recovery later on.
Yes.. we would.. I plan to open a JIRA to address NMStateStore changes for this as well as YARN-5620

bq. Also, should the rollback API be always be able to rollback ?
Once Commit has been called, you cannot rollback. The AM would have to explicitly call the upgrade API again with the previous launchContext.

bq. ContainerLaunchContext already has the ContainerRetryContext ? can we reuse that retryContext?
I wanted to distinguish between the retry policy used to retry a failed container and the policy used to decide failure retries during upgrades. It is possible both can be the same. I just put that argument there in the _upgrade()_ API to make it explicit.

bq. The ContainerImpl#ContainerRetryContext is not updated to new value on upgrade.
This is fixed in the latest YARN-5620 patch

bq. RetryFailureTranstion: it's a bit complicated.. is it possible to simplify it something like below:
I refactored it a bit.. let me know if its ok.






was (Author: asuresh):
Updating patch based on [~jianhe]'s suggesting and rebasing with latest YARN-5620 patch.

bq. Do we need to do something for this condition ? else, it can be removed,
Yeah.. can be removed.. I had put that there to remind me of something.. forgot to remove it :)

bq. In RollbackContainerTransition: the container.getResourceSet() will return all resources including current and previous version. We should re-request only the previous version's resources, rather than the union of both?
In the latest patch, the resourceSet is reverted to previous state as well.

bq. I still have question on the commit API, how does AM use this API in practice ?
Commit is just a way for the AM to tell the NM that it is fine with the upgrade (after it performs some upgrade diagnostics check on the container perhaps) and the container is working as it should be.. After the AM does a commit, the container cannot be rolledback and any bookkeeping required to rollback (the reInitContext for eg.) can is deleted by the NM. 

Prior to a commit, if the upgraded Container fails, NM can choose to automatically rollback.

Of course the AM is still free to call 'upgrade' again, with an old launch context.

By default, autoCommit is 'true' which means, as soon as the container is upgraded, it is also committed.

bq. Also, should the rollback API be always be able to rollback ?
Once Commit has been called, you cannot rollback. The AM would have to explicitly call the upgrade API again with the previous launchContext.

bq. ContainerLaunchContext already has the ContainerRetryContext ? can we reuse that retryContext?
I wanted to distinguish between the retry policy used to retry a failed container and the policy used to decide failure retries during upgrades. It is possible both can be the same. I just put that argument there in the _upgrade()_ API to make it explicit.

bq. The ContainerImpl#ContainerRetryContext is not updated to new value on upgrade.
This is fixed in the latest YARN-5620 patch

bq. RetryFailureTranstion: it's a bit complicated.. is it possible to simplify it something like below:
I refactored it a bit.. let me know if its ok.





> Changes in NodeManager to support Container upgrade and rollback/commit
> -----------------------------------------------------------------------
>
>                 Key: YARN-5637
>                 URL: https://issues.apache.org/jira/browse/YARN-5637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Arun Suresh
>            Assignee: Arun Suresh
>         Attachments: YARN-5637.001.patch, YARN-5637.002.patch
>
>
> YARN-5620 added support for re-initialization of Containers using a new launch Context.
> This JIRA proposes to use the above feature to support upgrade and subsequent rollback or commit of the upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org