You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Arun Suresh (JIRA)" <ji...@apache.org> on 2017/10/20 21:41:00 UTC
[jira] [Comment Edited] (YARN-7373) The atomicity of container
update in RM is not clear
[ https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213280#comment-16213280 ]
Arun Suresh edited comment on YARN-7373 at 10/20/17 9:40 PM:
-------------------------------------------------------------
[~haibochen] / [~miklos.szegedi@cloudera.com]
So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either.
# The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there
# It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized.
# It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that.
Although - I do have to mention, that the code you pasted above - which is part of the changes in YARN-4511 can cause a few problems, since you are updating the node as well, and you might need a lock on the node before you do that.
was (Author: asuresh):
[~haibochen] / [~miklos.szegedi@cloudera.com]
So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either.
# The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there
# It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized.
# It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that.
> The atomicity of container update in RM is not clear
> ----------------------------------------------------
>
> Key: YARN-7373
> URL: https://issues.apache.org/jira/browse/YARN-7373
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: resourcemanager
> Reporter: Haibo Chen
> Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that
> {code:java}
> 342 // notify schedulerNode of the update to correct resource accounting
> 343 node.containerUpdated(existingRMContainer, existingContainer);
> 344
> 345 ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346 // notify SchedulerNode of the update to correct resource accounting
> 347 node.containerUpdated(tempRMContainer, tempContainer);
> 348
> {code}
> bq. I think that it would be nicer to lock around these two calls to become atomic.
> Container update, and thus container swap as part of that, is atomic according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more conservative.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org