You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Arun Suresh (JIRA)" <ji...@apache.org> on 2017/10/20 21:41:00 UTC
[jira] [Comment Edited] (YARN-7373) The atomicity of container update in RM is not clear

    [ https://issues.apache.org/jira/browse/YARN-7373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213280#comment-16213280 ] 

Arun Suresh edited comment on YARN-7373 at 10/20/17 9:40 PM:
-------------------------------------------------------------

[~haibochen] / [~miklos.szegedi@cloudera.com]

So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either. 
# The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there
# It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized.
# It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that.

Although - I do have to mention, that the code you pasted above - which is part of the changes in YARN-4511 can cause a few problems, since you are updating the node as well, and you might need a lock on the node before you do that.


was (Author: asuresh):
[~haibochen] / [~miklos.szegedi@cloudera.com]

So, like I mentioned in the earlier JIRA, what we have in trunk currently is mostly atomic because:
# the {{swapContainer}} is called within the {{pullNewlyUpdatedContainers}} method in the SchedulerApplicationAttempt - during which the thread has acquired a write lock on the application. You don't need a lock on the queue and since there are no changes to the node, there is not need for that either. 
# The only concurrent action that can happen, is that the Node where the Container is running might have heart-beaten in - but that operation, releaseContainer, tries to take a lock on the app too, which will have to contend with the writelock acquired in {{pullNewlyUpdatedContainers}} - so we are good there
# It is possible that multiple container update requests (say container increase requests) for containers running on the same node can come in concurrently - but the flow is such that the actual resource allocation for the update is internally treated as a new (temporary) container container allocation - and just like any normal container allocation in the scheduler, they are serialized.
# It is possible that multiple container requests for the SAME container can come in too - but we have a container version that takes care of that.

> The atomicity of container update in RM is not clear
> ----------------------------------------------------
>
>                 Key: YARN-7373
>                 URL: https://issues.apache.org/jira/browse/YARN-7373
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> While reviewing YARN-4511, Miklos noticed that  
> {code:java}
> 342	    // notify schedulerNode of the update to correct resource accounting
> 343	    node.containerUpdated(existingRMContainer, existingContainer);
> 344	
> 345	    ((RMContainerImpl)tempRMContainer).setContainer(updatedTempContainer);
> 346	    // notify SchedulerNode of the update to correct resource accounting
> 347	    node.containerUpdated(tempRMContainer, tempContainer);
> 348	
> {code}
> bq. I think that it would be nicer to lock around these two calls to become atomic.
> Container update, and thus container swap as part of that, is atomic according to [~asuresh].
> It'd be nice to discuss this in more details to see if we want to be more conservative.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org