You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Botong Huang (JIRA)" <ji...@apache.org> on 2018/11/05 18:37:00 UTC

[jira] [Commented] (YARN-7631) ResourceRequest with different Capacity (Resource) overrides each other in RM and thus lost

    [ https://issues.apache.org/jira/browse/YARN-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675566#comment-16675566 ] 

Botong Huang commented on YARN-7631:
------------------------------------

Please consider directly using _ResourceRequestSetKey_ to replace _SchedulerRequestKey_ for this, thx!

> ResourceRequest with different Capacity (Resource) overrides each other in RM and thus lost
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-7631
>                 URL: https://issues.apache.org/jira/browse/YARN-7631
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Botong Huang
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: resourcebug.patch
>
>
> Today in AMRMClientImpl, the ResourceRequests (RR) are kept as: RequestId -> Priority -> ResourceName -> ExecutionType -> Resource (Capacity) -> ResourceRequestInfo (the actual RR). This means that only RRs with the same (requestId, priority, resourcename, executionType, resource) will be grouped and aggregated together. 
> While in RM side, the mapping is SchedulerRequestKey (RequestId, priority) -> LocalityAppPlacementAllocator (ResourceName -> RR). 
> The issue is that in RM side Resource is not in the key to the RR at all. (Note that executionType is also not in the RM side, but it is fine because RM handles it separately as container update requests.) This means that under the same value of (requestId, priority, resourcename), RRs with different Resource values will be grouped together and override each other in RM. As a result, some of the container requests are lost and will never be allocated. Furthermore, since the two RRs are kept under different keys in AMRMClient side, allocation of RR1 will only trigger cancel for RR1, the pending RR2 will not get resend as well. 
> I’ve attached an unit test (resourcebug.patch) which is failing in trunk to illustrate this issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org