You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Lee young gon (Jira)" <ji...@apache.org> on 2021/09/24 03:28:00 UTC
[jira] [Created] (YARN-10968) SchedulingRequests can be wrong when
multiple containers stopped at the same time
Lee young gon created YARN-10968:
------------------------------------
Summary: SchedulingRequests can be wrong when multiple containers stopped at the same time
Key: YARN-10968
URL: https://issues.apache.org/jira/browse/YARN-10968
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Lee young gon
There are two ways to request containers to RM through AMRMClientImpl.
# addContainerRequest
# addSchedulingRequests
These two requests are linked to each parameter in Scheduler's allocate()
{code:java}
# addContainerRequest <-> ask
# addSchedulingRequests <-> schedulingRequestspublic Allocation allocate(ApplicationAttemptId applicationAttemptId,
List<ResourceRequest> ask, List<SchedulingRequest> schedulingRequests,
List<ContainerId> release, List<String> blacklistAdditions,
List<String> blacklistRemovals, ContainerUpdates updateRequests) {
FiCaSchedulerApp application = getApplicationAttempt(applicationAttemptId);
{code}
We are using yarn-service and placement_policy, in which case addSchedulingRequests is used.
AddSchedulingRequests have the problems.
When two containers are terminated at the same time in the presence of a placement_policy, AM requests a submitting scheduling request twice as follows.
{code:java}
2021-03-31 17:56:07,485 [Component dispatcher] INFO component.Component - [COMPONENT sleep] Requesting for 1 container(s)
2021-03-31 17:56:07,485 [Component dispatcher] INFO component.Component - [COMPONENT sleep] Submitting scheduling request: SchedulingRequestPBImpl{priority=0, allocationReqId=0, executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, allocationTags=[testapp], resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512, vCores:1>}, placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp}2021-03-31 17:56:07,486 [Component dispatcher] INFO component.Component - [COMPONENT sleep] Requesting for 1 container(s)
2021-03-31 17:56:07,487 [Component dispatcher] INFO component.Component - [COMPONENT sleep] Submitting scheduling request: SchedulingRequestPBImpl{priority=0, allocationReqId=0, executionType={Execution Type: GUARANTEED, Enforce Execution Type: true}, allocationTags=[testapp], resourceSizing=ResourceSizingPBImpl{numAllocations=1, resources=<memory:512, vCores:1>}, placementConstraint=notin,node,yarn_node_partition/=[test2]:notin,node,testapp} {code}
And this comes to RM at each request.
Then if the above request is received, the SingleConstrainAppPlaceAllocatorwill have only the last value.
In other words, if multiple containers die at the same time, multiple requests are created, and RM accepts only the final one request and allocates it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org