You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (JIRA)" <ji...@apache.org> on 2019/03/16 22:25:00 UTC
[jira] [Created] (YARN-9393) Asking for more resources than cluster capacity errors are handled in a different layer for custom resources

Szilard Nemeth created YARN-9393:
------------------------------------

             Summary: Asking for more resources than cluster capacity errors are handled in a different layer for custom resources
                 Key: YARN-9393
                 URL: https://issues.apache.org/jira/browse/YARN-9393
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Szilard Nemeth


*1. If I start an MR sleep job, asking for more memory than the cluster has:*
 
Command: 
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi -Dmapreduce.map.resource.memory-mb=8000 -Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
 
Error message (coming from org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest):
 
{code:java}
2019-03-16 13:14:58,963 INFO mapreduce.Job: Job job_1552766296556_0003 failed with state KILLED due to: The required MAP capability is more than the supported max container capability in the cluster. Killing the Job. mapResourceRequest: <memory:8000, vCores:1, resource1: 5000M> maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job received Kill while in RUNNING state.{code}
 
 
 
*2. If I start an MR sleep job, asking for more vcores than the cluster has:*
 
Command: 
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi -Dmapreduce.map.resource.memory-mb=2000 -Dmapreduce.map.resource.vcores=9 -Dmapreduce.map.resource.resource1=5000M 1 1000;popd{code}
 
Error message (coming from: org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator#handleMapContainerRequest)
 
{code:java}
2019-03-16 13:17:59,546 INFO mapreduce.Job: Job job_1552766296556_0005 failed with state KILLED due to: The required MAP capability is more than the supported max container capability in the cluster. Killing the Job. mapResourceRequest: <memory:2000, vCores:9, resource1: 5000M> maxContainerCapability:<memory:6144, vCores:8, resource1: 6000000000> Job received Kill while in RUNNING state{code}
 
 
*3. However, if I start an MR sleep job, asking for more amount of "resource1" than the cluster has:*
 
Command:
{code:java}
MY_HADOOP_VERSION=3.3.0-SNAPSHOT;pushd /opt/hadoop;bin/yarn jar "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" pi -Dmapreduce.map.resource.memory-mb=200 -Dmapreduce.map.resource.resource1=18G 1 1000;popd{code}
 
 
Error stacktrace (coming from ResourceManager / *ApplicationMasterService.allocate)*
2019-03-16 15:05:32,893 WARN
{code:java}
org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor: Invalid resource ask by application appattempt_1552773851229_0001_000001 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocatio n. Requested resource type=[resource1], Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<mem ory:8192, vCores:8192, resource1: 9223372036854775807>         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.throwInvalidResourceException(SchedulerUtils.java:492)         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkResourceRequestAgainstAvailableResource(SchedulerUtils.java:388)         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:316)         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:294)         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:302)         at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:259)         at org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor.allocate(DefaultAMSProcessor.java:243)         at org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor.allocate(DisabledPlacementProcessor.java:75)         at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:93)         at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:429)         at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)         at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:531)         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878)         at java.security.AccessController.doPrivileged(Native Method)         at javax.security.auth.Subject.doAs(Subject.java:422)         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2827) 2019-03-16 15:05:32,894 INFO org.apache.hadoop.ipc.Server: IPC Server handler 28 on default port 8030, call Call#37 Retry#0 org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.allo cate from 172.28.196.136:40734 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocatio n. Requested resource type=[resource1], Requested resource=<memory:200, vCores:1, resource1: 18000000000>, maximum allowed allocation=<memory:6144, vCores:8, resource1: 6000000000>, p lease note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<mem ory:8192, vCores:8192, resource1: 9223372036854775807>{code}
 
 
For normal resources, exceeding the cluster capacity is handled by the MR client (RMContainerAllocator), but for custom resources it's handled in ApplicationMasterService.allocate, meaning that the AM was created and it fails to allocate the mapper container because of the too big request. 
 
This behavior is inconsistent, we need to aim for handling all resource similarly. My vote is to add a piece of code to the RMContainerAllocator that should also handle custom resources and fail-fast as like it happens with normal resources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org