You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Adam Kawa (JIRA)" <ji...@apache.org> on 2014/06/29 01:03:24 UTC
[jira] [Created] (YARN-2230) Fix description of
yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to
show)
Adam Kawa created YARN-2230:
-------------------------------
Summary: Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code to show)
Key: YARN-2230
URL: https://issues.apache.org/jira/browse/YARN-2230
Project: Hadoop YARN
Issue Type: Bug
Reporter: Adam Kawa
Priority: Minor
When a user requests more vcores than the allocation limit (e.g. mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores), then InvalidResourceRequestException is thrown - https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
{code}
if (resReq.getCapability().getVirtualCores() < 0 ||
resReq.getCapability().getVirtualCores() >
maximumResource.getVirtualCores()) {
throw new InvalidResourceRequestException("Invalid resource request"
+ ", requested virtual cores < 0"
+ ", or requested virtual cores > max configured"
+ ", requestedVirtualCores="
+ resReq.getCapability().getVirtualCores()
+ ", maxVirtualCores=" + maximumResource.getVirtualCores());
}
{code}
According to documentation - yarn-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml, the request will be capped to the allocation limit:
{code}
<property>
<description>The maximum allocation for every container request at the RM,
in terms of virtual CPU cores. Requests higher than this won't take effect,
and will get capped to this value.</description>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>32</value>
</property>
{code}
* Either documentation or code should be corrected (unless this exception is handled elsewhere accordingly, but it looks that it is not).
This behavior is confusing, because when such a job (with mapreduce.map.cpu.vcores is larger than yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
{code}
2014-06-29 00:34:51,469 WARN org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Invalid resource ask by application appattempt_1403993411503_0002_000001
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=32, maxVirtualCores=3
at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{code}
* IMHO, such an exception should be forwarded to client. Otherwise, it is non obvious to discover why a job does not make any progress.
The same looks to be related to memory.
--
This message was sent by Atlassian JIRA
(v6.2#6252)