You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Bibin A Chundatt (JIRA)" <ji...@apache.org> on 2015/11/21 15:44:11 UTC

[jira] [Commented] (MAPREDUCE-6476) InvalidResourceException when Nodelabel don't have access to queue should be handled

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020508#comment-15020508 ] 

Bibin A Chundatt commented on MAPREDUCE-6476:
---------------------------------------------

We can solve this in two ways

# Create  InvalidLabelResourceException extending InvalidResourceException and based on this exception we can kill application/job using {{eventHandler.handle(new JobEvent(jobId, JobEventType.JOB_KILL))}}
# ResourceRequestAllocate response we can send the accessible labels too for the queue.

Cons of first approach is that if REDUCE is not having access then will wait till the exception to be received at AM , but simple implementation.

Thoughts??
\cc [~leftnoteasy]


> InvalidResourceException when Nodelabel don't have access to queue should be handled
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6476
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6476
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>
> Steps to reproduce
> ===============
> Submit  mapreduce job 
> # map to label x
> # reduce to label y
> Precondition
> # Queue b to which reduce is submitted not having access to label specified
> *Impact*
> # Jobs fail only of the RM-AM comunication timeout
> (About 10 mins i think)
> Should kill the job immediately when InvalidResourceException is received on {{RMContainerRequestor#makeRemoteRequest}}
> *Logs*
> {noformat}
> 2015-09-11 16:44:30,116 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, queue=b1 doesn't have permission to access all labels in resource request. labelExpression of resource request=1. Queue labels=3
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> 	at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:457)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> 	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224)
> 	at sun.reflect.GeneratedConstructorAccessor39.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> 	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> 	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateYarnException(RPCUtil.java:75)
> 	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:116)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
> 	at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:251)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> 	at com.sun.proxy.$Proxy37.allocate(Unknown Source)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor.makeRemoteRequest(RMContainerRequestor.java:203)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:694)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:263)
> 	at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:281)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException): Invalid resource request, queue=b1 doesn't have permission to access all labels in resource request. labelExpression of resource request=1. Queue labels=3
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
> 	at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
> 	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:457)
> 	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
> 	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2230)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2224)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1440)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1371)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy36.allocate(Unknown Source)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
> 	... 11 more
> 2015-09-11 16:44:31,120 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Could not contact RM after 360000 milliseconds.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)