You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Brian Murphy (JIRA)" <ji...@apache.org> on 2014/04/22 23:16:25 UTC

[jira] [Commented] (YARN-1842) InvalidApplicationMasterRequestException raised during AM-requested shutdown

    [ https://issues.apache.org/jira/browse/YARN-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977446#comment-13977446 ] 

Brian Murphy commented on YARN-1842:
------------------------------------

Hey there,

We are seeing this bug occur while shutting down Samza containers as well. We are running Hadoop 2.3.0 on Ubuntu 12.10. The container hangs indefinitely in the KILLING state.

Here is the stack trace:

{code}
2014-04-22 20:25:08 SamzaAppMaster$ [ERROR] Error occured in amClient's callback
org.apache.samza.SamzaException: Received a reboot signal from the RM, so throwing an exception to reboot the AM.
	at org.apache.samza.job.yarn.SamzaAppMasterLifecycle.onReboot(SamzaAppMasterLifecycle.scala:59)
	at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onShutdownRequest$1.apply(SamzaAppMaster.scala:136)
	at org.apache.samza.job.yarn.SamzaAppMaster$$anonfun$onShutdownRequest$1.apply(SamzaAppMaster.scala:136)
	at scala.collection.immutable.List.foreach(List.scala:318)
	at org.apache.samza.job.yarn.SamzaAppMaster$.onShutdownRequest(SamzaAppMaster.scala:136)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:285)
2014-04-22 20:25:09 ELContextCleaner [INFO] javax.el.BeanELResolver purged
2014-04-22 20:25:09 ContextHandler [INFO] stopped o.e.j.w.WebAppContext{/,jar:file:/mnt/data/hadoop/yarn/usercache/brian/appcache/application_1397507485520_0040/filecache/10/samza-job-package-0.7.0-dist.tar.gz/lib/samza-yarn_2.10-0.7.0.jar!/scalate}
2014-04-22 20:25:10 ELContextCleaner [INFO] javax.el.BeanELResolver purged
2014-04-22 20:25:10 ContextHandler [INFO] stopped o.e.j.w.WebAppContext{/,jar:file:/mnt/data/hadoop/yarn/usercache/brian/appcache/application_1397507485520_0040/filecache/10/samza-job-package-0.7.0-dist.tar.gz/lib/samza-yarn_2.10-0.7.0.jar!/scalate}
2014-04-22 20:25:10 SamzaAppMasterLifecycle [INFO] Shutting down.
2014-04-22 20:25:10 SamzaAppMaster$ [WARN] Listener org.apache.samza.job.yarn.SamzaAppMasterLifecycle@3c9ead34 failed to shutdown.
org.apache.hadoop.yarn.exceptions.InvalidApplicationMasterRequestException: Application doesn't exist in cache appattempt_1397507485520_0040_000001
	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.throwApplicationDoesNotExistInCacheException(ApplicationMasterService.java:329)
	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.finishApplicationMaster(ApplicationMasterService.java:288)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.finishApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:75)
	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:97)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1962)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1958)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1956)
{code}

> InvalidApplicationMasterRequestException raised during AM-requested shutdown
> ----------------------------------------------------------------------------
>
>                 Key: YARN-1842
>                 URL: https://issues.apache.org/jira/browse/YARN-1842
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.3.0
>            Reporter: Steve Loughran
>            Priority: Minor
>         Attachments: hoyalogs.tar.gz
>
>
> Report of the RM raising a stack trace [https://gist.github.com/matyix/9596735] during AM-initiated shutdown. The AM could just swallow this and exit, but it could be a sign of a race condition YARN-side, or maybe just in the RM client code/AM dual signalling the shutdown. 
> I haven't replicated this myself; maybe the stack will help track down the problem. Otherwise: what is the policy YARN apps should adopt for AM's handling errors on shutdown? go straight to an exit(-1)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)