You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Kavan Suresh (JIRA)" <ji...@apache.org> on 2016/10/13 21:48:20 UTC

[jira] [Created] (TEZ-3468) TezUncheckedException: Cannot get ApplicationACLs before all services have started

Kavan Suresh created TEZ-3468:
---------------------------------

             Summary: TezUncheckedException: Cannot get ApplicationACLs before all services have started
                 Key: TEZ-3468
                 URL: https://issues.apache.org/jira/browse/TEZ-3468
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Kavan Suresh


Discovered this issue while running some ACID queries concurrently over on 10 threads. TezTask failed with the below error:
{code:java}
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] |tezplugins.LlapTaskSchedulerService|: ScheduleResult for Task: TaskInfo{task=attempt_1476340082683_0013_137_01_000019_0, priority=14, startTime=0, containerId=null, assignedInstance=null, uniqueId=7505, localityDelayTimeout=9223372036854775807} = DELAYED_RESOURCES
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] |tezplugins.LlapTaskSchedulerService|: Preempting for task=attempt_1476340082683_0013_137_01_000019_0 on any available host
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] |tezplugins.LlapTaskSchedulerService|: No tasks qualify as killable to schedule tasks at priority 14
2016-10-13 09:14:24,999 [INFO] [Socket Reader #1 for port 36800] |ipc.Server|: Auth successful for application_1476340082683_0013 (auth:SIMPLE)
2016-10-13 09:14:24,999 [ERROR] [Dispatcher thread {Central}] |common.AsyncDispatcher|: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Cannot get ApplicationACLs before all services have started
        at org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1711)
        at org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:491)
        at org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:448)
        at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:419)
        at org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:76)
        at org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:60)
        at org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:36)
        at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
2016-10-13 09:14:24,999 [WARN] [Dispatcher thread {Central}] |container.AMContainerImpl|: Unexpected TA Assignment: TAId: attempt_1476340082683_0013_137_01_000017_3  for ContainerId: container_222212222_0013_01_007508 while in state: ALLOCATED
{code}
{code}
2016-10-13 09:14:24,929 [INFO] [pool-1-thread-1] |app.DAGAppMaster|: DAGAppMasterShutdownHook invoked
2016-10-13 09:14:24,929 [INFO] [pool-1-thread-1] |app.DAGAppMaster|: DAGAppMaster received a signal. Signaling TaskScheduler
2016-10-13 09:14:24,931 [INFO] [pool-1-thread-1] |rm.TaskSchedulerManager|: TaskScheduler notified that iSignalled was : true
{code}

{code}
2016-10-13 09:14:25,089 [WARN] [AMRM Heartbeater thread] |retry.RetryInvocationHandler|: Exception while invoking ApplicationMasterProtocolPBClientImpl.allocate over null. Not retrying because try once and fail.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1476340082683_0013_000
001 doesn't exist in ApplicationMasterService cache.
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:445)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBSer
viceImpl.java:60)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationM
asterProtocol.java:99)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
at com.sun.proxy.$Proxy42.allocate(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:237)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException): Application attempt appattempt_1476340082683_0013_000001 doesn't exist in ApplicationMasterService cache.
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:445)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy41.allocate(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 9 more
2016-10-13 09:14:25,090 [INFO] [AMRM Heartbeater thread] |impl.AMRMClientAsyncImpl|: Shutdown requested. Stopping callback.
{code}

{code}
Application Report :
	Application-Id : application_1476340082683_0013
	Application-Name : HIVE-66493e0b-0ccd-4878-abe5-405d87d02b2d
	Application-Type : TEZ
	User : hive
	Queue : default
	Application Priority : null
	Start-Time : 1476343184157
	Finish-Time : 1476350064973
	Progress : 100%
	State : FAILED
	Final-State : FAILED
	Tracking-URL : http://hn0-c44488.hsic-hadl25-ssh.c2.internal.cloudapp.net:8088/cluster/app/application_1476340082683_0013
	RPC Port : -1
	AM Host : N/A
	Aggregate Resource Allocation : 7045512 MB-seconds, 6880 vcore-seconds
	Log Aggregation Status : SUCCEEDED
	Diagnostics : Application application_1476340082683_0013 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1476340082683_0013_000001 exited with  exitCode: -104
For more detailed output, check the application tracking page: http://hn0-c44488.hsic-hadl25-ssh.c2.internal.cloudapp.net:8088/cluster/app/application_1476340082683_0013 Then click on links to logs of each attempt.
Diagnostics: Container [pid=19619,containerID=container_1476340082683_0013_01_000001] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1476340082683_0013_01_000001 :
	|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
	|- 19633 19619 19619 19619 (java) 32742 3597 2764886016 263532 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m -Djava.io.tmpdir=/mnt/resource/hadoop/yarn/local/usercache/hive/appcache/application_1476340082683_0013/container_1476340082683_0013_01_000001/tmp -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.2.0-37 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= org.apache.tez.dag.app.DAGAppMaster --session
	|- 19619 19617 19619 19619 (bash) 0 0 11575296 753 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m -Djava.io.tmpdir=/mnt/resource/hadoop/yarn/local/usercache/hive/appcache/application_1476340082683_0013/container_1476340082683_0013_01_000001/tmp -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.2.0-37 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster --session 1>/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001/stdout 2>/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
	Unmanaged Application : false
	Application Node Label Expression : <Not set>
	AM container Node Label Expression : <DEFAULT_PARTITION>
{code}

[~hitesh] mentioned the AM was killed as it went over mem limits.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)