You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Feng Yuan (JIRA)" <ji...@apache.org> on 2016/05/24 12:42:12 UTC

[jira] [Commented] (TEZ-3266) DAG failed when yarn resources is rare like " No groups available for user A" because DAGAppMaster launched and exit_with_sucessful immediately.

    [ https://issues.apache.org/jira/browse/TEZ-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298106#comment-15298106 ] 

Feng Yuan commented on TEZ-3266:
--------------------------------

i try to print yarn error:
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application killed by user.


> DAG failed when yarn resources is rare like " No groups available for user A" because DAGAppMaster launched and exit_with_sucessful  immediately.
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3266
>                 URL: https://issues.apache.org/jira/browse/TEZ-3266
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.5.2
>         Environment: hadoop-2.6.0, hive-0.14.0
>            Reporter: Feng Yuan
>
> When in a resource queue there is full of apps,if you submit a new tez app,there is infomations like :
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1463493135662_66844_01_000004 Container Transitioned from ALLOCATED to ACQUIRED
> 2016-05-24 01:52:02,963 INFO org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application with id 66847 submitted by user bae
> 2016-05-24 01:52:02,963 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Storing application with id application_1463493135662_66847
> 2016-05-24 01:52:02,963 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=bae      IP=192.168.44.40        OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1463493135662_66847
> 2016-05-24 01:52:02,963 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1463493135662_66847 State change from NEW to NEW_SAVING
> 2016-05-24 01:52:02,964 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Storing info for app: application_1463493135662_66847
> 2016-05-24 01:52:02,966 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1463493135662_66847 State change from NEW_SAVING to SUBMITTED
> 2016-05-24 01:52:02,966 WARN org.apache.hadoop.security.UserGroupInformation: No groups available for user bae
> 2016-05-24 01:52:02,966 WARN org.apache.hadoop.security.UserGroupInformation: No groups available for user bae
> 2016-05-24 01:52:02,966 WARN org.apache.hadoop.security.UserGroupInformation: No groups available for user bae
> 2016-05-24 01:52:02,966 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Accepted application application_1463493135662_66847 from user: bae, in queue: default, currently num of applications: 16
> 2016-05-24 01:52:02,967 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1463493135662_66847 State change from SUBMITTED to ACCEPTED
> 2016-05-24 01:52:02,967 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1463493135662_66847_000001
> 2016-05-24 01:52:02,967 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1463493135662_66847_000001 State change from NEW to SUBMITTED
> 2016-05-24 01:52:02,967 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added Application Attempt appattempt_1463493135662_66847_000001 to scheduler from user: bae
> 2016-05-24 01:52:02,967 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1463493135662_66847_000001 State change from SUBMITTED to SCHEDULED
> 2016-05-24 01:52:02,976 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1463493135662_66837_000001 (auth:SIMPLE)
> 2016-05-24 01:52:03,044 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed...
> and then a container assigned to AM but am cant running,it's log:
> 2016-05-24 01:51:57,610 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container container_1463493135662_66846_01_000004 succeeded
> 2016-05-24 01:51:57,610 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1463493135662_66846_01_000004 transitioned from RUNNING to EXITED_WITH_SUCCESS
> you can see it can launch but why exit right away?
> and because of AM is exit,you will see these log in hive.log:
> 2.2016-05-24 01:52:13,921 ERROR [Thread-18]: exec.Task (TezTask.java:execute(184)) - Failed to execute tez graph.
> org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown
>         at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:603)
>         at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:212)
>         at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:234)
>         at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:136)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
> 3.2016-05-24 01:52:14,597 INFO  [main]: client.TezClient (TezClient.java:stop(444)) - Failed to shutdown Tez Session via proxy
> org.apache.tez.dag.api.SessionNotRunning: Application not running, applicationId=application_1463493135662_66847, yarnApplicationState=FAILED, finalApplicationStatus=FAILED, trackingUrl=http://rm01:8088/cluster/app/application_1463493135662_66847
>         at org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:787)
>         at org.apache.tez.client.TezClient.getSessionAMProxy(TezClient.java:623)
>         at org.apache.tez.client.TezClient.stop(TezClient.java:436)
>         at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.close(TezSessionState.java:269)
>         at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.close(TezSessionPoolManager.java:176)
>         at org.apache.hadoop.hive.ql.session.SessionState.close(SessionState.java:1256)
>         at org.apache.hadoop.hive.cli.CliSessionState.close(CliSessionState.java:108)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:683)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> 2016-05-24 01:52:14,597 INFO  [main]: client.TezClient (TezClient.java:stop(449)) - Could not connect to AM, killing session via YARN, sessionName=XXX, applicationId=application_1463493135662_66847



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)