You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2016/01/27 07:34:39 UTC

[jira] [Comment Edited] (TEZ-2307) Possible wrong error message when submitting new dag

    [ https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118448#comment-15118448 ] 

Jeff Zhang edited comment on TEZ-2307 at 1/27/16 6:34 AM:
----------------------------------------------------------

I think make the submit RPC call wait might not be a good option because it is confused that user can not submit new dag even after previous dag is completed. So I suggest that user can still submit new dag, but keep the dag in NEW state until the cleanup of previous dag is done. The only issue is that the TezDAGId cache is not cleared, but it should be fine. [~sseth] What do you think ?


was (Author: zjffdu):
I think make the submit RPC call wait might not be a good option because it is confused that user can not submit new dag even after previous dag is completed. So I suggest that user can still submit new dag, but keep the dag in NEW state until the cleanup of previous dag is done. [~sseth] What do you think ?

> Possible wrong error message when submitting new dag
> ----------------------------------------------------
>
>                 Key: TEZ-2307
>                 URL: https://issues.apache.org/jira/browse/TEZ-2307
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2307-1.patch
>
>
> In the following 2 cases, AM would propagate wrong error message to client ("App master already running a DAG")
> * The last dag is completed but AM is still in RUNNING state
> * AM is in shutting down. 
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server (Server.java:run(2070)) - IPC Server handler 0 on 46821, call org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
> 	at org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
> 	at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
> 	at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
> 	at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)