You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2014/08/26 19:33:57 UTC

[jira] [Commented] (TEZ-1493) Tez examples fail in recovery sometimes

    [ https://issues.apache.org/jira/browse/TEZ-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110995#comment-14110995 ] 

Hitesh Shah commented on TEZ-1493:
----------------------------------

[~zjffdu] Patch fails to apply. Also, is there some other change that needs to be done to ensure that state does not go from RUNNING back to SUBMITTED when the AM is recovering? 

> Tez examples fail in recovery sometimes
> ---------------------------------------
>
>                 Key: TEZ-1493
>                 URL: https://issues.apache.org/jira/browse/TEZ-1493
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Blocker
>         Attachments: Tez-1493.patch
>
>
> {code}
> 14/08/25 17:37:03 INFO client.TezClient: Submitting DAG to YARN, applicationId=application_1408499461970_0053, dagName=WordCount
> 14/08/25 17:37:03 INFO impl.YarnClientImpl: Submitted application application_1408499461970_0053
> 14/08/25 17:37:03 INFO client.TezClient: The url to track the Tez AM: http://jzhangMBPr.local:8088/proxy/application_1408499461970_0053/
> 14/08/25 17:37:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 14/08/25 17:37:03 INFO client.AHSProxy: Connecting to Application History server at /0.0.0.0:10200
> 14/08/25 17:37:03 INFO rpc.DAGClientRPCImpl: Waiting for DAG to start running
> 14/08/25 17:37:07 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 0% TotalTasks: 2 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 14/08/25 17:37:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING Progress: 50% TotalTasks: 2 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
> 14/08/25 17:37:17 INFO rpc.DAGClientRPCImpl: DAG completed. FinalState=SUBMITTED
> WordCount failed with diagnostics: []
> {code}
> The client side shows that the job is failed, but checking the logs found that the recovery works in server side, and eventually finish the job successfully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)