You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Yesha Vora (JIRA)" <ji...@apache.org> on 2013/12/11 09:21:07 UTC

[jira] [Created] (TEZ-676) Tez job fails on client side if nodemanager running AM is lost

Yesha Vora created TEZ-676:
------------------------------

             Summary: Tez job fails on client side if nodemanager running AM is lost 
                 Key: TEZ-676
                 URL: https://issues.apache.org/jira/browse/TEZ-676
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Yesha Vora


Scenario:

1) Run a long running Teragen Job
2) Find out the node where AM has started.
3) Kill nodemanager on AM host  using kill -9 command

Expected:
2nd AM should be started and Job should be resumed. Job should also keep running on client side

Actual:
Here, the 1st am was started and then NM running AM was killed. The job wait for around 10 min to start 2nd AM. After that, 2nd AM attempt was started. Just at the same time, job output says that "job failed" and it exited.
Though RM has already started 2nd AM. Gradually 2nd AM runs are job finishes successfully. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)