You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2014/03/07 00:58:44 UTC

[jira] [Assigned] (TEZ-676) Tez job fails on client side if nodemanager running AM is lost

     [ https://issues.apache.org/jira/browse/TEZ-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hitesh Shah reassigned TEZ-676:
-------------------------------

    Assignee: Hitesh Shah

> Tez job fails on client side if nodemanager running AM is lost 
> ---------------------------------------------------------------
>
>                 Key: TEZ-676
>                 URL: https://issues.apache.org/jira/browse/TEZ-676
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Hitesh Shah
>         Attachments: TEZ-676.1.patch
>
>
> Scenario:
> 1) Run a long running Teragen Job
> 2) Find out the node where AM has started.
> 3) Kill nodemanager on AM host  using kill -9 command
> Expected:
> 2nd AM should be started and Job should be resumed. Job should also keep running on client side
> Actual:
> Here, the 1st am was started and then NM running AM was killed. The job wait for around 10 min to start 2nd AM. After that, 2nd AM attempt was started. Just at the same time, job output says that "job failed" and it exited.
> Though RM has already started 2nd AM. Gradually 2nd AM runs are job finishes successfully. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)