You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2013/02/11 17:09:13 UTC

[jira] [Created] (MAPREDUCE-4999) AM attempt ended up in ERROR state and generated history after node decommissioned

Jason Lowe created MAPREDUCE-4999:
-------------------------------------

             Summary: AM attempt ended up in ERROR state and generated history after node decommissioned
                 Key: MAPREDUCE-4999
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4999
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mr-am
    Affects Versions: 0.23.6
            Reporter: Jason Lowe


Saw a case where a job recorded history for an app attempt that ended up in the ERROR state after the node the AM was running on was decommissioned.  When the node was decommissioned, the RM marked all the containers on the node as killed and subsequently the application attempt was invalidated.  When the AM attempt heartbeated in before the NM did (and therefore before the NM killed the AM) it discovered it was no longer a valid app attempt and exited in the ERROR state.  However it also thought, incorrectly, that it was the last attempt and generated the history for the job.

Decommissioning a node should not cause an app attempt to end up in the ERROR state with history, as the subsequent app attempt should be the one to generate the definitive history for the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira