You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-dev@hadoop.apache.org by "qus-jiawei (JIRA)" <ji...@apache.org> on 2013/12/03 09:49:36 UTC

[jira] [Created] (YARN-1469) ApplicationMaster crash cause the TaskAttemptImpl couldn't handle the TA_TOO_MANY_FETCH_FAILURE at KILLED

qus-jiawei created YARN-1469:
--------------------------------

             Summary: ApplicationMaster crash cause the TaskAttemptImpl  couldn't handle the TA_TOO_MANY_FETCH_FAILURE at KILLED
                 Key: YARN-1469
                 URL: https://issues.apache.org/jira/browse/YARN-1469
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: qus-jiawei


This bug could happen when using demission command to demission an nodemanager.The detail is bellow:

1.one job running happily on the yarn cluster and some MapTask finish on machine A then begin to schedule the reduce task.Now,the MapTask's state is successed.
2.The hadoop admin demission machine A 's NodeManager.
3.The ApplicationMaster find the some MapTask hived finish on a demissioned nodemanager, change this MapTask 's state to KILLED.
4.Some running ReduceTask couldn't get the data from MapTask throw an event TA_TOO_MANY_FETCH_FAILURE to TaskAttemptImpl.
5.TaskAttemptImpl couldn't handle TA_TOO_MANY_FETCH_FAILURE  at KILLED state then throw an exception,cause the ApplicationMaster turn to ERROR.

I think TaskAttemptImpl could just ignore the TA_TOO_MANY_FETCH_FAILURE  event at KILLED state 



--
This message was sent by Atlassian JIRA
(v6.1#6144)