You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Thomas Graves (Commented) (JIRA)" <ji...@apache.org> on 2012/04/13 17:30:20 UTC

[jira] [Commented] (MAPREDUCE-4152) map task left hanging after AM dies trying to connect to RM

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253460#comment-13253460 ] 

Thomas Graves commented on MAPREDUCE-4152:
------------------------------------------

The Job did not kill off the map task that it had running before exiting.  In JobImpl when it moves from RUNNING to ERROR, all it does is send the JobUnsuccessfulCompletion event.  I would think it would atleast try to kill any tasks it has.

Now there might also be another issue with NM as to why it didn't kill it.  I need to investigate that further.  The NM was also not able to connect to RM and I saw one of the threads restart. I'm guessing when that restarted it lost that container but I need to investigate that further.


                
> map task left hanging after AM dies trying to connect to RM
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-4152
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4152
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>
> We had an instance where the RM went down for more then an hour.  The application master exited with "Could not contact RM after 360000 milliseconds"
> 2012-04-11 10:43:36,040 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1333003059741_15999Job Transitioned from RUNNING to ERROR

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira