You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/09/09 22:04:46 UTC

[jira] [Commented] (MAPREDUCE-5003) AM recovery should recreate records for attempts that were incomplete

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737484#comment-14737484 ] 

Jason Lowe commented on MAPREDUCE-5003:
---------------------------------------

Thanks for updating the patch, Chang!  I tried it out with a sleep job where I manually failed and killed various task attempts and then killed the AM attempt so it would recover.  It recovers information for task attempts that had completed, but for task attempts that were active at the time where the AM failed (i.e.: those with a diagnostic of "Killed during application recovery") the log link is broken and the host info is missing.  That means we can't figure out where the task attempts were running and can't get to their logs.

Other comments on the patch:

completedTasksFromPreviousRun is probably not the best name given the code is now placing tasks in that collection that have not completed.

Nit: rather than returning from the middle of the TaskImpl.recover function for a running task, I think it would be a bit cleaner to have the code recover a missing taskStatus as taskState=RUNNING and add a RUNNING case to the existing switch statement.


> AM recovery should recreate records for attempts that were incomplete
> ---------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5003
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5003
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>            Reporter: Jason Lowe
>            Assignee: Chang Li
>         Attachments: MAPREDUCE-5003.1.patch, MAPREDUCE-5003.2.patch, MAPREDUCE-5003.3.patch, MAPREDUCE-5003.4.patch, MAPREDUCE-5003.5.patch, MAPREDUCE-5003.5.patch, MAPREDUCE-5003.6.patch
>
>
> As discussed in MAPREDUCE-4992, it would be nice if the AM recovered task attempt entries for *all* task attempts launched by the prior app attempt even if those task attempts did not complete.  The attempts would have to be marked as killed or something similar to indicate it is no longer running.  Having records for the task attempts enables the user to see what nodes were associated with the attempts and potentially access their logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)