You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Sharad Agarwal (JIRA)" <ji...@apache.org> on 2011/07/17 12:19:00 UTC

[jira] [Commented] (MAPREDUCE-2702) OutputCommitter changes for MR Application Master recovery

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066609#comment-13066609 ] 

Sharad Agarwal commented on MAPREDUCE-2702:
-------------------------------------------


1. Job level commit changes:
Currently in FileOutputCommitter:
- tasks write output to <outputpath>/_temporary/<taskid>
- task commit promotes it to <outputpath>/<taskid>

Currently there is no job level commit. The task commit promotes the output directly to the final destination.
This would have problem in the case of recovery. In case of network partitions etc, there is a possibility of more than
one AppMaster running simultaneously for the same job. To avoid them overstepping on each other, we need to make
changes in job level commit as follows: -
[Below <AppAttemptCount> is the count of the application master start]
- tasks write output to <outputpath>/_<AppAttemptCount>/_temporary/<taskid>
- task commit promotes it to <outputpath>/_<AppAttemptCount>/<taskid>
- job commit promotes all tasks from the current AppAttemptCount dir to <outputpath>/<taskid>

2. *New API* required in OutputCommitter:
To recover the *completed* task output from the previous life of appmaster, the output needs to be moved from
<outputpath>/_<AppAttemptCount-1>/<taskid> to <outputpath>/_<AppAttemptCount>/<taskid>. 
For this we will need a new recoverTask api in OutputCommitter.


> OutputCommitter changes for MR Application Master recovery
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2702
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2702
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>
> In MR AM recovers from a crash, it only reruns the non completed tasks. The completed tasks (along with their output, if any) needs to be recovered from the previous life. This would require some changes in OutputCommitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira