You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2016/04/19 18:47:25 UTC
[jira] [Comment Edited] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15248151#comment-15248151 ] 

Junping Du edited comment on MAPREDUCE-6608 at 4/19/16 4:46 PM:
----------------------------------------------------------------

[~vinodkv], thanks for review and comments. I think most your points here are solid, however, the comments about "Output Commit of previous tasks" is a bit stale.

bq. The new AM needs to make sure that output of previously running containers can be safely committed. IIRC, with today's FileOutputCommitter, new AM will only promote task-outputs that are present in $jobOutput/_temporary/$currentAttemptID/
This is true before YARN-4815. However, after YARN-4815, most task-output commit to job final output is handled by {{FileOutputCommitter.commitTask()}} instead of {{FileOutputCommitter.commitJob()}}. So the commitJob() only left work of cleanup $jobOutput/_temporary. So there is nothing need to do here except we make sure "mapreduce.fileoutputcommitter.algorithm.version" is set to 2. 
This is also an assumption setting for work of MAPREDUCE-5485 which is a prerequisite for feature here - or AM will failed directly in case previous AM ends in job committing.

Investigating on rest of issues and will bring some possible proposals later.  


bq. I'd suggest spending more time on the design, atleast on some of the areas I pointed above and then create a branch, create sub-tasks, do some prototypes etc.
+1. This feature work could be a bit over my expectation before. I agree we could need a separated branch for developing this in parallel. Will create a branch once we finalize our design work. 



was (Author: djp):
[~vinodkv], thanks for review and comments. I think most your points here are solid, however, the comments about "Output Commit of previous tasks" is a bit stale.

bq. The new AM needs to make sure that output of previously running containers can be safely committed. IIRC, with today's FileOutputCommitter, new AM will only promote task-outputs that are present in $jobOutput/_temporary/$currentAttemptID/
This is true before YARN-4815. However, after YARN-4815, most task-output commit to job final output is handled by {{FileOutputCommitter.commitTask()}} instead of {{FileOutputCommitter.commitJob()}}. So the commitJob() only left work of cleanup $jobOutput/_temporary. So there is nothing need to do here unless we make sure "mapreduce.fileoutputcommitter.algorithm.version" is set to 2. 
This is also an assumption setting for work of MAPREDUCE-5485 which is a prerequisite for feature here - or AM will failed directly in case previous AM ends in job committing.

Investigating on rest of issues and will propose some possible solutions later.  


bq. I'd suggest spending more time on the design, atleast on some of the areas I pointed above and then create a branch, create sub-tasks, do some prototypes etc.
+1. This feature work could be a bit over my expectation before. I agree we could need a separated branch for developing this in parallel. Will create a branch once we finalize our design work. 


> Work Preserving AM Restart for MapReduce
> ----------------------------------------
>
>                 Key: MAPREDUCE-6608
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Srikanth Sampath
>            Assignee: Srikanth Sampath
>         Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, WorkPreservingMRAppMaster-2.pdf, WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like to take advantage of this for MapReduce(MR) applications.  There are some challenges which have been described in the attached document and few options discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)