You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Chris Douglas (JIRA)" <ji...@apache.org> on 2013/05/09 11:27:16 UTC

[jira] [Commented] (MAPREDUCE-4584) Umbrella: Preemption and restart of MapReduce tasks

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13652832#comment-13652832 ] 

Chris Douglas commented on MAPREDUCE-4584:
------------------------------------------

[~ozawa]: I've been reading some of the iterations of your patch(es) as you've updated them over the last few months. Our proposals are absolutely complementary. Your approach (IIRC) involved reusing map tasks to aggregate map output on the same host, right? MAPREDUCE-4502 can accomplish more than checkpointing by aggregating across partitions.

We added some metadata to {{IFile}} to track which task attempts a segment contains. I haven't looked at a recent version of your patch, but that's certainly shared functionality.
                
> Umbrella: Preemption and restart of MapReduce tasks
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-4584
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4584
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2, performance, resourcemanager, task
>            Reporter: Sriram Rao
>            Assignee: Chris Douglas
>
> This JIRA will track the implementation of improvements to the handling of intermediate data (e.g., map output). Specifically, it tracks changes in support of preempting running tasks, checkpointing completed work, and spawning one or more tasks to complete the original split/partition. These mechanisms allow one to manage skew in intermediate data, respond to resource abundance or scarcity (particularly with preemption), speculatively execute on the remaining work from checkpointed tasks, and automatically tune parameters for performance.
> Iterations will build on learnings from previous work, including the following:
> Technical reports:
> http://research.yahoo.com/files/yl-2012-002.pdf
> http://research.yahoo.com/files/yl-2012-003.pdf
> Source code:
> http://code.google.com/p/sailfish

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira