You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2012/10/13 04:16:18 UTC

speculative execution before mappers finish

Is it possible for reducers to start (not just copying, but actually)
"reducing" before all mappers are done, speculatively?

In particular im asking this because Im curious about the internals of how
the shuffle and sort might (or might not :)) be able to support this.

Re: speculative execution before mappers finish

Posted by Harsh J <ha...@cloudera.com>.
Think of it in partition terms. If you know that your map-splits X, Y
and Z won't emit any key of partition P, then the Pth reducer can jump
ahead and run without those X, Y and Z completing their processing.

Otherwise, a reducer can't run until all maps have completed, in fear
of losing a few keys that may have come out of the maps it has skipped
fetching from. To some this may be tolerable, or some would be OK to
receive it later - but thats gonna add complexity when you could just
fetch continuously and wait.

Should be easy to take the MRv2 application [0] and add such a thing
in today, if you need it.

[0] - Given the confusion between what MRv2 and YARN mean individually
(they get mixed up too much), hope this blog post of mine helps:
http://www.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/

On Sat, Oct 13, 2012 at 7:46 AM, Jay Vyas <ja...@gmail.com> wrote:
> Is it possible for reducers to start (not just copying, but actually)
> "reducing" before all mappers are done, speculatively?
>
> In particular im asking this because Im curious about the internals of how
> the shuffle and sort might (or might not :)) be able to support this.



-- 
Harsh J