You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Xuelin Cao <xu...@gmail.com> on 2015/02/03 06:48:53 UTC

Can spark provide an option to start reduce stage early?

In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps*

which can be used to start reducer stage when X% mappers are completed. By
doing this, the data shuffling process is able to parallel with the map
process.

In a large multi-tenancy cluster, this option is usually tuned off. But, in
some cases, turn on the option could accelerate some high priority jobs.

Will spark provide similar option?

Re: Can spark provide an option to start reduce stage early?

Posted by Kay Ousterhout <ke...@eecs.berkeley.edu>.
There's a JIRA tracking this here:
https://issues.apache.org/jira/browse/SPARK-2387

On Mon, Feb 2, 2015 at 9:48 PM, Xuelin Cao <xu...@gmail.com> wrote:

> In hadoop MR, there is an option *mapred.reduce.slowstart.completed.maps*
>
> which can be used to start reducer stage when X% mappers are completed. By
> doing this, the data shuffling process is able to parallel with the map
> process.
>
> In a large multi-tenancy cluster, this option is usually tuned off. But, in
> some cases, turn on the option could accelerate some high priority jobs.
>
> Will spark provide similar option?
>