You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by 曹雪林 <xu...@gmail.com> on 2015/01/08 03:01:42 UTC

Fwd: When will spark support "push" style shuffle?

Hi,

      I've heard a lot of complain about spark's "pull" style shuffle. Is
there any plan to support "push" style shuffle in the near future?

      Currently, the shuffle phase must be completed before the next stage
starts. While, it is said, in Impala, the shuffled data is "streamed" to
the next stage handler, which greatly saves time. Will spark support this
mechanism one day?

Thanks

Re: When will spark support "push" style shuffle?

Posted by "Xuelin Cao.2015" <xu...@gmail.com>.
Got it. The explain makes sense. Thank you.


On Thu, Jan 8, 2015 at 1:06 PM, Patrick Wendell [via Apache Spark
Developers List] <ml...@n3.nabble.com> wrote:

> This question is conflating a few different concepts. I think the main
> question is whether Spark will have a shuffle implementation that
> streams data rather than persisting it to disk/cache as a buffer.
> Spark currently decouples the shuffle write from the read using
> disk/OS cache as a buffer. The two benefits of this approach this are
> that it allows intra-query fault tolerance and it makes it easier to
> elastically scale and reschedule work within a job. We consider these
> to be design requirements (think about jobs that run for several hours
> on hundreds of machines). Impala, and similar systems like dremel and
> f1, not offer fault tolerance within a query at present. They also
> require gang scheduling the entire set of resources that will exist
> for the duration of a query.
>
> A secondary question is whether our shuffle should have a barrier or
> not. Spark's shuffle currently has a hard barrier between map and
> reduce stages. We haven't seen really strong evidence that removing
> the barrier is a net win. It can help the performance of a single job
> (modestly), but in the a multi-tenant workload, it leads to poor
> utilization since you have a lot of reduce tasks that are taking up
> slots waiting for mappers to finish. Many large scale users of
> Map/Reduce disable this feature in production clusters for that
> reason. Thus, we haven't seen compelling evidence for removing the
> barrier at this point, given the complexity of doing so.
>
> It is possible that future versions of Spark will support push-based
> shuffles, potentially in a mode that remove some of Spark's fault
> tolerance properties. But there are many other things we can still
> optimize about the shuffle that would likely come before this.
>
> - Patrick
>
> On Wed, Jan 7, 2015 at 6:01 PM, 曹雪林 <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10029&i=0>> wrote:
>
> > Hi,
> >
> >       I've heard a lot of complain about spark's "pull" style shuffle.
> Is
> > there any plan to support "push" style shuffle in the near future?
> >
> >       Currently, the shuffle phase must be completed before the next
> stage
> > starts. While, it is said, in Impala, the shuffled data is "streamed" to
> > the next stage handler, which greatly saves time. Will spark support
> this
> > mechanism one day?
> >
> > Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10029&i=1>
> For additional commands, e-mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=10029&i=2>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-When-will-spark-support-push-style-shuffle-tp10028p10029.html
>  To start a new topic under Apache Spark Developers List, email
> ml-node+s1001551n1h40@n3.nabble.com
> To unsubscribe from Apache Spark Developers List, click here
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=eHVlbGluY2FvMjAxNEBnbWFpbC5jb218MXwtOTc3NDY2MzAy>
> .
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-When-will-spark-support-push-style-shuffle-tp10028p10031.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: When will spark support "push" style shuffle?

Posted by Patrick Wendell <pw...@gmail.com>.
This question is conflating a few different concepts. I think the main
question is whether Spark will have a shuffle implementation that
streams data rather than persisting it to disk/cache as a buffer.
Spark currently decouples the shuffle write from the read using
disk/OS cache as a buffer. The two benefits of this approach this are
that it allows intra-query fault tolerance and it makes it easier to
elastically scale and reschedule work within a job. We consider these
to be design requirements (think about jobs that run for several hours
on hundreds of machines). Impala, and similar systems like dremel and
f1, not offer fault tolerance within a query at present. They also
require gang scheduling the entire set of resources that will exist
for the duration of a query.

A secondary question is whether our shuffle should have a barrier or
not. Spark's shuffle currently has a hard barrier between map and
reduce stages. We haven't seen really strong evidence that removing
the barrier is a net win. It can help the performance of a single job
(modestly), but in the a multi-tenant workload, it leads to poor
utilization since you have a lot of reduce tasks that are taking up
slots waiting for mappers to finish. Many large scale users of
Map/Reduce disable this feature in production clusters for that
reason. Thus, we haven't seen compelling evidence for removing the
barrier at this point, given the complexity of doing so.

It is possible that future versions of Spark will support push-based
shuffles, potentially in a mode that remove some of Spark's fault
tolerance properties. But there are many other things we can still
optimize about the shuffle that would likely come before this.

- Patrick

On Wed, Jan 7, 2015 at 6:01 PM, 曹雪林 <xu...@gmail.com> wrote:
> Hi,
>
>       I've heard a lot of complain about spark's "pull" style shuffle. Is
> there any plan to support "push" style shuffle in the near future?
>
>       Currently, the shuffle phase must be completed before the next stage
> starts. While, it is said, in Impala, the shuffled data is "streamed" to
> the next stage handler, which greatly saves time. Will spark support this
> mechanism one day?
>
> Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org