You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Michail Giannakopoulos (JIRA)" <ji...@apache.org> on 2018/07/17 16:21:01 UTC

[jira] [Commented] (SPARK-9850) Adaptive execution in Spark

    [ https://issues.apache.org/jira/browse/SPARK-9850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546840#comment-16546840 ] 

Michail Giannakopoulos commented on SPARK-9850:
-----------------------------------------------

Hello [~yhuai]! Are people currently working on this Epic? In other words, is this work in progress, or have you determined that it should be stalled?
I am asking because recently I logged an issue related with adaptive execution (SPARK-24826). It would be nice to know if you are working on this actively since it reduces a lot the number of partitions during shuffles when executing sql queries (one of the main bottlenecks for spark). Thanks a lot!

> Adaptive execution in Spark
> ---------------------------
>
>                 Key: SPARK-9850
>                 URL: https://issues.apache.org/jira/browse/SPARK-9850
>             Project: Spark
>          Issue Type: Epic
>          Components: Spark Core, SQL
>            Reporter: Matei Zaharia
>            Assignee: Yin Huai
>            Priority: Major
>         Attachments: AdaptiveExecutionInSpark.pdf
>
>
> Query planning is one of the main factors in high performance, but the current Spark engine requires the execution DAG for a job to be set in advance. Even with cost-based optimization, it is hard to know the behavior of data and user-defined functions well enough to always get great execution plans. This JIRA proposes to add adaptive query execution, so that the engine can change the plan for each query as it sees what data earlier stages produced.
> We propose adding this to Spark SQL / DataFrames first, using a new API in the Spark engine that lets libraries run DAGs adaptively. In future JIRAs, the functionality could be extended to other libraries or the RDD API, but that is more difficult than adding it in SQL.
> I've attached a design doc by Yin Huai and myself explaining how it would work in more detail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org