You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eyal Farago (JIRA)" <ji...@apache.org> on 2019/03/08 19:12:00 UTC

[jira] [Commented] (SPARK-17556) Executor side broadcast for broadcast joins

    [ https://issues.apache.org/jira/browse/SPARK-17556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788206#comment-16788206 ] 

Eyal Farago commented on SPARK-17556:
-------------------------------------

why was this abandoned?

[~viirya]'s pull request seems promising.

I think the last comment by [~LI,Xiao] applies for current implementation as well as executors hold the entire broadcast anyway (assuming they ran task that used it) - so memory footprint on the executors side doesn't change, re. performance regression in case of multiple smaller partitions this also applies for current implementation as the RDD partitions has to be calculated and transferred to the driver.

one thing I personally think could be improved in [~viirya]'s PR was the requirement for the RDD to be pre-persisted, I think blocks could be evaluated in the mapPartition operation performed in the newly introduced RDD.broadcast method, this would have solved most comments by [~holdenk_amp] in the PR.

> Executor side broadcast for broadcast joins
> -------------------------------------------
>
>                 Key: SPARK-17556
>                 URL: https://issues.apache.org/jira/browse/SPARK-17556
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core, SQL
>            Reporter: Reynold Xin
>            Priority: Major
>         Attachments: executor broadcast.pdf, executor-side-broadcast.pdf
>
>
> Currently in Spark SQL, in order to perform a broadcast join, the driver must collect the result of an RDD and then broadcast it. This introduces some extra latency. It might be possible to broadcast directly from executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org