You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/12/20 13:03:58 UTC

[jira] [Resolved] (SPARK-18944) Understanding BroadcastNestedLoopJoin and number of partitions

     [ https://issues.apache.org/jira/browse/SPARK-18944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-18944.
-------------------------------
    Resolution: Invalid

Questions belong on user@spark.apache.org

> Understanding BroadcastNestedLoopJoin and number of partitions
> --------------------------------------------------------------
>
>                 Key: SPARK-18944
>                 URL: https://issues.apache.org/jira/browse/SPARK-18944
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.2
>         Environment: Spark 1.6.2
>            Reporter: David Hodeffi
>            Priority: Trivial
>              Labels: question
>
> I have two dataframes which I am joining. small and big  size dataframess. The optimizer  suggest to use BroadcastNestedLoopJoin. 
> number of partitions for the big dataframe is 200 while small dataframe has 5 partitions.
> The joined dataframe  results with 205 partitions (joined.rdd.partitions.size), I have tried to understand  why is this number and figured out that BroadCastNestedLoopJoin is actually a union. 
> code : 
> case class BroadcastNestedLoopJoin{
>    def doExecuteo(): ={
>         ...
>         ...
>        sparkContext.union(
>             matchedStreamRows,
>             sparkContext.makeRDD(notMatchedBroadcastRows)
>       )
>   }
> }
> can someone explain what exactly the code of doExecute() do?  can you elaborate about all the null checks and why can we have nulls ? Why do we have 205 partions? link to a JIRA with discussion that can explain the code can help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org