You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2018/10/03 23:37:15 UTC

How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

Hi All,

How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

Thanks

Re: How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2?

Posted by kathleen li <ka...@gmail.com>.
Not sure what you mean about “raw” Spark sql, but there is one parameter which will impact the optimizer choose broadcast join automatically or not :

spark.sql.autoBroadcastJoinThreshold

You can read Spark doc about above parameter setting and using explain to check your join using broadcast or not.

Make sure you gather statistics for tables.
 
There is broadcast hint also. Please be aware if the table being broadcasted to all worker nodes is fairly big, it will not be a good option always.

Kathleen

Sent from my iPhone

> On Oct 3, 2018, at 4:37 PM, kant kodali <ka...@gmail.com> wrote:
> 
> Hi All,
> 
> How to do a broadcast join using raw Spark SQL 2.3.1 or 2.3.2? 
> 
> Thanks
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org