You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Akhilanand <ak...@gmail.com> on 2019/03/05 07:20:04 UTC

Join selection

Hello,

I was going through the Spark strategies class and found that by default
Sort merge join is preferred over shuffled hash join. The
preferSortMergeJoin needs to be explicitly set to False if we have to force
a shuffled hash join.

1) why is Sort merge join preferred over hash join?
2) are there any performance implications we need to take care of when we
force shuffled hash joins?

-- 
Sent from my iphone