You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by shyla deshpande <de...@gmail.com> on 2017/03/29 06:47:58 UTC

Re: dataframe join questions. Appreciate your input.

On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <de...@gmail.com>
wrote:

> Following are my questions. Thank you.
>
> 1. When joining dataframes is it a good idea to repartition on the key column that is used in the join or
> the optimizer is too smart so forget it.
>
> 2. In RDD join, wherever possible we do reduceByKey before the join to avoid a big shuffle of data. Do we need
> to do anything similar with dataframe joins, or the optimizer is too smart so forget it.
>
>